fairlearn.preprocessing package¶
Preprocessing tools to help deal with sensitive attributes.

class
fairlearn.preprocessing.
CorrelationRemover
(*, sensitive_feature_ids, alpha=1)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
A component that filters out sensitive correlations in a dataset.
CorrelationRemover applies a linear transformation to the nonsensitive feature columns in order to remove their correlation with the sensitive feature columns while retaining as much information as possible (as measured by the leastsquares error).
 Parameters
Notes
This method will change the original dataset by removing all correlation with sensitive values. To describe that mathematically, let’s assume in the original dataset \(X\) we’ve got a set of sensitive attributes \(S\) and a set of nonsensitive attributes \(Z\). Mathematically this method will be solving the following problem.
\[\begin{split}\min _{\mathbf{z}_{1}, \ldots, \mathbf{z}_{n}} \sum_{i=1}^{n}\left\\mathbf{z}_{i} \mathbf{x}_{i}\right\^{2} \\ \text{subject to} \\ \frac{1}{n} \sum_{i=1}^{n} \mathbf{z}_{i}\left(\mathbf{s}_{i}\overline{\mathbf{s}} \right)^{T}=\mathbf{0}\end{split}\]The solution to this problem is found by centering sensitive features, fitting a linear regression model to the nonsensitive features and reporting the residual.
The columns in \(S\) will be dropped but the hyper parameter \(\alpha\) does allow you to tweak the amount of filtering that gets applied.
\[X_{\text{tfm}} = \alpha X_{\text{filtered}} + (1\alpha) X_{\text{orig}}\]Note that the lack of correlation does not imply anything about statistical dependence. Therefore, we expect this to be most appropriate as a preprocessing step for (generalized) linear models.