fairlearn.preprocessing package#

Preprocessing tools to help deal with sensitive attributes.

class fairlearn.preprocessing.CorrelationRemover(*, sensitive_feature_ids, alpha=1)[source]#

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

A component that filters out sensitive correlations in a dataset.

CorrelationRemover applies a linear transformation to the non-sensitive feature columns in order to remove their correlation with the sensitive feature columns while retaining as much information as possible (as measured by the least-squares error).

Read more in the User Guide.

Parameters
  • sensitive_feature_ids (list) – list of columns to filter out this can be a sequence of either int ,in the case of numpy, or string, in the case of pandas.

  • alpha (float) – parameter to control how much to filter, for alpha=1.0 we filter out all information while for alpha=0.0 we don’t apply any.

Notes

This method will change the original dataset by removing all correlation with sensitive values. To describe that mathematically, let’s assume in the original dataset \(X\) we’ve got a set of sensitive attributes \(S\) and a set of non-sensitive attributes \(Z\). Mathematically this method will be solving the following problem.

\[\begin{split}\min _{\mathbf{z}_{1}, \ldots, \mathbf{z}_{n}} \sum_{i=1}^{n}\left\|\mathbf{z}_{i} -\mathbf{x}_{i}\right\|^{2} \\ \text{subject to} \\ \frac{1}{n} \sum_{i=1}^{n} \mathbf{z}_{i}\left(\mathbf{s}_{i}-\overline{\mathbf{s}} \right)^{T}=\mathbf{0}\end{split}\]

The solution to this problem is found by centering sensitive features, fitting a linear regression model to the non-sensitive features and reporting the residual.

The columns in \(S\) will be dropped but the hyper parameter \(\alpha\) does allow you to tweak the amount of filtering that gets applied.

\[X_{\text{tfm}} = \alpha X_{\text{filtered}} + (1-\alpha) X_{\text{orig}}\]

Note that the lack of correlation does not imply anything about statistical dependence. Therefore, we expect this to be most appropriate as a preprocessing step for (generalized) linear models.

Methods

fit(X[, y])

Learn the projection required to make the dataset uncorrelated with sensitive columns.

fit_transform(X[, y])

Fit to data, then transform it.

get_params([deep])

Get parameters for this estimator.

set_params(**params)

Set the parameters of this estimator.

transform(X)

Transform X by applying the correlation remover.

fit(X, y=None)[source]#

Learn the projection required to make the dataset uncorrelated with sensitive columns.

transform(X)[source]#

Transform X by applying the correlation remover.