Preprocessing tools to help deal with sensitive attributes.
fairlearn.preprocessing.
CorrelationRemover
Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin
sklearn.base.BaseEstimator
sklearn.base.TransformerMixin
A component that filters out sensitive correlations in a dataset.
CorrelationRemover applies a linear transformation to the non-sensitive feature columns in order to remove their correlation with the sensitive feature columns while retaining as much information as possible (as measured by the least-squares error).
sensitive_feature_ids (list) – list of columns to filter out this can be a sequence of either int ,in the case of numpy, or string, in the case of pandas.
alpha (float) – parameter to control how much to filter, for alpha=1.0 we filter out all information while for alpha=0.0 we don’t apply any.
Notes
This method will change the original dataset by removing all correlation with sensitive values. To describe that mathematically, let’s assume in the original dataset \(X\) we’ve got a set of sensitive attributes \(S\) and a set of non-sensitive attributes \(Z\). Mathematically this method will be solving the following problem.
The solution to this problem is found by centering sensitive features, fitting a linear regression model to the non-sensitive features and reporting the residual.
The columns in \(S\) will be dropped but the hyper parameter \(\alpha\) does allow you to tweak the amount of filtering that gets applied.
Note that the lack of correlation does not imply anything about statistical dependence. Therefore, we expect this to be most appropriate as a preprocessing step for (generalized) linear models.
fit
Learn the projection required to make the dataset uncorrelated with sensitive columns.
transform
Transform X by applying the correlation remover.