Preprocessing tools to help deal with sensitive attributes.
Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin
A component that filters out sensitive correlations in a dataset.
CorrelationRemover applies a linear transformation to the non-sensitive feature columns
in order to remove their correlation with the sensitive feature columns while retaining
as much information as possible (as measured by the least-squares error).
sensitive_feature_ids (list) – list of columns to filter out this can be a sequence of
either int ,in the case of numpy, or string, in the case of pandas.
alpha (float) – parameter to control how much to filter, for alpha=1.0 we filter out
all information while for alpha=0.0 we don’t apply any.
This method will change the original dataset by removing all correlation with sensitive
values. To describe that mathematically, let’s assume in the original dataset \(X\)
we’ve got a set of sensitive attributes \(S\) and a set of non-sensitive attributes
\(Z\). Mathematically this method will be solving the following problem.
The solution to this problem is found by centering sensitive features, fitting a
linear regression model to the non-sensitive features and reporting the residual.
The columns in \(S\) will be dropped but the hyper parameter \(\alpha\)
does allow you to tweak the amount of filtering that gets applied.
Note that the lack of correlation does not imply anything about statistical dependence.
Therefore, we expect this to be most appropriate as a preprocessing step for
(generalized) linear models.
Learn the projection required to make the dataset uncorrelated with sensitive columns.
Transform X by applying the correlation remover.