fairlearn.preprocessing.CorrelationRemover#
- class fairlearn.preprocessing.CorrelationRemover(*, sensitive_feature_ids, alpha=1)[source]#
- A component that filters out sensitive correlations in a dataset. - CorrelationRemover applies a linear transformation to the non-sensitive feature columns in order to remove their correlation with the sensitive feature columns while retaining as much information as possible (as measured by the least-squares error). - Read more in the User Guide. - Parameters:
 - Notes - This method will change the original dataset by removing all correlation with sensitive values. To describe that mathematically, let’s assume in the original dataset \(X\) we’ve got a set of sensitive attributes \(S\) and a set of non-sensitive attributes \(Z\). Mathematically this method will be solving the following problem. \[\begin{split}\min _{\mathbf{z}_{1}, \ldots, \mathbf{z}_{n}} \sum_{i=1}^{n}\left\|\mathbf{z}_{i} -\mathbf{x}_{i}\right\|^{2} \\ \text{subject to} \\ \frac{1}{n} \sum_{i=1}^{n} \mathbf{z}_{i}\left(\mathbf{s}_{i}-\overline{\mathbf{s}} \right)^{T}=\mathbf{0}\end{split}\]- The solution to this problem is found by centering sensitive features, fitting a linear regression model to the non-sensitive features and reporting the residual. - The columns in \(S\) will be dropped but the hyper parameter \(\alpha\) does allow you to tweak the amount of filtering that gets applied. \[X_{\text{tfm}} = \alpha X_{\text{filtered}} + (1-\alpha) X_{\text{orig}}\]- Note that the lack of correlation does not imply anything about statistical dependence. Therefore, we expect this to be most appropriate as a preprocessing step for (generalized) linear models. - New in version 0.6. - Methods - fit(X[, y])- Learn the projection required to make the dataset uncorrelated with sensitive columns. - fit_transform(X[, y])- Fit to data, then transform it. - Get metadata routing of this object. - get_params([deep])- Get parameters for this estimator. - set_output(*[, transform])- Set output container. - set_params(**params)- Set the parameters of this estimator. - transform(X)- Transform X by applying the correlation remover. - fit(X, y=None)[source]#
- Learn the projection required to make the dataset uncorrelated with sensitive columns. 
 - fit_transform(X, y=None, **fit_params)[source]#
- Fit to data, then transform it. - Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X. - Parameters:
- X (array-like of shape (n_samples, n_features)) – Input samples. 
- y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations). 
- **fit_params (dict) – Additional fit parameters. 
 
- Returns:
- X_new – Transformed array. 
- Return type:
- ndarray array of shape (n_samples, n_features_new) 
 
 - get_metadata_routing()[source]#
- Get metadata routing of this object. - Please check User Guide on how the routing mechanism works. - Returns:
- routing – A - MetadataRequestencapsulating routing information.
- Return type:
- MetadataRequest 
 
 - set_output(*, transform=None)[source]#
- Set output container. - See Introducing the set_output API for an example on how to use the API. - Parameters:
- transform ({"default", "pandas"}, default=None) – - Configure output of transform and fit_transform. - ”default”: Default output format of a transformer 
- ”pandas”: DataFrame output 
- None: Transform configuration is unchanged 
 
- Returns:
- self – Estimator instance. 
- Return type:
- estimator instance 
 
 - set_params(**params)[source]#
- Set the parameters of this estimator. - The method works on simple estimators as well as on nested objects (such as - Pipeline). The latter have parameters of the form- <component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
- **params (dict) – Estimator parameters. 
- Returns:
- self – Estimator instance. 
- Return type:
- estimator instance