fairlearn.postprocessing package#

This module contains methods which operate on a predictor, rather than an estimator.

The predictor’s output is adjusted to fulfill specified parity constraints. The postprocessors learn how to adjust the predictor’s output from the training data.

class fairlearn.postprocessing.ThresholdOptimizer(*, estimator=None, constraints='demographic_parity', objective='accuracy_score', grid_size=1000, flip=False, prefit=False, predict_method='deprecated')[source]#

Bases: sklearn.base.BaseEstimator, sklearn.base.MetaEstimatorMixin

A classifier based on the threshold optimization approach.

The classifier is obtained by applying group-specific thresholds to the provided estimator. The thresholds are chosen to optimize the provided performance objective subject to the provided fairness constraints.

Read more in the User Guide.

Parameters

estimator (object) – A scikit-learn compatible estimator # noqa whose output is postprocessed.
constraints (str, default='demographic_parity') –
Fairness constraints under which threshold optimization is performed. Possible inputs are:

’demographic_parity’, ‘selection_rate_parity’ (synonymous)
match the selection rate across groups

’{false,true}_{positive,negative}_rate_parity’
match the named metric across groups

’equalized_odds’
match true positive and false positive rates across groups
objective (str, default='accuracy_score') –
Performance objective under which threshold optimization is performed. Not all objectives are allowed for all types of constraints. Possible inputs are:

’accuracy_score’, ‘balanced_accuracy_score’
allowed for all constraint types

’selection_rate’, ‘true_positive_rate’, ‘true_negative_rate’,
allowed for all constraint types except ‘equalized_odds’
grid_size (int, default=1000) – The values of the constraint metric are discretized according to the grid of the specified size over the interval [0,1] and the optimization is performed with respect to the constraints achieving those values. In case of ‘equalized_odds’ the constraint metric is the false positive rate.
flip (bool, default=False) – If True, then allow flipping the decision if it improves the resulting
prefit (bool, default=False) – If True, avoid refitting the given estimator. Note that when used with sklearn.model_selection.cross_val_score(), sklearn.model_selection.GridSearchCV, this will result in an error. In that case, please use prefit=False.
predict_method ({'auto', 'predict_proba', 'decision_function', 'predict' }, default='auto') –
Defines which method of the estimator is used to get the output values.
- ’auto’: use one of predict_proba, decision_function, or predict, in that order.
- ’predict_proba’: use the second column from the output of predict_proba. It is assumed that the second column represents the positive outcome.
- ’decision_function’: use the raw values given by the decision_function.
- ’predict’: use the hard values reported by the predict method if estimator is a classifier, and the regression values if estimator is a regressor. This is equivalent to what is done in [1].
New in version 0.7: In previous versions only the predict method was used implicitly.

Changed in version 0.7: From version 0.7, ‘predict’ is deprecated as the default value and the default will change to ‘auto’ from v0.10.

Notes

The procedure is based on the algorithm of Hardt et al. (2016) [1].

References

1(1,2): M. Hardt, E. Price, and N. Srebro, “Equality of Opportunity in Supervised Learning,” arXiv.org, 07-Oct-2016. [Online]. Available: https://arxiv.org/abs/1610.02413.

Examples

>>> from fairlearn.postprocessing import ThresholdOptimizer
>>> from sklearn.linear_model import LogisticRegression
>>> X                  = [[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]]
>>> y                  = [ 1 ,  1 ,  1 ,  1 ,  0,   0 ,  1 ,  0 ,  0 ,  0 ]
>>> sensitive_features = ["a", "b", "a", "a", "b", "a", "b", "b", "a", "b"]
>>> unmitigated_lr = LogisticRegression().fit(X, y)
>>> postprocess_est = ThresholdOptimizer(
...                    estimator=unmitigated_lr,
...                    constraints="false_negative_rate_parity",
...                    objective="balanced_accuracy_score",
...                    prefit=True,
...                    predict_method='predict_proba')
>>> postprocess_est.fit(X, y, sensitive_features=sensitive_features)
ThresholdOptimizer(constraints='false_negative_rate_parity',
                   estimator=LogisticRegression(),
                   objective='balanced_accuracy_score',
                   predict_method='predict_proba', prefit=True)

Methods

`fit`(X, y, , sensitive_features, *kwargs)	Fit the model.
`get_params`([deep])	Get parameters for this estimator.
`predict`(X, *, sensitive_features[, random_state])	Predict label for each sample in X while taking into account sensitive features.
`set_params`(**params)	Set the parameters of this estimator.

fit(X, y, *, sensitive_features, **kwargs)[source]#

Fit the model.

The fit is based on training features and labels, sensitive features, as well as the fairness-unaware predictor or estimator. If an estimator was passed in the constructor this fit method will call fit(X, y, **kwargs) on said estimator.

Parameters

X (numpy.ndarray or pandas.DataFrame) – The feature matrix
y (numpy.ndarray, pandas.DataFrame, pandas.Series, or list) – The label vector
sensitive_features (numpy.ndarray, list, pandas.DataFrame, or pandas.Series) – sensitive features to identify groups by

predict(X, *, sensitive_features, random_state=None)[source]#

Predict label for each sample in X while taking into account sensitive features.

Parameters

X (numpy.ndarray or pandas.DataFrame) – feature matrix
sensitive_features (numpy.ndarray, list, pandas.DataFrame, pandas.Series) – sensitive features to identify groups by
random_state (int or numpy.random.RandomState instance, default=None) – Controls random numbers used for randomized predictions. Pass an int for reproducible output across multiple function calls.

Returns

The prediction in the form of a scalar or vector. If X represents the data for a single example the result will be a scalar. Otherwise the result will be a vector.

Return type

numpy.ndarray

fairlearn.postprocessing.plot_threshold_optimizer(threshold_optimizer, ax=None, show_plot=True)[source]#

Plot the chosen solution of the threshold optimizer.

For fairlearn.postprocessing.ThresholdOptimizer objects that have their constraint set to ‘demographic_parity’ this will result in a selection/error curve plot. For fairlearn.postprocessing.ThresholdOptimizer objects that have their constraint set to ‘equalized_odds’ this will result in a ROC curve plot.

Parameters

threshold_optimizer (fairlearn.postprocessing.ThresholdOptimizer) – the ThresholdOptimizer instance for which the results should be illustrated.
ax (matplotlib.axes.Axes) – a custom matplotlib.axes.Axes object to use for the plots, default None
show_plot (bool) – whether or not the generated plot should be shown, default True