fairlearn.reductions package

This module contains algorithms implementing the reductions approach to disparity mitigation.

In this approach, disparity constraints are cast as Lagrange multipliers, which cause the reweighting and relabelling of the input data. This reduces the problem back to standard machine learning training.

class fairlearn.reductions.AbsoluteLoss(min_val, max_val)[source]

Bases: object

Class to evaluate absolute loss.

eval(y_true, y_pred)[source]

Evaluate the absolute loss for the given set of true and predicted values.

class fairlearn.reductions.BoundedGroupLoss(loss, *, upper_bound=None)[source]

Bases: fairlearn.reductions.ConditionalLossMoment

Moment for constraining the worst-case loss by a group.

For more information refer to the user guide.

class fairlearn.reductions.ClassificationMoment[source]

Bases: fairlearn.reductions.Moment

Moment that can be expressed as weighted classification error.

class fairlearn.reductions.DemographicParity(*, difference_bound=None, ratio_bound=None, ratio_bound_slack=0.0)[source]

Bases: fairlearn.reductions.UtilityParity

Implementation of demographic parity as a moment.

A classifier \(h(X)\) satisfies demographic parity if

\[P[h(X) = 1 | A = a] = P[h(X) = 1] \; \forall a\]

This implementation of UtilityParity defines a single event, all. Consequently, the prob_event pandas.Series will only have a single entry, which will be equal to 1. Similarly, the index property will have twice as many entries (corresponding to the Lagrange multipliers for positive and negative constraints) as there are unique values for the sensitive feature. The signed_weights() method will compute the costs according to Example 3 of Agarwal et al. (2018) 4.

This Moment also supports control features, which can be used to stratify the data, with the Demographic Parity constraint applied within each stratum, but not between strata. If the control feature groups are \(c \in \mathcal{C}\) then the above equation will become

\[P[h(X) = 1 | A = a, C = c] = P[h(X) = 1 | C = c] \; \forall a, c\]

References

4

A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. Wallach, “A Reductions Approach to Fair Classification,” arXiv.org, 16-Jul-2018. [Online]. Available: https://arxiv.org/abs/1803.02453.

load_data(X, y, *, sensitive_features, control_features=None)[source]

Load the specified data into the object.

short_name = 'DemographicParity'
class fairlearn.reductions.EqualizedOdds(*, difference_bound=None, ratio_bound=None, ratio_bound_slack=0.0)[source]

Bases: fairlearn.reductions.UtilityParity

Implementation of equalized odds as a moment.

Adds conditioning on label compared to demographic parity, i.e.

\[P[h(X) = 1 | A = a, Y = y] = P[h(X) = 1 | Y = y] \; \forall a, y\]

This implementation of UtilityParity defines events corresponding to the unique values of the Y array.

The prob_event pandas.Series will record the fraction of the samples corresponding to each unique value in the Y array.

The index MultiIndex will have a number of entries equal to the number of unique values for the sensitive feature, multiplied by the number of unique values of the Y array, multiplied by two (for the Lagrange multipliers for positive and negative constraints).

With these definitions, the signed_weights() method will calculate the costs according to Example 4 of Agarwal et al. (2018) 7.

This Moment also supports control features, which can be used to stratify the data, with the constraint applied within each stratum, but not between strata.

References

7

A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. Wallach, “A Reductions Approach to Fair Classification,” arXiv.org, 16-Jul-2018. [Online]. Available: https://arxiv.org/abs/1803.02453.

load_data(X, y, *, sensitive_features, control_features=None)[source]

Load the specified data into the object.

short_name = 'EqualizedOdds'
class fairlearn.reductions.ErrorRate[source]

Bases: fairlearn.reductions.ClassificationMoment

Misclassification error.

gamma(predictor)[source]

Return the gamma values for the given predictor.

load_data(X, y, *, sensitive_features, control_features=None)[source]

Load the specified data into the object.

project_lambda(lambda_vec)[source]

Return the lambda values.

signed_weights(lambda_vec=None)[source]

Return the signed weights.

short_name = 'Err'
class fairlearn.reductions.ErrorRateParity(*, difference_bound=None, ratio_bound=None, ratio_bound_slack=0.0)[source]

Bases: fairlearn.reductions.UtilityParity

Implementation of error rate parity as a moment.

A classifier \(h(X)\) satisfies error rate parity if

\[P[h(X) \ne Y | A = a] = P[h(X) \ne Y] \; \forall a\]

This implementation of UtilityParity defines a single event, all. Consequently, the prob_event pandas.Series will only have a single entry, which will be equal to 1.

The index property will have twice as many entries (corresponding to the Lagrange multipliers for positive and negative constraints) as there are unique values for the sensitive feature.

The signed_weights() method will compute the costs according to Example 3 of Agarwal et al. (2018) 8. However, in this scenario, g = abs(h(x)-y), rather than g = h(x)

This Moment also supports control features, which can be used to stratify the data, with the constraint applied within each stratum, but not between strata.

References

8

A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. Wallach, “A Reductions Approach to Fair Classification,” arXiv.org, 16-Jul-2018. [Online]. Available: https://arxiv.org/abs/1803.02453.

load_data(X, y, *, sensitive_features, control_features=None)[source]

Load the specified data into the object.

short_name = 'ErrorRateParity'
class fairlearn.reductions.ExponentiatedGradient(estimator, constraints, eps=0.01, max_iter=50, nu=None, eta0=2.0, run_linprog_step=True, sample_weight_name='sample_weight')[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.MetaEstimatorMixin

An Estimator which implements the exponentiated gradient approach to reductions.

The exponentiated gradient algorithm is described in detail by Agarwal et al. (2018).

Changed in version 0.3.0: Was a function before, not a class

Changed in version 0.4.6: Requires 0-1 labels for classification problems

Parameters
  • estimator (estimator) – An estimator implementing methods fit(X, y, sample_weight) and predict(X), where X is the matrix of features, y is the vector of labels (binary classification) or continuous values (regression), and sample_weight is a vector of weights. In binary classification labels y and predictions returned by predict(X) are either 0 or 1. In regression values y and predictions are continuous.

  • constraints (fairlearn.reductions.Moment) – The disparity constraints expressed as moments

  • eps (float) –

    Allowed fairness constraint violation; the solution is guaranteed to have the error within 2*best_gap of the best error under constraint eps; the constraint violation is at most 2*(eps+best_gap)

    Changed in version 0.5.0: eps is now only responsible for setting the L1 norm bound in the optimization

  • max_iter (int) –

    Maximum number of iterations

    New in version 0.5.0: Used to be T

  • nu (float) – Convergence threshold for the duality gap, corresponding to a conservative automatic setting based on the statistical uncertainty in measuring classification error

  • eta_0 (float) –

    Initial setting of the learning rate

    New in version 0.5.0: Used to be eta_mul

  • run_linprog_step (bool) –

    if True each step of exponentiated gradient is followed by the saddle point optimization over the convex hull of classifiers returned so far; default True

    New in version 0.5.0.

  • sample_weight_name (str) –

    Name of the argument to estimator.fit() which supplies the sample weights (defaults to sample_weight)

    New in version 0.5.0.

fit(X, y, **kwargs)[source]

Return a fair classifier under specified fairness constraints.

Parameters
predict(X, random_state=None)[source]

Provide predictions for the given input data.

Predictions are randomized, i.e., repeatedly calling predict with the same feature data may yield different output. This non-deterministic behavior is intended and stems from the nature of the exponentiated gradient algorithm.

Notes

A fitted ExponentiatedGradient has an attribute predictors_, an array of predictors, and an attribute weights_, an array of non-negative floats of the same length. The prediction on each data point in X is obtained by first picking a random predictor according to the probabilities in weights_ and then applying it. Different predictors can be chosen on different data points.

Parameters
  • X (numpy.ndarray or pandas.DataFrame) – Feature data

  • random_state (int or RandomState instance, default=None) – Controls random numbers used for randomized predictions. Pass an int for reproducible output across multiple function calls.

Returns

The prediction. If X represents the data for a single example the result will be a scalar. Otherwise the result will be a vector

Return type

Scalar or vector

class fairlearn.reductions.FalsePositiveRateParity(*, difference_bound=None, ratio_bound=None, ratio_bound_slack=0.0)[source]

Bases: fairlearn.reductions.UtilityParity

Implementation of false positive rate parity as a moment.

Adds conditioning on label Y=0 compared to demographic parity, i.e.,

\[P[h(X) = 1 | A = a, Y = 0] = P[h(X) = 1 | Y = 0] \; \forall a\]

This implementation of UtilityParity defines the event corresponding to Y=0.

The prob_event pandas.DataFrame will record the fraction of the samples corresponding to Y = 0 in the Y array.

The index MultiIndex will have a number of entries equal to the number of unique values of the sensitive feature, multiplied by the number of unique non-NaN values of the constructed event array, whose entries are either NaN or label=0 (so only one unique non-NaN value), multiplied by two (for the Lagrange multipliers for positive and negative constraints).

With these definitions, the signed_weights() method will calculate the costs for Y=0 as they are calculated in Example 4 of Agarwal et al. (2018), but will use the weights equal to zero for Y=1 6.

This Moment also supports control features, which can be used to stratify the data, with the constraint applied within each stratum, but not between strata.

References

6

A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. Wallach, “A Reductions Approach to Fair Classification,” arXiv.org, 16-Jul-2018. [Online]. Available: https://arxiv.org/abs/1803.02453.

load_data(X, y, *, sensitive_features, control_features=None)[source]

Load the specified data into the object.

short_name = 'FalsePositiveRateParity'
class fairlearn.reductions.GridSearch(estimator, constraints, selection_rule='tradeoff_optimization', constraint_weight=0.5, grid_size=10, grid_limit=2.0, grid_offset=None, grid=None, sample_weight_name='sample_weight')[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.MetaEstimatorMixin

Estimator to perform a grid search given a blackbox estimator algorithm.

The approach used is taken from section 3.4 of Agarwal et al. (2018) 1.

New in version 0.3.0.

Changed in version 0.4.6: Enabled for more than two sensitive feature values

Parameters
  • estimator (estimator) – An estimator implementing methods fit(X, y, sample_weight) and predict(X), where X is the matrix of features, y is the vector of labels (binary classification) or continuous values (regression), and sample_weight is a vector of weights. In binary classification labels y and predictions returned by predict(X) are either 0 or 1. In regression values y and predictions are continuous.

  • constraints (fairlearn.reductions.Moment) – The disparity constraints expressed as moments

  • selection_rule (str) – Specifies the procedure for selecting the best model found by the grid search. At the present time, the only valid value is “tradeoff_optimization” which minimizes a weighted sum of the error rate and constraint violation.

  • constraint_weight (float) – When the selection_rule is “tradeoff_optimization” this specifies the relative weight put on the constraint violation when selecting the best model. The weight placed on the error rate will be 1-constraint_weight

  • grid_size (int) – The number of Lagrange multipliers to generate in the grid

  • grid_limit (float) – The largest Lagrange multiplier to generate. The grid will contain values distributed between -grid_limit and grid_limit by default

  • grid_offset (pandas.DataFrame) – Shifts the grid of Lagrangian multiplier by that value. It is ‘0’ by default

  • grid – Instead of supplying a size and limit for the grid, users may specify the exact set of Lagrange multipliers they desire using this argument.

  • sample_weight_name (str) –

    Name of the argument to estimator.fit() which supplies the sample weights (defaults to sample_weight)

    New in version 0.5.0.

References

1

A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. Wallach, “A Reductions Approach to Fair Classification,” arXiv.org, 16-Jul-2018. [Online]. Available: https://arxiv.org/abs/1803.02453.

fit(X, y, **kwargs)[source]

Run the grid search.

This will result in multiple copies of the estimator being made, and the fit(X) method of each one called.

Parameters
predict(X)[source]

Provide a prediction using the best model found by the grid search.

This dispatches X to the predict(X) method of the selected estimator, and hence the return type is dependent on that method.

Parameters

X (numpy.ndarray or pandas.DataFrame) – Feature data

predict_proba(X)[source]

Provide the result of predict_proba from the best model found by the grid search.

The underlying estimator must support predict_proba(X) for this to work. The return type is determined by this method.

Parameters

X (numpy.ndarray or pandas.DataFrame) – Feature data

class fairlearn.reductions.LossMoment(loss)[source]

Bases: fairlearn.reductions.Moment

Moment that can be expressed as weighted loss.

class fairlearn.reductions.Moment[source]

Bases: object

Generic moment.

Our implementations of the reductions approach to fairness described in Agarwal et al. (2018) make use of Moment objects to describe the disparity constraints imposed on the solution. This is an abstract class for all such objects.

bound()[source]

Return vector of fairness bound constraint the length of gamma.

gamma(predictor)[source]

Calculate the degree to which constraints are currently violated by the predictor.

load_data(X, y, *, sensitive_features=None)[source]

Load a set of data for use by this object.

Parameters
  • X (array) – The feature array

  • y (pandas.Series) – The label vector

  • sensitive_features (pandas.Series) – The sensitive feature vector (default None)

project_lambda(lambda_vec)[source]

Return the projected lambda values.

signed_weights(lambda_vec)[source]

Return the signed weights.

property total_samples

Return the number of samples in the data.

class fairlearn.reductions.SquareLoss(min_val, max_val)[source]

Bases: object

Class to evaluate the square loss.

eval(y_true, y_pred)[source]

Evaluate the square loss for the given set of true and predicted values.

class fairlearn.reductions.TruePositiveRateParity(*, difference_bound=None, ratio_bound=None, ratio_bound_slack=0.0)[source]

Bases: fairlearn.reductions.UtilityParity

Implementation of true positive rate parity as a moment.

Note

The true positive rate parity fairness criterion is also known as “equal opportunity”.

Adds conditioning on label Y=1 compared to demographic parity, i.e.,

\[P[h(X) = 1 | A = a, Y = 1] = P[h(X) = 1 | Y = 1] \; \forall a\]

This implementation of UtilityParity defines the event corresponding to Y=1.

The prob_event pandas.DataFrame will record the fraction of the samples corresponding to Y = 1 in the Y array.

The index MultiIndex will have a number of entries equal to the number of unique values of the sensitive feature, multiplied by the number of unique non-NaN values of the constructed event array, whose entries are either NaN or label=1 (so only one unique non-NaN value), multiplied by two (for the Lagrange multipliers for positive and negative constraints).

With these definitions, the signed_weights() method will calculate the costs for Y=1 as they are calculated in Example 4 of Agarwal et al. (2018) <https://arxiv.org/abs/1803.02453>, but will use the weights equal to zero for Y=0 5.

This Moment also supports control features, which can be used to stratify the data, with the constraint applied within each stratum, but not between strata.

References

5

A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. Wallach, “A Reductions Approach to Fair Classification,” arXiv.org, 16-Jul-2018. [Online]. Available: https://arxiv.org/abs/1803.02453.

load_data(X, y, *, sensitive_features, control_features=None)[source]

Load the specified data into the object.

short_name = 'TruePositiveRateParity'
class fairlearn.reductions.UtilityParity(*, difference_bound=None, ratio_bound=None, ratio_bound_slack=0.0)[source]

Bases: fairlearn.reductions.ClassificationMoment

A generic moment for parity in utilities (or costs) under classification.

This serves as the base class for DemographicParity, EqualizedOdds, and others. All subclasses can be used as difference-based constraints or ratio-based constraints. Refer to the user guide for more information and example usage.

Constraints compare the group-level mean utility for each group with the overall mean utility (unless further events are specified, e.g., in equalized odds). Constraint violation for difference-based constraints starts if the difference between a group and the overall population with regard to a utility exceeds difference_bound. For ratio-based constraints, the ratio between the group-level and overall mean utility needs to be bounded between ratio_bound and its inverse (plus an additional additive ratio_bound_slack).

The index field is a pandas.MultiIndex corresponding to the constraint IDs. It is an index of various DataFrame and Series objects that are either required as arguments or returned by several of the methods of the UtilityParity class. It is the Cartesian product of:

  • The unique events defining the particular moment object

  • The unique values of the sensitive feature

  • The characters + and -, corresponding to the Lagrange multipliers for positive and negative violations of the constraint

Parameters
  • difference_bound (float) – The constraints’ difference bound for constraints that are expressed as differences, also referred to as \(\\epsilon\) in documentation. If ratio_bound is used then difference_bound needs to be None. If neither ratio_bound nor difference_bound are set then a default difference bound of 0.01 is used for backwards compatibility. Default None.

  • ratio_bound (float) – The constraints’ ratio bound for constraints that are expressed as ratios. The specified value needs to be in (0,1]. If difference_bound is used then ratio_bound needs to be None. Default None.

  • ratio_bound_slack (float) – The constraints’ ratio bound slack for constraints that are expressed as ratios, also referred to as \(\\epsilon\) in documentation. ratio_bound_slack is ignored if ratio_bound is not specified. Default 0.0

bound()[source]

Return bound vector.

Returns

a vector of bound values corresponding to all constraints

Return type

pandas.Series

default_objective()[source]

Return the default objective for moments of this kind.

gamma(predictor)[source]

Calculate the degree to which constraints are currently violated by the predictor.

load_data(X, y, *, sensitive_features, event=None, utilities=None)[source]

Load the specified data into this object.

This adds a column event to the tags field.

The utilities is a 2-d array which corresponds to g(X,A,Y,h(X)) as mentioned in the paper Agarwal et al. (2018) <https://arxiv.org/abs/1803.02453> 2. The utilities defaults to h(X), i.e. [0, 1] for each X_i. The first column is G^0 and the second is G^1. Assumes binary classification with labels 0/1.

\[utilities = [g(X,A,Y,h(X)=0), g(X,A,Y,h(X)=1)]\]

References

2

A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. Wallach, “A Reductions Approach to Fair Classification,” arXiv.org, 16-Jul-2018. [Online]. Available: https://arxiv.org/abs/1803.02453.

project_lambda(lambda_vec)[source]

Return the projected lambda values.

i.e., returns lambda which is guaranteed to lead to the same or higher value of the Lagrangian compared with lambda_vec for all possible choices of the classifier, h.

signed_weights(lambda_vec)[source]

Compute the signed weights.

Uses the equations for \(C_i^0\) and \(C_i^1\) as defined in Section 3.2 of Agarwal et al. (2018) in the ‘best response of the Q-player’ subsection to compute the signed weights to be applied to the data by the next call to the underlying estimator 3.

Parameters

lambda_vec (pandas.Series) – The vector of Lagrange multipliers indexed by index

References

3

A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. Wallach, “A Reductions Approach to Fair Classification,” arXiv.org, 16-Jul-2018. [Online]. Available: https://arxiv.org/abs/1803.02453.

class fairlearn.reductions.ZeroOneLoss[source]

Bases: fairlearn.reductions.AbsoluteLoss

Class to evaluate a zero-one loss.