# Reductions#

On a high level, the reduction algorithms within Fairlearn enable unfairness mitigation for an arbitrary machine learning model with respect to user-provided fairness constraints. All of the constraints currently supported by reduction algorithms are group-fairness constraints. For more information on the supported fairness constraints refer to Fairness constraints for binary classification and Fairness constraints for regression.

Note

The choice of a fairness metric and fairness constraints is a crucial step in the AI development and deployment, and choosing an unsuitable constraint can lead to more harms. For a broader discussion of fairness as a sociotechnical challenge and how to view Fairlearn in this context refer to Fairness in Machine Learning.

The reductions approach for classification seeks to reduce binary
classification subject to fairness constraints to a sequence of weighted
classification problems (see [1]), and similarly for regression (see [2]).
As a result, the reduction algorithms
in Fairlearn only require a wrapper access to any “base” learning algorithm.
By this we mean that the “base” algorithm only needs to implement `fit`

and
`predict`

methods, as any standard scikit-learn estimator, but it
does not need to have any knowledge of the desired fairness constraints or sensitive features.

From an API perspective this looks as follows in all situations

```
>>> reduction = Reduction(base_estimator, constraints, **kwargs)
>>> reduction.fit(X_train, y_train, sensitive_features=sensitive_features)
>>> reduction.predict(X_test)
```

Fairlearn doesn’t impose restrictions on the referenced `base_estimator`

other than the existence of `fit`

and `predict`

methods.
At the moment, the `base_estimator`

’s `fit`

method also needs to
provide a `sample_weight`

argument which the reductions techniques use
to reweight samples.
In the future Fairlearn will provide functionality to handle this even
without a `sample_weight`

argument.

Before looking more into reduction algorithms, this section
reviews the supported fairness constraints. All of them
are expressed as objects inheriting from the base class `Moment`

.
`Moment`

’s main purpose is to calculate the constraint violation of a
current set of predictions through its `gamma`

function as well as to
provide `signed_weights`

that are used to relabel and reweight samples.

## Fairness constraints for binary classification#

All supported fairness constraints for binary classification inherit from
`UtilityParity`

. They are based on some underlying metric called
*utility*, which can be evaluated on individual data points and is averaged
over various groups of data points to form the *utility parity* constraint
of the form

where \(a\) is a sensitive feature value and \(e\) is an *event*
identifier. Each data point has only one value of a sensitive feature,
and belongs to at most one event. In many examples, there is only
a single event \(*\), which includes all the data points. Other
examples of events include \(Y=0\) and \(Y=1\). The utility
parity requires that the mean utility within each event equals
the mean utility of each group whose sensitive feature is \(a\)
within that event.

The class `UtilityParity`

implements constraints that allow
some amount of violation of the utility parity constraints, where
the maximum allowed violation is specified either as a difference
or a ratio.

The *difference-based relaxation* starts out by representing
the utility parity constraints as pairs of
inequalities

and then replaces zero on the right-hand side
with a value specified as `difference_bound`

. The resulting
constraints are instantiated as

```
>>> UtilityParity(difference_bound=0.01)
```

Note that satisfying these constraints does not mean
that the difference between the groups with the highest and
smallest utility in each event is bounded by `difference_bound`

.
The value of `difference_bound`

instead bounds
the difference between the utility of each group and the overall mean
utility within each event. This, however,
implies that the difference between groups in each event is
at most twice the value of `difference_bound`

.

The *ratio-based relaxation* relaxes the parity
constraint as

for some value of \(r\) in (0,1]. For example, if \(r=0.9\), this means that within each event \(0.9 \cdot \text{utility}_{a,e} \leq \text{utility}_e\), i.e., the utility for each group needs to be at least 90% of the overall utility for the event, and \(0.9 \cdot \text{utility}_e \leq \text{utility}_{a,e}\), i.e., the overall utility for the event needs to be at least 90% of each group’s utility.

The two ratio constraints can be rewritten as

When instantiating the ratio constraints, we use `ratio_bound`

for \(r\),
and also allow further relaxation by replacing the zeros on the right hand side
by some non-negative `ratio_bound_slack`

. The resulting instantiation
looks as

```
>>> UtilityParity(ratio_bound=0.9, ratio_bound_slack=0.01)
```

Similarly to the difference constraints, the ratio constraints do not directly bound the ratio between the pairs of groups, but such a bound is implied.

Note

It is not possible to specify both `difference_bound`

*and*
`ratio_bound`

for the same constraint object.

### Demographic Parity#

A binary classifier \(h(X)\) satisfies *demographic parity* if

In other words, the selection rate or percentage of samples with label 1 should be equal across all groups. Implicitly this means the percentage with label 0 is equal as well. In this case, the utility function is equal to \(h(X)\) and there is only a single event \(*\).

In the example below group `"a"`

has a selection rate of 60%,
`"b"`

has a selection rate of 20%. The overall selection rate is 40%,
so `"a"`

is 0.2 above the overall selection rate, and `"b"`

is
0.2 below. Invoking the method `gamma`

shows the values
of the left-hand sides of the constraints described
in Fairness constraints for binary classification, which is independent
of the provided `difference_bound`

. Note that the left-hand sides
corresponding to different values of `sign`

are just negatives
of each other.
The value of `y_true`

is in this example irrelevant to the calculations,
because the underlying utility in demographic parity, selection rate, does not
consider performance relative to the true labels, but rather proportions in
the predicted labels.

Note

When providing `DemographicParity`

to mitigation algorithms, only use
the constructor and the mitigation algorithm itself then invokes `load_data`

.
The example below uses `load_data`

to illustrate how `DemographicParity`

instantiates inequalities from Fairness constraints for binary classification.

```
>>> from fairlearn.reductions import DemographicParity
>>> from fairlearn.metrics import MetricFrame, selection_rate
>>> import numpy as np
>>> import pandas as pd
>>> dp = DemographicParity(difference_bound=0.01)
>>> X = np.array([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]])
>>> y_true = np.array([ 1 , 1 , 1 , 1 , 0, 0 , 0 , 0 , 0 , 0 ])
>>> y_pred = np.array([ 1 , 1 , 1 , 1 , 0, 0 , 0 , 0 , 0 , 0 ])
>>> sensitive_features = np.array(["a", "b", "a", "a", "b", "a", "b", "b", "a", "b"])
>>> selection_rate_summary = MetricFrame(metrics=selection_rate,
... y_true=y_true,
... y_pred=y_pred,
... sensitive_features=pd.Series(sensitive_features, name="SF 0"))
>>> selection_rate_summary.overall
0.4
>>> selection_rate_summary.by_group
SF 0
a 0.6
b 0.2
Name: selection_rate, dtype: float64
>>> dp.load_data(X, y_true, sensitive_features=sensitive_features)
>>> dp.gamma(lambda X: y_pred)
sign event group_id
+ all a 0.2
b -0.2
- all a -0.2
b 0.2
dtype: float64
```

The ratio constraints for the demographic parity with `ratio_bound`

\(r\) (and `ratio_bound_slack=0`

) take form

Revisiting the same example as above we get

```
>>> dp = DemographicParity(ratio_bound=0.9, ratio_bound_slack=0.01)
>>> dp.load_data(X, y_pred, sensitive_features=sensitive_features)
>>> dp.gamma(lambda X: y_pred)
sign event group_id
+ all a 0.14
b -0.22
- all a -0.24
b 0.16
dtype: float64
```

Following the expressions for the left-hand sides of the constraints, we obtain

### True Positive Rate Parity and False Positive Rate Parity#

A binary classifier \(h(X)\) satisfies *true positive rate parity* if

and *false positive rate parity* if

In first case, we only have one event \(Y=1\) and ignore the samples with \(Y=0\), and in the second case vice versa. Refer to Equalized Odds for the fairness constraint type that simultaneously enforce both true positive rate parity and false positive rate parity by considering both events \(Y=0\) and \(Y=1\).

In practice this can be used in a difference-based relaxation as follows:

```
>>> from fairlearn.reductions import TruePositiveRateParity
>>> from fairlearn.metrics import true_positive_rate
>>> import numpy as np
>>> tprp = TruePositiveRateParity(difference_bound=0.01)
>>> X = np.array([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]])
>>> y_true = np.array([ 1 , 1 , 1 , 1 , 1, 1 , 1 , 0 , 0 , 0 ])
>>> y_pred = np.array([ 1 , 1 , 1 , 1 , 0, 0 , 0 , 1 , 0 , 0 ])
>>> sensitive_features = np.array(["a", "b", "a", "a", "b", "a", "b", "b", "a", "b"])
>>> tpr_summary = MetricFrame(metrics=true_positive_rate,
... y_true=y_true,
... y_pred=y_pred,
... sensitive_features=sensitive_features)
>>> tpr_summary.overall
0.5714285714285714
>>> tpr_summary.by_group
sensitive_feature_0
a 0.75...
b 0.33...
Name: true_positive_rate, dtype: float64
>>> tprp.load_data(X, y_true, sensitive_features=sensitive_features)
>>> tprp.gamma(lambda X: y_pred)
sign event group_id
+ label=1 a 0.1785...
b -0.2380...
- label=1 a -0.1785...
b 0.2380...
dtype: float64
```

Note

When providing `TruePositiveRateParity`

or `FalsePositiveRateParity`

to mitigation algorithms, only use
the constructor. The mitigation algorithm itself then invokes `load_data`

.
The example uses `load_data`

to illustrate how `TruePositiveRateParity`

instantiates inequalities from Fairness constraints for binary classification.

Alternatively, a ratio-based relaxation is also available:

```
>>> tprp = TruePositiveRateParity(ratio_bound=0.9, ratio_bound_slack=0.01)
>>> tprp.load_data(X, y_true, sensitive_features=sensitive_features)
>>> tprp.gamma(lambda X: y_pred)
sign event group_id
+ label=1 a 0.1035...
b -0.2714...
- label=1 a -0.2357...
b 0.1809...
dtype: float64
```

### Equalized Odds#

A binary classifier \(h(X)\) satisfies *equalized odds* if it satisfies both
*true positive rate parity* and *false positive rate parity*, i.e.,

The constraints represent the union of constraints for true positive rate and false positive rate.

```
>>> from fairlearn.reductions import EqualizedOdds
>>> eo = EqualizedOdds(difference_bound=0.01)
>>> eo.load_data(X, y_true, sensitive_features=sensitive_features)
>>> eo.gamma(lambda X: y_pred)
sign event group_id
+ label=0 a -0.3333...
b 0.1666...
label=1 a 0.1785...
b -0.2380...
- label=0 a 0.3333...
b -0.1666...
label=1 a -0.1785...
b 0.2380...
dtype: float64
```

### Error Rate Parity#

The *error rate parity* requires that the error rates should be
the same across all groups. For a classifier \(h(X)\)
this means that

In this case, the utility is equal to 1 if \(h(X)\ne Y\) and equal to
0 if \(h(X)=Y\), and so large value of utility here actually correspond
to poor outcomes. The difference-based relaxation specifies that
the error rate of any given group should not deviate from
the overall error rate by more than the value of `difference_bound`

.

```
>>> from fairlearn.reductions import ErrorRateParity
>>> from sklearn.metrics import accuracy_score
>>> accuracy_summary = MetricFrame(metrics=accuracy_score,
... y_true=y_true,
... y_pred=y_pred,
... sensitive_features=sensitive_features)
>>> accuracy_summary.overall
0.6
>>> accuracy_summary.by_group
sensitive_feature_0
a 0.8
b 0.4
Name: accuracy_score, dtype: float64
>>> erp = ErrorRateParity(difference_bound=0.01)
>>> erp.load_data(X, y_true, sensitive_features=sensitive_features)
>>> erp.gamma(lambda X: y_pred)
sign event group_id
+ all a -0.2
b 0.2
- all a 0.2
b -0.2
dtype: float64
```

Note

When providing `ErrorRateParity`

to mitigation algorithms, only use
the constructor. The mitigation algorithm itself then invokes `load_data`

.
The example uses `load_data`

to illustrate how `ErrorRateParity`

instantiates inequalities from Fairness constraints for binary classification.

Alternatively, error rate parity can be relaxed via ratio constraints as

with a `ratio_bound`

\(r\). The usage is identical with other
constraints:

```
>>> from fairlearn.reductions import ErrorRateParity
>>> erp = ErrorRateParity(ratio_bound=0.9, ratio_bound_slack=0.01)
>>> erp.load_data(X, y_true, sensitive_features=sensitive_features)
>>> erp.gamma(lambda X: y_pred)
sign event group_id
+ all a -0.22
b 0.14
- all a 0.16
b -0.24
dtype: float64
```

### Control features#

The above examples of `Moment`

(Demographic Parity,
True and False Positive Rate Parity,
Equalized Odds and Error Rate Parity) all support the concept
of *control features* when applying their fairness constraints.
A control feature stratifies the dataset, and applies the fairness constraint
within each stratum, but not between strata.
One case this might be useful is a loan scenario, where we might want
to apply a mitigation for the sensitive features while controlling for some
other feature(s).
This should be done with caution, since the control features may have a
correlation with the sensitive features due to historical biases.
In the loan scenario, we might choose to control for income level, on the
grounds that higher income individuals are more likely to be able to repay
a loan.
However, due to historical bias, there is a correlation between the income level
of individuals and their race and gender.

Control features modify the above equations. Consider a control feature value, drawn from a set of valid values (that is, \(c \in \mathcal{C}\)). The equation given above for Demographic Parity will become:

The other constraints acquire similar modifications.

## Fairness constraints for multiclass classification#

Reductions approaches do not support multiclass classification yet at this point. If this is an important scenario for you please let us know!

## Fairness constraints for regression#

The performance objective in the regression scenario is to minimize the
loss of our regressor \(h\). The loss can be expressed as
`SquareLoss`

or `AbsoluteLoss`

. Both take constructor arguments
`min_val`

and `max_val`

that define the value range within which
the loss is evaluated. Values outside of the value range get clipped.

```
>>> from fairlearn.reductions import SquareLoss, AbsoluteLoss, ZeroOneLoss
>>> y_true = [0, 0.3, 1, 0.9]
>>> y_pred = [0.1, 0.2, 0.9, 1.3]
>>> SquareLoss(0, 2).eval(y_true, y_pred)
array([0.01, 0.01, 0.01, 0.16])
>>> # clipping at 1 reduces the error for the fourth entry
>>> SquareLoss(0, 1).eval(y_true, y_pred)
array([0.01, 0.01, 0.01, 0.01])
>>> AbsoluteLoss(0, 2).eval(y_true, y_pred)
array([0.1, 0.1, 0.1, 0.4])
>>> AbsoluteLoss(0, 1).eval(y_true, y_pred)
array([0.1, 0.1, 0.1, 0.1])
>>> # ZeroOneLoss is identical to AbsoluteLoss(0, 1)
>>> ZeroOneLoss().eval(y_true, y_pred)
array([0.1, 0.1, 0.1, 0.1])
```

When using Fairlearn’s reduction techniques for regression it’s required to
specify the type of loss by passing the corresponding loss object when
instantiating the object that represents our fairness constraint. The only
supported type of constraint at this point is `BoundedGroupLoss`

.

### Bounded Group Loss#

*Bounded group loss* requires the loss of each group to be below a
user-specified amount \(\zeta\). If \(\zeta\) is chosen reasonably
small the losses of all groups are very similar.
Formally, a predictor \(h\) satisfies bounded group loss at level
\(\zeta\) under a distribution over \((X, A, Y)\) if

In the example below we use `BoundedGroupLoss`

with
`ZeroOneLoss`

on two groups `"a"`

and `"b"`

.
Group `"a"`

has an average loss of \(0.05\), while group
`"b"`

’s average loss is \(0.5\).

```
>>> from fairlearn.reductions import BoundedGroupLoss, ZeroOneLoss
>>> from sklearn.metrics import mean_absolute_error
>>> bgl = BoundedGroupLoss(ZeroOneLoss(), upper_bound=0.1)
>>> X = np.array([[0], [1], [2], [3]])
>>> y_true = np.array([0.3, 0.5, 0.1, 1.0])
>>> y_pred = np.array([0.3, 0.6, 0.6, 0.5])
>>> sensitive_features = np.array(["a", "a", "b", "b"])
>>> mae_frame = MetricFrame(metrics=mean_absolute_error,
... y_true=y_true,
... y_pred=y_pred,
... sensitive_features=pd.Series(sensitive_features, name="SF 0"))
>>> mae_frame.overall
0.275
>>> mae_frame.by_group
SF 0
a 0.05
b 0.50
Name: mean_absolute_error, dtype: float64
>>> bgl.load_data(X, y_true, sensitive_features=sensitive_features)
>>> bgl.gamma(lambda X: y_pred)
group_id
a 0.05
b 0.50
Name: loss, dtype: float64
```

Note

In the example above the `BoundedGroupLoss`

object does not use the
`upper_bound`

argument. It is only used by reductions techniques
during the unfairness mitigation. As a result the constraint violation
detected by `gamma`

is identical to the mean absolute error.