Advanced Usage of MetricFrame#
In this section, we will discuss how MetricFrame
can
be used in more sophisticated scenarios.
All code examples will use the following definitions:
>>> y_true = [0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1]
>>> y_pred = [0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0]
>>> sf_data = ['b', 'b', 'a', 'b', 'b', 'c', 'c', 'c', 'a',
... 'a', 'c', 'a', 'b', 'c', 'c', 'b', 'c', 'c']
>>> from fairlearn.metrics import MetricFrame
Extra Arguments to Metric functions#
The metric functions supplied to MetricFrame
might require additional
arguments.
These fall into two categories: ‘scalar’ arguments (which affect the operation
of the metric function), and ‘per-sample’ arguments (such as sample weights).
Different approaches are required to use each of these.
Scalar Arguments#
We do not directly support scalar arguments for the metric functions.
If these are required, then use functools.partial()
to prebind the
required arguments to the metric function:
>>> import functools
>>> from sklearn.metrics import fbeta_score
>>> fbeta_06 = functools.partial(fbeta_score, beta=0.6)
>>> metric_beta = MetricFrame(metrics=fbeta_06,
... y_true=y_true,
... y_pred=y_pred,
... sensitive_features=sf_data)
>>> metric_beta.overall.item()
0.56983...
>>> metric_beta.by_group
sensitive_feature_0
a 0.365591
b 0.850000
c 0.468966
Name: metric, dtype: float64
Per-Sample Arguments#
If there are per-sample arguments (such as sample weights), these can also be
provided in a dictionary via the sample_params
argument.
The keys of this dictionary are the argument names, and the values are 1-D
arrays equal in length to y_true
etc.:
>>> from sklearn.metrics import recall_score
>>> import pandas as pd
>>> pd.set_option('display.max_columns', 20)
>>> pd.set_option('display.width', 80)
>>> s_w = [1, 2, 1, 3, 2, 3, 1, 2, 1, 2, 3, 1, 2, 3, 2, 3, 1, 1]
>>> s_p = { 'sample_weight':s_w }
>>> weighted = MetricFrame(metrics=recall_score,
... y_true=y_true,
... y_pred=y_pred,
... sensitive_features=pd.Series(sf_data, name='SF 0'),
... sample_params=s_p)
>>> weighted.overall.item()
0.45...
>>> weighted.by_group
SF 0
a 0.500000
b 0.583333
c 0.250000
Name: recall_score, dtype: float64
If multiple metrics are being evaluated, then sample_params
becomes a
dictionary of dictionaries.
The first key to this dictionary is the name of the metric as specified
in the metrics
argument.
The keys of the inner dictionary are the argument names, and the values
are the 1-D arrays of sample parameters for that metric.
For example:
>>> s_w_2 = [3, 1, 2, 3, 2, 3, 1, 4, 1, 2, 3, 1, 2, 1, 4, 2, 2, 3]
>>> metrics = {
... 'recall' : recall_score,
... 'recall_weighted' : recall_score,
... 'recall_weight_2' : recall_score
... }
>>> s_p = {
... 'recall_weighted' : { 'sample_weight':s_w },
... 'recall_weight_2' : { 'sample_weight':s_w_2 }
... }
>>> weighted = MetricFrame(metrics=metrics,
... y_true=y_true,
... y_pred=y_pred,
... sensitive_features=pd.Series(sf_data, name='SF 0'),
... sample_params=s_p)
>>> weighted.overall
recall 0.500000
recall_weighted 0.454545
recall_weight_2 0.458333
dtype: float64
>>> weighted.by_group
recall recall_weighted recall_weight_2
SF 0
a 0.5 0.500000 0.666667
b 0.6 0.583333 0.600000
c 0.4 0.250000 0.272727
Note that there is no concept of a ‘global’ sample parameter (e.g. a set of sample weights to be applied for all metric functions). In such a case, the sample parameter in question must be repeated in the nested dictionary for each metric function.
No y_true or y_pred#
In some cases, a metric may not have y_true or y_pred arguments, or even
either of them.
One example of this is the selection rate metric, which only considers
the y_pred values (selection rate is used when computing
demographic parity).
However, MetricFrame
requires all supplied metric functions to
conform to the scikit-learn metric paradigm, where the first two arguments
to the metric function are the y_true and y_pred arrays.
The workaround in this case is to supply a dummy argument.
This is the approach we use in selection_rate()
, which simply ignores
the supplied y_true argument.
When invoking MetricFrame, a y_true array of the appropriate length
must still be supplied.
For example:
>>> from fairlearn.metrics import selection_rate
>>> dummy_y_true = [x for x in range(len(y_pred))]
>>> sel_rate_frame = MetricFrame(metrics=selection_rate,
... y_true=dummy_y_true,
... y_pred=y_pred,
... sensitive_features=pd.Series(sf_data, name='SF 0'))
>>> sel_rate_frame.overall.item()
0.55555...
>>> sel_rate_frame.by_group
SF 0
a 0.75
b 0.50
c 0.50
Name: selection_rate, dtype: float64
More Complex Metrics#
Metric functions often return a single scalar value based on arguments which are vectors of
scalars.
This is how MetricFrame
was introduced in the Performing a Fairness Assessment
section above.
However, this need not be the case - indeed, we were rather vague about the
contents of the input vectors and the return value of the metric function.
We will now show how to use MetricFrame
in cases where the result is not
a scalar, and when the inputs are not vectors of scalars.
Non-Scalar Results from Metric Functions#
Metric functions need not return a scalar value.
A straightforward example of this is the confusion matrix.
Such return values are fully supported by MetricFrame
:
>>> from sklearn.metrics import confusion_matrix
>>> mf_conf = MetricFrame(
... metrics=confusion_matrix,
... y_true=y_true,
... y_pred=y_pred,
... sensitive_features=sf_data
... )
>>> mf_conf.overall
array([[2, 4],
[6, 6]]...)
>>> mf_conf.by_group
sensitive_feature_0
a [[0, 2], [1, 1]]
b [[1, 0], [2, 3]]
c [[1, 2], [3, 2]]
Name: confusion_matrix, dtype: object
Obviously for such cases, operations such as MetricFrame.difference()
have no meaning.
However, if scalar-returning metrics are also present, they will still be calculated:
>>> mf_conf_recall = MetricFrame(
... metrics={ 'conf_mat':confusion_matrix, 'recall':recall_score },
... y_true=y_true,
... y_pred=y_pred,
... sensitive_features=sf_data
... )
>>> mf_conf_recall.overall
conf_mat [[2, 4], [6, 6]]
recall 0.5
dtype: object
>>> mf_conf_recall.by_group
conf_mat recall
sensitive_feature_0
a [[0, 2], [1, 1]] 0.5
b [[1, 0], [2, 3]] 0.6
c [[1, 2], [3, 2]] 0.4
>>> mf_conf_recall.difference()
conf_mat NaN
recall 0.2
dtype: float64
We see that the difference between group recall scores has been calculated, while a value of
None
has been returned for the meaningless ‘maximum difference between two confusion matrices’
entry.
Inputs are Arrays of Objects#
MetricFrame
can also handle cases when the \(Y_{true}\) and/or \(Y_{pred}\) vectors
are not vectors of scalars.
It is the metric function(s) which gives meaning to these values - MetricFrame
itself
just slices the vectors up according to the sensitive feature(s) and the control feature(s).
As a toy example, suppose that our y
values (both true and predicted) are tuples representing
the dimensions of a rectangle.
For some reason known only to our fevered imagination (although it might possibly be due to a
desire for a really simple example), we are interested in the areas of these rectangles.
In particular, we want to calculate the mean of the area ratios. That is:
>>> import numpy as np
>>> def area_metric(y_true, y_pred):
... def calc_area(a):
... return a[0] * a[1]
...
... y_ts = np.asarray([calc_area(x) for x in y_true])
... y_ps = np.asarray([calc_area(x) for x in y_pred])
...
... return np.mean(y_ts / y_ps)
This is a perfectly good metric for MetricFrame
, provided we supply appropriate
inputs.
>>> y_rect_true = [(4,9), (3,8), (2,10)]
>>> y_rect_pred = [(1,12), (2,1), (5, 2)]
>>> rect_groups = { 'sf_0':['a', 'a', 'b'] }
>>>
>>> mf_non_scalar = MetricFrame(
... metrics=area_metric,
... y_true=y_rect_true,
... y_pred=y_rect_pred,
... sensitive_features=rect_groups
... )
>>> print(mf_non_scalar.overall)
5.6666...
>>> print(mf_non_scalar.by_group)
sf_0
a 7.5
b 2.0
Name: area_metric, dtype: float64
For a more concrete example, consider an image recognition algorithm which draws a bounding box around some region of interest. We will want to compare the ‘true’ bounding boxes (perhaps from human annotators) with the ones predicted by our model. A straightforward metric for this purpose is the IoU or ‘intersection over union.’ As the name implies, this metric takes two rectangles, and computes the area of their intersection and divides it by the area of their union. If the two rectangles are disjoint, then the IoU will be zero. If the two rectangles are identical, then the IoU will be one. This is presented in full in our example notebook.