In this section, we will describe the steps involved in performing a fairness assessment, and introduce some widely (if occasionally incautiously) used fairness metrics, such as demographic parity and equalized odds. We will show how MetricFrame can be used to evaluate the metrics identified during the course of a fairness assessment.

In the mathematical definitions below, \(X\) denotes a feature vector used for predictions, \(A\) will be a single sensitive feature (such as age or race), and \(Y\) will be the true label. Fairness metrics are phrased in terms of expectations with respect to the distribution over \((X,A,Y)\). Note that \(X\) and \(A\) may or may not share columns, dependent on whether the model is allowed to ‘see’ the sensitive features. When we need to refer to particular values, we will use lower case letters; since we are going to be comparing between groups identified by the sensitive feature, \(\forall a \in A\) will be appearing regularly to indicate that a property holds for all identified groups.

Fairlearn dashboard#

The Fairlearn dashboard was a Jupyter notebook widget for assessing how a model’s predictions impact different groups (e.g., different ethnicities), and also for comparing multiple models along different fairness and performance metrics.


The FairlearnDashboard is no longer being developed as part of Fairlearn. For more information on how to use it refer to microsoft/responsible-ai-toolbox. Fairlearn provides some of the existing functionality through matplotlib-based visualizations. Refer to the Plotting section.