fairlearn.datasets.fetch_acs_income#

fairlearn.datasets.fetch_acs_income(*, cache=True, data_home=None, as_frame=True, return_X_y=False, states=None)[source]#

Load the ACS Income dataset (regression).

Download it if necessary.

Samples total

1664500

Dimensionality

10

Features

numeric, categorical

Target

numeric

Source: Paper: Ding et al. (2021) [1]

and corresponding repository zykls/folktables

Read more in the User Guide.

New in version 0.8.0.

Parameters
  • cache (bool, default=True) – Whether to cache downloaded datasets using joblib.

  • data_home (str, default=None) – Specify another download and cache folder for the datasets. By default, all fairlearn data is stored in ‘~/.fairlearn-data’ subfolders.

  • as_frame (bool, default=True) –

    If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric, string or categorical). The target is a pandas DataFrame or Series depending on the number of target_columns. The Bunch will contain a frame attribute with the target and the data. If return_X_y is True, then (data, target) will be pandas DataFrames or Series as describe above.

    Changed in version 0.9.0: Default value changed to True.

  • return_X_y (bool, default=False) – If True, returns (data.data, data.target) instead of a Bunch object.

  • states (list, default=None) – List containing two letter (capitalized) state abbreviations. If None, data from all 50 US states and Puerto Rico will be returned. Note that Puerto Rico is the only US territory included in this dataset. The state abbreviations and codes can be found on page 1 of the data dictionary at ACS PUMS [2].

Returns

  • dataset (Bunch) – Dictionary-like object, with the following attributes.

    datandarray, shape (1664500, 10)

    Each row corresponding to the 10 feature values in order. If as_frame is True, data is a pandas object.

    targetnumpy array of shape (1664500,)

    Integer denoting each person’s annual income. A threshold can be applied as a postprocessing step to frame this as a binary classification problem. If as_frame is True, target is a pandas object.

    feature_nameslist of length 10

    Array of ordered feature names used in the dataset.

    DESCRstring

    Description of the ACSIncome dataset.

    categoriesdict or None

    Maps each categorical feature name to a list of values, such that the value encoded as i is ith in the list. If as_frame is True, this is None.

    framepandas DataFrame

    Only present when as_frame is True. DataFrame with data and target.

  • (data, target) (tuple if return_X_y is True)

Notes

Our API largely follows the API of sklearn.datasets.fetch_openml().

References

1

Ding, F., Hardt, M., Miller, J., & Schmidt, L. (2021). “Retiring Adult: New Datasets for Fair Machine Learning.” Advances in Neural Information Processing Systems, 34.

2

“2018 ACS PUMS Data Dictionary”. United States Census Bureau.