fairlearn.datasets.fetch_credit_card#

fairlearn.datasets.fetch_credit_card(*, cache=True, data_home=None, as_frame=True, return_X_y=False)[source]#

Load the ‘Default of Credit Card clients’ dataset (binary classification).

Samples total	30000
Dimensionality	23
Features	real
Classes	2

Source: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients I-Cheng Yeh and Che-hui Lien, “The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients”, Expert Systems with Applications, 36(2), 2473-2480, 2009

Parameters:

cacheboolean, default=True

Whether to cache downloaded datasets using joblib

data_homeoptional, default: None

Specify another download and cache folder for the datasets. By default, all scikit-learn data is stored in ‘~/.fairlearn-data’ subfolders.

as_frameboolean, default=True

If True,: Returns the data as Pandas DataFrame, and the target is returned as a Pandas Series.
If False,: Returns a scikit-learn Bunch object with frame attribute containing the data and the target.

Changed in version 0.9.0: Default value changed to True.

return_X_yboolean, default=False.

If True,: returns (data.data, data.target)
Else,: return Sci-kit Learn Bunch object

Returns:

datasetclass:~sklearn.utils.Bunch

Dictionary-like object, with the following attributes.

dataNumPy Array or Pandas DataFrame, Shape (30000, 23): Each row corresponds to the 23 feature values in order. If as_frame is True, data is a Pandas DataFrame
targetNumPy Array or Pandas Series, Shape (30000,): Each value represents whether an applicant defaulted on credit loan. If as_frame is True, target is a Pandas Series.
feature_namesList of Strings, Length 23: Array of ordered feature names used in the dataset.
DESCRstring: Description of the UCI Default of Credit Card
categoriesdict or None: Maps each categorical feature name to a list of values, such that the value encoded as i is ith in the list. If as_frame is True, this is None.
framepandas DataFrame: Only present when as_frame is True. DataFrame with data and target.

(data, target)tuple if return_X_y is True

Notes

Our API largely follows the API of sklearn.datasets.fetch_openml().