fairlearn.datasets.fetch_credit_card#

fairlearn.datasets.fetch_credit_card(*, cache=True, data_home=None, as_frame=True, return_X_y=False)[source]#

Load the ‘Default of Credit Card clients’ dataset (binary classification).

Samples total

30000

Dimensionality

23

Features

real

Classes

2

Source: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients I-Cheng Yeh and Che-hui Lien, “The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients”, Expert Systems with Applications, 36(2), 2473-2480, 2009

Parameters:
cacheboolean, default=True

Whether to cache downloaded datasets using joblib

data_homeoptional, default: None

Specifiy another download and cache folder for the datasets. By default, all scikit-learn data is stored in ‘~/.fairlearn-data’ subfolders.

as_frameboolean, default=True
If True,

Returns the data as Pandas DataFrame, and the target is returned as a Pandas Series.

If False,

Returns a scikit-learn Bunch object with frame attribute containing the data and the target.

Changed in version 0.9.0: Default value changed to True.

return_X_yboolean, default=False.
If True,

returns (data.data, data.target)

Else,

return Sci-kit Learn Bunch object

Returns:
datasetclass:~sklearn.utils.Bunch

Dictionary-like object, with the following attributes.

dataNumPy Array or Pandas DataFrame, Shape (30000, 23)

Each row corresponds to the 23 feature values in order. If as_frame is True, data is a Pandas DataFrame

targetNumPy Array or Pandas Series, Shape (30000,)

Each value represents whether an applicant defaulted on credit loan. If as_frame is True, target is a Pandas Series.

feature_namesList of Strings, Length 23

Array of ordered feature names used in the dataset.

DESCRstring

Description of the UCI Default of Credit Card

categoriesdict or None

Maps each categorical feature name to a list of values, such that the value encoded as i is ith in the list. If as_frame is True, this is None.

framepandas DataFrame

Only present when as_frame is True. DataFrame with data and target.

(data, target)tuple if return_X_y is True

Notes

Our API largely follows the API of sklearn.datasets.fetch_openml().