fairlearn.datasets.fetch_bank_marketing#

fairlearn.datasets.fetch_bank_marketing(*, cache=True, data_home=None, as_frame=True, return_X_y=False)[source]#

Load the UCI bank marketing dataset (binary classification).

Download it if necessary.

Samples total

45211

Dimensionality

16

Features

numeric, categorical

Classes

2

Source:

  • UCI Repository [1]

  • Paper: Moro et al. [2]

The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (or not) subscribed.

The classification goal is to predict if the client will subscribe a term deposit (variable y).

New in version 0.5.0.

Parameters:
cachebool, default=True

Whether to cache downloaded datasets using joblib.

data_homestr, default=None

Specify another download and cache folder for the datasets. By default, all fairlearn data is stored in ‘~/.fairlearn-data’ subfolders.

as_framebool, default=True

If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric, string or categorical). The target is a pandas DataFrame or Series depending on the number of target_columns. The Bunch will contain a frame attribute with the target and the data. If return_X_y is True, then (data, target) will be pandas DataFrames or Series as describe above.

Changed in version 0.9.0: Default value changed to True.

return_X_ybool, default=False

If True, returns (data.data, data.target) instead of a Bunch object.

Returns:
datasetBunch

Dictionary-like object, with the following attributes.

datandarray, shape (45211, 16)

Each row corresponding to the 16 feature values in order. If as_frame is True, data is a pandas object.

targetnumpy array of shape (45211,)

Each value represents whether the client subscribed a term deposit which is ‘yes’ if the client subscribed and ‘no’ otherwise. If as_frame is True, target is a pandas object.

feature_nameslist of length 16

Array of ordered feature names used in the dataset.

DESCRstring

Description of the UCI bank marketing dataset.

categoriesdict or None

Maps each categorical feature name to a list of values, such that the value encoded as i is ith in the list. If as_frame is True, this is None.

framepandas DataFrame

Only present when as_frame is True. DataFrame with data and target.

(data, target)tuple if return_X_y is True

Notes

Our API largely follows the API of sklearn.datasets.fetch_openml().

References