fairlearn.datasets.fetch_acs_income#
- fairlearn.datasets.fetch_acs_income(*, cache=True, data_home=None, as_frame=True, return_X_y=False, states=None)[source]#
Load the ACS Income dataset (regression).
Download it if necessary.
Samples total
1664500
Dimensionality
10
Features
numeric, categorical
Target
numeric
Source:
Paper: Ding et al. (2021) [1]
Repository: zykls/folktables
Read more in the User Guide.
New in version 0.8.0.
- Parameters:
- cachebool, default=True
Whether to cache downloaded datasets using joblib.
- data_homestr, default=None
Specify another download and cache folder for the datasets. By default, all fairlearn data is stored in ‘~/.fairlearn-data’ subfolders.
- as_framebool, default=True
If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric, string or categorical). The target is a pandas DataFrame or Series depending on the number of target_columns. The Bunch will contain a
frame
attribute with the target and the data. Ifreturn_X_y
is True, then(data, target)
will be pandas DataFrames or Series as describe above.Changed in version 0.9.0: Default value changed to True.
- return_X_ybool, default=False
If True, returns
(data.data, data.target)
instead of a Bunch object.- states: list, default=None
List containing two letter (capitalized) state abbreviations. If None, data from all 50 US states and Puerto Rico will be returned. Note that Puerto Rico is the only US territory included in this dataset. The state abbreviations and codes can be found on page 1 of the data dictionary at ACS PUMS [2].
- Returns:
- dataset
Bunch
Dictionary-like object, with the following attributes.
- datandarray, shape (1664500, 10)
Each row corresponding to the 10 feature values in order. If
as_frame
is True,data
is a pandas object.- targetnumpy array of shape (1664500,)
Integer denoting each person’s annual income. A threshold can be applied as a postprocessing step to frame this as a binary classification problem. If
as_frame
is True,target
is a pandas object.- feature_nameslist of length 10
Array of ordered feature names used in the dataset.
- DESCRstring
Description of the ACSIncome dataset.
- categoriesdict or None
Maps each categorical feature name to a list of values, such that the value encoded as i is ith in the list. If
as_frame
is True, this is None.- framepandas DataFrame
Only present when
as_frame
is True. DataFrame withdata
andtarget
.
- (data, target)tuple if
return_X_y
is True
- dataset
Notes
Our API largely follows the API of
sklearn.datasets.fetch_openml()
.References