fairlearn.datasets.fetch_acs_income#
- fairlearn.datasets.fetch_acs_income(*, cache=True, data_home=None, as_frame=True, return_X_y=False, states=None)[source]#
- Load the ACS Income dataset (regression). - Download it if necessary. - Samples total - 1664500 - Dimensionality - 10 - Features - numeric, categorical - Target - numeric - Source: - Paper: Ding et al. (2021) [1] 
- Repository: zykls/folktables 
 - Read more in the User Guide. - New in version 0.8.0. - Parameters:
- cache (bool, default=True) – Whether to cache downloaded datasets using joblib. 
- data_home (str, default=None) – Specify another download and cache folder for the datasets. By default, all fairlearn data is stored in ‘~/.fairlearn-data’ subfolders. 
- as_frame (bool, default=True) – - If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric, string or categorical). The target is a pandas DataFrame or Series depending on the number of target_columns. The Bunch will contain a - frameattribute with the target and the data. If- return_X_yis True, then- (data, target)will be pandas DataFrames or Series as describe above.- Changed in version 0.9.0: Default value changed to True. 
- return_X_y (bool, default=False) – If True, returns - (data.data, data.target)instead of a Bunch object.
- states (list, default=None) – List containing two letter (capitalized) state abbreviations. If None, data from all 50 US states and Puerto Rico will be returned. Note that Puerto Rico is the only US territory included in this dataset. The state abbreviations and codes can be found on page 1 of the data dictionary at ACS PUMS [2]. 
 
- Returns:
- dataset ( - Bunch) – Dictionary-like object, with the following attributes.- datandarray, shape (1664500, 10)
- Each row corresponding to the 10 feature values in order. If - as_frameis True,- datais a pandas object.
- targetnumpy array of shape (1664500,)
- Integer denoting each person’s annual income. A threshold can be applied as a postprocessing step to frame this as a binary classification problem. If - as_frameis True,- targetis a pandas object.
- feature_nameslist of length 10
- Array of ordered feature names used in the dataset. 
- DESCRstring
- Description of the ACSIncome dataset. 
- categoriesdict or None
- Maps each categorical feature name to a list of values, such that the value encoded as i is ith in the list. If - as_frameis True, this is None.
- framepandas DataFrame
- Only present when - as_frameis True. DataFrame with- dataand- target.
 
- (data, target) (tuple if - return_X_yis True)
 
 - Notes - Our API largely follows the API of - sklearn.datasets.fetch_openml().- References