- fairlearn.datasets.fetch_boston(*, cache=True, data_home=None, as_frame=True, return_X_y=False, warn=True)[source]#
Load the boston housing dataset (regression).
Download it if necessary.
real 5. - 50.
Source: OpenML  Paper: D. Harrison (1978) 
The Boston house-price data of D. Harrison, and D.L. Rubinfeld .
Referenced in Belsley, Kuh & Welsch, ‘Regression diagnostics…’, Wiley, 1980. N.B. .
This dataset has known fairness issues . There’s a “lower status of population” (LSTAT) parameter that you need to look out for and a column that is a derived from the proportion of people with a black skin color that live in a neighborhood (B) . See the references at the bottom for more detailed information.
Here’s a table of all the variables in order:
per capita crime rate by town
proportion of residential land zoned for lots over 25,000 sq.ft.
proportion of non-retail business acres per town
Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
nitric oxides concentration (parts per 10 million)
average number of rooms per dwelling
proportion of owner-occupied units built prior to 1940
weighted distances to five Boston employment centres
index of accessibility to radial highways
full-value property-tax rate per $10,000
pupil-teacher ratio by town
1000(Bk - 0.63)^2 where Bk is the proportion of Black people by town
% lower status of the population
Median value of owner-occupied homes in $1000’s
Read more in the User Guide.
New in version 0.5.0.
cache (bool, default=True) – Whether to cache downloaded datasets using joblib.
data_home (str, default=None) – Specify another download and cache folder for the datasets. By default, all fairlearn data is stored in ‘~/.fairlearn-data’ subfolders.
as_frame (bool, default=True) –
If True, the data is a pandas DataFrame including columns with appropriate dtypes (numeric, string or categorical). The target is a pandas DataFrame or Series depending on the number of target_columns. The Bunch will contain a
frameattribute with the target and the data. If
return_X_yis True, then
(data, target)will be pandas DataFrames or Series as describe above.
Changed in version 0.9.0: Default value changed to True.
return_X_y (bool, default=False) – If True, returns
(data.data, data.target)instead of a Bunch object.
warn (bool, default=True) – If True, it raises an extra warning to make users aware of the unfairness aspect of this dataset.
Bunch) – Dictionary-like object, with the following attributes.
- datandarray, shape (506, 13)
Each row corresponding to the 13 feature values in order. If
datais a pandas object.
- targetnumpy array of shape (506,)
Each value corresponds to the average house value in units of 100,000. If
targetis a pandas object.
- feature_nameslist of length 13
Array of ordered feature names used in the dataset.
Description of the Boston housing dataset.
- categoriesdict or None
Maps each categorical feature name to a list of values, such that the value encoded as i is ith in the list. If
as_frameis True, this is None.
- framepandas DataFrame
Only present when
as_frameis True. DataFrame with
(data, target) (tuple if
Our API largely follows the API of
sklearn.datasets.fetch_openml(). This dataset consists of 506 samples and 13 features. It is notorious for the fairness issues related to the B column. There’s more information in the references.
J. Vanschoren, “boston,” OpenML, 29-Sep-2014. [Online]. Available: https://www.openml.org/d/531.
D. Harrison and D. L. Rubinfeld, “Hedonic housing prices and the demand for clean air,” Journal of Environmental Economics and Management, vol. 5, no. 1, pp. 81–102, Mar. 1978.
D. A. Belsley, E. Kuh, and R. E. Welsch, Regression diagnostics identifying influential data and sources of collinearity. Hoboken, NJ, NJ: Wiley-Interscience, 1980.
J. Sykes, “- B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town · Issue #16155 · scikit-learn/scikit-learn,” GitHub, 18-Jan-2020. [Online]. Available: scikit-learn/scikit-learn#16155.
M. Carlisle, “racist data destruction?,” Medium, 13-Jun-2019. [Online]. Available: https://medium.com/@docintangible/racist-data-destruction-113e3eff54a8.