Diabetes 130-Hospitals Dataset#
Introduction#
The Diabetes 130-Hospitals Dataset consists of 10 years worth of clinical care data at 130 US hospitals and integrated delivery networks [1]. Each record represents the hospital admission record for a patient diagnosed with diabetes whose stay lasted between one to fourteen days. Also, laboratory tests were performed and medications were administered during the encounter. The features describing each encounter include demographics, diagnoses, diabetic medications, number of visits in the year preceding the encounter, and payer information, as well as whether the patient was readmitted after release, and whether the readmission occurred within 30 days of the release.
Strack et al. used the data to investigate the impact of HbA1c measurement on hospital readmission rates. The data was collected from the Health Facts database, which is a national data warehouse in the United States consisting of clinical records from hospitals throughout the United States. Once Strack et al. completed their research, the dataset was submitted to the UCI Machine Learning Repository such that it became available for later use.
Dataset Description#
The original data can be found in the UCI Repository [2]. This version of the dataset was derived by the Fairlearn team for the SciPy 2021 tutorial “Fairness in AI Systems: From social context to practice using Fairlearn”. In this version, the target variable “readmitted” is binarized into whether the patient was readmitted within thirty days. The full dataset pre-processing script can be found on GitHub. The dataset contains 101,766 rows. Each row describes a patient encounter and contains 25 features, which we describe below:
Column name |
Description |
---|---|
race |
|
gender |
|
age |
|
discharge_disposition_id |
|
admission_source_id |
|
time_in_hospital |
Integer number of days between admission and discharge. |
medical_specialty |
|
num_lab_procedures |
Integer number of lab tests performed during the encounter |
num_procedures |
Integer number of procedures (other than lab tests) performed during the encounter |
num_medications |
Integer number of distinct generic names administered during the encounter |
primary_diagnosis |
|
number_diagnoses |
Integer number of diagnoses. |
max_glu_serum |
|
A1Cresult |
|
insulin |
|
change |
|
diabetesMed |
Binary attribute indicating whether there was any diabetic medication prescribed. |
medicare |
Binary attribute indicating whether the patient had medicare as insurance. |
medicaid |
Binary attribute indicating whether the patient had medicaid as insurance. |
had_emergency |
Binary attribute indicating whether the patient had an emergency in the prior year. |
had_inpatient_days |
Binary attribute indicating whether the patient had inpatient days in the prior year. |
had_outpatient_days |
Binary attribute indicating whether the patient had outpatient days in the prior year. |
readmitted |
|
readmit_binary |
Binary attribute indicating whether the patient was readmitted. Can also be used as a target variable. |
The default target label is given by readmit_30_days. However, the “readmitted” or “readmit_binary” attributes can also be used as a target, depending on what you are interested in.
Column name |
Description |
---|---|
readmit_30_days |
Binary attribute indicating whether the patient was readmitted within 30 days. |
Using the dataset#
The dataset can be loaded via the fairlearn.datasets.fetch_diabetes_hospital()
function. By default, the dataset is returned as a pandas.DataFrame
.