Big Data: Advantages and Shortfalls

As new resourceful interventions are being introduced, prediction of diabetes progression and end-organ failure, become a necessity.  Risk stratification can be promoted by developing new biomarkers or by using artificial intelligence algorithmic tools to mine readily available and under-utilized clinical data.  The latter reside in Electronic Medical Records (EMR), and constitute what is commonly termed big data.

We identified 554,110 diabetic patients and 645,077 pre-diabetic individuals (NHS).  All those had 2 glucose and/or one HgbA1C tests at minimum, a diagnostic code and/or listed use of a hypoglycemic drug.

Two models were trained.  First model was trained to identify pre-diabetics prone to progress to diabetes within 1-year from index date.  The second model was trained to identify diabetic patients prone to present with microalbumin above 300 mg/g or eGFR below 45 within 1-year.

Performance of first model (incorporating tens of signals to create over 900 features to include historical lab results) was compared to that of a logistic regression model (incorporating sex, age, glucose and HgbA1C).  It outperformed the logistic one at any given sensitivity by a 50-100% increase of PPV.  Major contributors to this performance were glucose, HabA1C, BMI, age & sex as expected.  Minor contributors included HDL, triglycerides, ALT, WBC, RBC, GGT and drugs.

Performance of second model was compared again to that of a logistic regression model (incorporating sex, age, eGFR, creatinine and urinalysis).  This outperformed the logistic one at any given sensitivity by a 35-70% increase of PPV.  The major contributors to this performance were creatinine, eGFR, urinalysis, age & sex as expected.  Minor contributors included HDL, triglycerides, albumin, WBC, BMI, glucose, HgbA1C, Hgb, and drugs.

Use of ML-based tools allows to decrease number of those needed to treat and to increase capture of those at risk, thus reducing morbidity & mortality and promoting cost effectiveness plans.

Presented by: Ran Goshen