Prediction of gestational diabetes based on nationwide electronic health records

Play all audios:

ABSTRACT Gestational diabetes mellitus (GDM) poses increased risk of short- and long-term complications for mother and offspring1–4. GDM is typically diagnosed at 24–28 weeks of gestation,

but earlier detection is desirable as this may prevent or considerably reduce the risk of adverse pregnancy outcomes5,6. Here we used a machine-learning approach to predict GDM on

retrospective data of 588,622 pregnancies in Israel for which comprehensive electronic health records were available. Our models predict GDM with high accuracy even at pregnancy initiation

(area under the receiver operating curve (auROC) = 0.85), substantially outperforming a baseline risk score (auROC = 0.68). We validated our results on both a future validation set and a

geographical validation set from the most populated city in Israel, Jerusalem, thereby emulating real-world performance. Interrogating our model, we uncovered previously unreported risk

factors, including results of previous pregnancy glucose challenge tests. Finally, we devised a simpler model based on just nine questions that a patient could answer, with only a modest

reduction in accuracy (auROC = 0.80). Overall, our models may allow early-stage intervention in high-risk women, as well as a cost-effective screening approach that could avoid the need for

glucose tolerance tests by identifying low-risk women. Future prospective studies and studies on additional populations are needed to assess the real-world clinical utility of the model.

Access through your institution Buy or subscribe This is a preview of subscription content, access via your institution ACCESS OPTIONS Access through your institution Access Nature and 54

other Nature Portfolio journals Get Nature+, our best-value online-access subscription $32.99 / 30 days cancel any time Learn more Subscribe to this journal Receive 12 print issues and

online access $209.00 per year only $17.42 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy now Prices may be subject to local taxes

which are calculated during checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer support SIMILAR CONTENT BEING VIEWED BY

OTHERS MIDO GDM: AN INNOVATIVE ARTIFICIAL INTELLIGENCE-BASED PREDICTION MODEL FOR THE DEVELOPMENT OF GESTATIONAL DIABETES IN MEXICAN WOMEN Article Open access 28 April 2023 IMPROVING

PREECLAMPSIA RISK PREDICTION BY MODELING PREGNANCY TRAJECTORIES FROM ROUTINELY COLLECTED ELECTRONIC MEDICAL RECORD DATA Article Open access 06 June 2022 DEVELOPMENT AND INTERNAL VALIDATION

OF A MODEL TO PREDICT TYPE 2 DIABETIC COMPLICATIONS AFTER GESTATIONAL DIABETES Article Open access 20 June 2022 DATA AVAILABILITY The data that support the findings of this study originate

from Clalit Health Services. Restrictions apply to the availability of these data and they are therefore not publicly available. Due to restrictions, these data can be accessed only by

request to the authors and/or Clalit Health Services. CODE AVAILABILITY The code that supports the findings of this study is tailored to the data and the fields of the Clalit Health Services

database, and is thus not provided since it is of no use as a standalone without access to the data per se. The algorithmic models used the standard Python code package scikit-learn, which

is publicly available. REFERENCES * Lowe, L. P. et al. Hyperglycemia and Adverse Pregnancy Outcome (HAPO) Study: associations of maternal A1C and glucose with pregnancy outcomes. _Diabetes

Care_ 35, 574–580 (2012). Article CAS PubMed PubMed Central Google Scholar * Lowe, W. L. et al. Association of gestational diabetes with maternal disorders of glucose metabolism and

childhood adiposity. _JAMA_ 320, 1005–1016 (2018). Article CAS PubMed PubMed Central Google Scholar * Scholtens, D. M. et al. Hyperglycemia and Adverse Pregnancy Outcome Follow-up Study

(HAPO FUS): maternal glycemia and childhood glucose metabolism. _Diabetes Care_ 42, 381–392 (2019). Article CAS PubMed PubMed Central Google Scholar * Zhao, P. et al. Maternal

gestational diabetes and childhood obesity at age 9–11: results of a multinational study. _Diabetologia_ 59, 2339–2348 (2016). Article CAS PubMed PubMed Central Google Scholar *

Koivusalo, S. B. et al. Gestational diabetes mellitus can be prevented by lifestyle intervention: the Finnish gestational diabetes prevention study (RADIEL): a randomized controlled trial.

_Diabetes Care_ 39, 24–30 (2016). Article CAS PubMed Google Scholar * Wang, C. et al. A randomized clinical trial of exercise during pregnancy to prevent gestational diabetes mellitus

and improve pregnancy outcome in overweight and obese pregnant women. _Am. J. Obstet. Gynecol._ 216, 340–351 (2017). Article PubMed Google Scholar * Donovan, P. J. & McIntyre, H. D.

Drugs for gestational diabetes. _Aust. Prescr._ 33, 141–144 (2010). Article Google Scholar * American Diabetes Association. 2. Classification and diagnosis of diabetes: standards of

medical care in diabetes—2018. _Diabetes Care_ 41, S13–S27 (2018). Article Google Scholar * Hunt, K. J. & Schuller, K. L. The increasing prevalence of diabetes in pregnancy. _Obstet.

Gynecol. Clin. N. Am._ 34, 173–199 (2007). Article Google Scholar * Bain, E. et al. Diet and exercise interventions for preventing gestational diabetes mellitus. _Cochrane Database Syst.

Rev_. CD010443 https://doi.org/10.1002/14651858.CD010443.pub2 (2015). * Avati, A. et al. Improving palliative care with deep learning. _BMC Med. Inform. Decis. Mak._ 18(Suppl 4), 122 (2018).

Article PubMed PubMed Central Google Scholar * Silva, I., Moody, G., Scott, D. J., Celi, L. A. & Mark, R. G. Predicting in-hospital mortality of ICU patients: the

PhysioNet/Computing in Cardiology Challenge 2012. _Comput. Cardiol. (2010)_ 39, 245–248 (2012). Google Scholar * Razavian, N., Marcus, J. & Sontag, D. Multi-task prediction of disease

onsets from longitudinal lab tests. Preprint at arXiv https://arxiv.org/abs/1608.00647 (2016). * Oh, J. et al. A generalizable, data-driven approach to predict daily risk of _Clostridium

difficile_ infection at two large academic health centers. _Infect. Control Hosp. Epidemiol._ 39, 425–433 (2018). Article PubMed PubMed Central Google Scholar * Miotto, R., Li, L., Kidd,

B. A. & Dudley, J. T. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. _Sci. Rep._ 6, 26094 (2016). Article CAS

PubMed PubMed Central Google Scholar * Danilenko-Dixon, D. R., Van Winter, J. T., Nelson, R. L. & Ogburn, P. L. Universal versus selective gestational diabetes screening: application

of 1997 American Diabetes Association recommendations. _Am. J. Obstet. Gynecol._ 181, 798–802 (1999). Article CAS PubMed Google Scholar * Qiu, H. et al. Electronic health record-driven

prediction for gestational diabetes mellitus in early pregnancy. _Sci. Rep._ 7, 16417 (2017). Article PubMed PubMed Central Google Scholar * Syngelaki, A. et al. First-trimester

screening for gestational diabetes mellitus based on maternal characteristics and history. _Fetal Diagn. Ther._ 38, 14–21 (2015). Article PubMed Google Scholar * US Department of Health

and Human Services, National Institutes of Health & Eunice Kennedy Shriver National Institute of Child Health and Human Development. _Am I at Risk for Gestational Diabetes?_

https://www.nichd.nih.gov/sites/default/files/publications/pubs/Documents/gestational_diabetes_2012.pdf (2012). * Steyerberg, E. W. et al. Assessing the performance of prediction models: a

framework for traditional and novel measures. _Epidemiology_ 21, 128–138 (2010). Article PubMed PubMed Central Google Scholar * Vickers, A. J. & Elkin, E. B. Decision curve analysis:

a novel method for evaluating prediction models. _Med. Decis. Making_ 26, 565–574 (2006). Article PubMed PubMed Central Google Scholar * Lundberg, S. & Lee, S.-I. A unified approach

to interpreting model predictions. _Adv. Neural Inf. Proc. Syst._ 30, 4765–4774 (2017). Google Scholar * Chu, S. Y. et al. Maternal obesity and risk of gestational diabetes mellitus.

_Diabetes Care_ 30, 2070–2076 (2007). Article PubMed Google Scholar * Williams, M. A., Qiu, C., Dempsey, J. C. & Luthy, D. A. Familial aggregation of type 2 diabetes and chronic

hypertension in women with gestational diabetes mellitus. _J. Reprod. Med._ 48, 955–962 (2003). PubMed Google Scholar * van Leeuwen, M. et al. Glucose challenge test for detecting

gestational diabetes mellitus: a systematic review. _BJOG_ 119, 393–401 (2012). Article PubMed Google Scholar * Donovan, L. et al. Screening tests for gestational diabetes: a systematic

review for the US Preventive Services Task Force. _Ann. Intern. Med._ 159, 115–122 (2013). Article PubMed Google Scholar * Lamain-de Ruiter, M. et al. External validation of prognostic

models to predict risk of gestational diabetes mellitus in one Dutch cohort: prospective multicentre cohort study. _BMJ_ 354, i4338 (2016). Article PubMed Google Scholar * Lao, T. T., Ho,

L.-F., Chan, B. C. P. & Leung, W.-C. Maternal age and prevalence of gestational diabetes mellitus. _Diabetes Care_ 29, 948–949 (2006). Article PubMed Google Scholar * Di Cianni, G.

et al. Prevalence and risk factors for gestational diabetes assessed by universal screening. _Diabetes Res. Clin. Pract._ 62, 131–137 (2003). Article PubMed Google Scholar * Teh, W. T. et

al. Risk factors for gestational diabetes mellitus: implications for the application of screening guidelines. _Aust. N. Z. J. Obstet. Gynaecol._ 51, 26–30 (2011). Article PubMed Google

Scholar * Shepherd, E. et al. Combined diet and exercise interventions for preventing gestational diabetes mellitus. _Cochrane Database Syst. Rev._ 11, CD010443 (2017). PubMed Google

Scholar * Davey, R. X. Selective versus universal screening for gestational diabetes mellitus: an evaluation of predictive risk factors. _Medical J. Aust._ 174, 118–121 (2001). Article CAS

Google Scholar * Kalter-Leibovici, O. et al. Screening and diagnosis of gestational diabetes mellitus: critical appraisal of the new International Association of Diabetes in Pregnancy

Study Group recommendations on a national level. _Diabetes Care_ 35, 1894–1896 (2012). Article PubMed PubMed Central Google Scholar * Phelan, M., Bhavsar, N. A. & Goldstein, B. A.

Illustrating informed presence bias in electronic health records data: how patient interactions with a health system can impact inference. _EGEMS (Wash DC)_ 5, 22 (2017). Google Scholar *

Zhang, C. & Ning, Y. Effect of dietary and lifestyle factors on the risk of gestational diabetes: review of epidemiologic evidence. _Am. J. Clin. Nutr._ 94, 1975S–1979S (2011). Article

CAS PubMed PubMed Central Google Scholar * Dudley, D. J. Diabetic-associated stillbirth: incidence, pathophysiology, and prevention. _Clin. Perinatol._ 34, 611–626 (2007). vii. Article

PubMed Google Scholar * _Data_. Clalit Research Institute; http://clalitresearch.org/about-us/our-data/ (accessed 23 July, 2019). * Vandorsten, J. P. et al. NIH consensus development

conference: diagnosing gestational diabetes mellitus. _NIH Consens. State Sci. Statements_ 29, 1–31 (2013). PubMed Google Scholar * State of Isreal Ministry of Health. _Monitoring of

Pregnancy and Medical Examinations During Pregnancy_ https://www.health.gov.il/English/Topics/Pregnancy/during/examination/Pages/permanent.aspx (accessed 23 July, 2019). * Hastie, T.,

Tibshirani, R. & Friedman, J. _The Elements of Statistical Learning_ (Springer, 2009). * Fernández-Delgado, M., Cernadas, E., Barro, S. & Amorim, D. Do we need hundreds of

classifiers to solve real world classification problems? _J. Mach. Learn. Res._ 15, 3133–3181 (2014). Google Scholar * Omar, K. _XGBoost and LGBM for Porto Seguro’s Kaggle Challenge: A

Comparison Semester Project_ (ETH, 2018). * Biendata Competitions. _KDD Cup of Fresh Air_ https://biendata.com/competition/kdd_2018/winners/ (accessed 23 July 2019). * Josse, J., Prost, N.,

Scornet, E. & Varoquaux, G. On the consistency of supervised learning with missing values. Preprint at arXiv https://arxiv.org/abs/1902.06931 (2019). * Chen, T. & Guestrin, C.

XGBoost: A scalable tree boosting system. in _Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining_ - KDD 2016 (eds Krishnapuram, B. et al.)

785–794 (ACM Press, 2016). * Ke, G. et al. _LightGBM: A Highly Efficient Gradient Boosting Decision Tree_

https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf (2017). * CBS. _Regional Statistics Section_

https://www.cbs.gov.il/EN/settlements/Pages/default.aspx?mode=Yeshuv (accessed 10 July 2018). * Lundberg, S. M. et al. Explainable machine-learning predictions for the prevention of

hypoxaemia during surgery. _Nat. Biomed. Eng._ 2, 749–760 (2018). Article PubMed PubMed Central Google Scholar Download references ACKNOWLEDGEMENTS We thank G. Barabash, E. Barkan, I.

Kalka and members of the Segal group for discussions. E.S. is supported by the Crown Human Genome Center, by D. L. Schwarz, J. N. Halpern and L. Steinberg, and by grants funded by the

European Research Council and the Israel Science Foundation. AUTHOR INFORMATION Author notes * These authors contributed equally: Nitzan Shalom Artzi, Smadar Shilo, Eran Hadar. AUTHORS AND

AFFILIATIONS * Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel Nitzan Shalom Artzi, Smadar Shilo, Hagai Rossman & Eran Segal *

Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel Nitzan Shalom Artzi, Smadar Shilo, Hagai Rossman & Eran Segal * Pediatric Diabetes Unit, Ruth

Rappaport Children’s Hospital, Rambam Healthcare Campus, Haifa, Israel Smadar Shilo * Helen Schneider Hospital for Women, Rabin Medical Center, Petach Tikva, Israel Eran Hadar, Shiri

Barbash-Hazan, Avi Ben-Haroush & Arnon Wiznitzer * Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Eran Hadar, Avi Ben-Haroush & Arnon Wiznitzer * Clalit Research

Institute, Clalit Health Services, Tel Aviv, Israel Ran D. Balicer & Becca Feldman * Department of Public Health, Faculty of Health Sciences, Ben-Gurion University, Beer-Sheva, Israel

Ran D. Balicer Authors * Nitzan Shalom Artzi View author publications You can also search for this author inPubMed Google Scholar * Smadar Shilo View author publications You can also search

for this author inPubMed Google Scholar * Eran Hadar View author publications You can also search for this author inPubMed Google Scholar * Hagai Rossman View author publications You can

also search for this author inPubMed Google Scholar * Shiri Barbash-Hazan View author publications You can also search for this author inPubMed Google Scholar * Avi Ben-Haroush View author

publications You can also search for this author inPubMed Google Scholar * Ran D. Balicer View author publications You can also search for this author inPubMed Google Scholar * Becca Feldman

View author publications You can also search for this author inPubMed Google Scholar * Arnon Wiznitzer View author publications You can also search for this author inPubMed Google Scholar *

Eran Segal View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS N.S.A., S.S. and E.H. conceived the project, designed and conducted the

analyses, interpreted the results and wrote the manuscript, and are listed in random order. H.R. conducted the analyses and wrote the manuscript. S.B.-H., A.B.-H., R.D.B. and B.F.

interpreted the results. A.W. and E.S. conceived and directed the project and analyses, designed the analyses, interpreted the results, wrote the manuscript and supervised the project.

CORRESPONDING AUTHORS Correspondence to Arnon Wiznitzer or Eran Segal. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PEER REVIEW

INFORMATION Joao Monteiro was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. PUBLISHER’S NOTE

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. EXTENDED DATA EXTENDED DATA FIG. 1 BASELINE PREDICTION, BASED ON

_BASELINE RISK SCORE_. A: Odds ratio for the risk score composing parameters. Adjusted odds ratios were derived from a logistic regression model, both values are presented on the training

set. B: Prevalence among women grouped by risk score. Error bars represent 90% confidence intervals on the train set. C: Histogram of risk scores in the training set. D: ROC curve for NIH

Risk Score and for a logistic regression model trained on its constructing parameters. Results are reported on the future validation set. Logistic regression model does not suppress the

Naive summation in the risk score. (n = 82,678 for all panels). EXTENDED DATA FIG. 2 EVALUATION OF THE MODEL ON THE GEOGRAPHICAL VALIDATION SET. A: Receiver Operating Characteristic (ROC)

curve, comparing our model (solid) and the Baseline Risk Score (dashed). Lighter colored lines are ROC curves of stratified partition of the validation set (not shown in ROC); bracketed

values are 95% confidence intervals calculated through a normal fit of those curves. B: Precision-Recall (PR) curve, with the same properties as in A. C: The fraction of GDM-positive samples

in every decile of the predicted probability. D: Predictions on different subsets of the cohort. auPR is shown for each subset, for our model (blue) and the baseline score (orange). Error

bars show 95% confidence intervals, and dark blue lines show the prevalence in each subset. Shaded area is the distribution of the relevant score. E: Performance by gestational age at

prediction. Every point is the evaluation score of a model built only with features available at this time point. (n = 46,002 for panels A-C. Subset sample sizes are listed in panel D).

EXTENDED DATA FIG. 3 EVALUATION OF THE MODEL ON THE GEO-TEMPORAL VALIDATION SET. A: Receiver Operating Characteristic (ROC) curve, comparing our model (solid) and the Baseline Risk Score

(dashed). Lighter colored lines are ROC curves of stratified partition of the validation set; bracketed values are 95% confidence intervals calculated through a normal fit of those curves.

B: Precision-Recall (PR) curve, with the same properties as in A. C: The fraction of GDM-positive samples in every decile of the predicted probability. D: Predictions on different subsets of

the cohort. auPR is shown for each subset, for our model (blue) and the baseline score (orange). Error bars show 95% confidence intervals, and dark blue lines show the prevalence in each

subset. Shaded area is the distribution of the relevant score. E: Performance by gestational age at prediction. Every point is the evaluation score of a model built only with features

available at this time point. (n = 8,540 for panels A-C. Subset sample sizes are listed in panel D). EXTENDED DATA FIG. 4 Evaluation results in different validation sets. EXTENDED DATA FIG.

5 BASIC UTILITY OF THE PREDICTOR. A: Calibration curve, showing the fraction of positive samples per bin versus the mean predicted probability of the bin. Blue and red bars represent the

ratio of negative/positive samples in the bin, respectively. B: Decision curve, showing the net benefit versus the threshold probability, for both predictor and baseline. The predictor

outperforms the baseline at all thresholds. (n = 82,678 for all panels). EXTENDED DATA FIG. 6 ADDITIONAL DEPENDENCE PLOTS. Top 20 features are shown (ordered left to right, top to bottom).

In each the mean predicted relative risk is plotted versus feature value. Bands represent SD area of the population per bin, which is connected to interactions between input features. (n =

82,678). EXTENDED DATA FIG. 7 HISTOGRAM OF LAB TESTS DURING PREGNANCY, SHOWING THE WINDOW DEFINITION OF F0, F1 AND F2. The peaks showing are weekly, and represents the fact that patients

tend to see a doctor in the same day of the week. SUPPLEMENTARY INFORMATION REPORTING SUMMARY RIGHTS AND PERMISSIONS Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Artzi,

N.S., Shilo, S., Hadar, E. _et al._ Prediction of gestational diabetes based on nationwide electronic health records. _Nat Med_ 26, 71–76 (2020). https://doi.org/10.1038/s41591-019-0724-8

Download citation * Received: 23 July 2019 * Accepted: 26 November 2019 * Published: 13 January 2020 * Issue Date: January 2020 * DOI: https://doi.org/10.1038/s41591-019-0724-8 SHARE THIS

ARTICLE Anyone you share the following link with will be able to read this content: Get shareable link Sorry, a shareable link is not currently available for this article. Copy to clipboard

Provided by the Springer Nature SharedIt content-sharing initiative