Diabetes Metab J.  2021 Jul;45(4):515-525. 10.4093/dmj.2020.0081.

Development and Validation of a Deep Learning Based Diabetes Prediction System Using a Nationwide Population-Based Cohort

Affiliations
  • 1Department of Endocrinology and Metabolism, Kyung Hee University School of Medicine, Seoul, Korea
  • 2Integrative Research Center for Cerebrovascular and Cardiovascular diseases, Yonsei University Health System, Yonsei University College of Medicine, Seoul, Korea
  • 3Yonsei University College of Medicine, Yonsei University Health System, Seoul, Korea
  • 4Division of Cardiology, Ewha Womans University School of Medicine, Seoul, Korea
  • 5Division of Cardiology, Severance Cardiovascular Hospital, Yonsei University Health System, Yonsei University College of Medicine, Seoul, Korea

Abstract

Background
Previously developed prediction models for type 2 diabetes mellitus (T2DM) have limited performance. We developed a deep learning (DL) based model using a cohort representative of the Korean population.
Methods
This study was conducted on the basis of the National Health Insurance Service-Health Screening (NHIS-HEALS) cohort of Korea. Overall, 335,302 subjects without T2DM at baseline were included. We developed the model based on 80% of the subjects, and verified the power in the remainder. Predictive models for T2DM were constructed using the recurrent neural network long short-term memory (RNN-LSTM) network and the Cox longitudinal summary model. The performance of both models over a 10-year period was compared using a time dependent area under the curve.
Results
During a mean follow-up of 10.4±1.7 years, the mean frequency of periodic health check-ups was 2.9±1.0 per subject. During the observation period, T2DM was newly observed in 8.7% of the subjects. The annual performance of the model created using the RNN-LSTM network was superior to that of the Cox model, and the risk factors for T2DM, derived using the two models were similar; however, certain results differed.
Conclusion
The DL-based T2DM prediction model, constructed using a cohort representative of the population, performs better than the conventional model. After pilot tests, this model will be provided to all Korean national health screening recipients in the future.

Keyword

Diabetes mellitus, type 2; Mass screening; Prediabetic state; Prediction

Figure

  • Fig. 1. Area under the curve (AUC) by year for the Cox longitudinal summary model (Cox Ls) and deep learning (DL) model.


Cited by  1 articles

Development of Various Diabetes Prediction Models Using Machine Learning Techniques
Juyoung Shin, Jaewon Kim, Chanjung Lee, Joon Young Yoon, Seyeon Kim, Seungjae Song, Hun-Sung Kim
Diabetes Metab J. 2022;46(4):650-657.    doi: 10.4093/dmj.2021.0115.


Reference

1. Zheng Y, Ley SH, Hu FB. Global aetiology and epidemiology of type 2 diabetes mellitus and its complications. Nat Rev Endocrinol. 2018; 14:88–98.
Article
2. International Diabetes Federation. IDF Diabetes Atlas. 8th ed. Brussels: International Diabetes Federation;2017.
3. Tuomilehto J, Lindstrom J, Eriksson JG, Valle TT, Hamalainen H, Ilanne-Parikka P, et al. Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. N Engl J Med. 2001; 344:1343–50.
Article
4. Knowler WC, Barrett-Connor E, Fowler SE, Hamman RF, Lachin JM, Walker EA, et al. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N Engl J Med. 2002; 346:393–403.
Article
5. Lindstrom J, Peltonen M, Eriksson JG, Ilanne-Parikka P, Aunola S, Keinanen-Kiukaanniemi S, et al. Improved lifestyle and decreased diabetes risk over 13 years: long-term follow-up of the randomised Finnish Diabetes Prevention Study (DPS). Diabetologia. 2013; 56:284–93.
6. Diabetes Prevention Program Research Group. Long-term effects of lifestyle intervention or metformin on diabetes development and microvascular complications over 15-year follow-up: the Diabetes Prevention Program Outcomes Study. Lancet Diabetes Endocrinol. 2015; 3:866–75.
7. Diabetes Prevention Program Research Group. The 10-year cost-effectiveness of lifestyle intervention or metformin for diabetes prevention: an intent-to-treat analysis of the DPP/DPPOS. Diabetes Care. 2012; 35:723–30.
8. Bang H, Edwards AM, Bomback AS, Ballantyne CM, Brillon D, Callahan MA, et al. Development and validation of a patient self-assessment score for diabetes risk. Ann Intern Med. 2009; 151:775–83.
Article
9. Hippisley-Cox J, Coupland C, Robson J, Sheikh A, Brindle P. Predicting risk of type 2 diabetes in England and Wales: prospective derivation and validation of QDScore. BMJ. 2009; 338:b880.
Article
10. Lindstrom J, Tuomilehto J. The diabetes risk score: a practical tool to predict type 2 diabetes risk. Diabetes Care. 2003; 26:725–31.
Article
11. Lee YH, Bang H, Kim HC, Kim HM, Park SW, Kim DJ. A simple screening score for diabetes for the Korean population: development, validation, and comparison with other scores. Diabetes Care. 2012; 35:1723–30.
Article
12. Ha KH, Lee YH, Song SO, Lee JW, Kim DW, Cho KH, et al. Development and validation of the Korean diabetes risk score: a 10-year national cohort study. Diabetes Metab J. 2018; 42:402–14.
Article
13. Deo RC. Machine learning in medicine. Circulation. 2015; 132:1920–30.
Article
14. Waljee AK, Higgins PD. Machine learning in medicine: a primer for physicians. Am J Gastroenterol. 2010; 105:1224–6.
Article
15. Chen JH, Asch SM. Machine learning and prediction in medicine: beyond the peak of inflated expectations. N Engl J Med. 2017; 376:2507–9.
16. Coudray N, Ocampo PS, Sakellaropoulos T, Narula N, Snuderl M, Fenyo D, et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med. 2018; 24:1559–67.
Article
17. Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One. 2017; 12:e0174944.
Article
18. Contreras I, Vehi J. Artificial intelligence for diabetes management and decision support: literature review. J Med Internet Res. 2018; 20:e10775.
Article
19. Choi BG, Rha SW, Kim SW, Kang JH, Park JY, Noh YK. Machine learning for the prediction of new-onset diabetes mellitus during 5-year follow-up in non-diabetic patients with cardiovascular risks. Yonsei Med J. 2019; 60:191–9.
Article
20. Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H. Predicting diabetes mellitus with machine learning techniques. Front Genet. 2018; 9:515.
Article
21. Ryu KS, Lee SW, Batbaatar E, Lee JW, Choi KS, Cha HS. A deep learning model for estimation of patients with undiagnosed diabetes. Appl Sci. 2020; 10:421.
Article
22. Nguyen BP, Pham HN, Tran H, Nghiem N, Nguyen QH, Do TTT, et al. Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records. Comput Methods Programs Biomed. 2019; 182:105055.
Article
23. Shin DW, Cho B, Guallar E. Korean National Health Insurance database. JAMA Intern Med. 2016; 176:138.
Article
24. Seong SC, Kim YY, Park SK, Khang YH, Kim HC, Park JH, et al. Cohort profile: the National Health Insurance Service-National Health Screening Cohort (NHIS-HEALS) in Korea. BMJ Open. 2017; 7:e016640.
Article
25. van Buuren S. Multiple imputation of discrete and continuous data by fully conditional specification. Stat Methods Med Res. 2007; 16:219–42.
Article
26. SAS Institute Inc.: SAS/STAT® 14.1 User’s Guide. Available from: https://support.sas.com/documentation/onlinedoc/stat/examples/141/index.html (cited 2021 Jan 4).
27. Lee YH, Han K, Ko SH, Ko KS, Lee KU; Taskforce Team of Diabetes Fact Sheet of the Korean Diabetes Association. Data analytic process of a nationwide population-based study using national health information database established by National Health Insurance Service. Diabetes Metab J. 2016; 40:79–82.
Article
28. Ko SH, Han K, Lee YH, Noh J, Park CY, Kim DJ, et al. Past and current status of adult type 2 diabetes mellitus management in Korea: a National Health Insurance Service database analysis. Diabetes Metab J. 2018; 42:93–100.
Article
29. Cho IJ, Sung JM, Chang HJ, Chung N, Kim HC. Incremental value of repeated risk factor measurements for cardiovascular disease prediction in middle-aged Korean adults: results from the NHIS-HEALS (National Health Insurance System-National Health Screening Cohort). Circ Cardiovasc Qual Outcomes. 2017; 10:e004197.
Article
30. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997; 9:1735–80.
Article
31. Tieleman T, Hinton G. 6.5-RMSprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning. Available from: https://www.coursera.org/learn/neural-networks-deep-learning (cited 2020 Jan 4).
32. Street WN. A neural network model for prognostic prediction. In : Proceedings of the Fifteenth International Conference on Machine Learning; 1998 Jul 24-27; Madison, WI. San Francisco. Morgan Kaufmann Publishers;1998. p. 540–6.
33. Baesens B, Van Gestel T, Stepanova M, Van den Poel D, Vanthienen J. Neural network survival analysis for personal loan data. J Oper Res Soc. 2005; 56:1089–98.
Article
34. Chi CL, Street WN, Wolberg WH. Application of artificial neural network-based survival analysis on two breast cancer datasets. AMIA Annu Symp Proc. 2007; 2007:130–4.
35. Dezfouli HN, Bakar MRA, Dezfouli HN. Feed forward neural networks models for survival analysis. In : 2012 International Conference on Statistics in Science, Business and Engineering (ICSSBE); 2012 Sep 10-12; Langkawi, MY. IEEE;2012. p. 1–5.
Article
36. Bach S, Binder A, Montavon G, Klauschen F, Muller KR, Samek W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS One. 2015; 10:e0130140.
Article
37. Escalante HJ, Escalera S, Guyon I, Baro X, Gucluturk Y, Guclu U, et al. Explainable and interpretable models in computer vision and machine learning. Cham: Springer;2018. Chapter, Explanation methods in deep learning: users, values, concerns and challenges. p. 19–36.
38. Arras L, Montavon G, Muller KR, Samek W. Explaining recurrent neural network predictions in sentiment analysis. In : Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis; 2017 Sep 8; Copenhagen, DM. Association for Computational Linguistics;2017. p. 159–68.
Article
39. Heagerty PJ, Lumley T, Pepe MS. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics. 2000; 56:337–44.
Article
40. Heagerty PJ, Zheng Y. Survival model predictive accuracy and ROC curves. Biometrics. 2005; 61:92–105.
Article
41. Lee YH, Bang H, Kim DJ. How to establish clinical prediction models. Endocrinol Metab (Seoul). 2016; 31:38–44.
Article
42. Noble D, Mathur R, Dent T, Meads C, Greenhalgh T. Risk models and scores for type 2 diabetes: systematic review. BMJ. 2011; 343:d7163.
Article
43. Rhee SY, Chon S, Ahn KJ, Woo JT; Korean Diabetes Prevention Study Investigators. Hospital-based Korean diabetes prevention study: a prospective, multi-center, randomized, open-label controlled study. Diabetes Metab J. 2019; 43:49–58.
Article
Full Text Links
  • DMJ
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr