Diabetes Metab J.  2022 Jul;46(4):650-657. 10.4093/dmj.2021.0115.

Development of Various Diabetes Prediction Models Using Machine Learning Techniques

Affiliations
  • 1Health Promotion Center, Seoul St. Mary’s Hospital, Seoul, Korea
  • 2Department of Endocrinology and Metabolism, College of Medicine, The Catholic University of Korea, Seoul, Korea
  • 3LifeSemantics Corp., Seoul, Korea
  • 4Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea

Abstract

Background
There are many models for predicting diabetes mellitus (DM), but their clinical implication remains vague. Therefore, we aimed to create various DM prediction models using easily accessible health screening test parameters.
Methods
Two sets of variables were used to develop eight DM prediction models. One set comprised 62 easily accessible examination results of commonly used variables from a tertiary university hospital. The second set comprised 27 of the 62 variables included in the national routine health checkups. Gradient boosting and random forest algorithms were used to develop the models. Internal validation was performed using the stratified 10-fold cross-validation method.
Results
The area under the receiver operating characteristic curve (ROC-AUC) for the 62-variable DM model making 12-month predictions for subjects without diabetes was the largest (0.928) among those of the eight DM prediction models. The ROC-AUC dropped by more than 0.04 when training with the simplified 27-variable set but still showed fairly good performance with ROC-AUCs between 0.842 and 0.880. The accuracy was up to 11.5% higher (from 0.807 to 0.714) when fasting glucose was included.
Conclusion
We created easily applicable diabetes prediction models that deliver good performance using parameters commonly assessed during tertiary university hospital and national routine health checkups. We plan to perform prospective external validation, hoping that the developed DM prediction models will be widely used in clinical practice.

Keyword

Diabetes mellitus; Electronic health records; Machine learning; Probability; Risk assessment

Figure

  • Fig. 1. Design of the four diabetes prediction models and selection of study subjects. The medical records contained 3,952 diabetic and 134,691 non-diabetic individuals. Model-1 and Model-2 were 2- and 1-year prediction models, respectively, for non-diabetic subjects. Subjects with data of the previous 24 months before diabetes mellitus (DM) diagnosis were included in the Model-1 (752 diabetic and 26,175 non-diabetic individuals). Subjects with data of the previous 12 months before DM diagnosis were included in the Model-2 (641 diabetic and 33,380 non-diabetic individuals). Model-3 and Model-4 were the 1-year prediction models for prediabetic subjects and model-4 was constructed after learning the difference between 1 and 2 years before diabetes diagnosis. From subjects of Model-2, subjects with prediabetic condition on previous 12 months were selected for the Model-3 (519 diabetic and 6,345 prediabetic individuals). From subjects of Model-3, subjects with data of the previous 24 months were selected for the Model-4 (281 diabetic and 3,814 prediabetic individuals). Non-diabetics were randomly selected from subjects without diabetes according to the design of each model. The number of non-diabetic or prediabetic subjects was adjusted to be the same in each model. Gradient boosting algorithms were used for Models-1, -2, and -3, and random forest algorithms were used for Model-4.

  • Fig. 2. Variable importance in the simplified diabetes risk prediction models with 27 variables. The Gini importance of the 27 variables is presented in reference to that of fasting blood glucose, which was set as 1.0. BMI, body mass index; GTP, glutamyl transpeptidase; ALT, alanine aminotransferase; LDL, low-density lipoprotein; HDL, high-density lipoprotein; BP, blood pressure; AST, aspartate transaminase; DM, diabetes mellitus.


Reference

1. Kim BY, Won JC, Lee JH, Kim HS, Park JH, Ha KH, et al. Diabetes fact sheets in Korea, 2018: an appraisal of current status. Diabetes Metab J. 2019; 43:487–94.
Article
2. Gillies CL, Abrams KR, Lambert PC, Cooper NJ, Sutton AJ, Hsu RT, et al. Pharmacological and lifestyle interventions to prevent or delay type 2 diabetes in people with impaired glucose tolerance: systematic review and meta-analysis. BMJ. 2007; 334:299.
Article
3. Lindstrom J, Peltonen M, Eriksson JG, Louheranta A, Fogelholm M, Uusitupa M, et al. High-fibre, low-fat diet predicts long-term weight loss and decreased type 2 diabetes risk: the Finnish Diabetes Prevention Study. Diabetologia. 2006; 49:912–20.
Article
4. Kim TM, Kim H, Jeong YJ, Baik SJ, Yang SJ, Lee SH, et al. The differences in the incidence of diabetes mellitus and prediabetes according to the type of HMG-CoA reductase inhibitors prescribed in Korean patients. Pharmacoepidemiol Drug Saf. 2017; 26:1156–63.
Article
5. Knowler WC, Barrett-Connor E, Fowler SE, Hamman RF, Lachin JM, Walker EA, et al. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N Engl J Med. 2002; 346:393–403.
Article
6. Pan XR, Li GW, Hu YH, Wang JX, Yang WY, An ZX, et al. Effects of diet and exercise in preventing NIDDM in people with impaired glucose tolerance: the Da Qing IGT and Diabetes Study. Diabetes Care. 1997; 20:537–44.
Article
7. Tuomilehto J, Lindstrom J, Eriksson JG, Valle TT, Hamalainen H, Ilanne-Parikka P, et al. Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. N Engl J Med. 2001; 344:1343–50.
Article
8. Sajda P. Machine learning for detection and diagnosis of disease. Annu Rev Biomed Eng. 2006; 8:537–65.
Article
9. Odedra D, Samanta S, Vidyarthi AS. Computational intelligence in early diabetes diagnosis: a review. Rev Diabet Stud. 2010; 7:252–62.
Article
10. Yoo I, Alafaireet P, Marinov M, Pena-Hernandez K, Gopidi R, Chang JF, et al. Data mining in healthcare and biomedicine: a survey of the literature. J Med Syst. 2012; 36:2431–48.
Article
11. Wang C, Li L, Wang L, Ping Z, Flory MT, Wang G, et al. Evaluating the risk of type 2 diabetes mellitus using artificial neural network: an effective classification approach. Diabetes Res Clin Pract. 2013; 100:111–8.
Article
12. Barber SR, Davies MJ, Khunti K, Gray LJ. Risk assessment tools for detecting those with pre-diabetes: a systematic review. Diabetes Res Clin Pract. 2014; 105:1–13.
Article
13. Choi SB, Kim WJ, Yoo TK, Park JS, Chung JW, Lee YH, et al. Screening for prediabetes using machine learning models. Comput Math Methods Med. 2014; 2014:618976.
Article
14. Arabasadi Z, Alizadehsani R, Roshanzamir M, Moosaei H, Yarifard AA. Computer aided decision making for heart disease detection using hybrid neural network-genetic algorithm. Comput Methods Programs Biomed. 2017; 141:19–26.
Article
15. Lee WC, Lee SY. National health screening program of Korea. J Korean Med Assoc. 2010; 53:363–70.
Article
16. Kim MK, Ko SH, Kim BY, Kang ES, Noh J, Kim SK, et al. 2019 Clinical practice guidelines for type 2 diabetes mellitus in Korea. Diabetes Metab J. 2019; 43:398–406.
Article
17. Silva K, Lee WK, Forbes A, Demmer RT, Barton C, Enticott J. Use and performance of machine learning models for type 2 diabetes prediction in community settings: a systematic review and meta-analysis. Int J Med Inform. 2020; 143:104268.
Article
18. Alssema M, Vistisen D, Heymans MW, Nijpels G, Glumer C, Zimmet PZ, et al. The evaluation of screening and early detection strategies for type 2 diabetes and impaired glucose tolerance (DETECT-2) update of the Finnish diabetes risk score for prediction of incident type 2 diabetes. Diabetologia. 2011; 54:1004–12.
Article
19. Choi HS, Lee SW, Kim JT, Lee HK. The association between pulmonary functions and incident diabetes: longitudinal analysis from the Ansung Cohort in Korea. Diabetes Metab J. 2020; 44:699–710.
Article
20. Casanova R, Saldana S, Simpson SL, Lacy ME, Subauste AR, Blackshear C, et al. Prediction of incident diabetes in the Jackson Heart Study using high-dimensional machine learning. PLoS One. 2016; 11:e0163942.
Article
21. Nanri A, Nakagawa T, Kuwahara K, Yamamoto S, Honda T, Okazaki H, et al. Development of risk score for predicting 3-year incidence of type 2 diabetes: Japan Epidemiology Collaboration on Occupational Health Study. PLoS One. 2015; 10:e0142779.
Article
22. Doi Y, Ninomiya T, Hata J, Hirakawa Y, Mukai N, Iwase M, et al. Two risk score models for predicting incident type 2 diabetes in Japan. Diabet Med. 2012; 29:107–14.
23. Heianza Y, Arase Y, Hsieh SD, Saito K, Tsuji H, Kodama S, et al. Development of a new scoring system for predicting the 5 year incidence of type 2 diabetes in Japan: the Toranomon Hospital Health Management Center Study 6 (TOPICS 6). Diabetologia. 2012; 55:3213–23.
Article
24. Sun F, Tao Q, Zhan S. An accurate risk score for estimation 5-year risk of type 2 diabetes based on a health screening population in Taiwan. Diabetes Res Clin Pract. 2009; 85:228–34.
Article
25. Guasch-Ferre M, Bullo M, Costa B, Martinez-Gonzalez MA, Ibarrola-Jurado N, Estruch R, et al. A risk score to predict type 2 diabetes mellitus in an elderly Spanish Mediterranean population at high cardiovascular risk. PLoS One. 2012; 7:e33437.
Article
26. Aekplakorn W, Bunnag P, Woodward M, Sritara P, Cheepudomwit S, Yamwong S, et al. A risk score for predicting incident diabetes in the Thai population. Diabetes Care. 2006; 29:1872–7.
Article
27. Schulze MB, Weikert C, Pischon T, Bergmann MM, Al-Hasani H, Schleicher E, et al. Use of multiple metabolic and genetic markers to improve the prediction of type 2 diabetes: the EPIC-Potsdam Study. Diabetes Care. 2009; 32:2116–9.
Article
28. Kim HS, Kim JH. Proceed with caution when using real world data and real world evidence. J Korean Med Sci. 2019; 34:e28.
Article
29. Kim HS, Kim DJ, Yoon KH. Medical big data is not yet available: why we need realism rather than exaggeration. Endocrinol Metab (Seoul). 2019; 34:349–54.
Article
30. Wilson PW, Meigs JB, Sullivan L, Fox CS, Nathan DM, D’Agostino RB Sr. Prediction of incident diabetes mellitus in middle-aged adults: the Framingham Offspring Study. Arch Intern Med. 2007; 167:1068–74.
Article
31. Mashayekhi M, Prescod F, Shah B, Dong L, Keshavjee K, Guergachi A. Evaluating the performance of the Framingham Diabetes Risk Scoring Model in Canadian electronic medical records. Can J Diabetes. 2015; 39:152–6.
Article
32. Rhee SY, Sung JM, Kim S, Cho IJ, Lee SE, Chang HJ. Development and validation of a deep learning based diabetes prediction system using a nationwide population-based cohort. Diabetes Metab J. 2021; 45:515–25.
Article
33. Kim MK, Han K, Koh ES, Hong OK, Baek KH, Song KH, et al. Cumulative exposure to impaired fasting glucose and future risk of type 2 diabetes mellitus. Diabetes Res Clin Pract. 2021; 175:108799.
Article
Full Text Links
  • DMJ
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr