Clin Exp Otorhinolaryngol.  2022 May;15(2):168-176. 10.21053/ceo.2021.01536.

Machine Learning Models for Predicting the Occurrence of Respiratory Diseases Using Climatic and Air-Pollution Factors

Affiliations
  • 1Department of Biomedical Engineering, Chungnam National University College of Medicine, Daejeon, Korea
  • 2Department of Neurology, Columbia University, New York, NY, USA
  • 3Institute of Health Policy and Management, Medical Research Center, Seoul National University, Seoul, Korea
  • 4Department of Otorhinolaryngology-Head and Neck Surgery, Chung-Ang University College of Medicine, Seoul, Korea

Abstract


Objectives
. Because climatic and air-pollution factors are known to influence the occurrence of respiratory diseases, we used these factors to develop machine learning models for predicting the occurrence of respiratory diseases.
Methods
. We obtained the daily number of respiratory disease patients in Seoul. We used climatic and air-pollution factors to predict the daily number of patients treated for respiratory diseases per 10,000 inhabitants. We applied the relief-based feature selection algorithm to evaluate the importance of feature selection. We used the gradient boosting and Gaussian process regression (GPR) methods, respectively, to develop two different prediction models. We also employed the holdout cross-validation method, in which 75% of the data was used to train the model, and the remaining 25% was used to test the trained model. We determined the estimated number of respiratory disease patients by applying the developed prediction models to the test set. To evaluate the performance of each model, we calculated the coefficient of determination (R2) and the root mean square error (RMSE) between the original and estimated numbers of respiratory disease patients. We used the Shapley Additive exPlanations (SHAP) approach to interpret the estimated output of each machine learning model.
Results
. Features with negative weights in the relief-based algorithm were excluded. When applying gradient boosting to unseen test data, R2 and RMSE were 0.68 and 13.8, respectively. For GPR, the R2 and RMSE were 0.67 and 13.9, respectively. SHAP analysis showed that reductions in average temperature, daylight duration, average humidity, sulfur dioxide (SO2), total solar insolation amount, and temperature difference increased the number of respiratory disease patients, whereas increases in atmospheric pressure, carbon monoxide (CO), and particulate matter ≤2.5 μm in aerodynamic diameter (PM2.5) increased the number of respiratory disease patients.
Conclusion
. We successfully developed models for predicting the occurrence of respiratory diseases using climatic and air-pollution factors. These models could evolve into public warning systems.

Keyword

Machine Learning; Respiratory Diseases; Climate; Air Pollution; Gradient Boosting; Gaussian Process Regression

Figure

  • Fig. 1. Daily numbers of patients treated for respiratory disease (RD) per 10,000 inhabitants in Seoul from January 1, 2014, to December 31, 2019. The green dots indicate holidays, the blue dots, the days after holidays, and black dots, regular weekdays.

  • Fig. 2. The overall procedure for the development of the machine learning (ML) prediction models. The training and test sessions with hyperparameter optimization were performed after data preprocessing and feature selection, after which Shapley Additive exPlanations (SHAP)-based interpretation for the developed models was performed.

  • Fig. 3. The results of the relief-based feature selection algorithm for 15 climatic and air-pollution factors. Higher feature weights indicate higher importance for the target response.

  • Fig. 4. The prediction results of the daily number of respiratory disease (RD) patients using unseen test data (the latter part of 2018 and 2019). (A) The prediction results using the developed gradient boosting model. (B) The prediction results using the developed Gaussian process regression (GPR) model. The black and blue dots indicate the actual and predicted daily numbers of RD patients per 10,000 inhabitants, respectively. The shaded area represents the 95% confidence interval. (C, D) Scatter plots between the actual and predicted RD patients for the developed gradient boosting and GPR models, respectively. The solid black line represents the Y=X line.

  • Fig. 5. Shapley Additive exPlanations (SHAP) feature importance (A) and summary plot (B). The SHAP feature importance (i.e., the mean absolute Shapley values) for the gradient boosting model. In the SHAP summary plot, the features on the Y-axis are ordered based on their importance. The color bars indicate the amplitudes of feature values from low to high. Overlapping points are stacked in the Y-axis directions of both images to show the distribution of the Shapley values for each feature. Reductions in average temperature, daylight duration, average humidity, sulfur dioxide (SO2), total solar insolation amount, and temperature difference increased the number of respiratory disease patients, whereas increases in atmospheric pressure, carbon monoxide (CO), and particulate matter ≤2.5 µm in aerodynamic diameter (PM2.5) increased the number of respiratory disease patients. NO2, nitrogen dioxide; PM10, particulate matter ≤10 µm in aerodynamic diameter.


Reference

1. Tang JW, Loh TP. Correlations between climate factors and incidence: a contributor to RSV seasonality. Rev Med Virol. 2014; Jan. 24(1):15–34.
Article
2. Vandini S, Corvaglia L, Alessandroni R, Aquilano G, Marsico C, Spinelli M, et al. Respiratory syncytial virus infection in infants and correlation with meteorological factors and air pollutants. Ital J Pediatr. 2013; Jan. 39(1):1.
Article
3. Sohn J, Jung IY, Ku Y, Kim Y. Machine-learning-based rehabilitation prognosis prediction in patients with ischemic stroke using brainstem auditory evoked potential. Diagnostics (Basel). 2021; Apr. 11(4):673.
Article
4. Volkova S, Ayton E, Porterfield K, Corley CD. Forecasting influenzalike illness dynamics for military populations using neural networks and social media. PLoS One. 2017; Dec. 12(12):e0188941.
Article
5. Lu J, Bu P, Xia X, Yao L, Zhang Z, Tan Y. A new deep learning algorithm for detecting the lag effect of fine particles on hospital emergency visits for respiratory diseases. IEEE Access. 2020; 8:145593–600.
Article
6. Yang PH, Hsieh MT, Lin GM, Chen MJ, Yeh CH, Huang ZX. Prediction of outpatient visits for upper respiratory tract infections by machine learning of PM2.5 and PM10 levels in Taiwan. In : In Proceedings of the 2018 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW); 2018.
Article
7. Bolourani S, Brenner M, Wang P, McGinn T, Hirsch JS, Barnaby D, et al. A machine learning prediction model of respiratory failure within 48 hours of patient admission for COVID-19: model development and validation. J Med Internet Res. 2021; Feb. 23(2):e24246.
Article
8. Dürichen R, Pimentel MA, Clifton L, Schweikard A, Clifton DA. Multitask Gaussian processes for multivariate physiological time-series analysis. IEEE Trans Biomed Eng. 2015; Jan. 62(1):314–22.
Article
9. Lundberg S, Lee SI. A unified approach to interpreting model predictions. In : In Proceedings of the 31st International Conference on Neural Information Processing Systems; 2017.
10. Kim H, Kim Y, Hong YC. The lag-effect pattern in the relationship of particulate air pollution to daily mortality in Seoul, Korea. Int J Biometeorol. 2003; Sep. 48(1):25–30.
Article
11. Robnik-Sikonja M, Kononenko I. An adaptation of Relief for attribute estimation in regression. In : In Proceedings of the Fourteenth International Conference on Machine Learning (ICML’97); 1997.
12. Liu L, Yu Y, Fei Z, Li M, Wu FX, Li HD, et al. An interpretable boosting model to predict side effects of analgesics for osteoarthritis. BMC Syst Biol. 2018; Nov. 12(Suppl 6):105.
Article
13. Liu H, Ong YS, Shen X, Cai J. When Gaussian process meets big data: a review of scalable GPs. IEEE Trans Neural Netw Learn Syst. 2020; Nov. 31(11):4405–23.
Article
14. Chen S, Xu J, Wu Y, Wang X, Fang S, Cheng J, et al. Predicting temporal propagation of seasonal influenza using improved gaussian process model. J Biomed Inform. 2019; May. 93:103144.
Article
15. Caywood MS, Roberts DM, Colombe JB, Greenwald HS, Weiland MZ. Gaussian process regression for predictive but interpretable machine learning models: an example of predicting mental workload across tasks. Front Hum Neurosci. 2017; Jan. 10:647.
Article
16. Santillana M, Nguyen AT, Dredze M, Paul MJ, Nsoesie EO, Brownstein JS. Combining search, social media, and traditional data sources to improve influenza surveillance. PLoS Comput Biol. 2015; Oct. 11(10):e1004513.
Article
17. Subudhi S, Verma A, Patel AB, Hardin CC, Khandekar MJ, Lee H, et al. Comparing machine learning algorithms for predicting ICU admission and mortality in COVID-19. NPJ Digit Med. 2021; May. 4(1):87.
Article
18. Shapley LS, Roth AE. The Shapley value: essays in honor of Lloyd S. Shapley. Cambridge: Cambridge University Press;1988.
19. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020; Jan. 2(1):56–67.
Article
20. Vandini S, Bottau P, Faldella G, Lanari M. Immunological, viral, environmental, and individual factors modulating lung immune response to respiratory syncytial virus. Biomed Res Int. 2015; 2015:875723.
Article
21. Ward MP, Xiao S, Zhang Z. Humidity is a consistent climatic factor contributing to SARS-CoV-2 transmission. Transbound Emerg Dis. 2020; Nov. 67(6):3069–74.
Article
22. Ciencewicki J, Jaspers I. Air pollution and respiratory viral infection. Inhal Toxicol. 2007; Nov. 19(14):1135–46.
Article
23. Moriyama M, Hugentobler WJ, Iwasaki A. Seasonality of respiratory viral infections. Annu Rev Virol. 2020; Sep. 7(1):83–101.
Article
24. Ferrari U, Exner T, Wanka ER, Bergemann C, Meyer-Arnek J, Hildenbrand B, et al. Influence of air pressure, humidity, solar radiation, temperature, and wind speed on ambulatory visits due to chronic obstructive pulmonary disease in Bavaria, Germany. Int J Biometeorol. 2012; Jan. 56(1):137–43.
Article
25. Schwarz T, Schwarz A. Molecular mechanisms of ultraviolet radiationinduced immunosuppression. Eur J Cell Biol. 2011; Jun-Jul. 90(6-7):560–4.
Article
26. Kim SY, Kong IG, Min C, Choi HG. Association of air pollution with increased risk of peritonsillar abscess formation. JAMA Otolaryngol Head Neck Surg. 2019; Jun. 145(6):530–5.
Article
27. Kim SY, Min C, Yoo DM, Park B, Choi HG. Short-term exposure to air pollution and epiglottitis: a nested case-control study. Laryngoscope. 2021; Nov. 131(11):2483–9.
Article
28. Croft DP, Zhang W, Lin S, Thurston SW, Hopke PK, Masiol M, et al. The association between respiratory infection and air pollution in the setting of air quality policy and economic change. Ann Am Thorac Soc. 2019; Mar. 16(3):321–30.
Article
29. Horne BD, Joy EA, Hofmann MG, Gesteland PH, Cannon JB, Lefler JS, et al. Short-term elevation of fine particulate matter air pollution and acute lower respiratory infection. Am J Respir Crit Care Med. 2018; Sep. 198(6):759–66.
Article
30. Su W, Wu X, Geng X, Zhao X, Liu Q, Liu T. The short-term effects of air pollutants on influenza-like illness in Jinan, China. BMC Public Health. 2019; Oct. 19(1):1319.
Article
31. Hyrkas H, Ikaheimo TM, Jaakkola JJ, Jaakkola MS. Asthma control and cold weather-related respiratory symptoms. Respir Med. 2016; Apr. 113:1–7.
Article
32. Hyrkas H, Jaakkola MS, Ikaheimo TM, Hugg TT, Jaakkola JJ. Asthma and allergic rhinitis increase respiratory symptoms in cold weather among young adults. Respir Med. 2014; Jan. 108(1):63–70.
Article
33. Koskela HO. Cold air-provoked respiratory symptoms: the mechanisms and management. Int J Circumpolar Health. 2007; Apr. 66(2):91–100.
Article
34. Lin L, Li T, Sun M, Liang Q, Ma Y, Wang F, et al. Effect of particulate matter exposure on the prevalence of allergic rhinitis in children: a systematic review and meta-analysis. Chemosphere. 2021; Apr. 268:128841.
Article
35. Zou QY, Shen Y, Ke X, Hong SL, Kang HY. Exposure to air pollution and risk of prevalence of childhood allergic rhinitis: a meta-analysis. Int J Pediatr Otorhinolaryngol. 2018; Sep. 112:82–90.
Article
36. Nhung NT, Schindler C, Dien TM, Probst-Hensch N, Kunzli N. Association of ambient air pollution with lengths of hospital stay for hanoi children with acute lower-respiratory infection, 2007-2016. Environ Pollut. 2019; Apr. 247:752–62.
Article
37. Zhu Y, Xie J, Huang F, Cao L. Association between short-term exposure to air pollution and COVID-19 infection: evidence from China. Sci Total Environ. 2020; Jul. 727:138704.
Article
38. Liu Y, Liu J, Chen F, Shamsi BH, Wang Q, Jiao F, et al. Impact of meteorological factors on lower respiratory tract infections in children. J Int Med Res. 2016; Feb. 44(1):30–41.
Article
39. Tasci SS, Kavalci C, Kayipmaz AE. Relationship of meteorological and air pollution parameters with pneumonia in elderly patients. Emerg Med Int. 2018; Mar. 2018:4183203.
Article
40. Banerjee A, Dunson DB, Tokdar ST. Efficient Gaussian process regression for large datasets. Biometrika. 2013; Mar. 100(1):75–89.
Article
Full Text Links
  • CEO
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr