Healthc Inform Res.  2020 Jan;26(1):20-33. 10.4258/hir.2020.26.1.20.

Prediction of Chronic Disease-Related Inpatient Prolonged Length of Stay Using Machine Learning Algorithms

  • 1Department of Industrial and Management System Engineering, University of South Florida, Tampa, FL, USA.
  • 2College of Engineering, University of South Florida, Tampa, FL, USA.


The study aimed to develop and compare predictive models based on supervised machine learning algorithms for predicting the prolonged length of stay (LOS) of hospitalized patients diagnosed with five different chronic conditions.
An administrative claim dataset (2008-2012) of a regional network of nine hospitals in the Tampa Bay area, Florida, USA, was used to develop the prediction models. Features were extracted from the dataset using the International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) codes. Five learning algorithms, namely, decision tree C5.0, linear support vector machine (LSVM), k-nearest neighbors, random forest, and multi-layered artificial neural networks, were used to build the model with semi-supervised anomaly detection and two feature selection methods. Issues with the unbalanced nature of the dataset were resolved using the Synthetic Minority Over-sampling Technique (SMOTE).
LSVM with wrapper feature selection performed moderately well for all patient cohorts. Using SMOTE to counter data imbalances triggered a tradeoff between the model's sensitivity and specificity, which can be masked under a similar area under the curve. The proposed aggregate rank selection approach resulted in a balanced performing model compared to other criteria. Finally, factors such as comorbidity conditions, source of admission, and payer types were associated with the increased risk of a prolonged LOS.
Prolonged LOS is mostly associated with pre-intraoperative clinical and patient socioeconomic factors. Accurate patient identification with the risk of prolonged LOS using the selected model can provide hospitals a better tool for planning early discharge and resource allocation, thus reducing avoidable hospitalization costs.


Length of Stay; Chronic Disease; Inpatients; Machine Learning; Discharge Planning

MeSH Terms

Chronic Disease
Cohort Studies
Decision Trees
International Classification of Diseases
Length of Stay*
Machine Learning*
Patient Discharge
Resource Allocation
Sensitivity and Specificity
Socioeconomic Factors
Supervised Machine Learning
Support Vector Machine


  • Figure 1 Data preprocessing steps for building predictive model. SVM: support vector machine.

  • Figure 2 Flowchart of the predictive model building and best performing model selection. CQ: chi-square feature selection, WR: support vector machinebased wrapper feature selection, AUC: area under the curve, SP: specificity, SN: sensitivity, SMOTE: Synthetic Minority Over-sampling Technique.

  • Figure 3 Performance metric changes (%) with and without SMOTE balancing. AMI: acute myocardial infarction, CHF: congestive heart failure, COPD: chronic obstructive pulmonary disease, DB: type 2 diabetes, PN: pneumonia, SP: specificity, SN: sensitivity, KNN: k-nearest neighbor, LSVM: linear support vector machine, RF: random forest, NN: multi-layer neural network, SMOTE: Synthetic Minority Over-sampling Technique.


1. Torio CM, Moore B. National inpatient hospital costs: the most expensive conditions by payer, 2013 [Internet]. Rockville (MD): Agency for Healthcare Research and Quality;2006. cited at 2020 Jan 10. Available from:
2. Weiss AJ, Elixhauser A. Overview of hospital stays in the United States, 2012 [Internet]. Rockville (MD): Agency for Healthcare Research and Quality;2014. cited at 2020 Jan 10. Available from:
3. Pfuntner A, Wier LM, Steiner C. Costs for hospital stays in the United States, 2010 [Internet]. Rockville (MD): Agency for Healthcare Research and Quality;2013. cited at 2020 Jan 10. Available from:
4. Centers for Medicare & Medicaid Services (CMS), HHS. Medicare Program; Medicare Shared Savings Program; Accountable Care Organizations: pathways to success and extreme and uncontrollable circumstances policies for performance year 2017: final rules. Fed Regist. 2018; 83(249):67816–68082.
5. Lu M, Sajobi T, Lucyk K, Lorenzetti D, Quan H. Systematic review of risk adjustment models of hospital length of stay (LOS). Med Care. 2015; 53(4):355–365.
6. Rowan M, Ryan T, Hegarty F, O'Hare N. The use of artificial neural networks to stratify the length of stay of cardiac patients based on preoperative and initial postoperative factors. Artif Intell Med. 2007; 40(3):211–221.
7. Wrenn J, Jones I, Lanaghan K, Congdon CB, Aronsky D. Estimating patient's length of stay in the Emergency Department with an artificial neural network. AMIA Annu Symp Proc. 2005; 1155.
8. Morton A, Marzban E, Giannoulis G, Patel A, Aparasu R, Kakadiaris IA. A comparison of supervised machine learning techniques for predicting short-term in-hospital length of stay among diabetic patients. In : Proceedings of 2014 13th International Conference on Machine Learning and Applications; 2014 Dec 3–6; Detroit, MI. p. 428–431.
9. Jiang X, Qu X, Davis LB. Using data mining to analyze patient discharge data for an urban hospital. In : Proceedings of the 2010 International Conference on Data Mining (DMIN); 2010 Jul 12–15; Las Vegas, NV. p. 139–144.
10. Azari A, Janeja VP, Levin S. Imbalanced learning to predict long stay Emergency Department patients. In : Proceedings of 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2015 Nov 9–12; Washington, DC. p. 807–814.
11. Salah H. Predicting inpatient length of stay in western New York health service area using machine learning algorithms [dissertation]. Binghamton (NY): State University of New York at Binghamton;2017.
12. Li JS, Tian Y, Liu YF, Shu T, Liang MH. Applying a BP neural network model to predict the length of hospital stay. In : Huang G, Liu X, He J, Klawonn F, Yao G, editors. Health information science. Heidelberg, Germany: Springer;2013. p. 18–29.
13. Averill RF, Goldfield N, Hughes JS, Bonazelli J, Mc-Cullough EC, Mullin R, et al. 3M APR DRG classification system: methodology overview [Internet]. Wallingford (CT): 3M Health Information Systems;2008. cited at 2020 Jan 10. Available from:
14. Thompson B, Elish K, Steele R. Machine learning-based prediction of prolonged length of stay in newborns. In : Proceedings of 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA); 2018 Dec 17–20; Orlando, FL. p. 1454–1459.
15. Allard JP, Keller H, Jeejeebhoy KN, Laporte M, Duerksen DR, Gramlich L, et al. Decline in nutritional status is associated with prolonged length of stay in hospitalized patients admitted for 7 days or more: a prospective cohort study. Clin Nutr. 2016; 35(1):144–152.
16. Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008; 28(5):1–26.
17. Amer M, Goldstein M, Abdennadher S. Enhancing one-class support vector machines for unsupervised anomaly detection. In : Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description; 2013 Aug 11; Chicago, IL. p. 8–15.
18. Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003; 3:1157–1182.
19. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002; 16:321–357.
20. Jeni LA, Cohn JF, De La Torre F. Facing imbalanced data: recommendations for the use of performance metrics. In : Proceedings of 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction; 2003 Sep 2–5; Geneva, Switzerland. p. 245–251.
21. Chuang MT, Hu YH, Lo CL. Predicting the prolonged length of stay of general surgery patients: a supervised learning approach. Int Trans Oper Res. 2018; 25(1):75–90.
22. Wang Y, Stavem K, Dahl FA, Humerfelt S, Haugen T. Factors associated with a prolonged length of stay after acute exacerbation of chronic obstructive pulmonary disease (AECOPD). Int J Chron Obstruct Pulmon Dis. 2014; 9:99–105.
23. Hachesu PR, Ahmadi M, Alizadeh S, Sadoughi F. Use of data mining techniques to determine and predict length of stay of cardiac patients. Healthc Inform Res. 2013; 19(2):121–129.
24. Gonçalves-Bradley DC, Lannin NA, Clemson LM, Cameron ID, Shepperd S. Discharge planning from hospital. Cochrane Database Syst Rev. 2016; (1):CD000313.
25. Chandra S, Wright SM, Howell EE. The Creating Incentives and Continuity Leading to Efficiency staffing model: a quality improvement initiative in hospital medicine. Mayo Clin Proc. 2012; 87(4):364–371.
26. Menachemi N, Rahurkar S, Harle CA, Vest JR. The benefits of health information exchange: an updated systematic review. J Am Med Inform Assoc. 2018; 25(9):1259–1265.
27. Kaufman BG, Spivack BS, Stearns SC, Song PH, O'Brien EC. Impact of accountable care organizations on utilization, care, and outcomes: a systematic review. Med Care Res Rev. 2019; 76(3):255–290.
28. Hasan O, Orav EJ, Hicks LS. Insurance status and hospital care for myocardial infarction, stroke, and pneumonia. J Hosp Med. 2010; 5(8):452–459.
29. Bai G, Anderson GF. US Hospitals are still using chargemaster markups to maximize revenues. Health Aff (Millwood). 2016; 35(9):1658–1664.
30. Krinsky S, Ryan AM, Mijanovich T, Blustein J. Variation in payment rates under Medicare's inpatient prospective payment system. Health Serv Res. 2017; 52(2):676–696.
Full Text Links
  • HIR
export Copy
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: