Stratified Sampling Design Based on Data Mining

Kim, Yeonkook J; Oh, Yoonhwan; Park, Sunghoon; Cho, Sungzoon; Park, Hayoung

Healthc Inform Res. 2013 Sep;19(3):186-195. 10.4258/hir.2013.19.3.186.

Stratified Sampling Design Based on Data Mining

Affiliations

¹Technology Management, Economics and Policy Graduate Program, Seoul National University, Seoul, Korea. hayoungpark@snu.ac.kr
²Department of Industrial Engineering, Seoul National University, Seoul, Korea.

KMID: 2166697
DOI: http://doi.org/10.4258/hir.2013.19.3.186

Abstract

OBJECTIVES
To explore classification rules based on data mining methodologies which are to be used in defining strata in stratified sampling of healthcare providers with improved sampling efficiency.
METHODS
We performed k-means clustering to group providers with similar characteristics, then, constructed decision trees on cluster labels to generate stratification rules. We assessed the variance explained by the stratification proposed in this study and by conventional stratification to evaluate the performance of the sampling design. We constructed a study database from health insurance claims data and providers' profile data made available to this study by the Health Insurance Review and Assessment Service of South Korea, and population data from Statistics Korea. From our database, we used the data for single specialty clinics or hospitals in two specialties, general surgery and ophthalmology, for the year 2011 in this study.
RESULTS
Data mining resulted in five strata in general surgery with two stratification variables, the number of inpatients per specialist and population density of provider location, and five strata in ophthalmology with two stratification variables, the number of inpatients per specialist and number of beds. The percentages of variance in annual changes in the productivity of specialists explained by the stratification in general surgery and ophthalmology were 22% and 8%, respectively, whereas conventional stratification by the type of provider location and number of beds explained 2% and 0.2% of variance, respectively.
CONCLUSIONS
This study demonstrated that data mining methods can be used in designing efficient stratified sampling with variables readily available to the insurer and government; it offers an alternative to the existing stratification method that is widely used in healthcare provider surveys in South Korea.

Keyword

Sampling Studies; Decision Trees; Data Mining

MeSH Terms

Data Mining
Decision Trees
Efficiency
Health Personnel
Humans
Inpatients
Insurance Carriers
Insurance, Health
Korea
Ophthalmology
Population Density
Republic of Korea
Sampling Studies
Specialization

Figure

Figure 1 Decision tree inducted to stratify general surgery (GS) clinics and hospitals. NPAT_SPEC: number of inpatients per specialist, POPD: population density of the region, LI: lengthiness index, DRG: Diagnosis Related Group.
Figure 2 Decision tree inducted to stratify ophthalmology clinics and hospitals. NPAT_SPEC: number of inpatients per specialist, FAC_SICK: number of beds.

Reference

1. Graubard BI, Korn EL. Modelling the sampling design in the analysis of health surveys. Stat Methods Med Res. 1996; 5(3):263–281.
Article

2. Lohr SL. Sampling: design and analysis. Belmont (CA): Duxbury Press;1999.

3. Scheaffer RL, Mendenhall W 3rd, Lyman Ott R. Elementary survey sampling. 6th ed. Belmont (CA): Duxbury Press;2006.

4. Park H, Kang G, Shin K, Oh Y, Lee C, Lee E, et al. A study on updating payment rates in the DRG based prospective payment system. Seoul, Korea: Seoul National University R&D Foundation;2013.

5. Han J, Kamber M, Pei J. Data mining: concepts and techniques. 3rd ed. San Francisco (CA): Morgan Kaufmann Publishers;2011.

6. Hung SY, Yen DC, Wang HY. Applying data mining to telecom churn management. Expert Syst Appl. 2006; 31(3):515–524.
Article

7. Shin H, Park H, Lee J, Jhee WC. A scoring model to detect abusive billing patterns in health insurance claims. Expert Syst Appl. 2012; 39(8):7441–7450.
Article

8. Ngai EW, Xiu L, Chau DC. Application of data mining techniques in customer relationship management: a literature review and classification. Expert Syst Appl. 2009; 36(2):2592–2602.
Article

9. Statistics Korea. Korean Statistical Information Service [Internet]. Daejeon, Korea: Statistics Korea;c2013. cited at 2013 Sep 1. Available from: http://kosis.kr/eng/database/database_001000.jsp?listid=A&subtitle=Population/Household#jsClick.

10. Jain AK. Data clustering: 50 years beyond k-means. Pattern Recognit Lett. 2010; 31(8):651–666.
Article

Stratified Sampling Design Based on Data Mining

Abstract

Keyword

MeSH Terms

Figure

Reference

Cited

Save citations to file

Email citations