Genomics Inform.  2019 Dec;17(4):e47. 10.5808/GI.2019.17.4.e47.

Pure additive contribution of genetic variants to a risk prediction model using propensity score matching: application to type 2 diabetes

  • 1Department of Statistics, Seoul National University, Seoul 08826, Korea.
  • 2Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea.


The achievements of genome-wide association studies have suggested ways to predict diseases, such as type 2 diabetes (T2D), using single-nucleotide polymorphisms (SNPs). Most T2D risk prediction models have used SNPs in combination with demographic variables. However, it is difficult to evaluate the pure additive contribution of genetic variants to classically used demographic models. Since prediction models include some heritable traits, such as body mass index, the contribution of SNPs using unmatched case-control samples may be underestimated. In this article, we propose a method that uses propensity score matching to avoid underestimation by matching case and control samples, thereby determining the pure additive contribution of SNPs. To illustrate the proposed propensity score matching method, we used SNP data from the Korea Association Resources project and reported SNPs from the genome-wide association study catalog. We selected various SNP sets via stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and the elastic-net (EN) algorithm. Using these SNP sets, we made predictions using SLR, LASSO, and EN as logistic regression modeling techniques. The accuracy of the predictions was compared in terms of area under the receiver operating characteristic curve (AUC). The contribution of SNPs to T2D was evaluated by the difference in the AUC between models using only demographic variables and models that included the SNPs. The largest difference among our models showed that the AUC of the model using genetic variants with demographic variables could be 0.107 higher than that of the corresponding model using only demographic variables.


genome-wide association study; penalized regression model; propensity score; type 2 diabetes

MeSH Terms

Area Under Curve
Body Mass Index
Case-Control Studies
Genome-Wide Association Study
Logistic Models
Polymorphism, Single Nucleotide
Propensity Score*
ROC Curve
Full Text Links
  • GNI
export Copy
  • Twitter
  • Facebook
Similar articles
Copyright © 2022 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: