Korean J Anesthesiol.  2019 Dec;72(6):558-569. 10.4097/kja.19087.

Multicollinearity and misleading statistical results

Affiliations
  • 1Department of Anesthesiology and Pain Medicine, School of Medicine, Daegu Catholic University, Daegu, Korea. usmed12@gmail.com

Abstract

Multicollinearity represents a high degree of linear intercorrelation between explanatory variables in a multiple regression model and leads to incorrect results of regression analyses. Diagnostic tools of multicollinearity include the variance inflation factor (VIF), condition index and condition number, and variance decomposition proportion (VDP). The multicollinearity can be expressed by the coefficient of determination (R(h)²) of a multiple regression model with one explanatory variable (X(h)) as the model's response variable and the others (X(i) [i≠h] as its explanatory variables. The variance (σ(h)²) of the regression coefficients constituting the final regression model are proportional to the VIF(1/1−R(h)²). Hence, an increase in R(h)² (strong multicollinearity) increases σ(h)². The larger σ(h)² produces unreliable probability values and confidence intervals of the regression coefficients. The square root of the ratio of the maximum eigenvalue to each eigenvalue from the correlation matrix of standardized explanatory variables is referred to as the condition index. The condition number is the maximum condition index. Multicollinearity is present when the VIF is higher than 5 to 10 or the condition indices are higher than 10 to 30. However, they cannot indicate multicollinear explanatory variables. VDPs obtained from the eigenvectors can identify the multicollinear variables by showing the extent of the inflation of σ(h)² according to each condition index. When two or more VDPs, which correspond to a common condition index higher than 10 to 30, are higher than 0.8 to 0.9, their associated explanatory variables are multicollinear. Excluding multicollinear explanatory variables leads to statistically stable multiple regression models.

Keyword

Biomedical research; Biostatistics; Multivariable analysis; Regression; Statistical bias; Statistical data analysis

MeSH Terms

Bias (Epidemiology)
Biostatistics
Data Interpretation, Statistical
Inflation, Economic
Full Text Links
  • KJAE
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr