subsetByVIF selects subsets of the covariates such that each covariate in a given subset has a VIF that is less than or equal to a value specified by the user. This program has been posted on the Statistical Software Components (SSC) archive.

We are frequently faced with analyzing data sets in which the ratio of covariates to patients is high. There are several approaches to analyzing such data including penalized regression methods, k-fold cross-validation techniques, and bagging. A problem with any of these approaches is that, even after the elimination of variables causing multicollinearity, the variance-covariance matrix of the remaining covariates is often highly ill-conditioned. The subsetByVIF program reduces the number of covariates to the largest subsample such that the maximum VIF for each variable in the subsample is less than some value specified by the user. These variables are selected without regard to the dependent variable of interest, which should mitigate problems due to overfitting. The use of this program should improve the convergence properties of many methods of exploratory data analysis.

Please see the subsetByVIF help file in the SSC archive for further details.