Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variable selection

    Dear Statalist users,
    I am interested in exploring several twoway interactions in multiple imputed datasets. I am not sure about the best way to do this, but one of the commands I have tried to follw a suggestion found here: https://rss.onlinelibrary.wiley.com/...8.2010.00740.x , using the cvlasso ( Ahrens, A., Hansen, C.B., Schaffer, M.E. 2018. cvlasso: Program for cross-validation using lasso, square-root lasso, elastic net, adaptive lasso and post-OLS estimators.). I want to loop over the imputed datasets, doing applying cvlasso on bootstrapped samples, and then post the results to a file. In the cvlasso, I have stated that all main effect variables are not to be penalized, only the interaction terms are. The code looks as follows:
    Code:
     while _mi_m < 11 {
     bsample
     cvlasso M_5mC_pct logAs logHg logCd logMn logPb Se_std MORS_ALDER Maternal_edu Parity KJONN SMOKING_X logJod logFOLAT c.logAs#(c.logCd c.logHg c.logMn c.logPb c.Se_std) /*
      */ c.logCd#(c.logHg c.logMn c.logPb c.Se_std) c.logHg#(c.logMn c.logPb c.Se_std) c.logMn#(c.logPb c.Se_std) c.logPb#c.Se_std /*
      */ i.KJONN#(c.logAs c.logCd c.logHg c.logMn c.logPb c.Se_std) c.logJod#(c.logAs c.logCd c.logHg c.logMn c.logPb c.Se_std), /*
      */ notpen(logAs logHg logCd logMn logPb Se_std KJONN SMOKING_X logJod logFOLAT) /*
      */ lopt postest tolzero(1e-8) /*alphacount(5)*/
      mat allecoef=e(betaAll)
      local `b1' = allecoef[1,1]
      local `b2' = allecoef[1,2]
      local `b3' = allecoef[1,3]
      local `b4' = allecoef[1,4]
      local `b5' = allecoef[1,5]
      local `b6' = allecoef[1,6]
      local `b7' = allecoef[1,7]
      local `b8' = allecoef[1,8]
      local `b9' = allecoef[1,9]
      local `b10' = allecoef[1,10]
      local `b11' = allecoef[1,11]
      local `b12' = allecoef[1,12]
      local `b13' = allecoef[1,13]
      local `b14' = allecoef[1,14]
      local `b15' = allecoef[1,15]
      local `b16' = allecoef[1,16]
      local `b17' = allecoef[1,17]
      local `b18' = allecoef[1,18]
      local `b19' = allecoef[1,19]
      local `b20' = allecoef[1,20]
      local `b21' = allecoef[1,21]
      local `b22' = allecoef[1,22]
      local `b23' = allecoef[1,23]
      local `b24' = allecoef[1,24]
      local `b25' = allecoef[1,25]
      local `b26' = allecoef[1,26]
      local `b27' = allecoef[1,27]
      local `b28' = allecoef[1,28]
      local `b29' = allecoef[1,29]
      local `b30' = allecoef[1,30]
      local `b31' = allecoef[1,31]
      local `b32' = allecoef[1,32]
      local `b33' = allecoef[1,33]
      local `b34' = allecoef[1,34]
      local `b35' = allecoef[1,35]
      local `b36' = allecoef[1,36]
      local `b37' = allecoef[1,37]
      local `b38' = allecoef[1,38]
      local `b39' = allecoef[1,39]
      local `b40' = allecoef[1,40]
      local `b41' = allecoef[1,41]
      local `b42' = allecoef[1,42]
      local `b43' = allecoef[1,43]
      local `b44' = allecoef[1,44]
      local `b45' = allecoef[1,45]
      local `b46' = allecoef[1,46]
      local `b47' = allecoef[1,47]
     
    
     post mysim4 (`b1')(`b2')(`b3')(`b4')(`b5')(`b6')(`b7')(`b8')(`b9')(`b10')(`b11')(`b12')(`b13')(`b14')(`b15')(`b16')(`b17')(`b18')(`b19')(`b20')(`b21')(`b22')(`b23')(`b24')(`b25')(`b26')(`b27')(`b28')(`b29')(`b30')(`b31')(`b32')(`b33')(`b34')(`b35')(`b36')(`b37')(`b38')(`b39')(`b40')(`b41')(`b42')(`b43')(`b44')(`b45')(`b46')(`b47')
    }
    postclose mysim4
    However, the end of the output gives me:
    Code:
    _= invalid name
    1) Is there any more appropriate ways in Stata 15 to evaluate many twoway interactions?
    2) Where does "_" come from?

    Best regards,
    Kjell Weyde

  • #2
    I have never worked with *lasso type models, so I cannot give specific advice on this.

    However, if you do not account for all possible interactions during the imputation process, you cannot (validly) test for these interactions in the imputed datasets later. Moreover, as far as I know, there is not an agreed-on method for combining multiple imputations with bootstrap procedures; I remember vaguely reading that you should actually make the imputation part of the bootstrap, i.e., bootstrap a sample with missing data, then impute the missing data, then analyze the imputed data, then draw the next bootstrap sample.

    To summarize: I believe that you will need code that (1) resamples repeatedly from the original data with missing values, (2) performs the imputations (2a) while accounting for any possible interactions that you wish to test for, (3) do the *lasso analysis for each imputed dataset, (4) combine the results from the analysis in an appropriate way.

    Best
    Daniel

    Comment

    Working...
    X