No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Double selection lasso with interaction terms as vars of interest


    I am currently using dsregress to estimate a double-selection lasso linear regression using Stata 16.

    My variables of interest are interaction terms between two categorical variables, let's call them i.x1 (i=0,1,2) and i.x2 (i=1-7). Ideally, I want to estimate the coefficients of these interaction terms, while allowing lasso to select among a vast set of controls Z. Something like:
    dsregress y i.x1#i.x2 , controls(i.x2 Z i.x2#Z)
    The issue is that I would like i.x2 to be also included in the set of controls Z because I would like to consider all possible interactions between x2 and the other covariates.
    If I estimate this model only for x1, it works just fine, as x2 is allowed to be included in Z
    dsregress y i.x1 , controls(i.x2 Z i.x2#Z)
    Similarly, if I estimate a model
    dsregress y i.x1#i.x2 , controls(Z)
    it works too, and it is close to what I need, apart from the fact that i.x2 (and possible interactions with Z) is not included in the controls. So, this is not entirely correct.

    My question is, how can I estimate the coefficients for the interaction terms i.x1#i.x2 , while including i.x2 and interactions i.x2#Z in the set of controls?
    If I try to do this
    dsregress y i.x1##i.x2 , controls(i.x2 Z i.x2#Z)
    , I get an error message saying that
    i.x2 cannot be specified both in varsofinterest and controls
    I understand this, as the use of ## implies that i.x1 and i.x2 are vars of interest too, so i.x2 is showing up in both sides.
    I do not need the coefficients for i.x2, and if I try to estimate
    dsregress y i.x1 i.x1#i.x2 , controls(i.x2 Z i.x2#Z)
    Stata omits not only the usual baseline category 0bn.x1#1bn.x2 (which it should omit) but also the whole category of 2bn.x1#i.x2, which I am interested in observing.

    Is there a way around this?
    Many thanks in advance.