Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Alternative specifications on interacting two binary variables

    Dear Members,

    I'm doing a pooled OLS estimation using state-level data from two different years to see how smoking prevalence changed from the earlier year to the later year. There was a tax policy event in between. Because the tax policy was applied to all states, I don't have a control group of states to do a standard DiD. I like to see how the tax reform changed the smoking prevalence in states with below average tax during pre-reform time compared to states with above average tax. So I created a dummy (LTstate) taking value 1 for all states with below-average tax rates before the tax reform and 0 for the high-tax states. I also have a dummy for the tax reform which takes value 1 for the later year and 0 for the former year. However, the following specifications give me slightly different coefficient estimates.


    Code:
    1. regress prevalence tax##LTstate [controls], vce(cluster state)
    2. regress prevalence tax LTstate tax#LTstate [controls], vce(cluster state)
    In both cases, the coefficient on tax is identical. However, the estimate for LTstate differs in both models. For the interaction term model 1 returns estimates for tax#LTstate 1 1 while model 2 returns tax#LTstate 0 1. In both models, however, the coefficient has identical size but opposite signs.

    My understanding is that the coefficient on tax should give the overall effect for all states (or is it capturing only the effect for high tax states because of the presence of interaction term). Coefficient on LTstate should provide the difference in smoking prevalence between low-tax and high-tax states before the tax policy I thought. But I am not sure why both models return alternative coefficient estimates for that dummy. Can someone guide me to pick the correct specification here that will help me estimate the effects separately for low-tax and high-tax state as well as a combined overall effect?

    Thanking you in advance.

    Rijo.
    Last edited by Rijo John; 28 Jul 2024, 00:02.

  • #2
    Stata's assumptions on whether a variable is categorical or continuous differs for variables using the "#" notation and those not; it is thus very important that you use "i." for categorical variables and "c." for continuous variables; this is explained in the help file; see
    Code:
    h fvvarlist

    Comment


    • #3
      Code:
      1. regress prevalence tax##LTstate [controls], vce(cluster state)
      2. regress prevalence tax LTstate c.tax#c.LTstate [controls], vce(cluster state)

      Comment


      • #4
        It is possible to get mathematically identical results with binary coded (only 0/1) variables when treating them as either continuous or categorical, it’s best to be explicit about which are categorical and continuous, especially in interaction terms so as not to avoid accidental mistakes leading to nonsense models.

        As for equivalent specifications, you could go with the marginal means approach which is more common:

        Code:
        … i.a##i.b    // a and b are your two binary variables
        i.a i.b i.a#i.b.  // equivalent expansion of the above
        the other way is the so called cell means model:

        Code:
        ibn.a#ibn.b , nocons

        Comment


        • #5
          Thank you very much, everyone. Since both tax and LTstate are binaries in my example the following returned identical results:
          Code:
          1. regress prevalence i.tax##i.LTstate [controls], vce(cluster state)
          2. regress prevalence i.tax i.LTstate i.tax#i.LTstate [controls], vce(cluster state)

          Comment

          Working...
          X