Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference between interaction term or subgroup analysis when introducing control variables

    Hi everybody

    I have a question. I am looking into the difference between public and outsourced employees regarding job engagement. I usually use an interaction term to conduct this analysis. However, I have been asked by a reviewer to include a subgroup analysis to test differences in slopes. My intuition told me this difference would be identical to the interaction term. What I find is that without control variables, everything is similar in terms of point estimates and SE (except for minor differences, which I assume is due to using a z-test (in suest) and t-test (in the interaction analysis)). However, when introducing control variables, the approaches differ in terms of SE but also, surprisingly, in point estimates. Can anybody elaborate on this difference?

    Without control variables:
    Code:
    reg engagement i.treatment_matched##c.workload [iw =cem_weights], vce(robust)
     
    foreach var1 of varlist engagement  {
    foreach var2 of varlist  workload {
    qui reg `var1' c.`var2'  [iw =cem_weights] if treatment_matched == 0 // , vce(robust)
    est store treatment_matched0
    qui reg `var1' c.`var2'  [iw =cem_weights] if treatment_matched == 1 // , vce(robust)
    est store treatment_matched1
    di "`var1'" " & " "`var2'"
    qui suest treatment_matched0 treatment_matched1, vce(robust) coefl
    lincom (_b[treatment_matched1_mean:`var2']-_b[treatment_matched0_mean:`var2'])
    }
    }
    With control variables:
    Code:
    reg engagement i.treatment_matched##c.workload c.alder_alt ib(1).kon ib(2).uddannelse [iw =cem_weights] , vce(robust)
     
    foreach var1 of varlist engagement  {
    foreach var2 of varlist  workload {
    qui reg `var1' c.`var2'  c.alder_alt ib(1).kon ib(2).uddannelse [iw =cem_weights] if treatment_matched == 0 // , vce(robust)
    est store treatment_matched0
    qui reg `var1' c.`var2'  c.alder_alt ib(1).kon ib(2).uddannelse [iw =cem_weights] if treatment_matched == 1 // , vce(robust)
    est store treatment_matched1
    di "`var1'" " & " "`var2'"
    qui suest treatment_matched0 treatment_matched1, vce(robust) coefl
    lincom (_b[treatment_matched1_mean:`var2']-_b[treatment_matched0_mean:`var2'])
    }
    }
    Thanks!

    Best
    Gustav

  • #2
    Hi again,

    Following #2 in this thread (https://www.statalist.org/forums/for...oup-regression), I realized that I had to interact all the control variables with the treatment_matched to arrive at equivalent point estimates for the interaction approach and the subgroup approach. However, the SE is slightly lower in the subgroup analysis (.1162794) than the interaction analysis (.118738), which aligns with my reviewer's claim that this approach has more statistical power. I know that Clyde Schechter usually say "that the difference between a statistically significant finding and a non-statistically significant finding is, itself, not statistically significant", so is there any other argument for choosing one approach over the other?

    Best
    Gustav

    Comment


    • #3
      Concern yourself, always, with practicality.. If you go up to a city manager and say "Hey, I know this cool intervention that can save you 0.00004%, on average, on...... building roads or whatever city manager people do, see what their reaction will be after you tell them it is statistically significant. Either way, I haven't used CEM in a while, so I can't comment on the details, but subgroup analyses and interactions are two different things. If I'm doing the impact of raising the minimum wage on employment, I can have a very general effect (all restaurants in the treated/control cities), or I can do the same analysis, but only restricting my sample to French establishments or Chinese establishments, or establishments that make over a given amount of he median income.

      Comment


      • #4
        I think you’d get some difference b/c you have one error term in the interaction model and two error terms in the sub-group model. Gujarati (1970), while…old…lays things out very clearly. Worth a quick read.

        Gujarati, D. (1970). Use of dummy variables in testing for equality between sets of coefficients in linear regressions: a generalization. American Statistician, 24(5), 18–22.


        _______________________________________

        Glenn Hoetker
        Professor in Business Strategy

        Melbourne Business School, University of Melbourne
        200 Leicester Street, Carlton, Victoria 3053, Australia
        Email: [email protected]

        I acknowledge the Traditional Owners of the land on which I work, the Wurundjeri people of the Kulin Nations, and pay my respects to their Elders, past and present.

        Comment


        • #5
          Thanks you Jared and Glenn. It makes great sense. And thanks for the literature.

          Comment

          Working...
          X