Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What are the advantage of adding interactive variable over subsampling in drawing conclusions?

    There are two main advantages of adding interaction variables over subsampling are: (1) having higher sample size, leading to higher precision and (2) higher degree of freedom.
    Nornal regression equation:

    Dependent_variables= pt + Independent_variables + fixed effects + error term (1)
    Adding interaction variable equation:

    Dependent_variables= pt + developed_dummy*pt + Independent_variables + fixed effects + error term (2)
    I am wondering what is the advantage of the explanation of adding interaction variables over subsampling (if having)?
    Click image for larger version

Name:	1.PNG
Views:	1
Size:	22.0 KB
ID:	1620767


    I know how to explain the second regression result:
    Click image for larger version

Name:	2.png
Views:	1
Size:	11.2 KB
ID:	1620770


    Last edited by Phuc Nguyen; 26 Jul 2021, 16:47.

  • #2
    You have already mentioned the differences of sample size in the two approaches, and I won't elaborate on that.

    The other difference is important and you need to consider which way it moves your decision-making. In the non-interaction model run separately on each subsample, all of the variables other than pt and developed and developed#pt (which I will refer to as just covariates) gets a separately estimated coefficient in each subsample. In effect, you are fitting models in which there is no necessary relationship between the effect of any covariate on the dependent variable in developing countries and the effect of that same covariate in developed countries. By contrast, when you use the interaction model in the way you show, you are constraining the effects of each covariate to be the same in both developing and developed countries.

    So which is a more realistic representation of the real world data generating process? Do the covariates affect the DV in the same way regardless of stage of development? If so, the interaction model is a better model. If not, you might be better off using the two subsample models, so that you can properly capture the different effects of these other variables. The ideal situation is actually to identify which covariates should have different effects in developing and developed countries, and which should not. Then interact the developed variable with all of the former and none of the latter.

    Comment


    • #3
      I like the flexibility of using interactions. As Clyde says, when you estimate models for each subsample separately, all parameters are free to differ across groups. They could even have different signs, e.g. a variable could have a negative effect in one group and a positive effect in another.

      When you pool the samples you have more flexibility. You might let the constants differ across groups, but require all slopes to be the same. Or, you could allow one or two slopes to differ by groups, but constrain the other slopes to be the same. Or, you could toss in interactions for everything, with would be pretty much the equivalent of estimating models for each group separately.

      Often, you have good reason for thinking that the effect of a variable will differ by group, e,g. you might think that the effect of years of education is different for women than for men. You can there add an interaction of gender x education to your model. But, all the other variable effects can be constrained to be the same if you don't expect them to differ by gender. Only allowing for a few interactions produces a much more parsimonious model than does letting the effect of every variable differ by gender.

      For more, see

      https://www3.nd.edu/~rwilliam/stats2/l51.pdf
      -------------------------------------------
      Richard Williams, Notre Dame Dept of Sociology
      StataNow Version: 19.5 MP (2 processor)

      EMAIL: [email protected]
      WWW: https://www3.nd.edu/~rwilliam

      Comment

      Working...
      X