Hi there! I have a question regarding the use of interaction terms in a cross-sectional regression model. I hope you can bear with my somewhat dodgy statistical explanation:
Currently I am working on a study for which I have a sample consisting of roughly 500 observations, which can be divided under type 1 and type 2 (similar to e.g. male and female). My interest goes out to finding whether the outcome in the dependent variable is significantly different between groups. After performing a difference in means t-test I found out that there is a significant difference, but I want to further control whether this difference holds by adding in total 8 independent variables. I performed an OLS regression including all these variables and a dummy that noted 1 for type 1 and 0 for type 2, but I was told to include interaction term effects. This way my model would look something like:
a + b1x1 + b2x2 + b3x3... b9x1x2 + b10x1x3... + e (I basically multiplied every predictor variable with the dummy to find the additional interaction terms)
I am wondering:
a) if I add a dummy to a regression with multiple predictor variables into a regression, why can I not interpret the outcome for the dummy as already being controlled for these variables?
b) whether this is legit? I have found models specifying e.g. x1 + x2 + x1x2, but I could hardly find any good reads on adding multiple independent variable interaction terms.
c) to me it seems highly problematic to use these many interaction terms, as my intention is to find out what the effect is for the dummy variable (type 1), and whether it is significant. I am aware that adding interaction terms gives me no more chance to directly interpret the coefficient and the significance, but I could hardly imagine how I can still make any valid assumptions with this amount of interaction terms?
d) Just as a check of my knowledge, if the interaction term lets say dummy1 x education comes out significant over this entire interaction model, can I then correctly assume that the coefficient shows the difference in the effect the predictor education has on the dependent variable for all type 1 when compared to type 2?
I hope that I explained my current problem clearly, if in need for any clarification I will gladly do so. Thanks in advance for any help.
Best, Robert
Currently I am working on a study for which I have a sample consisting of roughly 500 observations, which can be divided under type 1 and type 2 (similar to e.g. male and female). My interest goes out to finding whether the outcome in the dependent variable is significantly different between groups. After performing a difference in means t-test I found out that there is a significant difference, but I want to further control whether this difference holds by adding in total 8 independent variables. I performed an OLS regression including all these variables and a dummy that noted 1 for type 1 and 0 for type 2, but I was told to include interaction term effects. This way my model would look something like:
a + b1x1 + b2x2 + b3x3... b9x1x2 + b10x1x3... + e (I basically multiplied every predictor variable with the dummy to find the additional interaction terms)
I am wondering:
a) if I add a dummy to a regression with multiple predictor variables into a regression, why can I not interpret the outcome for the dummy as already being controlled for these variables?
b) whether this is legit? I have found models specifying e.g. x1 + x2 + x1x2, but I could hardly find any good reads on adding multiple independent variable interaction terms.
c) to me it seems highly problematic to use these many interaction terms, as my intention is to find out what the effect is for the dummy variable (type 1), and whether it is significant. I am aware that adding interaction terms gives me no more chance to directly interpret the coefficient and the significance, but I could hardly imagine how I can still make any valid assumptions with this amount of interaction terms?
d) Just as a check of my knowledge, if the interaction term lets say dummy1 x education comes out significant over this entire interaction model, can I then correctly assume that the coefficient shows the difference in the effect the predictor education has on the dependent variable for all type 1 when compared to type 2?
I hope that I explained my current problem clearly, if in need for any clarification I will gladly do so. Thanks in advance for any help.
Best, Robert
Comment