I am using a logit model to look at the effect of competitiveness on gambling, using competitive sport as a proxy to measure competitiveness. My dependent variable is a dummy variable, which takes the value 1 if someone gambled in the past 12 months and 0 otherwise. My independent variable is a categorical variable, taking the value 1 if someone took part in competitive sport, 2 if they took part in only fitness-based physical activity and 0 if they did not undertake any physical activity at all. I also have a range of control variables.
Two of my control variables are gender and age (and age squared since the age distribution was quadratic). I was interested in seeing how the effect of competitive sport on gambling varies with gender and age, so I interacted the competitive sport variable with gender, age and age squared. I then used marginsplot to see how the marginal effect of competitive sport on gambling varies across gender and age. My initial regression did not include the age and gender interactions.
When I'm evaluating the total, direct effect of competitive sport on gambling using margins, dydx(*) atmean, should I remove the interactions from the regression or keep them in? Given that marginal effects do not exist for interaction terms, I am unsure how to proceed.
As an additional point, I also have data on the different types of gambling activities people engage in. I have grouped these activities into 2 broad categories: skill-based (gambling activities where people have some element of control over the outcome) and pure-chance (activities where people have no control over the outcome). I have created a dummy variable to capture this, which takes the value 1 if someone took part in any skill-based activity and 0 if they took part in only pure-chance activities. I want to see whether people who take part in competitive sport are more likely to engage in skill-based rather than pure-chance activities. Given that non-gamblers cannot take part in either skill-based or pure-chance activities, I was thinking of using a heckprobit model to account for the potential sample selection bias. I am unsure of whether the 'select' part of the heckprobit model should include the age and gender interaction terms or not?
Two of my control variables are gender and age (and age squared since the age distribution was quadratic). I was interested in seeing how the effect of competitive sport on gambling varies with gender and age, so I interacted the competitive sport variable with gender, age and age squared. I then used marginsplot to see how the marginal effect of competitive sport on gambling varies across gender and age. My initial regression did not include the age and gender interactions.
When I'm evaluating the total, direct effect of competitive sport on gambling using margins, dydx(*) atmean, should I remove the interactions from the regression or keep them in? Given that marginal effects do not exist for interaction terms, I am unsure how to proceed.
As an additional point, I also have data on the different types of gambling activities people engage in. I have grouped these activities into 2 broad categories: skill-based (gambling activities where people have some element of control over the outcome) and pure-chance (activities where people have no control over the outcome). I have created a dummy variable to capture this, which takes the value 1 if someone took part in any skill-based activity and 0 if they took part in only pure-chance activities. I want to see whether people who take part in competitive sport are more likely to engage in skill-based rather than pure-chance activities. Given that non-gamblers cannot take part in either skill-based or pure-chance activities, I was thinking of using a heckprobit model to account for the potential sample selection bias. I am unsure of whether the 'select' part of the heckprobit model should include the age and gender interaction terms or not?

Comment