Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting Interaction Term Coefficients

    Dear Stata Users,

    I intend to study the relationship between ethnicity and COVID-19 on income.

    I conduct a log-linear regression with the log of income as the dependent as follows...

    Code:
    reg LOGNETINCOME MIXED INDIAN PAKISTANI BANGLADESHI CHINESE OTHER_ASIAN BLACK OTHER COVIDDEATHSBYPOP c.COVIDDEATHSBYPOP##i.MIXED c.COVIDDEATHSBYPOP##i.INDIAN c.COVIDDEATHSBYPOP##i.PAKISTANI c.COVIDDEATHSBYPOP##i.BANGLADESHI c.COVIDDEATHSBYPOP##i.CHINESE c.COVIDDEATHSBYPOP##i.OTHER_ASIAN c.COVIDDEATHSBYPOP##i.BLACK c.COVIDDEATHSBYPOP##i.OTHER if FEMALE==0, robust
    The regression also includes several other variables, particularly dummy variables (quarter, region, education, industry)

    With particular reference to the interaction terms, I am unfortunately struggling to understand their effects therefore I would like to ask...

    1) Put simply in words, how does one interpret the interaction between COVID-19 deaths by population (COVIDDEATHSBYPOP) and Ethnic Group?

    2) Given the comparatively high (or low in the negative sense) coefficients of the interactions, does this remain possible with the log of income as the y-variable, or have I gone disastrously wrong?

    An extract of the regression output is as follows:

    LOGNETINCOME_________________Coef. Std. Err. t P>|t| [95% Conf. Interval]
    INDIAN#c.COVIDDEATHSBYPOP_ . 157.1472 58.63142 2.68 0.007 42.2305 272.0638
    INDIAN _______________________-.0268082 .0246202 -1.09 0.276 -.0750634 .0214471
    COVIDDEATHSBYPOP___________58.12169 29.31941 1.98 0.047 .6561067 115.5873

    Apologies if any issues have been made in the posting procedure...rookie mistakes I'm sure.

    Thanks for any help!

    I appreciate your time.

    Regards,

    Guest.

    Last edited by sladmin; 01 Sep 2022, 07:07. Reason: anonymize original poster

  • #2
    The interpretation of interaction coefficients is complicated, especially when there is more than one interaction term involving the same variable. In your regression, it is made even a bit more complicated by virtue of having created separate indicator ("dummy") variables for each ethnic group. I suggest that you re-run things in the following way:

    1. Create a single variable, ethnicity, that is coded 1 for MIXED, 2 for INDIAN, 3 for PAKISTANI, 4 for BANGLADESHI, 5 for CHINESE, 6 for OTHER_ASIAN, 7 for BLACK, 8 for OTHER, etc. (There should be at least one ethnic group not accounted for in the list you used, so include that in the variable as well, coded as 0. If there are still others that you have not shown, add them on as well starting with 9. Just one variable distinguishing all the different ethnicities is what you need. I recommend putting value labels on this variable so you don't have to constantly remember which number corresponds to which group. Drop the separate variables for each ethnicity: you don't need them and they just clutter up your data set.

    2. Redo the regression as:
    Code:
    reg LOGNETINCOME i.ethnicity##c.COVIDDEATHSBYPOP if female == 0, robust
    This is exactly the same regression model, and you will get the same results as before, arranged a bit more conveniently. But, the best part is that now you can leave the interpretation of the interactions to the -margins- command.

    3. Run -margins-. Before doing that you must identify a set of interesting values of the variable COVIDDEATHSBYPOP. You should choose several that span, approximately, the range of observed values of that variable in the data set, and include some that are near the mean or median. For the purposes of demonstrating code, I will assume these values are 5 10 15 20 30 50. (Use actual values based on your data--these numbers are just for demonstrating the approach to the code.)
    Code:
    margins ethnicity, at(COVIDDEATHSBYPOP = (5 10 15 20 30 50))
    marginsplot
    margins ethnicity, dydx(COVIDDEATHSBYPOP)
    The first of these -margins- commands will show you the expected value of log net income in each ethnic group (including the omitted reference group) at each of the selected values of covid deaths by population. The -marginsplot- command will graph a separate curve for each ethnicity of expected value of log net income vs coviddeathsbypop. All the curves will be on the same graph. And the final one will show you the marginal effect of coviddeathsbypop on log net income in each ethnic group.

    Given the comparatively high (or low in the negative sense) coefficients of the interactions, does this remain possible with the log of income as the y-variable, or have I gone disastrously wrong?
    Probably not. Remember that the coefficients of the interaction terms are scaled in the inverse of the units of coviddeathsbypop. So if, for example, coviddeathsbypop is in units that make its values small numbers, then the coefficients of those terms are going to be large. Remember that (at least when there are no interactions) the coefficient represents the expected difference in outcome associated with a 1 unit difference in the variable. So if say population fatality rates in your data range from 0.1% up to 3%, and if these are recorded in the data as 0.001 and 0.03, then a unit difference in this variable corresponds to the difference between no deaths at all and everybody dying from covid. So the impact of that on almost any outcome that makes any sense would be enormous. On the other hand, if these are coded as 0.1 and 3, then a unit difference corresponds to the difference between 1% and 2% dying: the effect of that should be fairly modest. So, it all depends on the scaling of that variable. When you look at the output of the first -margins- command you will be able to tell if things are working properly or not. If they aren't, don't spend a lot of time fretting about the code: explore what is wrong with the data. I'm pretty confident that this code is correct (and that your original code was as well, although it is not suited to work with -margins-.)
    Last edited by Clyde Schechter; 28 Aug 2022, 20:48.

    Comment


    • #3
      Clyde Schechter Thank you for your much-needed input!

      Given the code you have suggested, I now understand the effects of the interaction terms.

      I appreciate it!

      Comment

      Working...
      X