Collinearity in interaction term

Ole Tomas

Join Date: May 2025

Posts: 5
#1

Collinearity in interaction term

11 May 2025, 08:50

For my master thesis, i want to research how Tax avoidance affects the change in ESG (Environmental, Social, Governance) score. I also want to see if Culture (time-invariant) moderates this relationship.
So my model looks like this:

ESG_delta = ETR + Culture + ETR * Culture + Controls

Culture is constructed as 9 specific cultural dimensions with a score ranging from 0-100. For example, Uncertainty avoidance = 55. This is based on in which country a firm has its headquarters. So this is time invariant in my panel dataset.
When I try to do a linear mixed effects regression, there is high multicollinearity present, namely, ETR (Effective tax rate) highly correlates with the interaction term, as well as Culture is correlated with the interaction term. This seems logical, however after trying to mean-center the variables, collinearity is still heavily present.

What to do about this?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#2

11 May 2025, 12:42

Your description of the model is unclear. You refer to Culture as "9 specific cultural dimensions," but in your model equation it appears to be a single variable. Are you using 9 culture variables in the regression or are the 9 variables somehow combined into a single culture variable that is used in the regressions?

Regardless of the answer to that question, I'm also wondering why you are using a mixed effects model here. Surely this is not randomized experimental data. So it would likely be more appropriate to use a fixed effects model, which would then eliminate the culture variable(s) (but preserve the interaction terms) and give you a much cleaner analysis, providing you with a direct answer to your question about effect modification (in the coefficient(s) of the interaction term(s)) without extraneous coefficients that are probably irrelevant to your research question, or even altogether meaningless (coefficients of the Culture and ETR terms). As a side effect of using the fixed effects model and losing the culture variable(s), you will also eliminate the multicolinearity, though, as I explain in the next paragraph, this doesn't really matter.

As for the multicollinearity, it is a non-issue. Interaction terms and their constituents always exhibit some degree of multi-colinearity. It is mathematically baked in to the calculations. But even beyond that, multicolinearity is always either a non-problem or an unsolvable problem--so either way there is nothing to be done. I highly recommend you read the chapter on multicollinearity in Arthur Goldberger's textbook of econometrics where he entertainingly demonstrates that "multicollinearity" is just a fancy term for "sample size too small." In your circumstance, the multicolinear relationship involves the interaction term--which is the most important part of your model. So if the standard error of the interaction term(s) is too large to give your estimated effect moderation enough precision to answer your research question, then you are probably just stuck: you need a (much) larger data set than you have, and it is seldom feasible to get that. (Given that the multicolinearity in this case is mathematically determined, the other solution, also seldom feasible, of using a different data design that breaks the multicolinearity is literally impossible.) If your interaction term standard error(s) is(are), happily, small enough that you can answer your research question with reasonable confidence, then the multicolinearity is just a non-issue and you should just forget about it.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5024
#3

11 May 2025, 15:29

I’m somewhat surprised you have high collinearity after centering the variables.

i pretty much agree with Clyde. It is possible you have high collinearity because you’ve done something stupid, like include income measured in dollars and thousands of dollars. There are also some things you can sometimes do, like create a scale out of variables. This handout tosses out some ideas.

https://www3.nd.edu/~rwilliam/stats2/l11.pdf

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
1 like
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1139
#4

11 May 2025, 16:31

The Goldberger comments on (multi)collinearity that Clyde Schechter mentioned in #2 are summarized in this blog post by Dave Giles. See also this blog post by Paul Allison, and this answer to a question by Mandy. In the final sentence of that answer, Allison wrote, "With centering, the main effects represent the effect of each variable when the other variable is at its mean." That is true in the case of mean-centering. A more general statement would say "when the other variable is at the value it was centered on".

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
1 like
Comment
Ole Tomas

Join Date: May 2025

Posts: 5
#5

12 May 2025, 02:15

To make it a bit more clear, the culture variable are 9 different variables, with each an individual score between 0-100. All these are individually measured in regression analysis.
My reasoning for not using fixed effects, was that time-invariant variables are omitted in a fixed effects model. However, if I still do a fixed effects model, are the coefficients still interpretable after the time invariant Cultural dimensions are omitted? If so, than I will decide to do a fixed effects regression, since the interaction term is really the sole point of the regression.

Thank you for your help!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#6

12 May 2025, 08:13

However, if I still do a fixed effects model, are the coefficients still interpretable after the time invariant Cultural dimensions are omitted?

Yes!
Comment

Announcement

Collinearity in interaction term

Comment

Comment

Comment

Comment

Comment