Spatial shocks: when is collinearity too much?

Xavier Pedros

Join Date: Oct 2016
Posts: 37

Spatial shocks: when is collinearity too much?

15 Sep 2022, 13:49

Hi,

I am working with a municipality-level dataset. I am analysing the effect of immigration shocks on a political voting outcome. I want to compare responses to immigration inside the municipality (in-municipality immigration), versus immigration in the neighbouring areas.

I have calculated immigration shocks in neighbouring areas, with different radiuses (10, 20, 30km). I have distance-weighted them. My thinking was to regress my outcome one of these variables, along the in-municipality immigration variable - to compare effects.

I find that in-municipality immigration is strongly, positively correlated with immigration shocks in neighbouring areas. For example, the raw correlation coefficient of the in-municipality immigration variable, with the 30km neighbouring measure is of 0.66. Moreover, descriptive stats have the same mean, with standard deviation being not too different:

Variable	N	Mean	Sd	Min	Max
In-municipality immigration	2,138	4.01	2.41	-1.06	22.68
Neighbouring 30km variable	2,138	4.01	3.55	-2.78	33.14
Neighbouring 20km variable	2,138	4.02	2.56	-1.30	24.93
Neighbouring 10km variable	2,138	4.01	2.41	-1.06	22.68

When I regress my outcome on in-municipality immigration alone, the coefficient is positive. When I regress my outcome on in-municipality immigration and, say, the 30km neighbouring measure, both variables have positive effects. Yet, the effect for the in-municipality variable reduces sharply - reflecting the positive correlation.

I did a VIF test. I regress the outcome against the in-municipality and the 30km neighbouring measure. Then run the "vif" command. And I get a 1.77 value, which would suggest no collinearity problems, as I understand.

Do you have any suggestions on whether I should be concerned or on what to do?

Last edited by Xavier Pedros; 15 Sep 2022, 14:06.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30354
#2

15 Sep 2022, 14:21

I don't understand precisely what you want to do here. Do you want to ultimately use the in-municipality immigration plus all three of the neighboring variables as explanatory variables in a model of some outcome? Or are you only interested in the effect of in-municipality immigration itself and including the others as covariates to adjust for?

As I don't know where you are going with this, I can't address your specific situation. But I can give you general guidance about the assessment of colinearity. For a longer, but highly entertaining, version to see the chapter on this problem in Arthur Goldberger's textbook A Course in Econometrics. The very short version is that in most situations even where there is very colinearity among variables, it is not a problem. And when it is a problem, there is usually nothing you can do about it anyway.

The longer version is this. If the variables that are involved in the near-colinearity relationship are not key explanatory variables but are included only for adjustment purposes, then there is no need to even ask whether there is colinearity. It is of no consequence to answering the research question in that setting. If, however, a key explanatory variable is among the variables that participates in the multi-colinearity, then there may be a problem. And the way to tell is to look at the standard error of the involved key explanatory variable(s). If the standard error of the coefficient of that variable is small enough that the coefficient is estimated with sufficient precision that you can, for practical purposes, answer your research question, then the multicolinearity is not a problem and you should just proceed. If, however, the standard error is large enough that the coefficient estimate is too imprecise to use, then you have a colinearity problem.

Unfortunately, when there is a colinearity problem, there isn't really anything much you can do about it. As Goldberger points out, multicolinearity should properly be called hyponumerosity. In other words, what we call multicolinearity, is in fact just a side effect of having a sample that is too small for the purpose at hand. The only solution is to get a better data set: either a bigger one (usually it needs to be much bigger) or one based on a different study design that reduces or eliminates the colinearity (e.g. a matched pair design or other non-simple sampling.) As a practical matter, that is usually going to be difficult or impossible.
2 likes
Comment
Xavier Pedros

Join Date: Oct 2016

Posts: 37
#3

15 Sep 2022, 14:39

Many thanks Clyde, very helpful.

Apologies for the lack of clarity. I want to use the in-municipality immigration variable plus one of the neighbouring immigration shocks, as explanatory variables.

Indeed, both variables are of interest. My regression output is in the image attached (with neighbouring immigration variables denoted with an upper bar). The outcome is the change in a political voting outcome. My interpretation is that neighbouring immigration might have a larger impact on the outcome, relative to the in-municipality immigration measure. Yet, I note that the in-municipality coefficient reduces drastically, as I add the neighbouring immigration variables. Hence denoting significant positive correlation. Hence I believe that I should probably be cautious about drawing interpretations.

Do you think this interpretation is sensible?
Attached Files

Last edited by Xavier Pedros; 15 Sep 2022, 15:12.
Comment
Xavier Pedros

Join Date: Oct 2016

Posts: 37
#4

16 Sep 2022, 00:47

Clyde Schechter Sorry, would you broadly agree with this interpretation?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30354
#5

16 Sep 2022, 08:29

I think a couple of issues are being confused here.

It is clearly the case that there is a pretty strong association among the four immigration variables. But the marked change in the estimate for the in-municipality coefficient as you add in one of the others is not a demonstration of multicolinearity. It is a demonstration of omitted variable bias! It demonstrates the necessity of incorporating more than just the in-municipality variable into your model in order to get valid conclusions.

If you were having a multicolinearity problem, what you would see in the models with both in-municipality and one of the other immigration variables is that their coefficients might be almost anything, but the standard errors for both coefficients would be large, perhaps a couple of orders of magnitude greater than what you actually have. (In the rubric of statistical significance, neither coefficient would be statistically significant, even though their coefficients might appear large.) By contrast, all of the variables have precisely estimated coefficients, with nice, small standard errors.
Comment
Xavier Pedros

Join Date: Oct 2016

Posts: 37
#6

16 Sep 2022, 10:15

Originally posted by Clyde Schechter View Post

I think a couple of issues are being confused here.

It is clearly the case that there is a pretty strong association among the four immigration variables. But the marked change in the estimate for the in-municipality coefficient as you add in one of the others is not a demonstration of multicolinearity. It is a demonstration of omitted variable bias! It demonstrates the necessity of incorporating more than just the in-municipality variable into your model in order to get valid conclusions.

If you were having a multicolinearity problem, what you would see in the models with both in-municipality and one of the other immigration variables is that their coefficients might be almost anything, but the standard errors for both coefficients would be large, perhaps a couple of orders of magnitude greater than what you actually have. (In the rubric of statistical significance, neither coefficient would be statistically significant, even though their coefficients might appear large.) By contrast, all of the variables have precisely estimated coefficients, with nice, small standard errors.

Thanks Clyde, that helps a lot!

I just wonder whether I can rigorously conclude that the marked change in the in-municipality coefficient is a demonstration of omitted variable bias:

Hypothesis 1: The omitted variable bias story would imply that the "true" in-municipality effect is found when I control for the neighbouring variable. Hence the model is appropriate when having neighbouring variables in.

Hypothesis 2: Yet, one could argue that the neighbouring variable is actually picking up part of the true effect of the in-municipality variable, out of positive correlation. Hence having neighbouring variables in is not appropriate.

I guess it comes down to theories that I can use, to interpret effects. However, I wonder if I can very strongly conclude that it is not Hypothesis 2.

Last edited by Xavier Pedros; 16 Sep 2022, 10:50.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30354
#7

16 Sep 2022, 11:37

Fair questions. And I was a bit hasty in drawing the conclusion that you must include more than one immigration variable in your model. However, I was not hasty in ruling out multicolinearity as an issue here. Your standard errors are fine, so there is no multicolinearity problem.

The question still remains, however, whether a one or two variable model is the correct one. And it does in fact come down to theories you can use to interpret the effects. Until now, I hadn't paid much attention to the fact that your outcome variable is voting. So there is plenty of potential for causality going in multiple directions here. When voting goes in anti-immigration directions in an area, the population of that area may concurrently undertake actions that make it less likely that immigrants will come there. E.g., they may refuse to sell their houses to immigrants, they may pass laws that make life in their area feel unattractive to immigrants, they may simply exhibit hostility to immigrants through behaviors that scare them off from coming there. These effects can also be contagious to other nearby locations (as may the attitudes and other forces that underlie voting.)

So, aside from the possibility that wider-area immigration shocks are an important confounder that must be taken into account, it is also possible that one of these immigration shock variables mediates the effect of the other, or that one of them is a collider of the relationship of the other to voting. So I would say that you need to rely on theory to produce a directed acyclic graph (DAG) of the causal relationships that you believe credibly exist among all these variables and then decide which variables to include or exclude based on that: excluding mediators and colliders, but including confounders.

It may well be that the network of associations here is so complicated that you cannot really draw a single DAG that you have any faith in and cannot reach a single conclusion. So I would say that a modified version of your Hypothesis 2 is in the running here: there may be back-door and front-door pathways among all these variables, and there is enough uncertainty about them that having neighboring variables in may not be appropriate. (But it's not just because of positive correlation. Without the positive correlation, admittedly, this problem could not exist. But the correlation by itself is not the issue.)
Comment
Xavier Pedros

Join Date: Oct 2016

Posts: 37
#8

16 Sep 2022, 12:01

Indeed, thanks Clyde. I agree. I think I may be better off running the regressions separately (see estimates below). And then possibly provide results combining them, while raising the caveats discussed.

In terms of theory for lower in-municipality effects - my outcome is political voting of natives. Compatible theories I think of:

- Contact theory. In-municipality immigration might provide with regular contact of natives with immigrants (in neighbourhoods, school, residential and shopping areas), while immigration in neighbouring places will consist of more sporadic interactions.
- Neighbouring immigration measures might capture job competition in the employment pool.
Attached Files

Last edited by Xavier Pedros; 16 Sep 2022, 12:05.
Comment

Announcement

Spatial shocks: when is collinearity too much?

Comment

Comment

Comment

Comment

Comment

Comment

Comment