How to compare two different coefficients from two different multilevel equations?

Damian Tojeiro

Join Date: Mar 2016

Posts: 86
#1

How to compare two different coefficients from two different multilevel equations?

12 Oct 2019, 14:45

Dear Statalist,

I am working in a three level model (time nested in firms nested in regions) and using Stata 15.1. I would like to compare two different coefficients (say: z2 and z3) from two different regressions. Even though both have the same dep. variable, there are a huge (near 0.8) correlation between the two independent variables (z2 and z3) reason why I do not include them jointly.
Some one told me to compare the distribution of the Betas and search for overlapping, but I am not sure how to do this.

On top of that, I would like to ask if there is something like the suest test (suest does not support meglm) but for a model like the one I show you next.

Thanks in advanced.

Code:

melogit y L.x1 L.x2 z1 z2 ||region: ||firm: , or vce(robust) melogit y L.x1 L.x2 z1 z3 ||region: ||firm: , or vce(robust)
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

12 Oct 2019, 15:57

Well, I don't know what the person you spoke with had in mind, but I would just look at the results for L.x1, L.x2, and L.z1 from both models and see if they are similar, and perhaps specifically whether or not each model's result falls within the confidence limits of the other.

All of that said, I question the wisdom of not putting both z2 and z3 in the model just because they are highly correlated with each other. If the effects of z2 and z3 are not themselves of interest (i.e. they are just being included to adjust for their nuisance effects) then you will get better (that is, less biased) results for L.x1, L.x2 and z1 if you include both z2 and z3 in the model. The separate effect of z2 and z3 may come out poorly estimated (very wide confidence intervals) but as they are not of interest anyway, it's not a problem.

If, on the other hand, the effects of z2 and z3 are of interest in their own right, then you have a problem because when used together in the model, their separate effects will be rather imprecisely estimated. But, unless they are also independent of y, omitting either one will give you biased estimates of the other's effects. The only solution to that problem is to get a much larger data set that will allow you to estimate the z2 and z3 effects with adequate precision when both are included.

In general, too many people waste too much time and effort worrying about "multicolinearity" in regression. Everyone should read Chapter 23 of Arthur Goldberger's book A Course in Econometrics. The entire chapter is devoted to a takedown of the concept of multicolinearity. It is a fun read, and, on top of that, it will save you lots of angst and wasted time over the course of your career.
1 like
Comment

Damian Tojeiro

Join Date: Mar 2016
Posts: 86

13 Oct 2019, 02:48

Dear Clyde, as always, thanks for your answer. Unfortunately, both z2 and z3 are important for the analysis. That said, it is not possible to increase the sample. Thanks for the advice about the chapter, I will look for it. I also know that Wooldridge in his very introductory econometric book (chapter 3 I think) he also talk about the too much importance multicolinearity takes for several authors, when it should not be. That said, people still give so much relevance to this problem so they would not advice to include them jointly.

perhaps specifically whether or not each model's result falls within the confidence limits of the other

Is there any way of graphically see this? I mean, to plot the distribution of the parameters of z2 and z3 and check for if they touch each other? Is this what you meant?

Code:

                     z1 |   .9998783   .0002269    -0.54   0.592     .9994337    1.000323
                     z2 |   1.042456   .0083441     5.19   0.000      1.02623     1.05894
                  _cons |   .0762872   .0263032    -7.46   0.000     .0388119    .1499475
------------------------+----------------------------------------------------------------
regionid                |
              var(_cons)|   3.35e-33   4.07e-32                      1.56e-43    7.19e-23
------------------------+----------------------------------------------------------------
regionid>firmid         |
              var(_cons)|   3.331411   .2135464                      2.938093    3.777383
-----------------------------------------------------------------------------------------

Code:

                     z1 |   .9995764   .0002206    -1.92   0.055     .9991442    1.000009
                     z3 |   1.029843   .0052627     5.75   0.000      1.01958     1.04021
                  _cons |   .0840892   .0289356    -7.20   0.000     .0428387    .1650608
------------------------+----------------------------------------------------------------
regionid                |
              var(_cons)|   2.49e-34   1.01e-33                      9.13e-38    6.81e-31
------------------------+----------------------------------------------------------------
regionid>firmid         |
              var(_cons)|   3.327979   .2128925                      2.935816    3.772527
-----------------------------------------------------------------------------------------

Comment

Erik Ruzek

Join Date: Oct 2017

Posts: 430
#4

13 Oct 2019, 09:25

Here is another idea. You might consider using gsem and constraining those two coefficients to be equal? Run one gsem without them constrained to be equal and then another with the constraint. Then use a chi-square test to determine whether the model without constraints is a better fit to the data. If Chi-square is significant, then they are different. If not, then they have equivalent effects and you can feel comfortable using one or the other. Documentation on constraints in sem and gsem can be found by typing in

Code:

help sem_and_gsem_option_constraints
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#5

13 Oct 2019, 09:29

Well, I've never been a fan of plotting regression coefficients, but Ben Jaan's -coefplot- available from SSC does precisely that. You could plot the coefficients, along with their confidence intervals using -coefplot-, and then combine the graphs.

Clearly just looking at the outputs you show, each variable's coefficient in either regression does fall within the confidence interval from the other regression.
Comment
Damian Tojeiro

Join Date: Mar 2016

Posts: 86
#6

13 Oct 2019, 11:34

Thanks Clyde and Erik for your suggestions. I will have to look for gsem. I read once that you can do multilevel model also with gsem even though it seems more complicated than with multilevel commands, and it seems it might be good for changing the random part of models, which is something I would like to do in the future for acknowledging that regions does affect to neighboring regions (something multilevel models usually do not recognize).
I will look also for coefplot as Clyde suggest me. Just one last question, when you mean:

each variable's coefficient in either regression does fall within the confidence interval from the other regression.

I see this for each confidence interval being within the other confidence interval, but not for the coefficient it self. I mean, from z2 (1.042) is above the higher confidence interval of z3; and from z3 (1.029) is right close the lower confidence interval of z2, so it is not outside the confidence interval just for a tiny quantity. Maybe I am misunderstanding that, right?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#7

13 Oct 2019, 12:25

I see this for each confidence interval being within the other confidence interval, but not for the coefficient it self. I mean, from z2 (1.042) is above the higher confidence interval of z3; and from z3 (1.029) is right close the lower confidence interval of z2, so it is not outside the confidence interval just for a tiny quantity. Maybe I am misunderstanding that, right?

Sorry, I meant that only to apply to the variables other than z2 and z3. There is no reason to expect the z2 and z3 coefficients to match, even though the variables are highly correlated: they could be on dfiferent scales altogether and still correlate highly. The concern, when representing the pair z2 and z3 by just one of them, is whether you are distorting the results for all the other variables in the model. And in this case you can see that the coefficients for all the other variables do agree in the sense I described.
Comment
Damian Tojeiro

Join Date: Mar 2016

Posts: 86
#8

13 Oct 2019, 12:51

Thanks again Clyde, I see your point if they are not measured with the same scale, even though in my case they are measuring the same but for two different groups. For having a more direct way to see this I will do it for all coefficients of the models (around 35 coefficients, too long for posting such tables here) with coefplot. Before doing the analysis, for me is hard to understand how changing a single variable from z2 to z3 (even though relevant -> with low p-value) may change the behavior of the rest.
How much variables would you suggest should be different (outside their confidence interval in the other model) for accepting that z2 and z3 are different? I assume there is not a threshold since this is not a proper way to look for the difference between both coefficients but I would like to know your opinion.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#9

13 Oct 2019, 13:28

Before doing the analysis, for me is hard to understand how changing a single variable from z2 to z3 (even though relevant -> with low p-value) may change the behavior of the rest.

On the contrary, this happens all the time. You probably just haven't done it often enough to be aware of it. Regression models are "brittle." Tiny changes in how they are specified can result in large changes in the parts that were not changed. The typical situation like yours would be where z2 and z3 are highly correlated with each other, but have very different correlations to some other variable x in the model. Then substitution of z3 for z2 will greatly change the coefficient for x. Here's a demonstration:

Code:

. // CREATE A DEMONSTRATION DATA SET . clear* . matrix C = [ 1, .3, .6, .6 \ /// > .3, 1, .05, .5 \ /// > .6, .05, 1, .8 \ /// > .6, .5, .8, 1] . . . set obs 250 number of observations (_N) was 0, now 250 . drawnorm y x z2 z3, corr(C) . . // SHOW THAT Z2 AND Z3 ARE HIGHLY CORRELATED . // BUT HAVE VERY DIFFERENT CORRELATIONS WITH X . corr x z2 z3 (obs=250) | x z2 z3 -------------+--------------------------- x | 1.0000 z2 | 0.0388 1.0000 z3 | 0.4708 0.8232 1.0000 . . // SHOW THAT THE SUBSTITUTION OF Z3 FOR Z2 . // DRASTICALLY CHANGES THE COEFFICIENT OF X . // (EVEN ITS SIGN) . regress y x z2 Source | SS df MS Number of obs = 250 -------------+---------------------------------- F(2, 247) = 68.87 Model | 89.0258492 2 44.5129246 Prob > F = 0.0000 Residual | 159.649891 247 .646355834 R-squared = 0.3580 -------------+---------------------------------- Adj R-squared = 0.3528 Total | 248.67574 249 .998697752 Root MSE = .80396 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .2296404 .0549007 4.18 0.000 .1215072 .3377737 z2 | .5583181 .0517215 10.79 0.000 .4564468 .6601895 _cons | -.0221332 .0509766 -0.43 0.665 -.1225375 .0782711 ------------------------------------------------------------------------------ . regress y x z3 Source | SS df MS Number of obs = 250 -------------+---------------------------------- F(2, 247) = 51.18 Model | 72.8582285 2 36.4291142 Prob > F = 0.0000 Residual | 175.817512 247 .711811789 R-squared = 0.2930 -------------+---------------------------------- Adj R-squared = 0.2873 Total | 248.67574 249 .998697752 Root MSE = .84369 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | -.0274254 .0652555 -0.42 0.675 -.1559536 .1011028 z3 | .5622083 .0616743 9.12 0.000 .4407337 .683683 _cons | -.0313463 .0536065 -0.58 0.559 -.1369304 .0742378 ------------------------------------------------------------------------------

How much variables would you suggest should be different (outside their confidence interval in the other model) for accepting that z2 and z3 are different?

Well, I wouldn't think about it that way, unless every variable in your model is equally important. In most work, some variables are included solely to adjust for their nuisance effects (i.e. to eliminate missing variable bias or to reduce residual variance) but are not of interest in their own right. I wouldn't care how many of those variables showed big differences in their coefficients; in fact, I wouldn't even bother looking at them. Then there are variables that are of interest, but not really the focus of the investigation. I'd be relatively lenient about those variables, and if most of the differences are small enough to ignore for practical purposes, I'd be OK with that Then there are the variables (or single variable) that are the focus of the investigation. Here, I'd be strict. If there is a material difference in the coefficient of any focal variable, I would be unwilling to consider the two models as equivalent.

The decision about which variables are just there for adjustment (often, erroneously, called "control variables"), which are important but not focal, and which are focal, obviously can only be made by you (or you and your colleagues in this project). Also, only you and your colleagues can say how large a difference is small enough to ignore for practical purposes--it depends on what the intended application of the results is, among other things. I have suggested using falling within each others' confidence intervals as something of a proxy for that--but it is really better to use your judgment about differences of practical importance if you can.

I see your point if they are not measured with the same scale, even though in my case they are measuring the same but for two different groups.

Wait, what? If z2 and z3 are measuring the same thing in two different groups, then they shouldn't be two different variables. Instead, you should put your data into long layout, separating
the data for the two groups into different observations and then you can use an interaction approach. The code would go something like this.

Code:

gen long obs_no = _n reshape long z, i(obs_no) j(group) melogit y i.group##c.(L.x1 L.x2 z) ||region: ||firm: , or vce(robust)

The interaction coefficients will then give you the estimated effect differences of each variable in groups 2 and 3.
Note: You may already have a group of variables that uniquely identifies observations in your data. If so, there is no need to create the variable obs_no. Just use the list of names of the identifying variables in place of obs_no in the -i()- option of reshape. Also, the code assumes L.x1 L.x2 and z are continuous variables. If not modify the code accordingly by putting appropriate c. and i. prefixes in front of each of those variables and drop the c. that occurs immediately after ##.
Comment
Damian Tojeiro

Join Date: Mar 2016

Posts: 86
#10

14 Oct 2019, 11:37

Dear Clyde, I have tried the code you gave me but I think it is too much for the dataset since it says: initial values not feasible. However, I think I did not explain myself pretty well, sorry for this.

As far as I understand, this latter code is like if I have two groups (say male/female or small/large firms…) and instead of doing two separate regressions for the two subsamples, you interact the dummy variables with all regressors in a single model, right? However, my z2 and z3 comes from another variable (say z) which is the number of collaborations, that can be divided into number of collaborations with public agents (z2) plus number of collaborations with private agents (z3). And the variables (z, z2, z3) are the main focus of the analysis.
That said, I will show you the coefplot you also suggest me, look that now I have included all the variables in both models and therefore the focus variables are now renamed as z4 and z5; being x1-x6 firm-level "controls" and z1-z3 regional-level "controls".
Following your advice, you can see that almost all controls are pretty much unchanged (see first graph). But knowing that the focal variables z4 and z5 are measured in the same scale and their effect is on the same dependent variable, I also show the two coefficients with their CI to compare (see the second graph).
What is you sensation after seen this? I would say they (z4, z5) are not different, right?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#11

14 Oct 2019, 15:15

Yes I would say based on this that the two models are giving you equivalent results based on this.
Comment
Damian Tojeiro

Join Date: Mar 2016

Posts: 86
#12

14 Oct 2019, 15:49

Dear Clyde, thanks for your patience and help.
Comment

Announcement