Cross level interaction - use of testparm after mixed vs manual calculation for statistical significance

Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#16

11 Aug 2018, 11:22

Because we are interested in the quartiles of NHses, which already reduces the 200 neighbourhoods into 5 categories, I thought we'd already have the issue of small cell size covered. Is there any reason why NHses can't go in the position of habneigh2? Or does habneigh2 have to remain at the top level in the code because it's the initial sampling frame and most fine grain option available.

Well, recall that I said that you might be forced to regroup neighborhoods to get things working properly? What I didn't know, because I have no knowledge of what your variables actually mean, is that NHses is a ready-made grouping of neighborhoods. Indeed, you can make it the top level of your model, and I think you will find that you get much more meaningful estimates and encounter few or no computational difficulties doing that. If your understanding of things is that these NHses groupings are a meaningful way of classifying neighborhoods, it makes sense to do that. Moreover, since NHSES has only 5 levels, your sample will probably be able to withstand further subdivision into the original number of income categories (although I still say you should ditch that 9=non-response category altogether).

By the way, my earlier suggestion to split income around the median was based on computational considerations only: it would minimize the number of tiny interactions between neighborhood and income. And, again, not knowing anything about your data or your research question beyond what you disclose, I could not know that doing that would obstruct the pursuit of your specific research goals. Certainly, there is no point in getting a computationally improved model that doesn't answer the research question! Were it not for the NHses (or similar) solution to the problem of cutting things too fine, it probably would have meant that your data set is incapable of addressing your research question. But it is likely that if you replace habneigh2 by NHses and return to your original income09 classification scheme (without the non-response category) you will get a usable answer.

Added: By replacing habneigh2 with NHses you will lose the ability to estimate variation down to the fine-grained neighborhood level. But it is by now apparent that your data set is not capable of providing that level of detail in any case. I raise this point simply to emphasize that you need to evaluate all modeling options relative to the specific research questions you are trying to ask. I'm getting the impression that variation at the fine-grained neighborhood level is not actually that important to you in any case that you are mostly interested in socio-economic effects anyway.

Also added: With only 5 categories of NHses, it's not at all clear that you need a multi-level model for this. You can just leave NHses as a fixed effect here and include its interactions with income09 in the model. With just 5 categories, your variance estimates for NHses itself and the random slopes are going to be very imprecise: for those parameters it is more or less as if you were doing a study with a sample size of 5.

Last edited by Clyde Schechter; 11 Aug 2018, 11:32.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#17

11 Aug 2018, 13:09

Thanks, Clyde, for pointing to the UCLA page explaining the difference between LR, Wald, and Score tests!

Do you (or someone else) happen(s) to know how to how to perform the score tests for omitted variables as shown at the UCLA page?

Dirk Enzmann There used to be a Stata command by the name -testomit- that could implement this test. You may want to contact the author and ask if he could send you the ado file. You can find his details from the following link.

https://ideas.repec.org/p/boc/asug01/2.2.html

For a general review of the approach, see Davidson and MacKinnon (1984)

Reference

Russell Davidson, James G MacKinnon. Convenient specification tests for logit and probit
models, Journal of Econometrics, Volume 25, Issue 3, 1984, Pages 241-262.
1 like
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10213

#18

11 Aug 2018, 15:13

Some method using data from the UCLA page

Code:

*GET UCLA DATASET
 use https://stats.idre.ucla.edu/stat/data/hsbdemo, clear

*GEN BINARY VARIABLE
sum write
gen hiwrite= write> 52

*MODEL WITHOUT OMITTED VARIABLES
logit  hiwrite female read

*GENERALIZED RESIDUALS
predict gres, score

*MULTIPLY REGRESSORS WITH GENERALIZED RESIDUAL
gen g1=gres*female
gen g2=gres*read

*omitted variables
gen g3=gres*math
gen g4=gres*science


*ARTIFICIAL REGRESSION
gen cons=1
reg cons gres g1 g2 g3 g4, nocons


*LM (score) test both variables
di "chi2(2)=" e(N)*e(r2)
di "prob>chi2 = " chi2tail(2,e(N)*e(r2))

*LM (score) test Math
reg cons gres g1 g2 g3, nocons
di "chi2(1)=" e(N)*e(r2)
di "prob>chi2 = " chi2tail(1,e(N)*e(r2))

*LM (score) test Science
reg cons gres g1 g2 g4, nocons
di "chi2(1)=" e(N)*e(r2)
di "prob>chi2 = " chi2tail(1,e(N)*e(r2))

Code:

. *LM (score) test both variables

.
. di "chi2(2)=" e(N)*e(r2)
chi2(2)=35.683995

.
. di "prob>chi2 = " chi2tail(2,e(N)*e(r2))
prob>chi2 = 1.784e-08

.
.
. *LM (score) test  Math


.
. di "chi2(1)=" e(N)*e(r2)
chi2(1)=29.082362

.
. di "prob>chi2 = " chi2tail(1,e(N)*e(r2))
prob>chi2 = 6.937e-08

.
.
. *LM (score) test Science

.
. di "chi2(1)=" e(N)*e(r2)
chi2(1)=15.469597


. di "prob>chi2 = " chi2tail(1,e(N)*e(r2))
prob>chi2 = .00008384



.

Last edited by Andrew Musau; 11 Aug 2018, 15:22.

Comment

Dirk Enzmann

Join Date: Apr 2014

Posts: 541
#19

13 Aug 2018, 07:21

@Andrew Musau: This is very helpful, indeed -- thanks a lot!
Comment

Emily Mann

Join Date: Jul 2018
Posts: 21

#20

13 Aug 2018, 23:18

With only 5 categories of NHses, it's not at all clear that you need a multi-level model for this. You can just leave NHses as a fixed effect here and include its interactions with income09 in the model. With just 5 categories, your variance estimates for NHses itself and the random slopes are going to be very imprecise: for those parameters it is more or less as if you were doing a study with a sample size of 5.

Most of #16 makes sense. Except I'm now confused about what things I need to take into consideration when deciding to use a multilevel model vs linear regression, aside from there needing to be a nesting effect (e.g. individuals in neighbourhoods) and many items (>20) per 'cell' to ensure a cross level interaction will be able to be computed (although that is optional). I can see from the results of models 1 and 2 below that they are the same output. Also, if I use code 1, is the interaction still considered a cross-level interaction?

Code:

1. regress WEMWBStotal i.agecat09 NHses09##i.income09 if sex2==2, vce(cluster habneigh2)
2. mixed WEMWBStotal i.agecat09 i.income09 i.NHses09 NHses09##i.income09 if sex2==2 || NHses09:income09, cov(unstr)

Code:

. regress WEMWBStotal i.agecat09 NHses09##i.income09 if sex2==2, vce(cluster habneigh2)

Linear regression                               Number of obs     =      4,006
                                                F(33, 199)        =       8.87
                                                Prob > F          =     0.0000
                                                R-squared         =     0.0532
                                                Root MSE          =     8.1884

                                                                     (Std. Err. adjusted for 200 clusters in habneigh2)
-----------------------------------------------------------------------------------------------------------------------
                                                      |               Robust
                                          WEMWBStotal |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------------------------------+----------------------------------------------------------------
                                             agecat09 |
                                               47-51  |   .2474299   .3722026     0.66   0.507    -.4865375    .9813972
                                               52-56  |   1.540072   .3822613     4.03   0.000     .7862693    2.293875
                                               57-61  |   2.443569   .4218595     5.79   0.000     1.611681    3.275458
                                               62-70  |   4.057689    .388591    10.44   0.000     3.291405    4.823974
                                                      |
                                              NHses09 |
                                                  Q2  |    .159355   .7256744     0.22   0.826    -1.271643    1.590353
                                                  Q3  |  -1.000347   .8682481    -1.15   0.251    -2.712495    .7118005
                                                  Q4  |  -1.230736   1.177646    -1.05   0.297    -3.553004    1.091531
                                        Q5(most dis)  |   2.034009   1.358236     1.50   0.136     -.644374    4.712393
                                                      |
                                             income09 |
                                       $72800-129999  |   -.684102   .6041605    -1.13   0.259     -1.87548    .5072763
                                        $52000-72799  |  -2.845216   .8414415    -3.38   0.001    -4.504502    -1.18593
                                        $26000-51599  |  -1.778907   .8674333    -2.05   0.042    -3.489448   -.0683668
                                    Less than $25999  |  -4.492337   1.497098    -3.00   0.003    -7.444549   -1.540125
             missing/Don't want to answer/Don't know  |  -1.391715   .7217463    -1.93   0.055    -2.814967    .0315377
                                                      |
                                     NHses09#income09 |
                                    Q2#$72800-129999  |  -.8237243   .9424944    -0.87   0.383    -2.682282    1.034834
                                     Q2#$52000-72799  |  -.4761205   1.164942    -0.41   0.683    -2.773335    1.821094
                                     Q2#$26000-51599  |  -.2456128   1.258467    -0.20   0.845    -2.727254    2.236029
                                 Q2#Less than $25999  |   .5789851    1.82626     0.32   0.752    -3.022321    4.180291
          Q2#missing/Don't want to answer/Don't know  |   .1631147   1.038028     0.16   0.875    -1.883831     2.21006
                                    Q3#$72800-129999  |    .491847   1.021433     0.48   0.631    -1.522374    2.506068
                                     Q3#$52000-72799  |   .9342543   1.389861     0.67   0.502    -1.806491       3.675
                                     Q3#$26000-51599  |  -.8656569   1.386653    -0.62   0.533    -3.600077    1.868763
                                 Q3#Less than $25999  |   .9374206   1.830171     0.51   0.609    -2.671598    4.546439
          Q3#missing/Don't want to answer/Don't know  |   .7312612   1.217959     0.60   0.549    -1.670501    3.133023
                                    Q4#$72800-129999  |  -.1865176   1.302058    -0.14   0.886     -2.75412    2.381085
                                     Q4#$52000-72799  |   1.744742   1.386125     1.26   0.210    -.9886353     4.47812
                                     Q4#$26000-51599  |   .2659125   1.619981     0.16   0.870     -2.92862    3.460445
                                 Q4#Less than $25999  |   1.045605   1.992117     0.52   0.600    -2.882763    4.973974
          Q4#missing/Don't want to answer/Don't know  |  -.0786658    1.57824    -0.05   0.960    -3.190886    3.033554
                          Q5(most dis)#$72800-129999  |   -3.64477   1.733386    -2.10   0.037    -7.062932   -.2266068
                           Q5(most dis)#$52000-72799  |  -2.153529   1.734761    -1.24   0.216    -5.574401    1.267344
                           Q5(most dis)#$26000-51599  |  -4.454693   1.720964    -2.59   0.010     -7.84836   -1.061027
                       Q5(most dis)#Less than $25999  |  -4.774991    2.09317    -2.28   0.024    -8.902631   -.6473502
Q5(most dis)#missing/Don't want to answer/Don't know  |  -4.814925   1.923416    -2.50   0.013    -8.607818   -1.022032
                                                      |
                                                _cons |   51.65885   .4721093   109.42   0.000     50.72787    52.58983
-----------------------------------------------------------------------------------------------------------------------

. mixed WEMWBStotal i.agecat09 NHses09##i.income09 if sex2==2 || NHses09:income09, cov(unstr) 

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -14090.906  
Iteration 1:   log likelihood = -14090.668  
Iteration 2:   log likelihood = -14090.665  
Iteration 3:   log likelihood = -14090.665  

Computing standard errors:
standard-error calculation has failed

Mixed-effects ML regression                     Number of obs     =      4,006
Group variable: NHses09                         Number of groups  =          5

                                                Obs per group:
                                                              min =        557
                                                              avg =      801.2
                                                              max =      1,007

                                                Wald chi2(33)     =     225.03
Log likelihood = -14090.665                     Prob > chi2       =     0.0000

-----------------------------------------------------------------------------------------------------------------------
                                          WEMWBStotal |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------------------------------+----------------------------------------------------------------
                                             agecat09 |
                                               47-51  |   .2474299   .4093324     0.60   0.546     -.554847    1.049707
                                               52-56  |   1.540072   .4107497     3.75   0.000     .7350173    2.345127
                                               57-61  |   2.443569    .418787     5.83   0.000     1.622762    3.264377
                                               62-70  |   4.057689   .4421159     9.18   0.000     3.191158     4.92422
                                                      |
                                              NHses09 |
                                                  Q2  |    .159355   .8101906     0.20   0.844    -1.428589    1.747299
                                                  Q3  |  -1.000347   1.006118    -0.99   0.320    -2.972303    .9716086
                                                  Q4  |  -1.230736   1.133988    -1.09   0.278    -3.453313    .9918397
                                        Q5(most dis)  |   2.034009   1.588713     1.28   0.200     -1.07981    5.147829
                                                      |
                                             income09 |
                                       $72800-129999  |   -.684102   .7161708    -0.96   0.339    -2.087771    .7195671
                                        $52000-72799  |  -2.845216   .8792069    -3.24   0.001     -4.56843   -1.122002
                                        $26000-51599  |  -1.778907   .8739095    -2.04   0.042    -3.491739   -.0660763
                                    Less than $25999  |  -4.492337   1.180519    -3.81   0.000    -6.806111   -2.178563
             missing/Don't want to answer/Don't know  |  -1.391715   .7868216    -1.77   0.077    -2.933857    .1504273
                                                      |
                                     NHses09#income09 |
                                    Q2#$72800-129999  |  -.8237243   1.108002    -0.74   0.457    -2.995369     1.34792
                                     Q2#$52000-72799  |  -.4761205    1.32526    -0.36   0.719    -3.073583    2.121342
                                     Q2#$26000-51599  |  -.2456128   1.280445    -0.19   0.848     -2.75524    2.264014
                                 Q2#Less than $25999  |   .5789851   1.664642     0.35   0.728    -2.683654    3.841624
          Q2#missing/Don't want to answer/Don't know  |   .1631147   1.218531     0.13   0.894    -2.225163    2.551392
                                    Q3#$72800-129999  |    .491847   1.270676     0.39   0.699    -1.998633    2.982327
                                     Q3#$52000-72799  |   .9342543   1.465255     0.64   0.524    -1.937592    3.806101
                                     Q3#$26000-51599  |  -.8656569   1.386942    -0.62   0.533    -3.584013    1.852699
                                 Q3#Less than $25999  |   .9374206   1.679396     0.56   0.577    -2.354136    4.228977
          Q3#missing/Don't want to answer/Don't know  |   .7312612   1.402526     0.52   0.602     -2.01764    3.480162
                                    Q4#$72800-129999  |  -.1865176   1.402392    -0.13   0.894    -2.935155    2.562119
                                     Q4#$52000-72799  |   1.744742   1.554022     1.12   0.262    -1.301085     4.79057
                                     Q4#$26000-51599  |   .2659125   1.485156     0.18   0.858    -2.644941    3.176766
                                 Q4#Less than $25999  |   1.045605   1.726001     0.61   0.545    -2.337294    4.428504
          Q4#missing/Don't want to answer/Don't know  |  -.0786658    1.45502    -0.05   0.957    -2.930453    2.773121
                          Q5(most dis)#$72800-129999  |   -3.64477   1.872417    -1.95   0.052     -7.31464    .0251012
                           Q5(most dis)#$52000-72799  |  -2.153529   1.981523    -1.09   0.277    -6.037242    1.730185
                           Q5(most dis)#$26000-51599  |  -4.454693   1.898173    -2.35   0.019    -8.175044   -.7343428
                       Q5(most dis)#Less than $25999  |  -4.774991   2.029161    -2.35   0.019    -8.752073   -.7979085
Q5(most dis)#missing/Don't want to answer/Don't know  |  -4.814925   1.916584    -2.51   0.012    -8.571361   -1.058489
                                                      |
                                                _cons |   51.65885   .5380258    96.02   0.000     50.60434    52.71336
-----------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
NHses09: Unstructured        |
               var(income09) |   1.33e-12          .             .           .
                  var(_cons) |   5.44e-12          .             .           .
         cov(income09,_cons) |  -1.57e-12          .             .           .
-----------------------------+------------------------------------------------
               var(Residual) |   66.48015          .             .           .
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 0.00                  Prob > chi2 = 1.0000

Note: LR test is conservative and provided only for reference.
Warning: standard-error calculation failed

Added: By replacing habneigh2 with NHses you will lose the ability to estimate variation down to the fine-grained neighborhood level. But it is by now apparent that your data set is not capable of providing that level of detail in any case. I raise this point simply to emphasize that you need to evaluate all modeling options relative to the specific research questions you are trying to ask. I'm getting the impression that variation at the fine-grained neighborhood level is not actually that important to you in any case that you are mostly interested in socio-economic effects anyway.

As predicted, this is what we see happening in the results above.

For the sake of being consistent and everything making sense, because I'm doing several other regressions prior to the regression for the cross level interaction, is it still okay to do them using the mixed prefix (below) or use the regress code as in model 1 above?

Code:

*Model 1
*Is there a relationship between individual education and MWB?
mixed WEMWBStotal i.edcat09 i.agecat09 if sex2==2 || habneigh2:, 

*Model 4
*Is there a relationship between individual education,occupation, income and MWB?
*adjust for age
mixed WEMWBStotal i.agecat09 i.edcat09 i.occf09 i.income09 if sex2==2 || habneigh2:, 

*Model 5
*Is there a relationship between Neighbourhood disadvantage and MWB?
*adjust for age & SE factors
mixed WEMWBStotal i.NHses09 i.agecat09 i.edcat09 i.occf09 i.income09 if sex2==2 || habneigh2:,

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#21

14 Aug 2018, 10:14

So, I agree that the output you show confirms that the random effects part of the second model is contributing nothing and is best omitted on that basis alone. (And all the more so given that there are so few levels of NHses, which makes the use of random effects at that level dubious to start with.)

In terms of describing your results, it is a bit difficult to say clearly whether you have included a "cross-level" interaction. Typically that term is reserved for describing a situation where you have a multi-level model with random slopes. In that sense, you do not have a cross-level interaction. But in another sense you do: the variable NHses is defined at a higher level than the single observation: it is denotes certain sets of neighborhoods. So although the model does not place it at a higher level analytically (for the good reasons we have already enumerated), it is still the case that NHses is a higher level variable and you have an interaction between that and income09, which is a bottom level variable. So in that sense, you do have a cross-level interaction. In this particular data, that is the only sense in which it is actually feasible to estimate cross-level interaction.

As for your other models, and whether you can do them in -mixed- with a habneigh2 level, I would say try them and see. I think you will not encounter the computational problems you did with your habneigh2: income09 terms. The data are large enough that dividing them into 200 groups shouldn't cause a problem. So I suspect these models will converge and give you results. Whether the results will show up with effectively 0 variance components (and possibly missing standard errors) remains to be seen. These models differ in what fixed effects they include, and just as in ordinary regression the results for any term will vary depending on what else is in the model, so to the magnitude of the variance component at the habneigh2: level may change depending on what fixed effects are in the model. To summarize, I think running these as a two-level model in -mixed- will probably be OK computationally. The results may or may not suggest that the habneigh2 level variance component is large enough to matter, and, in fact, that conclusion may differ among the models.
Comment

Emily Mann

Join Date: Jul 2018
Posts: 21

#22

15 Aug 2018, 05:41

As for your other models, and whether you can do them in -mixed- with a habneigh2 level, I would say try them and see.

All these models run without any problem, including random effects.

I need to double check... in model 2 that I ran in #20, I now realise I included the income09 random slope.

Code:

mixed WEMWBStotal i.agecat09 NHses09##i.income09 if sex2==2 || NHses09:income09, cov(unstr)

Did I misinterpret what you were suggesting and the income09 random slope should be removed and the code should be:

Code:

mixed WEMWBStotal i.agecat09 i.NHses09##i.income09 if sex2==2 || NHses09:, cov(unstr)

If I did misunderstand, then the coefficients in each model are very very close using either habneigh2 or NHses09. See code and output with both options below.

Code:

. mixed WEMWBStotal i.agecat09 i.NHses09##i.income09 if sex2==2 || habneigh2:, cov(unstr) 
Note: single-variable random-effects specification in habneigh2 equation; covariance structure set to identity

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -14090.141  
Iteration 1:   log likelihood = -14089.472  
Iteration 2:   log likelihood = -14089.472  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =      4,006
Group variable: habneigh2                       Number of groups  =        200

                                                Obs per group:
                                                              min =          3
                                                              avg =       20.0
                                                              max =         66

                                                Wald chi2(33)     =     216.95
Log likelihood = -14089.472                     Prob > chi2       =     0.0000

-----------------------------------------------------------------------------------------------------------------------
                                          WEMWBStotal |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------------------------------+----------------------------------------------------------------
                                             agecat09 |
                                               47-51  |   .2437934   .4090223     0.60   0.551    -.5578756    1.045462
                                               52-56  |   1.539611   .4105466     3.75   0.000     .7349544    2.344268
                                               57-61  |   2.430952   .4186836     5.81   0.000     1.610347    3.251557
                                               62-70  |   4.051717   .4420312     9.17   0.000     3.185352    4.918083
                                                      |
                                              NHses09 |
                                                  Q2  |   .2059345   .8303939     0.25   0.804    -1.421608    1.833477
                                                  Q3  |  -.9121363   1.022323    -0.89   0.372    -2.915853     1.09158
                                                  Q4  |  -1.251334   1.146685    -1.09   0.275    -3.498796    .9961286
                                        Q5(most dis)  |   2.105793   1.598963     1.32   0.188    -1.028117    5.239704
                                                      |
                                             income09 |
                                       $72800-129999  |  -.6533111   .7157163    -0.91   0.361    -2.056089    .7494671
                                        $52000-72799  |  -2.814418   .8794645    -3.20   0.001    -4.538137   -1.090699
                                        $26000-51599  |   -1.72276   .8744216    -1.97   0.049    -3.436595   -.0089249
                                    Less than $25999  |  -4.429687   1.181481    -3.75   0.000    -6.745346   -2.114027
             missing/Don't want to answer/Don't know  |   -1.36387   .7855292    -1.74   0.083    -2.903479    .1757384
                                                      |
                                     NHses09#income09 |
                                    Q2#$72800-129999  |  -.8548818   1.107775    -0.77   0.440    -3.026081    1.316317
                                     Q2#$52000-72799  |  -.4970684    1.32497    -0.38   0.708    -3.093961    2.099825
                                     Q2#$26000-51599  |  -.2612173   1.280761    -0.20   0.838    -2.771463    2.249029
                                 Q2#Less than $25999  |   .5472246   1.663371     0.33   0.742    -2.712922    3.807371
          Q2#missing/Don't want to answer/Don't know  |   .1586946   1.216514     0.13   0.896    -2.225629    2.543018
                                    Q3#$72800-129999  |    .458532   1.270964     0.36   0.718    -2.032512    2.949576
                                     Q3#$52000-72799  |   .9481092   1.465768     0.65   0.518    -1.924743    3.820962
                                     Q3#$26000-51599  |  -.9404811   1.388512    -0.68   0.498    -3.661915    1.780953
                                 Q3#Less than $25999  |   .9277627   1.680005     0.55   0.581    -2.364987    4.220513
          Q3#missing/Don't want to answer/Don't know  |    .739893   1.401277     0.53   0.597    -2.006559    3.486345
                                    Q4#$72800-129999  |    -.13733   1.401733    -0.10   0.922    -2.884677    2.610017
                                     Q4#$52000-72799  |   1.783494   1.553906     1.15   0.251    -1.262107    4.829094
                                     Q4#$26000-51599  |    .320281   1.484898     0.22   0.829    -2.590066    3.230628
                                 Q4#Less than $25999  |   1.062388   1.726761     0.62   0.538       -2.322    4.446777
          Q4#missing/Don't want to answer/Don't know  |   -.014323   1.455248    -0.01   0.992    -2.866557    2.837911
                          Q5(most dis)#$72800-129999  |  -3.628508   1.873972    -1.94   0.053    -7.301426    .0444107
                           Q5(most dis)#$52000-72799  |  -2.178551    1.98263    -1.10   0.272    -6.064435    1.707333
                           Q5(most dis)#$26000-51599  |  -4.525222   1.899129    -2.38   0.017    -8.247446   -.8029972
                       Q5(most dis)#Less than $25999  |  -4.899998   2.031803    -2.41   0.016    -8.882259   -.9177379
Q5(most dis)#missing/Don't want to answer/Don't know  |    -4.8296   1.917394    -2.52   0.012    -8.587623   -1.071577
                                                      |
                                                _cons |   51.61409   .5520662    93.49   0.000     50.53206    52.69612
-----------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
habneigh2: Identity          |
                  var(_cons) |   .5087875   .3757902      .1196298    2.163881
-----------------------------+------------------------------------------------
               var(Residual) |   65.97599   1.507391      63.08674    68.99758
------------------------------------------------------------------------------
 
. mixed WEMWBStotal i.agecat09 i.NHses09##i.income09 if sex2==2 || NHses09:, cov(unstr) 
Note: single-variable random-effects specification in NHses09 equation; covariance structure set to identity

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -14090.787  
Iteration 1:   log likelihood = -14090.667  
Iteration 2:   log likelihood = -14090.665  
Iteration 3:   log likelihood = -14090.665  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =      4,006
Group variable: NHses09                         Number of groups  =          5

                                                Obs per group:
                                                              min =        557
                                                              avg =      801.2
                                                              max =      1,007

                                                Wald chi2(33)     =     225.03
Log likelihood = -14090.665                     Prob > chi2       =     0.0000

-----------------------------------------------------------------------------------------------------------------------
                                          WEMWBStotal |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------------------------------+----------------------------------------------------------------
                                             agecat09 |
                                               47-51  |   .2474299   .4093324     0.60   0.546     -.554847    1.049707
                                               52-56  |   1.540072   .4107497     3.75   0.000     .7350173    2.345127
                                               57-61  |   2.443569    .418787     5.83   0.000     1.622762    3.264377
                                               62-70  |   4.057689   .4421159     9.18   0.000     3.191158     4.92422
                                                      |
                                              NHses09 |
                                                  Q2  |    .159355   .8101906     0.20   0.844    -1.428589    1.747299
                                                  Q3  |  -1.000347   1.006118    -0.99   0.320    -2.972303    .9716086
                                                  Q4  |  -1.230736   1.133988    -1.09   0.278    -3.453313    .9918397
                                        Q5(most dis)  |   2.034009   1.588713     1.28   0.200     -1.07981    5.147829
                                                      |
                                             income09 |
                                       $72800-129999  |   -.684102   .7161708    -0.96   0.339    -2.087771    .7195671
                                        $52000-72799  |  -2.845216   .8792069    -3.24   0.001     -4.56843   -1.122002
                                        $26000-51599  |  -1.778907   .8739095    -2.04   0.042    -3.491739   -.0660763
                                    Less than $25999  |  -4.492337   1.180519    -3.81   0.000    -6.806111   -2.178563
             missing/Don't want to answer/Don't know  |  -1.391715   .7868216    -1.77   0.077    -2.933857    .1504273
                                                      |
                                     NHses09#income09 |
                                    Q2#$72800-129999  |  -.8237243   1.108002    -0.74   0.457    -2.995369     1.34792
                                     Q2#$52000-72799  |  -.4761205    1.32526    -0.36   0.719    -3.073583    2.121342
                                     Q2#$26000-51599  |  -.2456128   1.280445    -0.19   0.848     -2.75524    2.264014
                                 Q2#Less than $25999  |   .5789851   1.664642     0.35   0.728    -2.683654    3.841624
          Q2#missing/Don't want to answer/Don't know  |   .1631147   1.218531     0.13   0.894    -2.225163    2.551392
                                    Q3#$72800-129999  |    .491847   1.270676     0.39   0.699    -1.998633    2.982327
                                     Q3#$52000-72799  |   .9342543   1.465255     0.64   0.524    -1.937592    3.806101
                                     Q3#$26000-51599  |  -.8656569   1.386942    -0.62   0.533    -3.584013    1.852699
                                 Q3#Less than $25999  |   .9374206   1.679396     0.56   0.577    -2.354136    4.228977
          Q3#missing/Don't want to answer/Don't know  |   .7312612   1.402526     0.52   0.602     -2.01764    3.480162
                                    Q4#$72800-129999  |  -.1865176   1.402392    -0.13   0.894    -2.935155    2.562119
                                     Q4#$52000-72799  |   1.744742   1.554022     1.12   0.262    -1.301085     4.79057
                                     Q4#$26000-51599  |   .2659125   1.485156     0.18   0.858    -2.644941    3.176766
                                 Q4#Less than $25999  |   1.045605   1.726001     0.61   0.545    -2.337294    4.428504
          Q4#missing/Don't want to answer/Don't know  |  -.0786658    1.45502    -0.05   0.957    -2.930453    2.773121
                          Q5(most dis)#$72800-129999  |   -3.64477   1.872417    -1.95   0.052     -7.31464    .0251012
                           Q5(most dis)#$52000-72799  |  -2.153529   1.981523    -1.09   0.277    -6.037242    1.730185
                           Q5(most dis)#$26000-51599  |  -4.454693   1.898173    -2.35   0.019    -8.175044   -.7343428
                       Q5(most dis)#Less than $25999  |  -4.774991   2.029161    -2.35   0.019    -8.752073   -.7979085
Q5(most dis)#missing/Don't want to answer/Don't know  |  -4.814925   1.916584    -2.51   0.012    -8.571361   -1.058489
                                                      |
                                                _cons |   51.65885   .5380258    96.02   0.000     50.60434    52.71336
-----------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
NHses09: Identity            |
                  var(_cons) |   1.19e-19   2.12e-18      8.16e-35    .0001735
-----------------------------+------------------------------------------------
               var(Residual) |   66.48015   1.485428      63.63159    69.45622
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 0.00          Prob >= chibar2 = 1.0000

So what is the benefit in putting NHses09 in the top level instead of habneigh2 if the random effects can't be computed and the 'cross level interaction' (in this scenario) is in the fixed effects section anyway?

Also, I feel like we've gone full circle back to my original post. Whichever code I now use, I can check the interaction with the -testparm- command because the interaction is in the bottom level, correct?

On another note, I realise I still have to address the use of 'missing' data issue, hence why the missing category is still in the model.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#23

15 Aug 2018, 10:45

Putting it concisely:

1. NHses09 and its interactions with income09 belong in the fixed effects; they are key to your hypothesis test.
2. NHses09 should not be used as a random effect, no matter what, because it takes on too few different values.
3. If you want to use a multi-level model, it is computationally feasible with habneigh2 as a random intercept, but your data are too sparse to support random slopes for income09 at that level.
4. The outputs using habneigh2 as a random intercept that you have shown so far suggest that there is negligible variation at the habneigh2 level anyway, so you might as well not bother with it. Stick with a simple -regress- model and ignore habneigh2.

Whichever code I now use, I can check the interaction with the -testparm- command because the interaction is in the bottom level, correct?

Correct.
Comment
Emily Mann

Join Date: Jul 2018

Posts: 21
#24

15 Aug 2018, 23:51

Points 1, 2, and 4 are clear.

For now (based on the fact that income09 is a factorial and the models with manually generated income indicators don't work) I accept I can use either:

Code:

1. regress WEMWBStotal i.agecat09 NHses09##i.income09 if sex2==2, vce(cluster habneigh2) 2. mixed WEMWBStotal i.agecat09 NHses09##i.income09 if sex2==2 || habneigh2:, cov(unstr)

Sorry, Clyde, I'm still struggling with random slopes! I realise this will go away with time and practice.

I think I understand that by including income09 (if we pretend it were continuous) as a random slope I am telling Stata that a person's income can randomly vary across neighbourhoods (habneigh2). And that including income09 as a random slope implies I'm running a cross-level interaction because I include i.NHses09 as a second level predictor and then present an interaction between i.NHses09#i.income09. By using NHses09 I am condensing the 200 habneigh2 neighbourhoods into 5 categories.

So I'm confused as to:
(1) why Stata runs this model if income09 is a factor variable? Maybe the lesson is that just because something runs doesn't mean it is correct, or should be interpreted.
(2) what you mean in point 3 about

... data are too sparse to support random slopes for income09 at that level.

. I can run this model (3) and it produces results. The only difference between models 3 and 4 are the lack of random effects output and the coefficients vary slightly. To add to my confusion, I read that in example 6 on page 529-530 in the online mixed documentation they have an average of 2.9 observations per group (198/68), whereas I have an average of 20 obs if you take 4006/200.

Code:

3. mixed WEMWBStotal i.agecat09 i.NHses09##i.income09 if sex2==2 || habneigh2: income09, cov(unstr) 4. mixed WEMWBStotal i.agecat09 i.NHses09##i.income09 if sex2==2 || habneigh2:, cov(unstr) var vce(r)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#25

16 Aug 2018, 09:27

OK. Perhaps the most important thing you say in #24 is that just because a model will run doesn't make it meaningful or interpretable. This is true not just of computer languages like Stata but also of natural languages. As Noam Chomsky famously remarked, "Colorless green ideas sleep furiously" is a perfectly grammatical sentence in English, but it is meaningless.

Let me comment on each of models 1 through 4 in #24 in turn.

1. This is a good model. It includes an interaction between NHses09 and income09 which is, in a sense, a cross-level interaction, though not in the sense that that phrase is normally used. It is a cross-level interaction because, although NHses09 is not being used as a level in the analytic model, the variable is itself defined at a higher level than income09. In this model, you get a lot of information in your results: you get a separate effect estimate for each value of income09 in each NHses09 group.

2. This is also a good model. It differs from model 1 in that it also allows gives you an estimate of the variation in outcome level among the different neighborhoods. (By the way, the -cov(unstr)- option does nothing here. It doesn't hurt, but because there is only a single random intercept effect specified in the model, the covariance matrix is a 1x1 matrix and it will be the same no matter which -cov()- structure you specify.)

3. This is an invalid model for two reasons. In the random effects part, it treats income09 as a continuous variable--which is not the case. And, if you actually wanted to treat income09 as a continuous variable, then you would also have to specify a continuous income09 in the fixed effects part. Anything for which a random slope is specified must also appear in the fixed effects--always! Continuous income09 does not appear in the fixed effects of this model. So, yes, it converges, but it is not meaningful or interpretable.

I think I understand that by including income09 (if we pretend it were continuous) as a random slope I am telling Stata that a person's income can randomly vary across neighbourhoods (habneigh2).

Correct.

And that including income09 as a random slope implies I'm running a cross-level interaction because I include i.NHses09 as a second level predictor and then present an interaction between i.NHses09#i.income09.

The first part is correct. But the second part is not. You could have a model

Code:

mixed WEMWBStotal i.agecat09 c.income09 if sex == 2 || NHses09:income09

That would be the way to model a cross-level interaction between NHses09 and continuous variable income09. In that model you should not include i.NHses09##i.income09. If you include that you are including two different interactions between NHses09 and income09, one based on the continuous version of income09 and the other based on the categorical version, and the results would be uninterpretable.

4. This is model 2 again, this time amplified by specifying that you want robust (Huber-White) variance estimates used. It is also a valid model. (Again, the -cov(unstr)- option does nothing here and could be omitted without changing anything. Similarly, if you are using the current version of Stat, the -var- option does nothing because it is the default. If you are using version 13, then -var- does change the results display of random effects from the older default standard deviation metric to the variance metric. I don't remember what the default was in version 14.)

I can run this model (3) and it produces results. The only difference between models 3 and 4 are the lack of random effects output and the coefficients vary slightly. To add to my confusion, I read that in example 6 on page 529-530 in the online mixed documentation they have an average of 2.9 observations per group (198/68), whereas I have an average of 20 obs if you take 4006/200.

It may indeed be true that the fixed-effects results of model 3 are similar to those of model 4, but model 3 is, nevertheless, a mis-specified model and should not be interpreted. Specifically, it is the random slopes output in model 3 that is invalid. And its presence is also distorting the fixed effects result--only slightly, as it turns out, but nevertheless, distorting them.

Concerning the example in the mixed documentation, it is sometimes the case that a random slopes model will converge even though the number of observations in each group is very small. This is particularly likely to happen if the variation in the slopes at that level is fairly large (so that the "signal to noise" ratio is high). But, in general, the less data per group there is, the less information there is to estimate things, and the closer the likelihood function is to flat. The flatter the likelihood, the more difficult it is to locate the maximum, and at a certain point things just break down and convergence fails. I should also point out that while you have an average of 20 obs per habneigh2, when you then try to allocate those 20 observations across 5 different levels of categorical income09, that reduces the average to only 4 per habneigh2-income09 group (which is the relevant grouping for a || habneigh2: income09_1 income09_2 income09_3 income09_4 income09_5 model). That happens to stretch things beyond the breaking point. I would also note that even if the model converged, the random slope estimates coming from such meager data would be very imprecise (as one would expect from an N of 4 study). In fact, if you look at the output from the example in the documentation, you will see that the 95% confidence interval for the random slope variance in the output shows that the result is uncertain by a factor of nearly 3. The random intercept variance estimate is similarly loose in that example.
Comment
Emily Mann

Join Date: Jul 2018

Posts: 21
#26

16 Aug 2018, 20:34

Thank you Clyde for your extensive answer once again! What you have written makes sense.

Your feedback and help has been immeasurable. I have learnt many many new things where a book or someone's Powerpoint pdf slides have not been all that helpful. One day I will hopefully be fluent in Stata!

I will print all this out to keep on hand in my note book and get on with running the models
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment