Mixed effects model and crossed random effects

Jesper Eriksson

Join Date: Oct 2016
Posts: 98

Mixed effects model and crossed random effects

20 Jan 2025, 05:13

Hi,

I am analyzing a large dataset on surgeries. We want to explore the risk of having a surgical complication depending on if the surgery was performed during day or night time. Model will be adjusted for sex, age, comorbidity (grpci) and case urgency.
We have missing data on ASA-score (physical status scoring) and are thus using multiple imputation, using all variables in the regression for estimation.

We believe that the coefficients vary by hospital as well as by patients. We therefore wish to use a mixed effects model, after the imputation of ASA. One specific patient might perform surgery at different hospitals, so we do not want to have patients nested within hospital.

So I guess what we want is a crossed random effects. Am I right? If I am, I would like to include a random effect for “Id” and another random effect for “Hospital”. However the code takes days to run on my new Macbook Pro.

Our code is as follows:

Code:

** Imputation part (only missingness on ASA
mi set mlong
mi register imputed ASA
mi impute mlogit ASA Complication NightTime Sex Age grpci CaseUrgency Hospital, add(20) noisily augment

** Estimation part
mi estimate, or: meqrlogit Complication i.NightTime i.ASA i.Sex Age i.grpci i.CaseUrgency || Hospital: || Id:

Is this the correct syntax?

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long Id int Age byte(Sex Hospital grpci Complication ASA NightTime) float CaseUrgency
  1 88 0 58 2 0 1 1 0
  3 40 1 58 0 0 0 0 0
  4 58 0 36 0 0 0 0 0
 13 42 1 58 0 0 0 0 0
 13 42 1 58 0 0 1 0 0
 13 42 1 58 0 0 1 0 0
 17 20 0 55 0 0 0 1 0
 21 56 0 79 1 0 1 1 0
 21 57 0 79 2 0 1 1 0
 22 65 1 30 2 0 1 1 0
 22 65 1 30 2 0 1 1 1
 25 77 1 25 2 0 0 0 0
 25 77 1 25 2 0 0 1 0
 26 55 0 36 2 0 1 0 0
 27 91 0 30 2 0 2 0 1
 30 77 1 29 2 0 1 0 0
 32 62 0 57 1 0 . 1 1
 35 57 0 70 0 0 0 0 0
 41 54 1 30 0 0 2 0 1
 45 31 0 29 0 0 2 1 1
 46 50 1  1 1 0 0 0 1
 57 37 0 56 0 0 . 0 1
 62 45 0 29 0 0 0 1 0
 64 40 0 43 0 0 0 1 0
 71 71 1 58 2 0 . 1 0
 71 71 1 58 2 0 1 1 0
 72 34 0 79 0 0 1 0 0
 73 21 0  1 0 0 0 1 0
 74 33 0 29 0 0 0 0 0
 74 33 0 14 0 0 0 1 0
 77 43 0  1 0 0 0 0 0
 77 43 0  1 0 0 0 0 0
 77 43 0  1 2 0 0 0 0
 85 32 0 14 0 0 0 1 0
 86 39 0 14 0 0 0 1 1
 90 74 0  9 0 0 1 1 0
 92 43 0 58 0 0 0 0 0
 93 29 0 17 0 0 0 1 0
 98 44 1  1 0 0 . 0 0
100 29 0 58 0 0 0 1 0
103 70 1 41 2 0 0 1 0
106 78 1 14 2 0 2 1 0
108 31 1 14 0 0 0 1 0
109 57 0 58 1 0 0 1 0
110 75 1  9 2 0 2 0 0
110 75 1  9 2 0 1 1 0
112 19 0  1 0 0 0 1 0
116 90 0  9 2 0 1 0 0
123 76 1  9 0 0 0 1 0
124 64 0 14 0 0 0 0 0
125 71 0 14 1 0 0 0 0
127 53 0 56 0 0 0 1 0
132 25 1  7 0 0 0 1 0
135 55 0 62 0 0 0 0 0
136 81 1 58 0 0 0 0 0
144 83 1 12 1 0 1 1 0
146 48 0 53 0 0 0 0 0
149 45 0 11 0 0 0 0 0
158 53 1 39 0 0 0 0 0
161 24 1 34 0 0 0 1 0
169 34 0 65 0 0 0 0 0
174 20 1 34 0 0 0 1 0
174 20 1 34 0 0 0 1 0
179 39 0  9 0 0 0 1 0
185 34 1 55 0 0 0 0 1
188 73 1 57 0 0 0 1 0
188 73 1 57 0 0 0 1 0
188 74 1 80 0 0 0 1 0
189 29 0 61 0 0 0 1 0
189 29 0 61 0 0 0 0 0
189 31 0 61 0 0 0 0 0
190 26 0 72 0 0 0 1 0
196 49 0 14 0 0 0 1 0
197 70 0 14 0 0 0 1 0
198 27 1 56 0 0 0 0 0
206 49 0 44 0 0 0 0 0
208 72 0 30 2 0 1 1 1
209 44 1 56 0 0 0 1 0
210 36 1 32 1 0 0 1 0
210 37 1 72 1 0 0 0 0
210 38 1 58 1 0 1 0 0
211 74 1 58 2 0 1 0 0
213 74 0  9 1 0 0 1 0
228 88 0  9 0 0 2 1 0
229 23 1 65 0 0 0 0 0
230 49 1 29 0 0 0 0 0
232 63 0 57 1 0 . 1 1
239 52 0 57 0 0 0 1 0
247 28 1 45 0 0 0 0 0
250 76 1 14 2 0 1 0 0
250 76 1 14 2 0 1 1 0
255 80 0 58 0 0 2 0 1
261 64 0  1 0 0 0 1 1
261 64 0  1 0 0 2 1 0
267 94 0 14 2 0 1 1 0
268 56 1 29 0 0 0 1 0
270 70 0 30 0 0 1 1 0
275 62 1 56 0 0 0 0 0
277 74 0 58 2 0 1 0 0
278 32 0 14 0 0 0 0 1
end
label values Sex KON
label def KON 0 "0. Female", modify
label def KON 1 "1. Male", modify
label values grpci grpci
label def grpci 0 "0. Charlson Index Score 0", modify
label def grpci 1 "1. Charlson Index Score 1", modify
label def grpci 2 "2. Charlson Index Score >1", modify
label values ASA ASA_4cat
label def ASA_4cat 0 "ASA Score 1-2", modify
label def ASA_4cat 1 "ASA Score 3", modify
label def ASA_4cat 2 "ASA Score 4", modify

Tags: None

Felix Bittmann

Join Date: Aug 2018

Posts: 702
#2

20 Jan 2025, 06:29

You might want to try a linear model as a faster alternative (mixed). In any case, this is not a fully crossed model. You need:

Code:

mi estimate, or: melogit Complication i.NightTime i.ASA i.Sex Age i.grpci i.CaseUrgency || _all: R.Hospital || _all: R.Id

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Jesper Eriksson

Join Date: Oct 2016

Posts: 98
#3

20 Jan 2025, 09:31

Originally posted by Felix Bittmann View Post

You might want to try a linear model as a faster alternative (mixed). In any case, this is not a fully crossed model. You need:

Code:

mi estimate, or: melogit Complication i.NightTime i.ASA i.Sex Age i.grpci i.CaseUrgency || _all: R.Hospital || _all: R.Id

Thank you for the clarification. So if I understand the syntax above correctly; "_all" specifies that all variables are crossed in both "Id" and "Hospital"?

What if only a few variables varies within "Id" and "Hospital" respectively? Would the following be a correct specification?

Code:

mi estimate, or: melogit Complication i.NightTime i.ASA i.Sex Age i.grpci i.CaseUrgency || Hospital: NightTime CaseUrgency || Id: Sex Age grpci ASA
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#4

20 Jan 2025, 10:35

I think you are confusing random slopes with the level at which a variable varies. By coding

Code:

Hospital: NightTime CaseUrgency || Id: Sex Age grpci ASA

you are specifying random slopes at these levels. For example, you are saying that the sex variable has a different effect for each Id, and that CaseUrgency has a different effect at each Hospital. Now, perhaps the latter makes sense and is what you intend. But the former does not. In fact, estimating this model will probably fail to converge because it will likely not be possible to disentangle the residual error term from the individual level random slopes of variables which are defined and vary at the individual level (Sex and Age; I don't know what grpci and ASA are, so this may or may not apply to them.)

In any case, the important general point here is that when you last variables in the random portion of the model at a given level, you are telling Stata to calculate random slopes for those variables at that level. This notation should not be used to try to tell Stata what level the variables are defined at (vary at). There is no need to specify that information in the command at all. And, most important, it is invalid to specify a random slope for a variable at the same level where it is defined.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 702
#5

20 Jan 2025, 11:04

In addition to Clyde's excellent advice, here are some more pointers regarding crossed effects:

https://blog.stata.com/2010/12/22/in...ffects-models/

https://www.stata-press.com/books/mu...odeling-stata/

While one can be really complex with these models, you should start simple. Especially since the computational times are already very long in your case, you might want to start with a simple baseline model. Then you can add higher-level covariates (random slopes) if necessary.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Jesper Eriksson

Join Date: Oct 2016

Posts: 98
#6

20 Jan 2025, 11:51

Thank you Clyde and Felix for your answers. Much appreciated.
Ok, this was indeed complex. Further, I was not giving a clear explanation of my concern.

I have patients (Id) that are undergoing surgery. I want to examine the risk of complications (Complication). My concerns are:
Some patients are undergoing several surgeries during the study period, some are only undergoing one surgery. A form of repeated measurements one could argue.

I suspect that there are different risks of complications at different hospitals.

Patients undergoing several surgeries might not do all of these surgeries at the same hospital. One surgery could very well be performed at hospital A and the next at hospital B, third at hospital G and so on...

I want to take the these matters into account. My thought was therefore to use random effects in a crossed random effects model. I might be wrong here, please correct me if this is the wrong way to go.

So a more correct model would look like this (?):

Code:

mi estimate, or: melogit Complication i.NightTime i.ASA i.Sex Age i.grpci i.CaseUrgency || _all: R.Hospital || _all: R.Id

If I have understood the links provided and my read-up on the subject, this specifies that both Hospital and Id will have random effects, but that they are not nested in each other in any way. Correct?
Does this make sense or am I still lost…?
Comment
Jesper Eriksson

Join Date: Oct 2016

Posts: 98
#7

20 Jan 2025, 11:54

The code in comment #6 is the same code Felix provided. The problem I have with this is that it takes forever to run. Is there a way to make it faster? My dataset is roughly 500k surgeries, I read about the option

Code:

option intmethod(pclaplace)

But I do not understand the consequences of this.

Last edited by Jesper Eriksson; 20 Jan 2025, 11:56.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 702
#8

20 Jan 2025, 12:58

These are all excellent questions and to get a competent answer you need to ask health researchers or look at published articles in the relevant journals of your field. For example, while time and hospitals can be treated as levels, one could also argue that these are fixed effect. It also depends: are these factors just nuisances or main research aspects? And, as you see, the practical questions are always relevant. Even if you have a great model yet it takes weeks to compute, is it helpful? And imputation does not make things easier or faster...

If you have a binary outcome variable, you can try this simpler model first:

Code:

mi estimate, dots: mixed Complication i.NightTime i.ASA i.Sex Age i.grpci i.CaseUrgency, vce(cluster Id) || Hospital:

Here we leave the Id aspect as a cluster SE and focus on the hospitals.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Jesper Eriksson

Join Date: Oct 2016

Posts: 98
#9

20 Jan 2025, 13:56

Originally posted by Felix Bittmann View Post

These are all excellent questions and to get a competent answer you need to ask health researchers or look at published articles in the relevant journals of your field. For example, while time and hospitals can be treated as levels, one could also argue that these are fixed effect. It also depends: are these factors just nuisances or main research aspects? And, as you see, the practical questions are always relevant. Even if you have a great model yet it takes weeks to compute, is it helpful? And imputation does not make things easier or faster...

If you have a binary outcome variable, you can try this simpler model first:

Code:

mi estimate, dots: mixed Complication i.NightTime i.ASA i.Sex Age i.grpci i.CaseUrgency, vce(cluster Id) || Hospital:

Here we leave the Id aspect as a cluster SE and focus on the hospitals.

Thank you again for the advice. I will try using clustered SE when I get to my computer tomorrow morning. Hopefully that will take a little less time. I am as always grateful for this forum.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#10

20 Jan 2025, 14:35

How many different hospitals are in the data? I would imagine that the number of patients in your data set is very large, but the number of hospitals might just be modest. If the number is relatively small, I would keep the Id: random effect but have Hospital as a fixed effect. That way you still capture both sources of variation in the modeling. But having Hospital as a fixed effect instead of a random intercept level will be faster, and it will automatically be crossed with Id:. It will also provide more useful results if part of your research question involves contrasting various hospitals.

Another way to speed things up a bit and keep the crossed random effects:

Code:

... mixed ... || _all: R.Hospital || Id:

This fits exactly the same model as the version with -|| _all: R.Hospital || _all: R. Id-, but is both faster and less demanding of memory. If the numbers of Ids is very large, and much greater than the number of Hospitals, the speedup here will be quite substantial.

Another possibility is to eliminate the repeated-patient level by changing the underlying data set: randomly select and retain only one surgery per patient. But don't do this if you think it likely that the "frequent fliers" have, systematically, a different level of complication risk from the occasionally operated upon. Or, closer to the original data set, for patients operated at multiple hospitals, for each patient select and retain only one hospital per patient. This approach would probably be OK if "frequent fliers" have a different level of risk, but would be problematic if patients who use multiple hospitals have a systematically different risk.

Finally, I have to say that execution times measured in days, as you described it in #1, is probably to be expected in a three-level model with N = 500,000 and multiple imputation on top of that! And it will be longer with crossed effects than it would be were the effects nested. I have had simpler models than this take 2 weeks to complete estimation. As Felix Bittman says, you have to decide if it is worth it to invest that much compute time into this question, or whether you need to simplify the project in some way that may be more approximate. But if the results are not needed urgently, and none of these speed-ups are appropriate and work out, consider letting the thing run till it's done.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 702
#11

21 Jan 2025, 01:09

Clyde brings up quite relevant aspects here. It would be really interesting to see: how many hospitals are in the dataset? How many observations are there per ID? You can test this with an empty model like this:

Code:

mi estimate: mixed Complication || Hospital: mi estimate: mixed Complication || Id:

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Jesper Eriksson

Join Date: Oct 2016

Posts: 98
#12

21 Jan 2025, 01:23

Originally posted by Clyde Schechter View Post

How many different hospitals are in the data? I would imagine that the number of patients in your data set is very large, but the number of hospitals might just be modest. If the number is relatively small, I would keep the Id: random effect but have Hospital as a fixed effect. That way you still capture both sources of variation in the modeling. But having Hospital as a fixed effect instead of a random intercept level will be faster, and it will automatically be crossed with Id:. It will also provide more useful results if part of your research question involves contrasting various hospitals.

Another way to speed things up a bit and keep the crossed random effects:

Code:

... mixed ... || _all: R.Hospital || Id:

This fits exactly the same model as the version with -|| _all: R.Hospital || _all: R. Id-, but is both faster and less demanding of memory. If the numbers of Ids is very large, and much greater than the number of Hospitals, the speedup here will be quite substantial.

Another possibility is to eliminate the repeated-patient level by changing the underlying data set: randomly select and retain only one surgery per patient. But don't do this if you think it likely that the "frequent fliers" have, systematically, a different level of complication risk from the occasionally operated upon. Or, closer to the original data set, for patients operated at multiple hospitals, for each patient select and retain only one hospital per patient. This approach would probably be OK if "frequent fliers" have a different level of risk, but would be problematic if patients who use multiple hospitals have a systematically different risk.

Finally, I have to say that execution times measured in days, as you described it in #1, is probably to be expected in a three-level model with N = 500,000 and multiple imputation on top of that! And it will be longer with crossed effects than it would be were the effects nested. I have had simpler models than this take 2 weeks to complete estimation. As Felix Bittman says, you have to decide if it is worth it to invest that much compute time into this question, or whether you need to simplify the project in some way that may be more approximate. But if the results are not needed urgently, and none of these speed-ups are appropriate and work out, consider letting the thing run till it's done.

There are 79 hospitals, 994k surgeries, 366k patients. My plan was to use multiple imputation with 20 iterations on missing ASA (none of the other variables have missingness). I will try...

Code:

... mixed ... || _all: R.Hospital || Id:

... and see if the speedup works. Eliminating all but one surgery is not an appealing choice since we believe that, as you correctly presumed, frequent fliers have a different systematic level of complication risk. As compared to patients with only one or a few surgeries. We do not think that we can capture and model this extra risk using the variables in the model. For your other suggestion, keeping only one hospital is more appealing. However, we know that som hospitals (the larger University hospitals), take more complicated surgeries and patients. We are concerned that a patient that undergo surgery at one small hospital and are experiencing a complication are then scheduled for his/hers next surgery at a larger hospital (since the surgery now is viewed as more complicated).

I will try your code suggestion (for a few days) and see if I get a result. Thanks a lot for your help!
Comment
Jesper Eriksson

Join Date: Oct 2016

Posts: 98
#13

21 Jan 2025, 01:31

Originally posted by Felix Bittmann View Post

These are all excellent questions and to get a competent answer you need to ask health researchers or look at published articles in the relevant journals of your field. For example, while time and hospitals can be treated as levels, one could also argue that these are fixed effect. It also depends: are these factors just nuisances or main research aspects? And, as you see, the practical questions are always relevant. Even if you have a great model yet it takes weeks to compute, is it helpful? And imputation does not make things easier or faster...

If you have a binary outcome variable, you can try this simpler model first:

Code:

mi estimate, dots: mixed Complication i.NightTime i.ASA i.Sex Age i.grpci i.CaseUrgency, vce(cluster Id) || Hospital:

Here we leave the Id aspect as a cluster SE and focus on the hospitals.

Thank you Felix! However, when I try your code I get the following error message:

Code:

Imputations (20): highest-level groups are not nested within LopNr an error occurred when mi estimate executed mixed on m=1 r(459);

Is the reason for the error that the code says that observations are non-independent at Id level [vce(cluster Id)] BUT at the same time saying they are independent at other levels? Resulting in that there will be a "controversy" (in lack of a better word) with the random effect on Hospital? Or is the error due to something else?
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 702
#14

21 Jan 2025, 01:36

Yes I think it means that you have a crossed-effect dataset. Maybe Clyde's suggestion from #10 works better. You can first try without covariates or only use a few imputations to get a first impression:

Code:

mi estimate, nimp(5) dots: ...

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment

Jesper Eriksson

Join Date: Oct 2016
Posts: 98

#15

14 Feb 2025, 13:07

Originally posted by Felix Bittmann View Post

Clyde brings up quite relevant aspects here. It would be really interesting to see: how many hospitals are in the dataset? How many observations are there per ID? You can test this with an empty model like this:

Code:

mi estimate: mixed Complication || Hospital:
mi estimate: mixed Complication || Id:

Sorry for late reply Felix. Here is the results for the above code:

Code:

. mixed Complication || Hospital:

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood =  1054376.7  
Iteration 1:   log likelihood =  1054376.7  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =    442,911
Group variable: Hospital                        Number of groups  =         78

                                                Obs per group:
                                                              min =          1
                                                              avg =    5,678.3
                                                              max =     22,361

                                                Wald chi2(0)      =          .
Log likelihood =  1054376.7                     Prob > chi2       =          .

------------------------------------------------------------------------------
Complication |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   .0005243   .0000503    10.43   0.000     .0004257    .0006228
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Hospital: Identity           |
                  var(_cons) |   6.61e-08   2.45e-08      3.19e-08    1.37e-07
-----------------------------+------------------------------------------------
               var(Residual) |   .0005009   1.06e-06      .0004988     .000503
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 22.68         Prob >= chibar2 = 0.0000

Code:

. mixed Complication || Id:

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood =  1050268.8  
Iteration 1:   log likelihood =  1054363.5  
Iteration 2:   log likelihood =  1054365.4  
Iteration 3:   log likelihood =  1054365.4  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =    442,911
Group variable: Id                              Number of groups  =    365,960

                                                Obs per group:
                                                              min =          1
                                                              avg =        1.2
                                                              max =        155

                                                Wald chi2(0)      =          .
Log likelihood =  1054365.4                     Prob > chi2       =          .

------------------------------------------------------------------------------
Complication |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   .0005012   .0000336    14.90   0.000     .0004353    .0005671
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
Id: Identity                 |
                  var(_cons) |   3.19e-11   2.67e-12      2.71e-11    3.76e-11
-----------------------------+------------------------------------------------
               var(Residual) |    .000501   1.06e-06      .0004989    .0005031
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 0.00          Prob >= chibar2 = 1.0000

If I may ask your opinion on a way forward I would be grateful. Is there a way to evaluate if random effects on Id has a significant impact on model fit. If not, maybe I just can skip using Id as a random effect which would make the model much easier to compute. I have tried:

Code:

logit Complication i.NightTime i.ASA i.Sex Age i.grpci i.CaseUrgency, or 

Iteration 0:   log likelihood = -1712.3685  
Iteration 1:   log likelihood = -1642.3642  
Iteration 2:   log likelihood = -1578.7786  
Iteration 3:   log likelihood = -1564.5523  
Iteration 4:   log likelihood = -1564.4664  
Iteration 5:   log likelihood = -1564.4664  

Logistic regression                             Number of obs     =    415,331
                                                LR chi2(9)        =     295.80
                                                Prob > chi2       =     0.0000
Log likelihood = -1564.4664                     Pseudo R2         =     0.0864

---------------------------------------------------------------------------------------------
               Complication | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
                1.NightTime |   1.118775   .1622073     0.77   0.439     .8420349    1.486468
                            |
                        ASA |
               ASA Score 3  |   1.315612   .2758496     1.31   0.191     .8722742    1.984279
               ASA Score 4  |   3.410797   .8047113     5.20   0.000     2.147986    5.416021
               ASA Score 5  |    9.90411   4.168505     5.45   0.000     4.340666    22.59824
                            |
                        Sex |
                    1. Man  |   .8238151   .1203674    -1.33   0.185     .6186726     1.09698
                        Age |   1.040259   .0060355     6.80   0.000     1.028496    1.052156
                            |
                      grpci |
 1. Charlson Index Score 1  |   2.309881   .5464024     3.54   0.000     1.452907    3.672327
2. Charlson Index Score >1  |   2.579663   .5503176     4.44   0.000     1.698153    3.918765
                            |
              1.CaseUrgency |   2.425043   .4727564     4.54   0.000      1.65493    3.553523
                      _cons |   .0000109   4.82e-06   -25.72   0.000     4.54e-06    .0000259
---------------------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

. est store ModelNonRE

. 
. xtset Id 
       panel variable:  Id (unbalanced)

. xtlogit Complication i.NightTime i.ASA i.Sex Age i.grpci i.CaseUrgency, re or 

Fitting comparison model:

Iteration 0:   log likelihood = -1712.3685  
Iteration 1:   log likelihood = -1642.3642  
Iteration 2:   log likelihood = -1578.7786  
Iteration 3:   log likelihood = -1564.5523  
Iteration 4:   log likelihood = -1564.4664  
Iteration 5:   log likelihood = -1564.4664  

Fitting full model:

tau =  0.0     log likelihood = -1564.4664
tau =  0.1     log likelihood = -1564.7726

Iteration 0:   log likelihood = -1564.7726  
Iteration 1:   log likelihood = -1564.4655  
Iteration 2:   log likelihood = -1564.4651  (not concave)
Iteration 3:   log likelihood = -1564.4651  (not concave)
Iteration 4:   log likelihood =  -1564.465  
Iteration 5:   log likelihood =  -1564.465  
Iteration 6:   log likelihood = -1564.4646  
Iteration 7:   log likelihood = -1564.4646  

Random-effects logistic regression              Number of obs     =    415,331
Group variable: Id                              Number of groups  =    347,144

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          1
                                                              avg =        1.2
                                                              max =        153

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(9)      =     264.90
Log likelihood  = -1564.4646                    Prob > chi2       =     0.0000

---------------------------------------------------------------------------------------------
               Complication | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
                1.NightTime |   1.118822    .162268     0.77   0.439     .8419907    1.486671
                            |
                        ASA |
               ASA Score 3  |   1.315553   .2758664     1.31   0.191     .8721968    1.984275
               ASA Score 4  |   3.412183   .8052329     5.20   0.000     2.148618    5.418827
               ASA Score 5  |   9.915997   4.180516     5.44   0.000     4.339859    22.65672
                            |
                        Sex |
                    1. Man  |   .8237489   .1204135    -1.33   0.185     .6185409    1.097037
                        Age |   1.040273   .0060377     6.80   0.000     1.028506    1.052174
                            |
                      grpci |
 1. Charlson Index Score 1  |   2.310142   .5465725     3.54   0.000     1.452938     3.67308
2. Charlson Index Score >1  |   2.580193   .5505217     4.44   0.000     1.698384     3.91984
                            |
              1.CaseUrgency |   2.426088   .4732588     4.54   0.000     1.655244    3.555913
                      _cons |   9.93e-06   5.60e-06   -20.42   0.000     3.29e-06      .00003
----------------------------+----------------------------------------------------------------
                   /lnsig2u |  -1.738451   3.906298                     -9.394654    5.917753
----------------------------+----------------------------------------------------------------
                    sigma_u |   .4192762   .8189089                      .0091196     19.2763
                        rho |   .0507241   .1880928                      .0000253    .9912239
---------------------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
Note: _cons estimates baseline odds (conditional on zero random effects).
LR test of rho=0: chibar2(01) = 3.5e-03                Prob >= chibar2 = 0.476

. est store ModelRE 

. lrtest  ModelRE ModelNonRE, force 

Likelihood-ratio test                                 LR chi2(1)  =      0.00
(Assumption: ModelNonRE nested in ModelRE)            Prob > chi2 =    0.9526

. estimates table ModelRE ModelNonRE, stats(aic bic)

----------------------------------------
    Variable |  ModelRE     ModelNonRE  
-------------+--------------------------
Complication |
   NightTime |
          1  |  .11227651    .11223443  
             |
         ASA |
ASA Score 3  |  .27425678    .27430199  
ASA Score 4  |  1.2273522     1.226946  
ASA Score 5  |  2.2941493    2.2929499  
             |
         Sex |
     1. Man  | -.19388951   -.19380922  
             |
         Age |  .03948276    .03946957  
             |
       grpci |
1. Charls..  |  .83730909    .83719583  
2. Charls..  |  .94786403    .94765882  
             |
 CaseUrgency |
          1  |   .8862801    .88584916  
             |
       _cons | -11.519956   -11.431126  
-------------+--------------------------
    /lnsig2u | -1.7384508               
-------------+--------------------------
Statistics   |                          
         aic |  3150.9292    3148.9327  
         bic |  3271.2343     3258.301  
----------------------------------------

The estimates are very similar regardless if I am using random effects on Id or just using a standard logistic regression.
But this requires me to use "force" option after lrtest since i use different estimators. I was also thinking of comparing AIC/BIC to give support to not include Id as a random effect since this does not seem to increase model fit. Does the results above give evidence to simplify the procedure and use the following code instead:

Code:

. xtset Hospital 
       panel variable:  Hospital (unbalanced)

. xtlogit Complication i.NightTime i.ASA i.Sex Age i.grpci i.CaseUrgency, re or

Fitting comparison model:

Iteration 0:   log likelihood = -1712.3685  
Iteration 1:   log likelihood = -1642.3642  
Iteration 2:   log likelihood = -1578.7786  
Iteration 3:   log likelihood = -1564.5523  
Iteration 4:   log likelihood = -1564.4664  
Iteration 5:   log likelihood = -1564.4664  

Fitting full model:

tau =  0.0     log likelihood = -1564.4664
tau =  0.1     log likelihood = -1559.1837
tau =  0.2     log likelihood = -1558.6451
tau =  0.3     log likelihood = -1559.9916

Iteration 0:   log likelihood = -1558.6464  
Iteration 1:   log likelihood = -1558.2669  
Iteration 2:   log likelihood = -1558.2649  
Iteration 3:   log likelihood = -1558.2649  

Random-effects logistic regression              Number of obs     =    415,331
Group variable: Hospital                        Number of groups  =         78

Random effects u_i ~ Gaussian                   Obs per group:
                                                              min =          1
                                                              avg =    5,324.8
                                                              max =     21,118

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(9)      =     259.30
Log likelihood  = -1558.2649                    Prob > chi2       =     0.0000

---------------------------------------------------------------------------------------------
               Complication | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------------------+----------------------------------------------------------------
                1.NightTime |   1.124424   .1634261     0.81   0.420     .8456966    1.495015
                            |
                        ASA |
               ASA Score 3  |   1.322321   .2787068     1.33   0.185     .8748396     1.99869
               ASA Score 4  |   3.380232   .8060138     5.11   0.000     2.118254    5.394049
               ASA Score 5  |   9.514072   4.030539     5.32   0.000     4.147283    21.82575
                            |
                        Sex |
                    1. Man  |    .827965   .1210764    -1.29   0.197      .621638    1.102774
                        Age |   1.039377   .0060664     6.62   0.000     1.027555    1.051336
                            |
                      grpci |
 1. Charlson Index Score 1  |   2.376717    .563484     3.65   0.000      1.49338     3.78255
2. Charlson Index Score >1  |   2.625058   .5615729     4.51   0.000      1.72601    3.992405
                            |
              1.CaseUrgency |   2.506501   .5006844     4.60   0.000     1.694481    3.707653
                      _cons |   .0000105   4.74e-06   -25.39   0.000     4.33e-06    .0000254
----------------------------+----------------------------------------------------------------
                   /lnsig2u |  -1.500606   .4843402                     -2.449895   -.5513163
----------------------------+----------------------------------------------------------------
                    sigma_u |   .4722235   .1143584                      .2937731    .7590724
                        rho |   .0634796    .028794                      .0255623    .1490383
---------------------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
Note: _cons estimates baseline odds (conditional on zero random effects).
LR test of rho=0: chibar2(01) = 12.40                  Prob >= chibar2 = 0.000

Does this seem like a reasonable approach?

This does not allow me to use multiple imputation, since likelihood varies between the imputed sets. I.e. I do not get "one" likelihood to compare against. Thoughts on this?

Announcement

Mixed effects model and crossed random effects

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment