Multilevel modeling using the mixed command - incorrect group number displayed in output?

Sophie Dibbern

Join Date: Sep 2017

Posts: 23
#1

Multilevel modeling using the mixed command - incorrect group number displayed in output?

07 Nov 2017, 16:48

Dear Statalist users,

I am estimating a multilevel model with three levels using the mixed command in Stata. Level 3 is the team (TeamID), level 2 is the individual (ID), and level 1 is the observation. In total, I have 187 observations.

I checked how many teams and individuals are in my sample, using the following code:

Code:

by TeamID, sort: gen nvals = _n == 1 count if nvals drop nvals by ID, sort: gen nvals = _n == 1 count if nvals drop nvals

From the output (below), I conclude that I have 57 teams and 114 individuals in my sample.

In the next step, I used the mixed command to estimate my model.

Code:

xtmixed leadTSat gcTSat Maj_owner Gender cEntExp cAgeDis cTAge2 TeamSize Sales_yes || TeamID: || ID: , mle cov(unstr) vsquish

BUT the group size for the ID variable indicated in the output is not consistent with number of IDs in my sample.

Can anyone help me spot the mistake? I am very certain that I do not have more than 114 different IDs in my sample (I also counted them manually).

I appreciate any advice! Thank you.

Last edited by Sophie Dibbern; 07 Nov 2017, 16:51.
Tags: hierarchical model, HLM, mixed, multilevel, xtmixed
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#2

08 Nov 2017, 02:45

Sophie:
I would investigate that issue further via:
-duplicates- and -isid-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Sophie Dibbern

Join Date: Sep 2017

Posts: 23
#3

08 Nov 2017, 04:50

Carlo, thank you very much for your advice.
I used the commands you suggested but could not identify any duplicates in terms of ID and the time variable Q. ID and Q (and TeamID) uniquely identify the observations.
Do you know what else I could do?
Comment

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

08 Nov 2017, 05:10

Just as a side note, to Carlo's advice, I recommend to use the updated - mixed - instead of - xtmixed - command.

That said, please take a look at the example below:

Code:

. webuse productivity
(Public Capital Productivity)

.  mixed gsp private emp hwy water other unemp || region: || state:, mle

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log likelihood =  1430.5017  
Iteration 1:   log likelihood =  1430.5017  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =        816

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
         region |          9         51       90.7        136
          state |         48         17       17.0         17
-------------------------------------------------------------

                                                Wald chi2(6)      =   18829.06
Log likelihood =  1430.5017                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
         gsp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     private |   .2671484   .0212591    12.57   0.000     .2254814    .3088154
         emp |    .754072   .0261868    28.80   0.000     .7027468    .8053973
         hwy |   .0709767    .023041     3.08   0.002     .0258172    .1161363
       water |   .0761187   .0139248     5.47   0.000     .0488266    .1034109
       other |  -.0999955   .0169366    -5.90   0.000    -.1331906   -.0668004
       unemp |  -.0058983   .0009031    -6.53   0.000    -.0076684   -.0041282
       _cons |   2.128823   .1543854    13.79   0.000     1.826233    2.431413
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
region: Identity             |
                  var(_cons) |   .0014506   .0012995      .0002506    .0083957
-----------------------------+------------------------------------------------
state: Identity              |
                  var(_cons) |   .0062757   .0014871      .0039442    .0099855
-----------------------------+------------------------------------------------
               var(Residual) |   .0013461   .0000689      .0012176    .0014882
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 1154.73               Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

. su state

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       state |        816        24.5     13.8619          1         48

. su region

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      region |        816    4.958333    2.459134          1          9

. by state, sort: gen nvals1 = _n == 1

. by region , sort: gen nvals2 = _n == 1

. count if nvals1
  48

. count if nvals2
  9

. di 48*17
816

In your case, that would give 185.6 (just 1.5 less than what is seen in the first part of the output, and I suspect it may be due to rounding.

Last edited by Marcos Almeida; 08 Nov 2017, 05:12.

Best regards,

Marcos

Comment

Sophie Dibbern

Join Date: Sep 2017
Posts: 23

08 Nov 2017, 06:45

Marcos, thank you for your help. I now used the mixed command and the post estimation commands as used in the example you posted. Here are my results:

Code:

. mixed leadTSat gcTSat Maj_owner Gender cEntExp cAgeDis cTAge2 TeamSize Sales_yes if
>  Q== 1 | Q==3 || TeamID: || ID: , mle cov(unstr) vsquish noretable
Note: single-variable random-effects specification in ID equation; covariance
      structure set to identity

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -269.90806  
Iteration 1:   log likelihood = -269.85633  
Iteration 2:   log likelihood = -269.85627  
Iteration 3:   log likelihood = -269.85627  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =        187

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
         TeamID |         57          1        3.3          6
             ID |        116          1        1.6          2
-------------------------------------------------------------

                                                Wald chi2(8)      =      14.32
Log likelihood = -269.85627                     Prob > chi2       =     0.0737

------------------------------------------------------------------------------
    leadTSat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      gcTSat |   .1017746    .105635     0.96   0.335    -.1052661    .3088154
   Maj_owner |  -.5315755   .2909966    -1.83   0.068    -1.101918    .0387674
      Gender |  -1.054702   .3339076    -3.16   0.002    -1.709149   -.4002554
     cEntExp |  -.0525345   .1193247    -0.44   0.660    -.2864066    .1813377
     cAgeDis |   .0406582    .048772     0.83   0.404    -.0549331    .1362495
      cTAge2 |    .013847   .0522518     0.27   0.791    -.0885645    .1162586
    TeamSize |   .0332104   .1703208     0.19   0.845    -.3006122    .3670331
   Sales_yes |  -.0392538   .2483863    -0.16   0.874     -.526082    .4475744
       _cons |   6.151467   .4764265    12.91   0.000     5.217688    7.085246
------------------------------------------------------------------------------

. su ID

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          ID |        187    88.04278    49.47301          1        170

. su TeamID

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      TeamID |        187    33.94652     19.5431          1         66

. by ID, sort: gen nvals1 = _n ==1

. by TeamID, sort: gen nvals2 = _n ==1

. count if nvals1
  114

. count if nvals2
  57

. bys ID : gen n = _N if _n == 1 
(73 missing values generated)

. su n

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           n |        114    1.640351    .4820163          1          2

. di 114*1.640351
187.00001

I can't make sense of why the number of groups for ID in the output table after the mixed command is 116 rather than 114.

Sorry for bothering you again, but any additional advice would really help me a lot!

Thank you.

Comment

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#6

08 Nov 2017, 08:35

Great that you have now realized the whereabouts of the number of observations.

With regards to the extra mistery, i.e., 2 additional IDs, hazarding a guess, and considering you denied having duplicates (an issue Carlo pointed out in #2) ,I wonder whether you could post the output of:

Code:

codebook ID

Best regards,

Marcos
Comment

Sophie Dibbern

Join Date: Sep 2017
Posts: 23

08 Nov 2017, 08:39

Thank you, Marcos. I just ran the command:

Code:

 codebook ID

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
ID                                                                                                                                                     Respondent ID
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

                  type:  numeric (double)

                 range:  [1,170]                      units:  1
         unique values:  114                      missing .:  0/187

                  mean:   88.0428
              std. dev:    49.473

           percentiles:        10%       25%       50%       75%       90%
                                17        41        95       129       151

Comment

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

08 Nov 2017, 11:56

Sorry for the delay, due to being exponentially busy for the last couple of hours.

I gather you must have missing data. Please see the same example, now with 2 missing values:

Code:

. webuse productivity
(Public Capital Productivity)

. replace region =. in 633
(1 real change made, 1 to missing)

. replace region =. in 634
(1 real change made, 1 to missing)

. mixed gsp private emp hwy water other unemp || region: || state:, mle vsquish nolog

Mixed-effects ML regression                     Number of obs     =        814

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
         region |          9         51       90.4        136
          state |         48         15       17.0         17
-------------------------------------------------------------

                                                Wald chi2(6)      =   18836.62
Log likelihood =  1428.7987                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
         gsp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     private |    .263416   .0212915    12.37   0.000     .2216855    .3051465
         emp |   .7579398   .0261903    28.94   0.000     .7066077    .8092719
         hwy |   .0755007    .023097     3.27   0.001     .0302314    .1207701
       water |   .0780542   .0139162     5.61   0.000      .050779    .1053295
       other |  -.1051114   .0170247    -6.17   0.000    -.1384792   -.0717437
       unemp |  -.0059328   .0009014    -6.58   0.000    -.0076996    -.004166
       _cons |   2.131285   .1544442    13.80   0.000      1.82858     2.43399
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
region: Identity             |
                  var(_cons) |    .001492   .0013204      .0002633    .0084538
-----------------------------+------------------------------------------------
state: Identity              |
                  var(_cons) |   .0062905   .0014915      .0039524    .0100117
-----------------------------+------------------------------------------------
               var(Residual) |   .0013387   .0000686      .0012107    .0014801
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 1145.76               Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

. by state, sort: gen nvals3 = _n == 1

. by region , sort: gen nvals4 = _n == 1

. count if nvals3
  48

. count if nvals4
  10

. list state region nvals3 nvals4 if region > 9

     +----------------------------------+
     | state   region   nvals3   nvals4 |
     |----------------------------------|
815. |    38        .        0        1 |
816. |    38        .        0        0 |
     +----------------------------------+

On account of these 2 missing values, you will "apparently" get a difference, if compared to the Stata output:

Code:

di 48*17
816

By the way, the Stata output under - mixed - is the correct one (Number of obs.), unsurprisingly enough!

Last edited by Marcos Almeida; 08 Nov 2017, 12:10.

Best regards,

Marcos

Comment

Sophie Dibbern

Join Date: Sep 2017

Posts: 23
#9

08 Nov 2017, 12:16

Dear Marcos,

I really appreciate it a lot that you give me advice despite being so busy.

It makes sense that I have some missing data. I just checked that, but no missing data was indicated:

Code:

mdesc TeamID ID Q Variable | Missing Total Percent Missing ----------------+----------------------------------------------- TeamID | 0 187 0.00 ID | 0 187 0.00 Q | 0 187 0.00 ----------------+-----------------------------------------------

I just wondered whether one ID is mistakenly allocated to more than one team and if this could be the source of the mistery. I will check that.

Thank you so much again and I am looking forward to any comment!
Comment

Sophie Dibbern

Join Date: Sep 2017
Posts: 23

#10

08 Nov 2017, 12:44

Actually, that was the problem. At one measurement point, 2 IDs had been assigned to the wrong TeamID.

I used the following command to identify IDs that were assigned to more than one TeamID (there are probably other commands that give a more "convenient" output).

Code:

by ID: tab TeamID

Finally, the numbers of group displayed in the mixed output are consistent with the actual numbers of groups at level 2 and 3.

Code:

. mixed leadTSat gcTSat Maj_owner Gender cEntExp cAgeDis cTAge2 cTeamSize Sales_yes if Q== 1 | Q==3 || TeamID: || ID: , mle cov(unstr) vsquish noretable
Note: single-variable random-effects specification in ID equation; covariance structure set to identity

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -269.27562  
Iteration 1:   log likelihood = -269.23316  
Iteration 2:   log likelihood = -269.23311  
Iteration 3:   log likelihood = -269.23311  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =        187

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
         TeamID |         56          1        3.3          6
             ID |        114          1        1.6          2
-------------------------------------------------------------

                                                Wald chi2(8)      =      14.37
Log likelihood = -269.23311                     Prob > chi2       =     0.0726

------------------------------------------------------------------------------
    leadTSat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      gcTSat |   .1107992   .1037334     1.07   0.285    -.0925145     .314113
   Maj_owner |  -.5244927   .2925602    -1.79   0.073      -1.0979    .0489148
      Gender |  -1.046296    .335788    -3.12   0.002    -1.704429   -.3881639
     cEntExp |  -.0490503   .1201821    -0.41   0.683    -.2846028    .1865022
     cAgeDis |    .042868   .0490445     0.87   0.382    -.0532575    .1389935
      cTAge2 |   .0166339   .0526938     0.32   0.752     -.086644    .1199118
   cTeamSize |   .0444763   .1718388     0.26   0.796    -.2923215     .381274
   Sales_yes |  -.0548911   .2506432    -0.22   0.827    -.5461428    .4363605
       _cons |   6.248978    .230861    27.07   0.000     5.796499    6.701458
------------------------------------------------------------------------------

. su ID

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          ID |        187    88.04278    49.47301          1        170

. su TeamID

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      TeamID |        187    34.25668    19.53349          1         66

. by ID, sort: gen nvals1 = _n ==1

. by TeamID, sort: gen nvals2 = _n ==1

. count if nvals1
  114

. count if nvals2
  56

. bys ID : gen n = _N if _n == 1 
(73 missing values generated)

. su n

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           n |        114    1.640351    .4820163          1          2

. di 114* 1.640351
187.00001

Thank you Marcos for having this conversation with me, it really helped me a lot understanding my data.
Have a nice evening!

Comment

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#11

08 Nov 2017, 13:39

Thank you, Sophie, for sharing the information/command/output appropriately and for considering your query reached a satisfactory closure.

Best regards,

Marcos
Comment

Announcement