Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multilevel modeling using the mixed command - incorrect group number displayed in output?

    Dear Statalist users,

    I am estimating a multilevel model with three levels using the mixed command in Stata. Level 3 is the team (TeamID), level 2 is the individual (ID), and level 1 is the observation. In total, I have 187 observations.

    I checked how many teams and individuals are in my sample, using the following code:

    Code:
    by TeamID, sort: gen nvals = _n == 1
    count if nvals
    drop nvals
    by ID, sort: gen nvals = _n == 1
    count if nvals
    drop nvals
    From the output (below), I conclude that I have 57 teams and 114 individuals in my sample.
    Click image for larger version

Name:	Unbenannt2.PNG
Views:	1
Size:	8.0 KB
ID:	1417460










    In the next step, I used the mixed command to estimate my model.

    Code:
    xtmixed leadTSat gcTSat Maj_owner Gender cEntExp cAgeDis cTAge2 TeamSize Sales_yes || TeamID: || ID: , mle cov(unstr) vsquish
    BUT the group size for the ID variable indicated in the output is not consistent with number of IDs in my sample.
    Click image for larger version

Name:	Unbenannt.PNG
Views:	1
Size:	7.2 KB
ID:	1417459





    Can anyone help me spot the mistake? I am very certain that I do not have more than 114 different IDs in my sample (I also counted them manually).

    I appreciate any advice! Thank you.
    Last edited by Sophie Dibbern; 07 Nov 2017, 16:51.

  • #2
    Sophie:
    I would investigate that issue further via:
    -duplicates- and -isid-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo, thank you very much for your advice.
      I used the commands you suggested but could not identify any duplicates in terms of ID and the time variable Q. ID and Q (and TeamID) uniquely identify the observations.
      Do you know what else I could do?


      Comment


      • #4
        Just as a side note, to Carlo's advice, I recommend to use the updated - mixed - instead of - xtmixed - command.

        That said, please take a look at the example below:

        Code:
        . webuse productivity
        (Public Capital Productivity)
        
        .  mixed gsp private emp hwy water other unemp || region: || state:, mle
        
        Performing EM optimization:
        
        Performing gradient-based optimization:
        
        Iteration 0:   log likelihood =  1430.5017  
        Iteration 1:   log likelihood =  1430.5017  
        
        Computing standard errors:
        
        Mixed-effects ML regression                     Number of obs     =        816
        
        -------------------------------------------------------------
                        |     No. of       Observations per Group
         Group Variable |     Groups    Minimum    Average    Maximum
        ----------------+--------------------------------------------
                 region |          9         51       90.7        136
                  state |         48         17       17.0         17
        -------------------------------------------------------------
        
                                                        Wald chi2(6)      =   18829.06
        Log likelihood =  1430.5017                     Prob > chi2       =     0.0000
        
        ------------------------------------------------------------------------------
                 gsp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
             private |   .2671484   .0212591    12.57   0.000     .2254814    .3088154
                 emp |    .754072   .0261868    28.80   0.000     .7027468    .8053973
                 hwy |   .0709767    .023041     3.08   0.002     .0258172    .1161363
               water |   .0761187   .0139248     5.47   0.000     .0488266    .1034109
               other |  -.0999955   .0169366    -5.90   0.000    -.1331906   -.0668004
               unemp |  -.0058983   .0009031    -6.53   0.000    -.0076684   -.0041282
               _cons |   2.128823   .1543854    13.79   0.000     1.826233    2.431413
        ------------------------------------------------------------------------------
        
        ------------------------------------------------------------------------------
          Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
        -----------------------------+------------------------------------------------
        region: Identity             |
                          var(_cons) |   .0014506   .0012995      .0002506    .0083957
        -----------------------------+------------------------------------------------
        state: Identity              |
                          var(_cons) |   .0062757   .0014871      .0039442    .0099855
        -----------------------------+------------------------------------------------
                       var(Residual) |   .0013461   .0000689      .0012176    .0014882
        ------------------------------------------------------------------------------
        LR test vs. linear model: chi2(2) = 1154.73               Prob > chi2 = 0.0000
        
        Note: LR test is conservative and provided only for reference.
        
        . su state
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
               state |        816        24.5     13.8619          1         48
        
        . su region
        
            Variable |        Obs        Mean    Std. Dev.       Min        Max
        -------------+---------------------------------------------------------
              region |        816    4.958333    2.459134          1          9
        
        . by state, sort: gen nvals1 = _n == 1
        
        . by region , sort: gen nvals2 = _n == 1
        
        . count if nvals1
          48
        
        . count if nvals2
          9
        
        . di 48*17
        816
        In your case, that would give 185.6 (just 1.5 less than what is seen in the first part of the output, and I suspect it may be due to rounding.
        Last edited by Marcos Almeida; 08 Nov 2017, 05:12.
        Best regards,

        Marcos

        Comment


        • #5
          Marcos, thank you for your help. I now used the mixed command and the post estimation commands as used in the example you posted. Here are my results:

          Code:
          . mixed leadTSat gcTSat Maj_owner Gender cEntExp cAgeDis cTAge2 TeamSize Sales_yes if
          >  Q== 1 | Q==3 || TeamID: || ID: , mle cov(unstr) vsquish noretable
          Note: single-variable random-effects specification in ID equation; covariance
                structure set to identity
          
          Performing EM optimization: 
          
          Performing gradient-based optimization: 
          
          Iteration 0:   log likelihood = -269.90806  
          Iteration 1:   log likelihood = -269.85633  
          Iteration 2:   log likelihood = -269.85627  
          Iteration 3:   log likelihood = -269.85627  
          
          Computing standard errors:
          
          Mixed-effects ML regression                     Number of obs     =        187
          
          -------------------------------------------------------------
                          |     No. of       Observations per Group
           Group Variable |     Groups    Minimum    Average    Maximum
          ----------------+--------------------------------------------
                   TeamID |         57          1        3.3          6
                       ID |        116          1        1.6          2
          -------------------------------------------------------------
          
                                                          Wald chi2(8)      =      14.32
          Log likelihood = -269.85627                     Prob > chi2       =     0.0737
          
          ------------------------------------------------------------------------------
              leadTSat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                gcTSat |   .1017746    .105635     0.96   0.335    -.1052661    .3088154
             Maj_owner |  -.5315755   .2909966    -1.83   0.068    -1.101918    .0387674
                Gender |  -1.054702   .3339076    -3.16   0.002    -1.709149   -.4002554
               cEntExp |  -.0525345   .1193247    -0.44   0.660    -.2864066    .1813377
               cAgeDis |   .0406582    .048772     0.83   0.404    -.0549331    .1362495
                cTAge2 |    .013847   .0522518     0.27   0.791    -.0885645    .1162586
              TeamSize |   .0332104   .1703208     0.19   0.845    -.3006122    .3670331
             Sales_yes |  -.0392538   .2483863    -0.16   0.874     -.526082    .4475744
                 _cons |   6.151467   .4764265    12.91   0.000     5.217688    7.085246
          ------------------------------------------------------------------------------
          
          . su ID
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                    ID |        187    88.04278    49.47301          1        170
          
          . su TeamID
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                TeamID |        187    33.94652     19.5431          1         66
          
          . by ID, sort: gen nvals1 = _n ==1
          
          . by TeamID, sort: gen nvals2 = _n ==1
          
          . count if nvals1
            114
          
          . count if nvals2
            57
          
          . bys ID : gen n = _N if _n == 1 
          (73 missing values generated)
          
          . su n
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                     n |        114    1.640351    .4820163          1          2
          
          . di 114*1.640351
          187.00001
          I can't make sense of why the number of groups for ID in the output table after the mixed command is 116 rather than 114.

          Sorry for bothering you again, but any additional advice would really help me a lot!

          Thank you.

          Comment


          • #6
            Great that you have now realized the whereabouts of the number of observations.

            With regards to the extra mistery, i.e., 2 additional IDs, hazarding a guess, and considering you denied having duplicates (an issue Carlo pointed out in #2) ,I wonder whether you could post the output of:

            Code:
            codebook ID
            Best regards,

            Marcos

            Comment


            • #7
              Thank you, Marcos. I just ran the command:
              Code:
               codebook ID
              
              --------------------------------------------------------------------------------------------------------------------------------------------------------------------
              ID                                                                                                                                                     Respondent ID
              --------------------------------------------------------------------------------------------------------------------------------------------------------------------
              
                                type:  numeric (double)
              
                               range:  [1,170]                      units:  1
                       unique values:  114                      missing .:  0/187
              
                                mean:   88.0428
                            std. dev:    49.473
              
                         percentiles:        10%       25%       50%       75%       90%
                                              17        41        95       129       151

              Comment


              • #8
                Sorry for the delay, due to being exponentially busy for the last couple of hours.

                I gather you must have missing data. Please see the same example, now with 2 missing values:

                Code:
                . webuse productivity
                (Public Capital Productivity)
                
                . replace region =. in 633
                (1 real change made, 1 to missing)
                
                . replace region =. in 634
                (1 real change made, 1 to missing)
                
                . mixed gsp private emp hwy water other unemp || region: || state:, mle vsquish nolog
                
                Mixed-effects ML regression                     Number of obs     =        814
                
                -------------------------------------------------------------
                                |     No. of       Observations per Group
                 Group Variable |     Groups    Minimum    Average    Maximum
                ----------------+--------------------------------------------
                         region |          9         51       90.4        136
                          state |         48         15       17.0         17
                -------------------------------------------------------------
                
                                                                Wald chi2(6)      =   18836.62
                Log likelihood =  1428.7987                     Prob > chi2       =     0.0000
                
                ------------------------------------------------------------------------------
                         gsp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                     private |    .263416   .0212915    12.37   0.000     .2216855    .3051465
                         emp |   .7579398   .0261903    28.94   0.000     .7066077    .8092719
                         hwy |   .0755007    .023097     3.27   0.001     .0302314    .1207701
                       water |   .0780542   .0139162     5.61   0.000      .050779    .1053295
                       other |  -.1051114   .0170247    -6.17   0.000    -.1384792   -.0717437
                       unemp |  -.0059328   .0009014    -6.58   0.000    -.0076996    -.004166
                       _cons |   2.131285   .1544442    13.80   0.000      1.82858     2.43399
                ------------------------------------------------------------------------------
                
                ------------------------------------------------------------------------------
                  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
                -----------------------------+------------------------------------------------
                region: Identity             |
                                  var(_cons) |    .001492   .0013204      .0002633    .0084538
                -----------------------------+------------------------------------------------
                state: Identity              |
                                  var(_cons) |   .0062905   .0014915      .0039524    .0100117
                -----------------------------+------------------------------------------------
                               var(Residual) |   .0013387   .0000686      .0012107    .0014801
                ------------------------------------------------------------------------------
                LR test vs. linear model: chi2(2) = 1145.76               Prob > chi2 = 0.0000
                
                Note: LR test is conservative and provided only for reference.
                
                . by state, sort: gen nvals3 = _n == 1
                
                . by region , sort: gen nvals4 = _n == 1
                
                . count if nvals3
                  48
                
                . count if nvals4
                  10
                
                . list state region nvals3 nvals4 if region > 9
                
                     +----------------------------------+
                     | state   region   nvals3   nvals4 |
                     |----------------------------------|
                815. |    38        .        0        1 |
                816. |    38        .        0        0 |
                     +----------------------------------+
                On account of these 2 missing values, you will "apparently" get a difference, if compared to the Stata output:

                Code:
                di 48*17
                816
                By the way, the Stata output under - mixed - is the correct one (Number of obs.), unsurprisingly enough!

                Last edited by Marcos Almeida; 08 Nov 2017, 12:10.
                Best regards,

                Marcos

                Comment


                • #9
                  Dear Marcos,

                  I really appreciate it a lot that you give me advice despite being so busy.

                  It makes sense that I have some missing data. I just checked that, but no missing data was indicated:

                  Code:
                   mdesc TeamID ID Q
                  
                      Variable    |     Missing          Total     Percent Missing
                  ----------------+-----------------------------------------------
                           TeamID |           0            187           0.00
                               ID |           0            187           0.00
                                Q |           0            187           0.00
                  ----------------+-----------------------------------------------
                  I just wondered whether one ID is mistakenly allocated to more than one team and if this could be the source of the mistery. I will check that.

                  Thank you so much again and I am looking forward to any comment!

                  Comment


                  • #10
                    Actually, that was the problem. At one measurement point, 2 IDs had been assigned to the wrong TeamID.

                    I used the following command to identify IDs that were assigned to more than one TeamID (there are probably other commands that give a more "convenient" output).

                    Code:
                    by ID: tab TeamID

                    Finally, the numbers of group displayed in the mixed output are consistent with the actual numbers of groups at level 2 and 3.

                    Code:
                    . mixed leadTSat gcTSat Maj_owner Gender cEntExp cAgeDis cTAge2 cTeamSize Sales_yes if Q== 1 | Q==3 || TeamID: || ID: , mle cov(unstr) vsquish noretable
                    Note: single-variable random-effects specification in ID equation; covariance structure set to identity
                    
                    Performing EM optimization: 
                    
                    Performing gradient-based optimization: 
                    
                    Iteration 0:   log likelihood = -269.27562  
                    Iteration 1:   log likelihood = -269.23316  
                    Iteration 2:   log likelihood = -269.23311  
                    Iteration 3:   log likelihood = -269.23311  
                    
                    Computing standard errors:
                    
                    Mixed-effects ML regression                     Number of obs     =        187
                    
                    -------------------------------------------------------------
                                    |     No. of       Observations per Group
                     Group Variable |     Groups    Minimum    Average    Maximum
                    ----------------+--------------------------------------------
                             TeamID |         56          1        3.3          6
                                 ID |        114          1        1.6          2
                    -------------------------------------------------------------
                    
                                                                    Wald chi2(8)      =      14.37
                    Log likelihood = -269.23311                     Prob > chi2       =     0.0726
                    
                    ------------------------------------------------------------------------------
                        leadTSat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                          gcTSat |   .1107992   .1037334     1.07   0.285    -.0925145     .314113
                       Maj_owner |  -.5244927   .2925602    -1.79   0.073      -1.0979    .0489148
                          Gender |  -1.046296    .335788    -3.12   0.002    -1.704429   -.3881639
                         cEntExp |  -.0490503   .1201821    -0.41   0.683    -.2846028    .1865022
                         cAgeDis |    .042868   .0490445     0.87   0.382    -.0532575    .1389935
                          cTAge2 |   .0166339   .0526938     0.32   0.752     -.086644    .1199118
                       cTeamSize |   .0444763   .1718388     0.26   0.796    -.2923215     .381274
                       Sales_yes |  -.0548911   .2506432    -0.22   0.827    -.5461428    .4363605
                           _cons |   6.248978    .230861    27.07   0.000     5.796499    6.701458
                    ------------------------------------------------------------------------------
                    
                    . su ID
                    
                        Variable |        Obs        Mean    Std. Dev.       Min        Max
                    -------------+---------------------------------------------------------
                              ID |        187    88.04278    49.47301          1        170
                    
                    . su TeamID
                    
                        Variable |        Obs        Mean    Std. Dev.       Min        Max
                    -------------+---------------------------------------------------------
                          TeamID |        187    34.25668    19.53349          1         66
                    
                    . by ID, sort: gen nvals1 = _n ==1
                    
                    . by TeamID, sort: gen nvals2 = _n ==1
                    
                    . count if nvals1
                      114
                    
                    . count if nvals2
                      56
                    
                    . bys ID : gen n = _N if _n == 1 
                    (73 missing values generated)
                    
                    . su n
                    
                        Variable |        Obs        Mean    Std. Dev.       Min        Max
                    -------------+---------------------------------------------------------
                               n |        114    1.640351    .4820163          1          2
                    
                    . di 114* 1.640351
                    187.00001

                    Thank you Marcos for having this conversation with me, it really helped me a lot understanding my data.
                    Have a nice evening!

                    Comment


                    • #11
                      Thank you, Sophie, for sharing the information/command/output appropriately and for considering your query reached a satisfactory closure.
                      Best regards,

                      Marcos

                      Comment

                      Working...
                      X