Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • mixed miscounts ID variables in a nested random effects model.

    Hi all. I'm attaching a dataset which has up up to 6 reading test scores (R_THETA) nested in each of 5995 children (CHILDID) nested each of 446 schools (S2_ID). Yet the output from the mixed procedure reports that both CHILDID and S2_ID have 5931 groups, as though every child is in their own school.

    The output doesn't make sense in other ways, either. Convergence is not achieved, and the child and school variances are reported as practically equal. (In fact they wouldn't be distinguishable if every child was in their own school.)

    What is going on here? Is there a bug in mixed, or am I doing something wrong?

    I'm attaching the data and log. Here are the commands I'm running.

    Code:
    log using mixed_error, replace
    use reduced, clear
    /* This dataset has up to 6 reading test scores (R_THETA)
       nested in each of 5995 children (CHILDID)
       nested each of 446 schools (S2_ID)
    */
    ssc install distinct
    distinct CHILDID S2_ID
    /* CHILDID has 5995 disinct values, S2_ID has 446 */
    list if missing(S2_ID) | missing(CHILDID)
    /* Neither has any missing values */
    mixed R_THETA || CHILDID: || S2_ID:, iter(5)
    /* but mixed reports that both CHILDID and S2_ID has 5931 groups, as though every child is in their own school */
    /* The CHILDID and S2_ID variances are reported as practically equal, and convergence was not achieved
        (even if I let it run for more than 5 iterations)
        In fact the variances would not be distinguishable if every child was in their own school.
     */
    /* Note that I used the iter(5) options because it doesn't converge (not concave). */
    log close
    Attached Files
    Last edited by paulvonhippel; 18 Aug 2023, 05:56.

  • #2
    Deleted
    Last edited by Andrew Musau; 18 Aug 2023, 07:49.

    Comment


    • #3
      This is just a difficult maximization. Newton-Raphson fails quite often with "badly-behaved" integrands. Below, adaptive quadrature appears to do the trick (after some rescaling of your variable - you can use raw values), but you need to be patient.

      Convergence is not achieved, and the child and school variances are reported as practically equal.
      If convergence is not achieved, do not read anything into the results.

      Code:
      ssc install gllamm, replace
      use "reduced.dta", clear
      drop if missing(R_THETA )
      replace R_THETA= int( R_THETA*1000)
      gllamm R_THETA, i(CHILDID S2_ID)
      Res.:

      Code:
      . gllamm R_THETA, i( CHILDID S2_ID ) adapt
      
      Running adaptive quadrature
      Iteration 0:    log likelihood = -336208.14
      Iteration 1:    log likelihood = -335554.92
      Iteration 2:    log likelihood = -335321.64
      Iteration 3:    log likelihood = -335141.64
      Iteration 4:    log likelihood = -335118.27
      Iteration 5:    log likelihood = -335118.07
      
      
      Adaptive quadrature has converged, running Newton-Raphson
      Iteration 0:   log likelihood = -335118.07  (not concave)
      Iteration 1:   log likelihood = -335118.07  (not concave)
      Iteration 2:   log likelihood = -335030.54  
      Iteration 3:   log likelihood = -335026.66  
      Iteration 4:   log likelihood = -335026.51  
      Iteration 5:   log likelihood = -335026.51  
       
      number of level 1 units = 40638
      number of level 2 units = 5931
      number of level 3 units = 444
       
      Condition Number = 3410.5986
       
      gllamm model 
       
      log likelihood = -335026.51
       
      ------------------------------------------------------------------------------
           R_THETA | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
      -------------+----------------------------------------------------------------
             _cons |    276.155   12.41826    22.24   0.000     251.8157    300.4944
      ------------------------------------------------------------------------------
       
      Variance at level 1
      ------------------------------------------------------------------------------
      
        765837.49 (5873.7663)
       
      Variances and covariances of random effects
      ------------------------------------------------------------------------------
      
       
      ***level 2 (CHILDID)
       
          var(1): 89764.834 (4262.5619)
       
      ***level 3 (S2_ID)
       
          var(1): 92618.48 (6900.0865)
      ------------------------------------------------------------------------------

      Comment


      • #4
        Code:
        mixed R_THETA || CHILDID: || S2_ID:, iter(5)
        is a model that has schools nested within children. O.P. wants it the other way around. It should be:
        Code:
        mixed R_THETA || S2_ID: || CHILDID:

        Comment


        • #5
          I don't think it's a question of model specification or maximization method. Let me clarify the problem that I meant to highlight. The -mixed- procedure seems to think that the number of CHILDIDs and S2_IDs is equal -- in fact, that the CHILDIDs and S2_IDs are the same. Here is the key part of the output from -mixed-. The statistics for CHILDID and S2_ID are identical. They should not be.

          -------------------------------------------------------------
          | No. of Observations per Group
          Group Variable | Groups Minimum Average Maximum
          ----------------+--------------------------------------------
          CHILDID | 5,931 1 6.9 8
          S2_ID | 5,931 1 6.9 8
          -------------------------------------------------------------



          If in fact the CHILDID and S2_ID were the same, then it would be impossible to distinguish the child-level and school-level variance. I think that's why the model is not converging.

          But in fact, the CHILDID and S2_ID are not the same. For example, if you type distinct CHILDID S2_ID, you get this:


          | Observations
          | total distinct
          ---------+----------------------
          CHILDID | 47960 5995
          S2_ID | 47960 446



          Why isn't mixed getting the count of the S2_ID right? I think this must be the source of the problem.
          Last edited by paulvonhippel; 18 Aug 2023, 10:39.

          Comment


          • #6
            Wait, hold it, I think Clyde Schechter has nailed it. I just needed to reverse the order of CHILDID and S2_ID in the -mixed- statement. Then the command converges quickly....

            Comment

            Working...
            X