Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why can't we cluster on anything we like?

    I often get error messages indicating that Stata doesn't like the variable that I have chosen to cluster on. Consider the following model, which has student random effects and tries to cluster standard errors at the teacher level:
    Code:
    mixed absent ib(2).classtype ib("k").gradenum i.schid || stdntid:, cluster(tchid)
    It produces the following error message:
    highest-level groups are not nested within tchid
    which is true, but so what? Why does the cluster option care whether students are clustered within teachers? Theoretically, it seems to me I should be able to cluster on teachers whether they nest students or not.

    My best guess is that this is a computational issue -- some constraint used by Stata to keep the matrices involved in clustered standard errors manageable in size. But I don't know. Your expertise most appreciated.


  • #2
    I am not an expert on -mixed-, but I know at least two occasions on which Stata's refusal to calculate clustered errors does not make sense. And by "does not make sense" I mean that I can override it with other methods, and I can calculate them manually.

    1) Back in the days -areg- was refusing to calculate clustered standard errors if you re not clustering on the fixed effects. I think they fixed this now.

    2) -xtreg, re- refuses to calculate clustered standard errors, and it does not make sense.

    Code:
    .  webuse nlswork
    (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
    
    . xtset idcode
           panel variable:  idcode (unbalanced)
    
    . xtreg ln_w grade i.race , re cluster(age)
    panels are not nested within clusters
    r(498);

    Comment


    • #3
      Yes, this is a great example. To return to the model I am trying to fit, I can specify it using either mixed or xtreg, re:

      Code:
       mixed absent ib(2).classtype ib("k").gradenum i.schid || stdntid:, cluster(tchid)
      
      xtset stdntid
      xtreg absent ib(2).classtype ib("k").gradenum i.schid, re cluster(tchid)
      But neither of these commands will run, and for the same reason: the random effects associated with stdntid aren't nested inside the clusters specified by tchid.

      Correct me if I'm wrong, but I believe there's nothing wrong with the models themselves. It's just that Stata won't run them, for reasons I don't understand.
      Last edited by paulvonhippel; 01 Aug 2020, 06:54.

      Comment


      • #4
        Joro Kolev : You said something about being able to calculate the clustered standard errors by "other methods." Can you be more specific? Is there a user-written command I can use to get around Stata's limitations here?

        Comment


        • #5
          Here's a sort of solution using the bootstrap:
          Code:
          bootstrap, cluster(tchid) rep (50): xtreg absent ib(2).classtype ib("k").gradenum i.schid, re
          Given the way the study was designed, I don't think the bootstrap is exactly what I want, but maybe it's progress. I'm still vexed about why I can't just cluster the standard errors.

          Comment


          • #6
            I mean by manual fixes, Paul.

            E.g., when I wrote the first version of this paper

            Kolev, Gueorgui I. "Underperformance by female CEOs: A more powerful test." Economics Letters 117, no. 2 (2012): 436-440.

            Stata was not able to do robust and clustered standard errors at all post -xtreg, re-, because it was "not making sense" to the person who programmed -xtreg, re- to have both GLS and robust standard errors on top.

            -xtreg, re- can be implemented as a simple quasi-time demeaning, you quasi-time demean your data, and then you can fit -xtreg, re- by -regress-. Professor Wooldridge explains the procedure in his graduate textbook, the year 2002 edition, on page 287.

            So for the first version of my paper I just quasi-time demeaned my data, and then used -regress- to do whatever clustering and robust variances I wanted to do.




            Originally posted by paulvonhippel View Post
            Joro Kolev : You said something about being able to calculate the clustered standard errors by "other methods." Can you be more specific? Is there a user-written command I can use to get around Stata's limitations here?

            Comment


            • #7
              paulvonhippel: I haven't used -mixed- before. But for -xtreg, re-, you can try:

              xtreg absent ib(2).classtype ib("k").gradenum i.schid, re cluster(tchid) nonest

              Comment


              • #8
                This totally works. But how do you know of this secret option, and why have I been excluded from the memo where this option was communicated ;-) ?

                Code:
                . webuse nlswork, clear
                (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
                
                .       xtset idcode
                       panel variable:  idcode (unbalanced)
                
                . xtreg ln_w grade i.race , re cluster(age) nonest
                
                Random-effects GLS regression                   Number of obs     =     28,508
                Group variable: idcode                          Number of groups  =      4,709
                
                R-sq:                                           Obs per group:
                     within  = 0.0000                                         min =          1
                     between = 0.3170                                         avg =        6.1
                     overall = 0.1970                                         max =         15
                
                                                                Wald chi2(3)      =     377.47
                corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                
                                                   (Std. Err. adjusted for 33 clusters in age)
                ------------------------------------------------------------------------------
                             |               Robust
                     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                       grade |   .0909144   .0062341    14.58   0.000     .0786958    .1031331
                             |
                        race |
                      black  |  -.0456496   .0122942    -3.71   0.000    -.0697458   -.0215535
                      other  |   .1041432   .0372747     2.79   0.005     .0310862    .1772002
                             |
                       _cons |   .5147636   .0939851     5.48   0.000     .3305561    .6989711
                -------------+----------------------------------------------------------------
                     sigma_u |  .30393641
                     sigma_e |  .32028665
                         rho |    .473825   (fraction of variance due to u_i)
                ------------------------------------------------------------------------------


                Originally posted by Hong Il Yoo View Post
                paulvonhippel: I haven't used -mixed- before. But for -xtreg, re-, you can try:

                xtreg absent ib(2).classtype ib("k").gradenum i.schid, re cluster(tchid) nonest

                Comment


                • #9
                  Fabulous Hong Il Yoo , thanks! It took me a minute to realize nonest stood for "no nest" rather than "non est".

                  Is the nonest option documented? If not, why not?

                  Comment


                  • #10
                    Don't kill the messenger! I don't think that Hong Il Yoo is from Stata. The right question is to ask how he discovered it.

                    Comment


                    • #11
                      LOL You guys have not overlooked anything, it's one of undocumented options in Stata. I don't remember exactly how and when I came to know about it. At some stage between August 2007 and now, I saw it in one of threads on this forum!

                      Comment


                      • #12
                        Here's a wrinkle, though: It appears that the nonest option does not work with the mle option. So this doesn't work:
                        Code:
                        xtreg absent ib(2).classtype ib("k").gradenum i.schid, vce(cluster tchid) nonest mle
                        Any notion why, or what might work instead (besides the bootstrap)?

                        Comment


                        • #13
                          We might not be so clever to use the -nonest- option, this is what the manual for -xtreg,re- says "The panel variable must be nested within the cluster variable because of the within-panel correlation that is generally induced by the random-effects transform when there is heteroskedasticity or within-panel serial correlation in the idiosyncratic errors."

                          Otherwise not that -xtreg, mle- does not work with the nonest option, the mle estimator does not allow robust and cluster at all.

                          There is no practical problem here, because -xtreg, re- and -xtreg, mle- give practically the same estimates. If you still insist to do mle, you can use the -_robust- to robustify the variance, however this is non trivial, you would need to look at the formulas.

                          Comment


                          • #14
                            Documentation aside, there are studies where you want to cluster on a variable that's not nested in the random effects. StataCorp was thinking about a particular data structure when they wrote the documentation, but the right way to cluster depends on the design of the study. In the study I'm working with, it's clear that having student random effects and clustering by teacher is desirable. So I was delighted to see the nonest option.

                            I'm not sure if -xtreg, re- and -xtreg, re mle- give similar estimates in all data. They do in the nlswork data that you've used as an example, but they might give pretty different estimates when there are a lot of missing values.
                            Last edited by paulvonhippel; 02 Aug 2020, 08:47.

                            Comment


                            • #15
                              From the Statalist archives ( https://www.stata.com/statalist/arch.../msg00335.html ):
                              From -help whatsnew9-: --update 15sep2005-- (30)B. xtreg, fe and xtreg, re produced cluster-robust VCEs when the panels were not nested within the clusters. In some cases this VCE is consistent, and in others it is not. You must now specify the new nonest option to get a cluster-robust VCE when the panels are not nested within the clusters.

                              Comment

                              Working...
                              X