Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mixed effects (multilevel) model vs. cluster command

    Hi all
    I want to test the association between childhood behaviour at age 6 years and earnings at age 36 (n=1000). The behavioural assessments were obtained from teachers when the children were aged 6. I want to control for clustering in the behavioural assessments at the school and classroom levels, although I’m not testing predictors at those levels. From what I understand, the mixed model is better although I would only report the fixed effects estimates, not the random effects, which seems amounts to a regular regression (with adjusted SE estimates). What are the pros and cons of using a mixed effects model vs. the cluster command? The advantage of the cluster command is simplicity and that I can still report standardised betas. Any suggestions welcome. See examples below – they produce similar results.
    Many thanks

    mixed OUTCOME behav1 behav2 etc || school: || Class:, vce(robust)

    egen double_cluster=group (school class)
    regress OUTCOME behav1 behav2 etc, vce(cluster double_cluster) robust

  • #2
    What you are calling "the cluster command" is not that. It is simply the use of cluster robust standard errors with -regress-. The distinction is important because Stata does, in fact, have a -cluster- command and what it does is unrelated to the problem you are working with.

    I would strongly prefer the use of the -mixed- model here. Yes it is, in a sense, a regular regression with adjustments made to the standard errors, but the adjustments are better than those provided by -vce(cluster ...)- when you really have hierarchical data. The -regress- approach, even with -vce(cluster ...)- does not adjust for potential confounding due to systematic differences among classes or schools. The -mixed- model does so.

    The only circumstance where I would take -regress- over -mixed- is if the intraclass correlations at the school and Class levels are very close to zero. In that case, -mixed- is telling you that there isn't really any systematic effect of class or school on the outcome (at least conditional on behav* etc.) and in that case -regress- would be fine, and the results would be essentially indistinguishable.

    As for standardized betas, even assuming that this is one of those unusual situations where using them with -regress- would actually make sense (which I question), they make no sense at all with hierarchical data. It isn't even clear what standardization means in the context of hierarchical data. What standard deviation should be used: that of the overall estimation sample? that within-class , calculated separately for each class? the pooled within-class one? that within-school, calculated separately for each school? the pooled within-school one? How would you explain or justify whichever choice you made? How would anybody go about using or interpreting the results obtained with any of these choices?

    Comment


    • #3
      In addition to what Clyde said, I have some minor points.

      1) The sandwich estimator of the variance is robust to violations of independence caused by clustering. (i.e. the -vce(cluster clustervar)- option; the -vce(robust)- option is robust to violation of heteroskedasticity and is similar but not the same). I believe that estimator relies on having a large number of clusters to achieve its goals (I hope someone will correct me if this is wrong). If you have few clusters, it won't work as well, and it might be better to explicitly model that.

      2) The original post alludes to two levels of clustering - classes, which are nested in schools. That is a situation where you'd default to -mixed-.

      3) With -mixed-, you can explicitly model the proportion of variance that's attributable to within-cluster variation, and between-cluster variation. Often, this is of substantive interest.

      4) Another option to be aware of is -xtreg, fe-, which uses fixed effects for the clusters. However, it only handles one level of clustering. Economists tend to prefer fixed effects, arguing that they provide unbiased estimates of the coefficients. Other disciplines are not as concerned about this. I mention it for completeness. Seeing as you have two levels of clustering, this won't be a perfect fit for your purposes.

      5) The correct term is the -vce(cluster ...)- option. There is a separate set of cluster analysis commands, which do something very different.
      Please use the code delimiters to show code and results - use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Please use the command -dataex- to show a representative sample of data; it is installed already if you have Stata 14.2 or 15.1, else you can install it by typing

      Code:
      ssc install dataex

      Comment


      • #4
        Thank you for these clear and detailed responses.

        One reason for my question is that I want to apply the above model to a categorical outcome with 3 levels (i.e. multinomial logistic regression/mlogit), but from what I’ve read, Stata doesn’t have a dedicated command for this and it can only be done using the gsem command. Also, to further complicate things, I need to do this within a multiple imputations framework, which I’ve read does not work for sem in Stata.

        The model I want to run is this: mi estimate: ?command 3_level_outcome pred1 pred2 etc || school: || class:

        What options do I have? Are there alternative to gsem for this kind of problem?

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          What you are calling "the cluster command" is not that. It is simply the use of cluster robust standard errors with -regress-. The distinction is important because Stata does, in fact, have a -cluster- command and what it does is unrelated to the problem you are working with.

          I would strongly prefer the use of the -mixed- model here. Yes it is, in a sense, a regular regression with adjustments made to the standard errors, but the adjustments are better than those provided by -vce(cluster ...)- when you really have hierarchical data. The -regress- approach, even with -vce(cluster ...)- does not adjust for potential confounding due to systematic differences among classes or schools. The -mixed- model does so.

          The only circumstance where I would take -regress- over -mixed- is if the intraclass correlations at the school and Class levels are very close to zero. In that case, -mixed- is telling you that there isn't really any systematic effect of class or school on the outcome (at least conditional on behav* etc.) and in that case -regress- would be fine, and the results would be essentially indistinguishable.

          As for standardized betas, even assuming that this is one of those unusual situations where using them with -regress- would actually make sense (which I question), they make no sense at all with hierarchical data. It isn't even clear what standardization means in the context of hierarchical data. What standard deviation should be used: that of the overall estimation sample? that within-class , calculated separately for each class? the pooled within-class one? that within-school, calculated separately for each school? the pooled within-school one? How would you explain or justify whichever choice you made? How would anybody go about using or interpreting the results obtained with any of these choices?
          Hi Clyde,

          Thank you for the detailed post. When you go with a -mixed- model, do you additionally recommend using vce(robust)? And does it make sense to additionally cluster the standard errors when using a mixed model? Or is that redundant given the explicit multi-level modeling in -mixed-? Thanks.

          Jason

          Comment


          • #6
            The use of random intercepts in the model deals with non-independence of observations due to the nested structure, but it does not deal with model mis-specification or heteroskedasticity. So if you have no worries about the latter two, then there is no need for cluster robust standard errors. But if you have concerns about those issues, then you should still use vce(cluster whatever). (In epidemiologic work, we often are not worried about this--in finance, however, those things are almost always thought to be present.)

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              The use of random intercepts in the model deals with non-independence of observations due to the nested structure, but it does not deal with model mis-specification or heteroskedasticity. So if you have no worries about the latter two, then there is no need for cluster robust standard errors. But if you have concerns about those issues, then you should still use vce(cluster whatever). (In epidemiologic work, we often are not worried about this--in finance, however, those things are almost always thought to be present.)
              Thank you for the response. I'm working with a mixed model where the dependent variable is binary (student goes to college or not), and the 2 key independent variables are # of AP courses completed and standardized test scores, and each of 500 high schools is a cluster. It sounds like the random intercept (at school level) will take care of non-independence of students within a school; I also have a random slope on each independent variable. But for example, since probability of attending college can't truly be linear in test score, it also sounds like vce(cluster school) is appropriate here to help with mod mis-specification. Does that sound right?

              Comment


              • #8
                Yes, I would agree. Moreover, with a linear probability model for a dichotomous outcome, heteroscedasticity is virtually guaranteed.

                Comment


                • #9
                  Thanks again Clyde. So, is there ever a situation in which you wouldn’t want to use vce(cluster whatever) — no matter if you are using reg, mixed, melogit, etc?

                  Comment


                  • #10
                    In situations where you are satisfied that heteroscedasticity and model misspecification are minimal or non-existent, and where observations are independent, or where dependence is fully accounted for with random intercepts, there is no need for vce(cluster). Also, bear in mind that cluster robust standard errors are asymptotically correct. When the number of clusters is small, these can actually be worse than the unclustered standard errors. There is no consensus about how small is small, but I myself would never use vce(cluster) with fewer than 15 clusters, and I might even require a larger number in some circumstances.

                    Comment

                    Working...
                    X