No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choice between multilevel model and clustered standard errors

    Dear All,
    I have sample data measuring intake of food at the individual level. The data is collected in a way that at the first stage there are divisions followed by districts, villages, households and individuals.
    Since the data is hierarchical a multilevel (Stata's mixed command) can be a possible choice.
    I have read posts on the “Choice between multilevel model and clustered standard errors”, including
    by Clyde Schechter.

    However I am still not clear about the following points

    What factors would justify the choice of a multilevel model over clustered standard errors?
    Given that the data has several levels isn’t is that clustering standard errors would not account for all the levels?
    Can/ should clustering be included in the multilevel model?

  • #2
    There are different issues to consider. The first is how the data were sampled. Was it truly multi-stage sampling or are you assigning households to villages ex post? Probably the former, in which case the number if divisions plays a key role. The second concerns exogeneity of your key explanatory variables, and at what level they vary. If the key variables vary at the individual level then you have a decision to make about using fixed effects estimation at the household level or village level and so on. Such analyses are usually more convincing for determining causality.

    If you just use a multilevel model without including group-level averages then you are making strong exogeneity assumptions; if you include them, you are basically doing fixed effects at the level of that group. If you ignore endogeneity concerns of your x then the multilevel analysis is typically more efficient than the pooled analysis with clustering. In any case, to put them on equal footing, you should cluster your standard errors at the same level for the multilevel and pooled analysis. Stata now allows clustering with mixed estimation. If you provide more information about sampling scheme, group sizes, nature of the explanatory variables then I can probably say more.

    If the data uses a multi-stage sampling scheme then you'll have to cluster at the highest level of cluster sampling. Was stratification used first followed by cluster sampling? This is true even if you use a multilevel model because there is always the possibility that some heterogeneity is missed in the model.


    • #3
      Dear Jeff Wooldridge thank you for your response.

      Data - The data were collected following a stratified sampling design. There are 7 divisions, 81 districts, 325 villages, 5503 households. There are three rounds of the data in 2011-12, 2015-16 and 2018-19. The same households were sampled in all three rounds so it is a panel. The data collects information on food consumption at the intra household level and the characteristics of the individual family members as well as some household level characteristics.

      Variables - My dependent variable is food intake in grams per person per day (so it is measured at the individual level). My explanatory variables consist of individual level characteristics (such as age, gender dummy, employment dummy) and household level characteristics (remittance dummy, covariate & idiosyncratic income shock dummies).

      Objective - My objective is to see the determinants of intra household consumption of animal source foods (such as meat and fish). In addition I also want to see how much variation there is in food consumption at the intra household level.