The best model for a multilevel dataset with a small number of clusters

Maria Snow

Join Date: Sep 2019

Posts: 9
#1

The best model for a multilevel dataset with a small number of clusters

07 Oct 2019, 18:59

Dear all,

I have a three-level cross-country survey data that is not panel. The levels are country, survey, and individual respondents. Ten countries and six years.
My dependent variable is binary variable. I am interested in interaction between the two variables one of which is on individual level and the other one is on country level.

Given a relatively low number of clusters, what would be the best model to choose:
a multilevel model with random effects for two upper levels: country and year? Perhaps there is a correction on a small number of clusters that I could include? (aka Kenward-Roger correction - I was unable find a version of the same correction for non-linear models)

meprobit y x1##x2 i.gender age [pweight = dweight] || country: || election:
a fixed effects model with clustered errors - if so, should the error be clustered on a country level only?

probit y x1##x2 i.gender age i.year i.country [pweight = dweight], cluster(country)
or a fixed effects model w/o clustered errors, since with only 10 countries the appropriateness of using cluster(country) is questionable as well.

probit y x1##x2 i.gender age i.year i.country [pweight = dweight], robust

I am really looking forward to any advice and literature suggestions.

Maria
Tags: fixed effects, interaction, multilevel
Erik Ruzek

Join Date: Oct 2017

Posts: 430
#2

08 Oct 2019, 15:26

Hi Maria,

You probably want to start with this article by Dan McNeish, in which he runs simulations to determine the best course of action when you have a small number of clusters and a binary outcome variable. See also the appendix with code for running the models in Stata, R, SAS, and Mplus. The bad news is that the best approach for small sample sizes with a larger number of observations per cluster are not available in Stata. You certainly can run the other approaches in Stata, just not the one that seems to come out best in the simulations. You might also consider a Bayesian (MCMC) approach, and McNeish also has done work on this for small samples.

If Stata is your only option, then you are probably best using the fixed effects approach. The standard error correction generally works best when you have more clusters, so if I were you, I'd probably just use dummy variables for country (your third option).
Comment
Maria Snow

Join Date: Sep 2019

Posts: 9
#3

09 Oct 2019, 10:31

Dear Erik, many thanks for your response. It's a bummer. However, I have certaintly seen the multilevel approach applied to as few as 14 country-clusters published in reputable journals.
On the upside, it looks like the Bayesiian approach works in Stata ( .bayes) so I can probably run it without switching to alternative programs?
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 430
#4

12 Oct 2019, 12:12

Yes, you can apply multilevel modeling with smaller numbers of clusters, but McNeish's work suggests you have to carefully specify the model using the particular estimation approaches suggested by his simulation results. With continuous, linear multilevel models, Stata has all the estimation options you need when working with small samples, but just not for generalized MLMs. Moving to a Bayesian paradigm could make sense as long as you understand what the model is doing and whether the priors for the variances are good choices for small numbers of clusters.
Comment

Announcement

The best model for a multilevel dataset with a small number of clusters

Comment

Comment

Comment