clustering at two levels for multinomial logistic regression

Sarah Shah

Join Date: Sep 2018

Posts: 3
#1

clustering at two levels for multinomial logistic regression

22 Sep 2018, 06:48

Good morning,

This is my first time posting and I’m hoping someone might be able to recommend a solution for the below issue. Briefly, I am regressing a categorical variable using mlogit but I am using panel data of respondents within households. I need to account for both attrition and household effects, and was attempting to run:

mlogit depvar indepvar control1 control2 control3 [pweight = panelweight], vce(cluster household panelvar)

However, Stata will not allow two cluster items in vce. I’ve also tried:

svyset panelvar [pweight=panelweight]

svy: mlogit depvar indepvar control1 control2 control3, vce(cluster household)

However, Stata will not allow the vce command when using the svy command.

How can I use vce to cluster at two levels? Is there an alternative command I should use instead?

Many thanks,
Sarah
Tags: None

Paolo Velasquez

Join Date: Apr 2016
Posts: 36

22 Sep 2018, 08:17

Code:

search vce2way

net install vce2way 

vce2way mlogit depvar indepvar control1 control2 control3 [pweight = panelweight], cluster( household panelvar)

This might work.

Best,

Comment

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#3

22 Sep 2018, 11:41

Welcome to Statalist, Sarah! Please describe the sampling design, linking to the survey documentation if possible. Before responding again, please read the FAQ, particularly FAQ 12, which asks that you post all code and results between [CODE] and [/CODE] delimiters. This is just what Paolo did to produce his code excerpt.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Sarah Shah

Join Date: Sep 2018

Posts: 3
#4

24 Sep 2018, 12:03

Thanks, Paolo and Steve, for your generous responses.

The sampling design is described more fully here: http://www.erfdataportal.com/index.p...og/45/sampling. Briefly, the 1998 ELMS was carried out on a nationally representative sample of households, containing information about all residents in those sampled households. This 1998 sample was a two-stage stratified random sample selected from a master sample, and over-sampled urban areas. In 1998, the primary sampling units (PSUs), from which the households were sampled, were selected according to the probability proportional to size (PPS) method. There were two additional waves, in 2006 and 2012, and these are the two waves I am using. While attrition between 1998 and 2006 was random, attrition between 2006 and 2012 was not. Thus, I need to account for both household and attrition while weighting the sample.

The code Paolo shared was very helpful, however vce2way will not allow me to include a weight:

Code:

. vce2way mlogit depvar indepvar control1 control2 control3 [pweight = panelweight], cluster( household panelvar) weights not allowed r(101);

Any thoughts?

Thanks again,
Sarah
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#5

25 Sep 2018, 16:29

Thanks for the link, Sarah. It led to this document which contained in section 1.3 the following:

The public use micro data for the 1998 and 2006 surveys, as well as a harmonized dataset including the 1988 special LFS, are available from ERF at http://www.erfdataportal.com. The public use micro data and documentation for the 2012 round will become available on November 1, 2013 from ERF, also at http://www.erfdataportal.com. The public use micro data will include the full 2012 cross-section, as well as [1] harmonized pooled cross-section dataset covering 1988–2012, and [2] a panel for all the individuals included in the 1998–2012 rounds. Documentation on the creation and definition of variables will also be made available. Additionally, with the release of the 2012 data, we will release the STATA do files used to generate and harmonize the 2012 datasets; interested users will be able to recreate all variables and weights from the raw data in 2012.

You don't sat whether you are using the longitudinal data set or the pooled cross-section data set (or both). However my advice is the same. Primary sampling units (PSUs) for the survey were geographical areas. Each of the data sets should contain the PSU variable, a STRATUM variable, and the appropriate WEIGHT, whether longitudinal or cross-sectional. With survey data, standard errors are based primarily on between PSU variation. That takes into account all variation within a PSU, including household-to-household variation and person-to-person variation within household. So you don't need a vce(household) or vce(panelvar) option anywhere in the svyset statement. For either data set:

Code:

svyset PSU [pw = WEIGHT], strata(STRATUM)

substituting the appropriate psu, weight, and variable names. The mlogit command will be:

Code:

svy: mlogit depvar indepvar control1 control2 control3

However, if your research question requires a multilevel model, with household and individual stages and random effects for each, the code gets much more complicated. So let us know.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Lara ingram

Join Date: Aug 2019

Posts: 36
#6

28 Jun 2020, 09:37

I happen to be using the same data set described by Sarah and I was wondering what the code would look like if one was to use a multilevel mixed effects model with a nominal outcome variable? (same example of "mlogit depvar indepvar control1 control2").
Comment
Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#7

28 Jun 2020, 10:28

Some time ago I generalised -vce2way- to -vcemway-, and fixed the bug that had caused the "weights not allowed" error along the way. Please click [here] for the background paper in the Stata Journal.
Comment
Larissa Zhu

Join Date: Aug 2023

Posts: 3
#8

03 Aug 2023, 04:02

Hi, I have a follow-up question. I am now using vce2way for regression and need to get Cohens' d to check for the effect size. However, I tried "estat size" and an error popped up by saying "estat esize only works with vce(ols) ". Could you please assist me with that?

Thank you very much.
Comment
Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#9

04 Aug 2023, 04:44

Larissa Zhu: I'm afraid that there's not much I can do. The postestimation command -estat esize- only works with the default OLS standard errors (i.e., one that assumes homoskedasticity and zero serial correlation), and is not compatible with other types of standard errors including Stata's native -vce(robust)- and -vce(hc3)-, as well as more robust standard errors generated by my -vce2way- and -vcemway- commands. On a separate note, I recommend that you use -vcemway- instead of -vce2way-: -vcemway- can do everything that -vce2way- does and also incorporates additional features & bug fixes.
Comment

Announcement

clustering at two levels for multinomial logistic regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment