clustering in panel data when panels are not nested within clusters

Christina Chara

Join Date: Oct 2014

Posts: 59
#1

clustering in panel data when panels are not nested within clusters

26 Dec 2014, 05:55

Hi,
I am running a random effects probit model (xtprobit) and all my variables except from one are micro level. However, in my specification I also have a macro level variable (unemployment by age group and sex). I therefore created a new variable at the intersection of age group and sex and I would like to cluster around this new variable. However, when I run my model

Code:

xtprobit depvar microindependentvariables unemploymentbyagegroupandsex, vce (cluster newvar)

I get the error that "panels are not nested within clusters".

Is there a way to cluster around this variable in a panel setup even if my panels are not nested within clusters? Are there any other options given that my macro variable was merged into my data based on more than 1 group characteristic (i.e. age and sex)?

Thanks a lot in advance
Tags: categorical, interaction, panel, panel data
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#2

26 Dec 2014, 07:26

Christina:
no, you cannot. For methodological reasons explained under -xtprobit- entry in Stata 13.1 .pdf manual under the heading -xtoprobit- and the robust VCE estimator (page 282) [QUOTE]The panel variable must be nested within the cluster variable because of the within-panel correlation that is generally induced by the random-effects transform when there is heteroskedasticity or within-panel serial correlation in the idiosyncratic errors./[QUOTE]

Kind regards,
Carlo
(Stata 19.0)
Comment
Christina Chara

Join Date: Oct 2014

Posts: 59
#3

26 Dec 2014, 08:20

Thank you very much Carlo for your swift response. So if I wanted to correct for the fact that observations of people in the unemployment (macro variable) will cluster together depending on their age group and sex I would not be able to do so and so my only option would be to cluster around personal identification?
Thanks again
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#4

26 Dec 2014, 08:43

Christina:
yes, you're only option is to cluster the standard error around personal identifier.
If I'm not going wrong you have already reported to the list that the literature you selected on this topic did not helped you out that much.
However, at a very first glance, without knowing neither your research goals, nor your data, I would be cautious with your so called "macro level variable" and would consider to plug as a predictor each variable it is composed of (unemployment, age_group and sex) and look for interaction among these terms instead.

Kind regards,
Carlo
(Stata 19.0)
Comment
Christina Chara

Join Date: Oct 2014

Posts: 59
#5

26 Dec 2014, 08:55

thanks a lot Carlo,
Yes, you are right. However, the reason I chose to break up the unemployment rate by age and sex was to get some variation in this variable and to avoid having the same value (i.e. unemployment rate) for every person in my data set. I already have sex and age as predictors in my equation but I wanted to correct for the fact that the unemployment variable and hence standard errors are likely to form clusters around age group and sex as for example people belonging to the same age group and of the same group in any year of the survey will have the same unemployment rate assigned to them.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#6

26 Dec 2014, 10:18

Christina:
under random effect specification, even if unemployment rate does not change (event though I am not clear if unemployment rate does not vary across idunits or across years; I find odd that uneployment rate does not vary across time for different professional clusters or for different age groups and would be even more surprised if an interaction between age groups and professional clusters did not turn out to be significant; probably a three-level interaction of sex, age groups and professional clusters would be also worth trying, although not that easy to explain), you should get an estimate of the coeffcients.
I would go clustering your SE around your idunit and take a look at the LR at the foot of the results table after -xtprobit-, just to be sure that -xtprobit- makes more sense than -probit- in your instance.

Kind regards,
Carlo
(Stata 19.0)
Comment
Christina Chara

Join Date: Oct 2014

Posts: 59
#7

30 Dec 2014, 04:52

thank you very much Carlo, your advice and suggestions are greatly appreciated.

The model I am using (my preferred specification) is a dynamic random effects probit model with Mundlak (1978) corrections (applied by including the individual means of each of the time-varying variables that are assumed to be correlated with the unobserved heterogeneity on the right hand side of the regression equation) and Initial Conditions correction (where the independent variable at the first instance each respondent was observed is entered as an independent variable in the regression).

My regression includes individual level variables such as demographics, job and work characteristics, the lagged of my dependent variable as well as a number of macro (Economy wide) independent variables like the unemployment rate by age group and sex.

The reason I wanted to cluster at the intersection of agegroup and sex was to be able to correct for the fact that one of my variables is aggregated at a higher level than the rest of my variables (at the country level rather than at the individual level) and hence clusters were likely to be formed around these combinations given that people within these groups were going to have the same rates of unemployment.

In terms of the LR at the bottom of the xtprobit regression (when I only cluster around personal id) I get a very large figure ( -13.48892) which I am unsure of how I should interpret it.

Any help will be greatly appreciated.
Comment

Announcement

clustering in panel data when panels are not nested within clusters

Comment

Comment

Comment

Comment

Comment

Comment