cross classified multilevel structure

Lenka Drazanova

Join Date: Apr 2019

Posts: 3
#1

cross classified multilevel structure

30 Apr 2019, 03:40

Dear Stata Forum,

I am using mixed estimating a cross-classified multilevel model with three levels, where I have approximately 100 000 observations divided into 12 cohorts cross-classified with 8 periods both nested in 9 countries. When I run the code:

Code:

mixed DV IV1 i.IV2 || _all: R.cohort || period: || country: , variance ------------------------------------------------------------ | No. of Observations per Group Group Variable | Groups Minimum Average Maximum ----------------+-------------------------------------------- _all | 1 114,788 114,788.0 114,788 essround | 8 13,509 14,348.5 14,887 country | 72 1,164 1,594.3 2,289

it seems Stata cross-classifies also the country and period variable (8 periods * 9 countries = 72 groups), instead of simply estimating country as a third level above the second (cohort-period cross-classified) level.
What would be the right way to write the syntax so that Stata would not cross-classify country with period?

Thank you very much in advance for your answers!
Tags: cross-classified, mixed, multilevel regression, syntax
Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#2

30 Apr 2019, 04:56

I'm not exactly sure what you mean by cohort, but if as you say there are only 12 of them in total and they are all nested under a total of nine countries, then most countries have only a single cohort. Did I get that right? If so, then maybe you'd be better off forgetting about country and go with just a cross-classified random effects model with cohorts and waves (periods). With only 12 of one and eight of the other cross-classified, I'm guessing that it will be difficult enough to tease out their variances, without having to deal with singleton cases of country-cohort combinations.
Comment
Lenka Drazanova

Join Date: Apr 2019

Posts: 3
#3

30 Apr 2019, 05:15

Dear Joseph,

thank you very much for your reply.
I am sorry I have not been clearer in my previous post - each country has 12 cohorts (so there are 108 cohorts in total). What I mean by cohort: each cohort defined as a 5 years span of people being born between 1931 till 1990. Thus, the youngest cohort are people born between 1931-1935 and the oldest cohort are those born between 1986-1990. These cohorts are present in each country.

I also tried to run the model with country fixed effects (which I think goes towards your suggestion), with code looking something like:

Code:

mixed DV IV1 i.IV2 i.country || _all: R.cohort || period: , variance

but the model unfortunately does not converge. (Why would that be?)
Comment
William Peterson

Join Date: Dec 2018

Posts: 8
#4

30 Apr 2019, 13:49

When thinking about cross-classification, it can help to picture the data structure as a spreadsheet, organized with rows and columns. In the horizontal direction, you've got your observations sorted into cohorts which are grouped together by country, giving you a total of 108 country-cohort rows. In the vertical direction, those same observations are also sorted into 8 periods or columns. So all of your 100,000 observations are sorted into 864 boxes (108 country-cohorts x 8 periods).

Do you really need another level of nesting in the vertical direction? In the spreadsheet I've described, every observation is already assigned to a specific country by the row. Dividing each column up into 9 countries would only add a lot of empty boxes without changing the way things are sorted.

So I think you want something like this:

Code:

mixed DV IV1 IV2 || _all:R.period || country: || cohort:

Does that work?
1 like
Comment

Lenka Drazanova

Join Date: Apr 2019
Posts: 3

30 Apr 2019, 16:31

Dear William,

thank you very much for your answer. I think you very nicely describe how to think about the data. When I ran the code as you suggest I get

Code:

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
           _all |          1    114,788  114,788.0    114,788
        country |          9     11,333   12,754.2     15,120
         cohort |        108        412    1,062.9      1,642
-------------------------------------------------------------

I still have few questions.
1.) When I run the code as you suggest, am I cross-classifying periods with country and cohorts, while nesting cohorts within countries: Is my understanding correct? Why is Stata giving me the number of groups multiplied by the nesting variable? For instance, if I would have run a simple three level model, it would tell me the number of groups nested within another level as the actual number of groups (i.e. 126 regions nested in 9 countries would be recognized as 126, not as 1134)

Code:

*THIS IS JUST AN EXAMPLE TO MAKE MY POINT ABOUT NUMBER OF GROUPS
mixed DV || country: || region: , variance
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
        country |          9      4,932    6,312.4      8,701
         region |        126          1      450.9      1,965

.

In my first post, I made the mistake to put country last as actually cross-classified period and cohort should be nested in country. Wouldn´t the code:

Code:

mixed DV IV1 IV2 || country: || _all:R.cohort || period: , variance

which gives this output:

Code:

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
        country |          9     11,333   12,754.2     15,120
           _all |          9     11,333   12,754.2     15,120
       essround |         72      1,164    1,594.3      2,289
-------------------------------------------------------------

also make sense? Am I nesting one artificial cross-classified (period) level into country as well as 12 cohorts in country?

2.) To further complicate things:
A) One cohort is missing in one period
B) my IVs are actually at the country-cohort level and country-period level. What I mean is that IV1 has different values for cohort 1 in country A and cohort 1 in country B. Same goes for period.

My possible overcomplicated solution to this would be to do something like this:

Code:

egen countrycohort=concat(country cohort)
egen coutryperiod=concat(country period)

egen cntrcohcntper==concat(countrycohort coutryperiod)

mixed DV IV1 IV2 || country: || cntrcohcntper: , variance

Would that also be correct?

Comment

William Peterson

Join Date: Dec 2018

Posts: 8
#6

01 May 2019, 09:07

I still have few questions.
1.) When I run the code as you suggest, am I cross-classifying periods with country and cohorts, while nesting cohorts within countries: Is my understanding correct? Why is Stata giving me the number of groups multiplied by the nesting variable? For instance, if I would have run a simple three level model, it would tell me the number of groups nested within another level as the actual number of groups (i.e. 126 regions nested in 9 countries would be recognized as 126, not as 1134)

Without seeing your data, I can't say for sure, but my first guess would be that this is a numbering problem.

In your first output, you've probably numbered the cohorts consistently in each country using the numbers 1 through 12. Stata is correctly nesting those groups within the 9 countries, giving you a total of 108 country-cohorts which is (I think) what you want.

But in your second bit of output you've got 9 countries again and 126 regions. I'm guessing that you don't really mean 126 regions PER COUNTRY. You've probably got a total of 126 regions and you numbered them individually, giving each region a unique number. You would need to number the regions within each country (starting at 1) to get the "countries x regions = total groups" output.

Am I right about all that? If not, maybe you should post a sample from your dataset so we can understand the numbering system.
Comment

Announcement