Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • cross classified multilevel structure

    Dear Stata Forum,

    I am using mixed estimating a cross-classified multilevel model with three levels, where I have approximately 100 000 observations divided into 12 cohorts cross-classified with 8 periods both nested in 9 countries. When I run the code:
    Code:
    mixed DV IV1 i.IV2 || _all: R.cohort || period: || country: , variance
    
    ------------------------------------------------------------
                    |     No. of       Observations per Group
     Group Variable |     Groups    Minimum    Average    Maximum
    ----------------+--------------------------------------------
               _all |          1    114,788  114,788.0    114,788
           essround |          8     13,509   14,348.5     14,887
            country |         72      1,164    1,594.3      2,289
    it seems Stata cross-classifies also the country and period variable (8 periods * 9 countries = 72 groups), instead of simply estimating country as a third level above the second (cohort-period cross-classified) level.
    What would be the right way to write the syntax so that Stata would not cross-classify country with period?

    Thank you very much in advance for your answers!

  • #2
    I'm not exactly sure what you mean by cohort, but if as you say there are only 12 of them in total and they are all nested under a total of nine countries, then most countries have only a single cohort. Did I get that right? If so, then maybe you'd be better off forgetting about country and go with just a cross-classified random effects model with cohorts and waves (periods). With only 12 of one and eight of the other cross-classified, I'm guessing that it will be difficult enough to tease out their variances, without having to deal with singleton cases of country-cohort combinations.

    Comment


    • #3
      Dear Joseph,

      thank you very much for your reply.
      I am sorry I have not been clearer in my previous post - each country has 12 cohorts (so there are 108 cohorts in total). What I mean by cohort: each cohort defined as a 5 years span of people being born between 1931 till 1990. Thus, the youngest cohort are people born between 1931-1935 and the oldest cohort are those born between 1986-1990. These cohorts are present in each country.

      I also tried to run the model with country fixed effects (which I think goes towards your suggestion), with code looking something like:
      Code:
        
       mixed DV IV1 i.IV2 i.country || _all: R.cohort || period: , variance
      but the model unfortunately does not converge. (Why would that be?)

      Comment


      • #4
        When thinking about cross-classification, it can help to picture the data structure as a spreadsheet, organized with rows and columns. In the horizontal direction, you've got your observations sorted into cohorts which are grouped together by country, giving you a total of 108 country-cohort rows. In the vertical direction, those same observations are also sorted into 8 periods or columns. So all of your 100,000 observations are sorted into 864 boxes (108 country-cohorts x 8 periods).

        Do you really need another level of nesting in the vertical direction? In the spreadsheet I've described, every observation is already assigned to a specific country by the row. Dividing each column up into 9 countries would only add a lot of empty boxes without changing the way things are sorted.

        So I think you want something like this:

        Code:
        mixed DV IV1 IV2 || _all:R.period || country: || cohort:
        Does that work?

        Comment


        • #5
          Dear William,

          thank you very much for your answer. I think you very nicely describe how to think about the data. When I ran the code as you suggest I get

          Code:
          -------------------------------------------------------------
                          |     No. of       Observations per Group
           Group Variable |     Groups    Minimum    Average    Maximum
          ----------------+--------------------------------------------
                     _all |          1    114,788  114,788.0    114,788
                  country |          9     11,333   12,754.2     15,120
                   cohort |        108        412    1,062.9      1,642
          -------------------------------------------------------------
          I still have few questions.
          1.) When I run the code as you suggest, am I cross-classifying periods with country and cohorts, while nesting cohorts within countries: Is my understanding correct? Why is Stata giving me the number of groups multiplied by the nesting variable? For instance, if I would have run a simple three level model, it would tell me the number of groups nested within another level as the actual number of groups (i.e. 126 regions nested in 9 countries would be recognized as 126, not as 1134)
          Code:
          *THIS IS JUST AN EXAMPLE TO MAKE MY POINT ABOUT NUMBER OF GROUPS
          mixed DV || country: || region: , variance
                          |     No. of       Observations per Group
           Group Variable |     Groups    Minimum    Average    Maximum
          ----------------+--------------------------------------------
                  country |          9      4,932    6,312.4      8,701
                   region |        126          1      450.9      1,965
          .

          In my first post, I made the mistake to put country last as actually cross-classified period and cohort should be nested in country. Wouldn´t the code:
          Code:
          mixed DV IV1 IV2 || country: || _all:R.cohort || period: , variance
          which gives this output:

          Code:
          -------------------------------------------------------------
                          |     No. of       Observations per Group
           Group Variable |     Groups    Minimum    Average    Maximum
          ----------------+--------------------------------------------
                  country |          9     11,333   12,754.2     15,120
                     _all |          9     11,333   12,754.2     15,120
                 essround |         72      1,164    1,594.3      2,289
          -------------------------------------------------------------
          also make sense? Am I nesting one artificial cross-classified (period) level into country as well as 12 cohorts in country?


          2.) To further complicate things:
          A) One cohort is missing in one period
          B) my IVs are actually at the country-cohort level and country-period level. What I mean is that IV1 has different values for cohort 1 in country A and cohort 1 in country B. Same goes for period.

          My possible overcomplicated solution to this would be to do something like this:

          Code:
          egen countrycohort=concat(country cohort)
          egen coutryperiod=concat(country period)
          
          egen cntrcohcntper==concat(countrycohort coutryperiod)
          
          mixed DV IV1 IV2 || country: || cntrcohcntper: , variance
          Would that also be correct?

          Comment


          • #6
            I still have few questions.
            1.) When I run the code as you suggest, am I cross-classifying periods with country and cohorts, while nesting cohorts within countries: Is my understanding correct? Why is Stata giving me the number of groups multiplied by the nesting variable? For instance, if I would have run a simple three level model, it would tell me the number of groups nested within another level as the actual number of groups (i.e. 126 regions nested in 9 countries would be recognized as 126, not as 1134)
            Without seeing your data, I can't say for sure, but my first guess would be that this is a numbering problem.

            In your first output, you've probably numbered the cohorts consistently in each country using the numbers 1 through 12. Stata is correctly nesting those groups within the 9 countries, giving you a total of 108 country-cohorts which is (I think) what you want.

            But in your second bit of output you've got 9 countries again and 126 regions. I'm guessing that you don't really mean 126 regions PER COUNTRY. You've probably got a total of 126 regions and you numbered them individually, giving each region a unique number. You would need to number the regions within each country (starting at 1) to get the "countries x regions = total groups" output.

            Am I right about all that? If not, maybe you should post a sample from your dataset so we can understand the numbering system.

            Comment

            Working...
            X