Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Appropriate Model for Panel Data with an Ordinal Dependent Variable and Non-Proportional Odds

    I am using STATA/MP 18.5 on Mac to examine the socioeconomic determinants of health using a longitudinal dataset spanning 13 waves (approximately 80,000 individuals per wave before data cleaning).

    Dependent Variable: self-reported health (excellent; very good; good; fair; poor) – ordinal
    Key Independent Variables: Educational Attainment (Degree; A-Levels; GCSEs; No Qualifications) and Neighbourhood Deprivation (IMD quintiles 1-5) – both are categorical
    Other Variables: demographic covariates (e.g. age, sex, ethnicity, marital status) and socio-economic covariates (e.g. income, occupation, etc)
    Panel Structure: individual-level repeated observations over time (13 waves of data per individual)

    I am looking for a suitable model that meets the following criteria:
    1. Models an ordinal dependent variable
    2. Supports panel data structures (i.e. incorporates random intercepts to adjust for individual-level variation0
    3. Does not assume proportional odds, as my data violates this assumption (confirmed via the Brant test)
    Given criteria (3), standard mixed-effects ordered logistic regression models (e.g. meologit) are unsuitable due to their proportional odds assumption.
    I have considered using a partial model (e.g. geologit2), but I understand that this model does not natively support panel data, thereby violating criteria (2).

    Potential Solutions Considered:
    1. Using a Generalised Ordered Logit Model (gsem) to relax the proportional odds assumption whilst allowing for logit with mixed effects. E.g. “((gsem) (y M1[pidp] <- x1 x2 x3, nocons), family(ordinal) link(logit))”.
    2. Dichotomising the outcome variable: recoding self-reported health (excellent, very good, good, fair, poor) into a binary outcome (“Good health” = excellent, very good, good. “Poor health” = fair, poor) to enable the use of a standard panel logistic regression model such as ‘xtlogit, re’ (as the outcome variable wouldn’t be ordinal so proportional odds would not need to be assumed).
    Questions for the Community:
    • An appropriate alternative model in Stata that can handle ordinal dependent variables in panel data while relaxing the proportional odds assumption?
    • Is the GSEM approach a viable solution, or are there better implementations?
    • Would dichotomisation be a reasonable compromise, or are there preferable ways to handle non-proportional odds in this context?
    I would appreciate any insights or suggestions you may have!

  • #2
    I don't have a good solution to offer, but, my guess is that meologit would be preferable to dichotomizing the outcome even if the proportional odds assumption is violated.

    I also would think that with a sample of 80,000, the Brant test will be significant even with extremely small differences that are likely not meaningful. So then it becomes a question of to what degree is the assumption violated. But again, I don't have a great answer of how to check that in this situation.

    Comment


    • #3
      I agree with Erik. Additionally, if you want a non-Stata solution, MLwiN, which can be called from Stata using the excellent runmlwin command, can do multilevel models with ordinal outcomes and non-proportional odds coefficients. A MLwiN license is a lifetime license with updates included. It's incredibly efficient at estimating multilevel models with 100s of thousands of data points, in my experience. It fits many models that Stata's mixed or me suite would choke on. Buying a MLwiN license is of the best investments I've made in my career.

      Comment


      • #4
        It is valid to use oprobit (and even ologit) and cluster your standard errors at the unit level. Moreover, to approximate unit fixed effects you can apply the Mundlak device. This is the most robust in that it is consistent with arbitrary serial correlation. But you have to use vce(cluster id) for inference. See Section 16.3.4 in my 2010 MIT Press book.

        I'm curious how you know proportional odds is violated in a relevant sense. What matters is once you condition on both observed and unobserved heterogeneity, not what is true in the raw data. In any case, I think oprobit does what you want.

        Comment

        Working...
        X