Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed-effects model with unbalanced data

    Hello everyone!

    We're researching into the mental health trajectories of immigrants and natives in Germany over time. For that purpose, we're using unbalanced panel data. We're estimating the within-person-changes with two fixed-effects models for immigrants and natives each, with mental health (mh) as outcome and time dummies (2020, 2018, 2016 etc. – survey every two years) as exposure, controlling for age. We want to visualize the results using coefplot.

    Our main question is: As to which degree does our model
    xtreg mh i.time_dummies c.age#c.age if immigrant==1, fe vce(robust)

    consider values of those who didn't participate every time? We have quite large standard errors, which was somehow expected, especially for the immigrant group. But those standard errors are rising quickly (for all time points) when we include more years in our time dummy variable. We struggle a bit to find an explanation for that. At the same time, the regression output tells us that all immigrant observations are included ("Number of obs").

    How does Stata work with unbalanced data here?

    We would be very happy to find out.

    Thanks in advance,
    Henning

  • #2
    Henning:
    welcome to this forum.
    First off, I fail to get running two different -xtreg,fe. when you can plug in a two-level categorical variable -i.immigrant- in the right-hand side of your regression equation.
    Another issue that worths mentioning is: are you dealing with a panel dataset (ie, assuming a bit of panel attrition, the same sample of patients is measured on the very same variables every two years) or a repated cross-sectional dataset (basically, the sample is not the same across years).
    In addition:
    1) Stata can handle both balanced and unblanced panale datasets in the very same way, considering the panel-specific available observations;
    2) the way you interacted -age- with itself (searching for possible turning points, I presume), should have been:
    Code:
    c.age##c.age
    ;
    3) "weird" standard errors may depend on different causes, that interested listers cannot comment on unless you share with them what you typed and what Stata gave you back (via CODE delimiters, please), as the FAQ recomend. Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      When you use two-way FE, a variable such as age cannot appear by itself. The unit FE accounts for different starting ages and then the time FE accounts for the fact that age increases by one for all units every year. Henning seems to know that and that is why age only appears as a quadratic. That can be estimated because different starting ages have different rates of change when the quadratic is included. But it is no guarantee that those coefficients can be precisely estimated. If the coefficient on c.age#c.age is small and insignificant it could mean there is no nonlinearity in the age variable.

      If you really meant to include age by itself then you'll have to use random effects.

      Fixed effects is more resilient to unbalanced panels because it allows the reason for unbalancedness to be correlated with the individual heterogeneity. But that isn't going to allow you to include age.

      Comment


      • #4
        Thanks, Jeff.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Would the community-contributed
          Code:
          xthybrid
          command help at all in this context?

          Comment


          • #6
            Originally posted by Jeff Wooldridge View Post
            When you use two-way FE, a variable such as age cannot appear by itself. The unit FE accounts for different starting ages and then the time FE accounts for the fact that age increases by one for all units every year. Henning seems to know that and that is why age only appears as a quadratic. That can be estimated because different starting ages have different rates of change when the quadratic is included. But it is no guarantee that those coefficients can be precisely estimated. If the coefficient on c.age#c.age is small and insignificant it could mean there is no nonlinearity in the age variable.

            If you really meant to include age by itself then you'll have to use random effects.

            Fixed effects is more resilient to unbalanced panels because it allows the reason for unbalancedness to be correlated with the individual heterogeneity. But that isn't going to allow you to include age.
            Regarding the question of age in fixed effects. Is it possible to include age groups in a fixed effects regression as a categorical variable?

            In the answers to this query it is suggested that a binary indicator (above a certain age threshold) could be included, since it will capture within effects of those people that switch group:
            https://www.statalist.org/forums/for...on#post1486821

            Comment

            Working...
            X