Fixed-effects model with unbalanced data

Henning Hinkers

Join Date: Jul 2022

Posts: 6
#1

Fixed-effects model with unbalanced data

02 Jul 2022, 07:54

Hello everyone!

We're researching into the mental health trajectories of immigrants and natives in Germany over time. For that purpose, we're using unbalanced panel data. We're estimating the within-person-changes with two fixed-effects models for immigrants and natives each, with mental health (mh) as outcome and time dummies (2020, 2018, 2016 etc. – survey every two years) as exposure, controlling for age. We want to visualize the results using coefplot.

Our main question is: As to which degree does our model
xtreg mh i.time_dummies c.age#c.age if immigrant==1, fe vce(robust)

consider values of those who didn't participate every time? We have quite large standard errors, which was somehow expected, especially for the immigrant group. But those standard errors are rising quickly (for all time points) when we include more years in our time dummy variable. We struggle a bit to find an explanation for that. At the same time, the regression output tells us that all immigrant observations are included ("Number of obs").

How does Stata work with unbalanced data here?

We would be very happy to find out.

Thanks in advance,
Henning
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17700
#2

02 Jul 2022, 08:12

Henning:
welcome to this forum.
First off, I fail to get running two different -xtreg,fe. when you can plug in a two-level categorical variable -i.immigrant- in the right-hand side of your regression equation.
Another issue that worths mentioning is: are you dealing with a panel dataset (ie, assuming a bit of panel attrition, the same sample of patients is measured on the very same variables every two years) or a repated cross-sectional dataset (basically, the sample is not the same across years).
In addition:
1) Stata can handle both balanced and unblanced panale datasets in the very same way, considering the panel-specific available observations;
2) the way you interacted -age- with itself (searching for possible turning points, I presume), should have been:

Code:

c.age##c.age

;
3) "weird" standard errors may depend on different causes, that interested listers cannot comment on unless you share with them what you typed and what Stata gave you back (via CODE delimiters, please), as the FAQ recomend. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2148
#3

03 Jul 2022, 01:38

When you use two-way FE, a variable such as age cannot appear by itself. The unit FE accounts for different starting ages and then the time FE accounts for the fact that age increases by one for all units every year. Henning seems to know that and that is why age only appears as a quadratic. That can be estimated because different starting ages have different rates of change when the quadratic is included. But it is no guarantee that those coefficients can be precisely estimated. If the coefficient on c.age#c.age is small and insignificant it could mean there is no nonlinearity in the age variable.

If you really meant to include age by itself then you'll have to use random effects.

Fixed effects is more resilient to unbalanced panels because it allows the reason for unbalancedness to be correlated with the individual heterogeneity. But that isn't going to allow you to include age.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17700
#4

03 Jul 2022, 02:42

Thanks, Jeff.

Kind regards,
Carlo
(Stata 19.0)
Comment
Maxence Morlet

Join Date: Mar 2021

Posts: 653
#5

03 Jul 2022, 08:07

Would the community-contributed

Code:

xthybrid

command help at all in this context?
Comment
Jonas Haas

Join Date: Aug 2022

Posts: 4
#6

09 Aug 2022, 03:41

Originally posted by Jeff Wooldridge View Post

When you use two-way FE, a variable such as age cannot appear by itself. The unit FE accounts for different starting ages and then the time FE accounts for the fact that age increases by one for all units every year. Henning seems to know that and that is why age only appears as a quadratic. That can be estimated because different starting ages have different rates of change when the quadratic is included. But it is no guarantee that those coefficients can be precisely estimated. If the coefficient on c.age#c.age is small and insignificant it could mean there is no nonlinearity in the age variable.

If you really meant to include age by itself then you'll have to use random effects.

Fixed effects is more resilient to unbalanced panels because it allows the reason for unbalancedness to be correlated with the individual heterogeneity. But that isn't going to allow you to include age.

Regarding the question of age in fixed effects. Is it possible to include age groups in a fixed effects regression as a categorical variable?

In the answers to this query it is suggested that a binary indicator (above a certain age threshold) could be included, since it will capture within effects of those people that switch group:
https://www.statalist.org/forums/for...on#post1486821
Comment

Announcement

Fixed-effects model with unbalanced data

Comment

Comment

Comment

Comment

Comment