Question regarding Panel data

Brian Kipkoech

Join Date: Nov 2024

Posts: 2
#1

Question regarding Panel data

19 Jun 2025, 02:59

I have two questions. I am analysing health outcomes as dependent variables against an independent variable (diet diversity). My problem is, should I include i.year, i.e., xtreg X Y i.year, fe, or just xtreg X Y, fe, which is okay?
2) Does converting this longitudinal data into a cross-section have an impact on the significance of the variables? Thank you
Tags: None
Sebastian Kripfganz

Join Date: May 2014

Posts: 2592
#2

19 Jun 2025, 04:25

It is usually a good idea to control for year-fixed effects with i.year, at least if there is reason to suspect that the external environment for the subjects might have been different across years; e.g., different economic conditions might have affected both health outcomes and individual behavior differently across years.

I am not sure what you mean by "converting" longitudinal data into a cross section? Do you just want to focus on a single year, or do you want to pool all of the data, simply ignoring the panel nature? If you are restricting your data to a single year, the sample size will be smaller, generally leading to larger standard errors. However, you will also no longer be able to use the fixed-effects estimator. A simple OLS estimation on the cross-sectional data would then require stronger assumptions - no correlation of the independent variables with any omitted variables (which includes fixed effects). If those stronger assumptions are satisfied, standard errors can be smaller again, but if they are violated, the estimator would be biased. The same tradeoff between higher precision due to stronger assumptions and potential bias due to a violation of these assumptions generally also applies to pooling of the data.

https://www.kripfganz.de/stata/
Comment
Mukesh Punia

Join Date: May 2020

Posts: 92
#3

19 Jun 2025, 04:47

I need clarification on #2 regarding fixed effects in the cross-section setting. In cross-sectional survey data, I see people using the term state fixed or district fixed. I had a recent discussion with a friend who said if you just use i.state in

Code:

reg y age i.education i.state

or

Code:

reghdfe y age i.education, absorb(state)

, then why are you not calling it education fixed effect as well?

My simple understanding is that x & education here are individual characteristics/variables, and state is a unit (grouping variable).

I wish for further clarification on this.

Thank you

Last edited by Mukesh Punia; 19 Jun 2025, 04:50.

Best regards,
Mukesh
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#4

19 Jun 2025, 06:11

Originally posted by Mukesh Punia View Post

In cross-sectional survey data, I see people using the term state fixed or district fixed.

This is an abuse of language. Strictly speaking, the term fixed effects is appropriate when referring to panel data settings where the unobserved heterogeneity is constant over time for a given unit (e.g., a state), and the panel identifier is the unit across which those effects are absorbed. In such a context, state fixed effects would refer to controlling for time-invariant state-level characteristics in a panel of repeated state-level observations.

In cross-sectional data, where you have only one observation per unit (e.g., one individual per state), including i.state is simply including state dummies — not fixed effects in the econometric sense. It is better and more precise to call them state indicators or state dummies. People often invoke the terminology of fixed effects because of the desirable properties associated with them in panel settings — such as controlling for unobserved, time-invariant heterogeneity — but doing so in a pure cross-sectional context is technically incorrect. By that logic, one could absurdly call i.education "education fixed effects," which, as your example shows, exposes the misuse.

So, your friend's suggestion highlights exactly why such terminology should be used carefully. In your cross-sectional regression:

Code:

reg y age i.education i.state

you are controlling for categorical variables, not invoking fixed effects in the proper panel-data sense.
Comment
Mukesh Punia

Join Date: May 2020

Posts: 92
#5

19 Jun 2025, 07:20

Thank you, Andrew Musau, for advancing the clarification. I agree with you on 'abuse of language.' I do not know why people (editors & reviewers) do not take the terminology seriously. I have seen papers published in PNAS, World Development, & Economic Letters with cross-sectional data using the terms mothers fixed effect, cluster fixed effect, district fixed effect, or region fixed effect.

May I invite Jeff Wooldridge & other members to add more to it, technically or conceptually?

Last edited by Mukesh Punia; 19 Jun 2025, 07:24.

Best regards,
Mukesh
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2156
#6

19 Jun 2025, 15:01

As usual, I largely agree with Andrew. Now, when I teach panel data methods, I provide a warning about the abuses of the language. As Andrew points out, "fixed effects" typical has positive connotations (unlike "random effects" in most settings). So, in a panel data setting, people will often write "I included industry fixed effects" when they have firm-level data. This is cheating because it assumes that the relevant heterogeneity varies only by industry, not by firm. Or, "I included school fixed effects" when the data are at the child level. Often these approaches are done because what we really mean by fixed effects, which is the within estimator that removes the heterogeneity at the unit level, are often imprecise. Of course, this is the same as putting in unit-level dummy variables. It's natural to try to "rescue" the analysis by putting in dummies at a higher level and argue that it's sufficient to handle endogeneity caused by individual-specific heterogeneity.

With cross-sectional data, you obviously cannot include a fixed effect for each unit. So, you see state or county dummy variables included for individual-level data, or for establishment-level data, or census tract data. In the end, it's just language, and probably a useful shorthand. BTW, I often do see language such as "education fixed effects" used. I've seen "race fixed effects" and "gender fixed effects." I find these objectionable because they're trying to make it sound like the analysis has a level of robustness that it unlikely has. As a shorthand for putting in an exhaustive set of categorical variables, I guess they're okay. It seems to be everywhere and I don't think there's any turning back.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#7

19 Jun 2025, 15:10

I wonder if the misuse of the term "fixed effects" may arise partly from its having a meaning other than as a grouping variable. When doing a random effects analysis, the non-random effects are often called the fixed effects of the model, fixed being intended to distinguish them from random, with no implication that this has anything to do with panel data or grouping variables.

In any case, it is an overloaded and oft-misused term (even by me!). Add my name to the list of those who would like to see its use improved, but hold little hope that that will happen.
Comment

Announcement

Question regarding Panel data

Comment

Comment

Comment

Comment

Comment

Comment