ERROR Fixed effect estimation (reghdfe) : treatment variable is collinear with the fixed effects

Shadi Sadie

Join Date: May 2023

Posts: 2
#1

ERROR Fixed effect estimation (reghdfe) : treatment variable is collinear with the fixed effects

17 May 2023, 18:30

Hi all,

I am conducting a study to estimate the effect of Medicaid expansion on the uninsured rate using a classic Difference-in-Differences (DID) design with two-way fixed effects (twfe) model. My mathematical model is as follows:

UNINSist = αs + δt + βEXPANSIONist + εist

In this model:

UNINSist is a binary variable indicating whether an individual in the survey is uninsured (1) or insured (0) in state s and year t.
αs represents state fixed effects, capturing time-invariant differences across states.
δt represents time fixed effects, capturing common time trends across all states.
β is the parameter of interest, representing the causal effect of Medicaid expansion on the uninsured rate.
EXPANSIONist is a binary treatment variable that equals 1 for states that adopted Medicaid expansion and 0 for states that did not.
εist is the error term accounting for unobserved factors and random variation.

I have data from the American Community Survey (ACS) for the years 2011 to 2019, which consists of repeated cross-sectional data. Here are the top 15 observations of my dataset:

To estimate this model, I am using the reghdfe command

Code:

reghdfe UNINS expansion , absorb(ST YEAR) cluster(ST)

eventhough I got the regression result I got the following error

Code:

note: expansion is probably collinear with the fixed effects (all partialled-out values are close to z > ero; tol = 1.0e-09) (MWFE estimator converged in 4 iterations) note: expansion omitted because of collinearity

I tired using xtreg command in Stata instead but encountered a challenge. Since my data is in a repeated cross-sectional format, the xtreg command requires me to define the panel structure using xtset ST YEAR.

To proceed with the xtreg command, I would need to aggregate the individual observations and take the mean uninsured for each state and year. This would transform my repeated cross-sectional data into a panel structure. However, I have a few concerns regarding this approach.

Firstly, my dataset includes several demographic variables such as sex, race, and education level, which are categorical variables. Aggregating the data by taking the mean may not be appropriate for categorical variables, as it could lead to the loss of valuable information. I am unsure how to handle these categorical variables effectively while converting the data to a panel structure.

Secondly, my dataset also includes survey weights. Considering that the survey weights are specific to each individual, taking the mean uninsured rate for each state and year may not accurately account for the survey design and could potentially introduce biases into the analysis.

Given these concerns, I am uncertain whether taking the average of individuals to obtain one observation per year per state is a suitable approach for my analysis. And also I don't know if taking this approach would solve my treatment collinearity with the fixed effect.

I am seeking guidance on how to address this issue and estimate the classic DID TWFE model.

Thank you for your assistance!

I'm using Stata 17
Tags: difference-in-differences, fixed effects, panel data, two-way fixed effects, xtreg
Clyde Schechter

Join Date: Apr 2014

Posts: 30169
#2

23 May 2023, 17:43

Your regression command does not correctly implement the model you are trying to estimate. Your data set can be made suitable for generalized DID estimation, but not for classical DID estimation, because the latter requires that there be a single time point at which all the "treated" entities begin treatment. But Medicaid expansion was undertaken at different times by different states. So you must use generalized DID. That approach does call for a TWFE model (which, by the way, classical DID does not), but the right hand side variable is not the treatment variable you are using.

The first thing you need to do is to determine in which year each state expanded Medicaid, and add that as a new variable in your data. Then you can use that to calculate a new variable: expanded = 1 for all observations where Medicaid expansion has already taken place in that state by that year, and 0 for all other observations (including observations for states that chose not to expand Medicaid at all.) Then you can get your DID estimate from -reghdfe UNINS expanded, absorb(ST YEAR) vce(cluster ST)-.

For future reference, please note that screenshots are discouraged on this forum, and they are not helpful as data examples if responding to the question requires developing code of any significant complexity. In this particular case, the code in question was a single line, so no harm done. But, in the future, when asking for help with code, please use the -dataex- command and show example data. Although sometimes, as here, it is possible to give an answer that has a reasonable probability of being correct, this is usually not the case. Moreover, such answers are necessarily based on experience-based guesses or intuitions about the nature of your data. When those guesses are wrong, both you and the person trying to help you have wasted their time as you end up with useless code. To avoid this, a -dataex- based example provides all of the information needed to develop and test a solution.

As you are running version 17, -dataex- is already part of your official Stata installation. Run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Added: -reghdfe- does support pweights, so the sampling weights can be accounted for by specifying them in your -reghdfe- command.

Last edited by Clyde Schechter; 23 May 2023, 17:46.
Comment

Announcement

ERROR Fixed effect estimation (reghdfe) : treatment variable is collinear with the fixed effects

Comment