Difference in Difference with state and year fixed effects

grant prinsen

Join Date: May 2018
Posts: 4

Difference in Difference with state and year fixed effects

03 May 2018, 21:48

I have data on volunteer hours from 2010-2015 and I am looking at how the Medicaid expansion that took effect in 2014 has affected volunteer hours. I would like to know what codes to use in Stata to do a difference in difference regression on the states that have expanded in 2014 vs the ones that haven't with population as a control variable. My teacher told me to use state and year fixed effect regression which I don't know how to do and can't figure out at the moment. This is just a sample of what my data looks like, I have it for all of the sates. Any guidance on how to proceed would be helpful.

State	Year	Volunteer Hrs	Population	Treat=1
Alabama	2010	106.88593	4785579	0
Alabama	2011	100.810044	4798649	0
Alabama	2012	102.936671	4813946	0
Alabama	2013	116.417864	4827660	0
Alabama	2014	125.985802	4840037	0
Alabama	2015	98.70755	4850858	0
Florida	2010	434.844492	18846461	0
Florida	2011	458.679654	19097369	0
Florida	2012	475.948588	19341327	0
Florida	2013	438.743281	19584927	0
Florida	2014	495.717974	19897747	0
Florida	2015	444.646822	20268567	0
Arkansas	2010	53.0621463	2921737	0
Arkansas	2011	59.2638898	2938640	0
Arkansas	2012	45.6552647	2949208	0
Arkansas	2013	55.5923406	2956780	0
Arkansas	2014	59.0693401	2964800	1
Arkansas	2015	48.5332943	2975626	1
Colorado	2010	163.956951	5048029	0
Colorado	2011	144.91652	5116411	0
Colorado	2012	160.310765	5186330	0
Colorado	2013	122.830502	5262556	0
Colorado	2014	159.511402	5342311	1
Colorado	2015	148.322003	5440445	1

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

03 May 2018, 22:23

If I have my facts straight, not all states that undertook Medicaid expansion did so in the same year. If that is correct, then you do not have a situation that lends itself to a classical difference-in-differences analysis. You can, instead, do generalized difference in differences.

You already have a variable, treat, that takes on the value 1 in a state that expanded in those years when the expansion had occurred. So this is your interaction term. As you will be using both state and year fixed effects, you do not need the usual "main effects" of treatment-group and pre-post, because they would be colinear with the state and year fixed effects anyway.

I don't know how you want to define your outcome variable. Perhaps you will calculate something like volunteer hours per 100,000 population. Anyway, you will have to decide on that before you can proceed.

That's the conceptual part. In terms of technical details of implementation in Stata, you will need to create a numeric encoded variable for the state, as you cannot use a string variable for a fixed effect. The rest is straight forward. So, it will look more or less like this:

Code:

gen outcome = expression to calculate the outcome variable here encode state, gen(n_state) xtset n_state year xtreg outcome i.treat1 i.year, fe

But you really need to do some reading in your textbooks, or perhaps get a tutor, to help you understand what all of this means. You won't learn much just from marking up my code. Fixed effects regression is a basic procedure in the analysis of economic and financial data, and has many applications outside those disciplines as well. You'll need to learn it.

Also, in the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment
grant prinsen

Join Date: May 2018

Posts: 4
#3

04 May 2018, 10:26

Thank you for the timely response. I tried installing the dataex command but it appears the University I'm at won't allow me to download it.
ssc install dataex
checking dataex consistency and verifying not already installed...
cannot write in directory \\cla-utility.ad.umn.edu\Profiles\All Users\ado\plus\d
r(603);

I calculated the outcome variable as,
-gen outcome = (VolunteerHrs/Population)*100000- for each state to get the amount volunteer hours per 100,000

I am confused about the Numeric Encoded Variable. Using the code -encode State, gen(n_State)- gave me another variable that was n_State which are the state names and then using -xtset n_State Year- it gave me an error that says -repeated time values within panel- and I am wondering how to get around that.

Also just to be clear the code that my teacher gave me was
-xtreg VolunteerHrs DID i.state i.year, fe-
and you're saying that I wouldn't need the DID variable because it would be colinear with the state and year fixed effects
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#4

04 May 2018, 11:14

I am confused about the Numeric Encoded Variable. Using the code -encode State, gen(n_State)- gave me another variable that was n_State which are the state names and then using -xtset n_State Year- it gave me an error that says -repeated time values within panel- and I am wondering how to get around that.

No, the new variable n_State is not another variable which contains the state names. It contains numbers for the states, but they are labeled with the names so that when you -list- or -browse- they look like they are the state names. But they are actually numbers. And that is what -xtset- requires.

The message is self explanatory: there is some country (or perhaps more than one) for which you have more than one observation on that country in the same year. This would probably be an error in your data and you need to fix the data error. To find the offending observations run:

Code:

duplicates tag n_State Year, gen(flag) browse if flag

If the observations are duplicates in all respects on all variables, then you can just drop all but one from each set of duplicates. But if they differ in other respects then you will have to figure out how to reconcile the differences and settle ultimately on a single observation for each. (At least I believe that is the case--you should only have one observation on each country in each year, right?) So fix that.

Also just to be clear the code that my teacher gave me was
-xtreg VolunteerHrs DID i.state i.year, fe-
and you're saying that I wouldn't need the DID variable because it would be colinear with the state and year fixed effects

No, that's not what I'm saying. The variable you are calling DID here is identical to the variable you called treat1 earlier, and you will see that it is, indeed, included in the code I suggested. You don't need the i.state variable, however, because that is automatically taken care of with -fe-. As for the variables that I said you don't need back in #2, since you don't seem to fully understand the modeling here, just don't worry about them for now--they do not exist in your data set as you showed it, and you won't need to create them (whereas you would have needed to create them in a more classical difference-in-differences model.)

Last edited by Clyde Schechter; 04 May 2018, 11:17.
1 like
Comment
grant prinsen

Join Date: May 2018

Posts: 4
#5

04 May 2018, 11:51

Thank you again, I am clearly not very good at using Stata but I got it to work. I am unsure about the interpretation of this regression. Does this mean that in the treatment group volunteer hours went up by .14 million hours per 100,000 people in the population, and then for each individual year it went down by the coefficient next to each year? If you have any other inferences about this regression it would be helpful too.

Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#6

04 May 2018, 13:15

I'll assume, though you don't say as much, that the unit of measurement in the volunteer-hours variable is millions of hours. I'll then assume that you calculated the outcome according to what you wrote in #3.

The interpretation would be that your difference-in-differences estimate of the effect of Medicaid expansion is an increase of 0.14 million volunteer hours per 100,000 population (95% CI, decrease of 0.07 to increase of 0.35).

As for the Year coefficients, each of those represents the expected difference in your outcome between the year shown in the output table and the base year of your analysis, which is 2010. So, irrespective of Medicaid expansion status, your model predicts an average decrease of 0.20 million volunteer hours per 100,000 population in 2011 compared to 2010. In 2012, there was an across-the-board decrease of 0.17 million volunteer hours per 100,000 population compared to 2010, etc.
Comment
grant prinsen

Join Date: May 2018

Posts: 4
#7

04 May 2018, 15:02

Thank you for all of these responses they have been extremely helpful. I just have a couple final questions for you. Could you try to explain why the state.fe can be dropped from the regression? Would this simply be called 'time fixed effect regression' rather than 'state and time fixed effect regression' since the form of the regression would be Y= β0 + β1*[DID] + β2*[Time fe] + ε , where Y is the outcome variable and DID is the Treat1 variable
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#8

04 May 2018, 15:52

On Stata's Help menu, select PDF documentation and then open the [XT] volume. Read the chapters on -xtset- and on -xtreg-. I can answer your specific questions, but you need to understand what's going on,

What you will see is that the -xtreg- command assumes that you have previously -xtset- your data, and it automatically incorporates fixed effects for whatever variable was declared as the panel identifier in your -xtset- command. So, with -xtset state- followed by -xtreg whatever, fe-, you automatically have state fixed effects in the model. You will not get output for those effects, but they are meaningless and unnecessary in any case. The key thing is that they have been included and adjusted for.

Time fixed effects, however, are not automatically included by -xtreg-, which is why your command must include i.time if you want fixed effects for time as well. So the code you want (and used) is:

Code:

xtset n_State xtreg outcome i.Treated1 i.Year, fe

and the underlying model for that is Y= β0 + β1*[DID] + β2*[Time fe] + β3*[State fe] + ε.
Comment

Announcement

Difference in Difference with state and year fixed effects

Comment

Comment

Comment

Comment

Comment

Comment

Comment