Difference-in-difference estimators

Yiling Xu

Join Date: Dec 2022
Posts: 9

Difference-in-difference estimators

02 Dec 2022, 19:02

Hello everyone, I am learning STATA as a beginner, so I need your help. My data is like this.

id	treatment (time variable, take 0 before treatment, 1 after treatment)	group (take 0 in New Jersey, take 1 in PA)	employment (independent variable)
1	0	1	16
1	1	1	20
2	0	1	4
2	1	1	2
3	0	0	7
3	1	0	8
4	0	0	4
4	1	0	6

We are supposed to run a regression to estimate the effect of the policy in New Jersey, and the employment represents the employment in every small restaurant in each state. My code is like this:

xtset id treatment

STATA said, "repeated time values within the panel." In this case, could you please give me some guidance about how I can set my panel data and run the regression to estimate the effect of the policy run in New Jersey (each state has many small restaurants, so I really do not know how to deal with this case)? Thanks for the kind help in advance, and hope you all have good luck today!

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30068
#2

02 Dec 2022, 19:48

Well, Stata is telling you that somewhere in your full data set there is one or more id that has two or more observations with the same value of treatment. The problem is with your data. To find the offending observations, run:

Code:

duplicates tag id treatment, gen(flag) browse if flag

Then you have to decide what to do about it. If those observations are actually correct (i.e. you really have more than one pre- or more than one post-treatment observation on the same id) then you can't use -xtset id treatment-. You could still use -xtset id-, though, as I don't know what you intend to do beyond this point, that may or may not serve your purposes. If however, you see that there are extra observations that shouldn't be there, then you need to get rid of them. Resist the temptation to just delete them from the data: you should rather review the data management that created this data set in the first place, find out how those spurious observations got there, and fix those coding errors. When you hunt down those coding errors, you may find others as well, and you should fix those, too. (If the data set was created by somebody else, ask them to do that.)
Comment
Yiling Xu

Join Date: Dec 2022

Posts: 9
#3

03 Dec 2022, 05:09

Thanks for the timely and warm reply! I tried the code and found that those observations were correct, not duplicate data.

I am estimating the effect of policy on employment in New Jersey (when the group is 0) compared with the other state, Pennsylvania. The dependent variable is the number of employed people, and the independent variable is the group dummy, the treatment dummy, and the interaction between the two. However, since each state (New Jersey and Pennsylvania) consists of different restaurants, the data we have is thus the number of employed people in each restaurant (we only care about employment in the restaurant). So these components consist of the data above.

In this case, if I want to estimate the effect of the policy in New Jersey, which includes many small restaurants, what can I do to see its impact? Thanks for the kind help!!! Your help counts to me!
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#4

03 Dec 2022, 05:26

Note that this is my opinion, and others will likely disagree with me, and that's cool. You should not xtset your data as you've specified above. Instead, you wanna do

Code:

qbys id: g time = _n xtset id time, g

Because now you have a more natural coding of your time periods. Clyde Schechter is right though, you'll need to figure out what to do with duplicates. I don't know if you created this dataset from other sources or it was forced upon you, but, either way, you'll need to decide how to get rid of them. In my experience, if I made the dataset, the fault is usually my own! You can then use xtdidregress or you can use normal OLS to specify the interaction term, if you'd like

Code:

u "https://github.com/alopatina/Applied-Causal-Analysis/blob/master/DinD_ex.dta?raw=true", clear cls reg fte i.nj##i.after bys sheet: g time = _n g treat = cond(nj==1 & time == 1,1,0) xtset sheet time, g xtdidregress (fte) (treat), group(sheet) time(time)

Last edited by Jared Greathouse; 03 Dec 2022, 05:48.
Comment
Yiling Xu

Join Date: Dec 2022

Posts: 9
#5

12 Dec 2022, 03:49

Thank you for all! Sorry for replying late. I have already figured out the problem.
Comment

Announcement

Difference-in-difference estimators

Comment

Comment

Comment

Comment