Difference in Difference model

Malik Saendman

Join Date: Dec 2023
Posts: 14

Difference in Difference model

28 Dec 2023, 13:07

Hello everyone. I am running a difference in difference on a set of COVID-19 job satisfaction among workers( before and after Covid). The treatment is set for those who work after March 2020. The treatment group is individual who has jobs. I have a cross-sectional wave from the year 2017 - 2021
I set up the data by generating a time monthly (nd_monthly) variable from a month and year interviewed variables (intdatm_dv and intdaty_dv). I get the result of -didregress- but when I run -estat trendplots- I get an error message (treatment assignment times vary)!!

Code:

gen nd_monthly = ym(intdaty_dv, intdatm_dv)
format nd_monthly %tm
gen work =  (job ==1)
gen treatment = 0
replace treatment = 1 if (work == 1)
gen post = ( nd_monthly >= ym(2020,3))
gen treatment_post = treatment * post

didregress (satis) ( treatment_post ), group(id) time( nd_monthly )
estat trendplots

I'm not sure where the error comes from but I guess the error comes from the time monthly variable because I used the wave variable instead of monthly time and the DiD works correctly in my full dataset (not with this sample here)

I am not sure how to deal with this. I need to use a monthly time variable. I appreciate any advice
Thanks,

Data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float id str1 wave byte(intdatd_dv intdatm_dv) int intdaty_dv float(satis job)
   22445 "j"  3  4 2018 6 1
  280165 "j"  2 12 2018 5 1
68011568 "j" 11  1 2018 6 1
68041488 "l"  4  2 2020 6 1
68044208 "l"  4  2 2020 6 1
68045568 "l" 12  1 2020 6 1
68060528 "j"  5  2 2018 6 1
68060528 "l" 10  2 2020 5 1
68097248 "l"  2  5 2020 5 1
68120368 "j" 12  2 2018 6 1
68142888 "l" 11  3 2020 7 1
68150968 "l" 10  4 2020 6 1
68150976 "j" 24  4 2018 6 1
68160488 "l" 15  1 2020 4 1
68180888 "j" 28  1 2018 7 1
68180888 "l"  9  2 2020 4 1
68180888 "j"  9  2 2018 5 1
68213528 "j" 10  4 2018 5 1
68216248 "j" 10  1 2018 6 1
68216248 "l"  5  2 2020 6 1
68216248 "l" 22  1 2020 2 1
68216248 "j" 25  1 2018 3 1
68288328 "j"  7  3 2018 5 1
68293088 "j" 10  1 2018 6 1
68293096 "j" 21  1 2018 6 1
68294448 "l" 15  1 2020 5 1
68295128 "l" 24  1 2020 3 1
68333208 "l" 10  3 2020 6 1
68333208 "j"  5  3 2018 5 1
68340072 "l"  6  2 2020 3 1
68340072 "j" 10  2 2018 6 1
68364488 "l" 16  7 2020 6 1
68395768 "j" 23  2 2018 7 1
68430448 "l" 16  1 2020 5 1
68453568 "j"  7  4 2018 5 1
68501168 "l"  7  3 2020 5 1
68501168 "j" 25  2 2018 6 1
68501168 "l" 29  2 2020 7 1
68501168 "j" 26  2 2018 6 1
68545368 "j" 23  2 2018 6 1
68565088 "l" 23  2 2020 6 1
68615408 "j" 27  2 2018 6 1
68646096 "j" 30  5 2018 5 1
68646096 "l" 12  3 2020 6 1
68710608 "l" 23  2 2020 7 1
68740600 "l" 20  4 2020 6 1
68754808 "l"  4  3 2020 6 1
68781328 "l" 20  1 2020 5 1
68785408 "j" 16  2 2018 5 1
68794256 "l" 13  3 2020 6 1
 29925 "l"  8 9 2020 5 0
   29925 "j" 29 7 2018 7 0
   76165 "j" 17 3 2018 5 0
   76165 "l"  1 4 2020 5 0
  333205 "j" 15 5 2018 6 0
  469205 "l"  5 5 2020 6 0
  469205 "j" 28 6 2018 7 0
  599765 "j" 14 1 2018 6 0
  665045 "l" 27 4 2020 4 0
  665045 "j"  1 5 2018 4 0
 4849085 "j" 28 4 2018 1 0
 4849085 "l"  3 4 2020 3 0
 4853165 "j" 30 1 2019 6 0
68008848 "j"  8 6 2018 7 0
68008848 "l"  8 3 2020 7 0
68010888 "l" 10 3 2020 6 0
68014288 "j" 20 6 2018 6 0
68021768 "j" 21 5 2018 7 0
68021784 "j" 28 5 2018 6 0
68029928 "j" 21 3 2018 7 0
68035368 "j" 30 4 2018 6 0
68035368 "l" 12 3 2020 6 0
68037408 "j" 30 1 2018 5 0
68042168 "j"  7 2 2018 3 0
68042168 "l" 11 3 2020 4 0
68042168 "j"  6 2 2018 6 0
68044208 "j" 12 2 2018 4 0
68045568 "j" 28 1 2018 6 0
68046928 "j"  6 4 2018 6 0
68046936 "j"  6 4 2018 4 0
68049648 "l"  8 1 2020 5 0
68049648 "j" 12 1 2018 4 0
68051008 "l" 13 1 2020 6 0
68056448 "j"  6 2 2018 4 0
68056448 "l" 22 1 2020 1 0
68056448 "l" 10 1 2020 7 0
68056456 "l" 15 1 2020 6 0
68060528 "l" 10 2 2020 5 0
68060528 "j"  5 2 2018 5 0
68060536 "j" 24 2 2018 5 0
68061288 "l" 28 2 2020 3 0
68061288 "j" 23 2 2018 5 0
68063248 "l" 28 1 2020 7 0
68063248 "j" 25 1 2018 7 0
68063256 "l" 19 2 2020 7 0
68063928 "l" 31 1 2020 6 0
68063928 "j" 26 1 2018 6 0
68063928 "l"  2 2 2020 3 0
68063928 "j" 27 1 2018 4 0
68063936 "l" 16 2 2020 4 0

end
label values intdatd_dv i_intdatd_dv
label values intdatm_dv i_intdatm_dv
label def i_intdatm_dv 1 "January", modify
label def i_intdatm_dv 2 "February", modify
label def i_intdatm_dv 3 "March", modify
label def i_intdatm_dv 4 "April", modify
label def i_intdatm_dv 5 "May", modify
label def i_intdatm_dv 7 "July", modify
label def i_intdatm_dv 12 "December", modify
label values intdaty_dv i_intdaty_dv
label values satis j_satis
label def j_satis 2 "mostly dissatisfied", modify
label def j_satis 3 "somewhat dissatisfied", modify
label def j_satis 4 "neither satisfied or dissatisfied", modify
label def j_satis 5 "somewhat satisfied", modify
label def j_satis 6 "mostly satisfied", modify
label def j_satis 7 "completely satisfied", modify
label values job job_1
label def job_1 1 "Yes mentioned", modify
label def job_1 0 "Not mentioned", modify

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30141
#2

28 Dec 2023, 14:09

The problem is that your data has some problems.

The one that -estat ptrends- is picking up on is that different id's in this study begin their treatment in different months. You claim that they all start in March 2020. But that's not true in the example data. Some of them don't have a post = 1 observation until April, May, July, or even September of 2020. Now, these anomalies arise because your data has lots of gaps. For example, id 29925, whose first observation with post = 1 isn't until September 2020 has no observations between July 2018 and September 2020. But -estat ptrends- can't handle this kind of situation.

You might try to fill in the gaps in the data, but I suspect that these gaps really represent intermittent participation in the survey, and filling them in will be impossible. So I think you have to give up on using -estat ptrends-. Instead, you can just manually create a parallel trends plot, the way everybody did it before -estat ptrends- came into existence.

Code:

collapse (mean) satis, by(treatment nd_monthly) reshape wide satis, i(nd_monthly) j(treatment) label var satis0 "Untreated" label var satis1 "Treated" graph twoway connect satis* nd_monthly if nd_monthly < tm(2020m3), sort

With the example data, this plot does not look like parallel trends, although the sample is so small and the data so noisy that it really isn't suitable. In your full data set, I expect this will give you a meaningful plot.

There is another problem with your data. You have numerous id nd_monthly combinations that appear more than once. And, worse, the values of many of the other variables contradict each other in these observations. So you have a data set that is rife with inconsistencies.
Comment
Malik Saendman

Join Date: Dec 2023

Posts: 14
#3

30 Dec 2023, 08:57

Originally posted by Clyde Schechter View Post

The problem is that your data has some problems.

The one that -estat ptrends- is picking up on is that different id's in this study begin their treatment in different months. You claim that they all start in March 2020. But that's not true in the example data. Some of them don't have a post = 1 observation until April, May, July, or even September of 2020. Now, these anomalies arise because your data has lots of gaps. For example, id 29925, whose first observation with post = 1 isn't until September 2020 has no observations between July 2018 and September 2020. But -estat ptrends- can't handle this kind of situation.

You might try to fill in the gaps in the data, but I suspect that these gaps really represent intermittent participation in the survey, and filling them in will be impossible. So I think you have to give up on using -estat ptrends-. Instead, you can just manually create a parallel trends plot, the way everybody did it before -estat ptrends- came into existence.

Code:

collapse (mean) satis, by(treatment nd_monthly) reshape wide satis, i(nd_monthly) j(treatment) label var satis0 "Untreated" label var satis1 "Treated" graph twoway connect satis* nd_monthly if nd_monthly < tm(2020m3), sort

With the example data, this plot does not look like parallel trends, although the sample is so small and the data so noisy that it really isn't suitable. In your full data set, I expect this will give you a meaningful plot.

There is another problem with your data. You have numerous id nd_monthly combinations that appear more than once. And, worse, the values of many of the other variables contradict each other in these observations. So you have a data set that is rife with inconsistencies.

Thanks for your advice

I got these graphs when I manually created a parallel trend plot, Which I think violates the parallel trends assumption (Non-parallel trends). Right?

Also, when I ran the basic DiD mode by -reg-, I got different results from -didrgeress- , I tried to understand what is the problem in my data, but I could not? Could you help me find out what the problem is and how I can fix it ?

Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30141
#4

30 Dec 2023, 09:23

There are two graphs shown in #3. Of them, only the first appears to be a pre-covid graph of trends, so I'll just ignore the other. I think it falls into a "grey" area with regard to parallel trends. The data are fairly noisy. Viewed as a "big picture," both curves appear to be pretty much flat, which would support parallel trends. But there is a hint of a small upward trend in the treated group which would seem to contradict parallel trends. I'm not sure if that hint of small upward trend is real or if I'm just imagining it. Even if it is exists, it is very small, much smaller than the noisiness of the graphs. I'm inclined to say that for practical purposes you are seeing parallel trends here. I have enough uncertainty about this that I would want to see a parallel-trends regression. To do that, run the same -collapse- command that you did before the graphing, but don't -reshape-. Then run -regress satis i.treatment##c.nd_monthly if nd_monthly < tm(2020md)-. The coefficient of 1.treatment#c.nd_monthly will be the difference in slopes between the satis trends in the treated and untreated groups.

With regard to the -didregress- and -regress- results, your -regress- command is not the way to emulate what -didregress- does. You want

Code:

xtset id xtreg satis i.treatment_post i.nd_monthly, fe vce(cluster id)
Comment
Malik Saendman

Join Date: Dec 2023

Posts: 14
#5

30 Dec 2023, 21:09

Originally posted by Clyde Schechter View Post

There are two graphs shown in #3. Of them, only the first appears to be a pre-covid graph of trends, so I'll just ignore the other. I think it falls into a "grey" area with regard to parallel trends. The data are fairly noisy. Viewed as a "big picture," both curves appear to be pretty much flat, which would support parallel trends. But there is a hint of a small upward trend in the treated group which would seem to contradict parallel trends. I'm not sure if that hint of small upward trend is real or if I'm just imagining it. Even if it is exists, it is very small, much smaller than the noisiness of the graphs. I'm inclined to say that for practical purposes you are seeing parallel trends here. I have enough uncertainty about this that I would want to see a parallel-trends regression. To do that, run the same -collapse- command that you did before the graphing, but don't -reshape-. Then run -regress satis i.treatment##c.nd_monthly if nd_monthly < tm(2020md)-. The coefficient of 1.treatment#c.nd_monthly will be the difference in slopes between the satis trends in the treated and untreated groups.

With regard to the -didregress- and -regress- results, your -regress- command is not the way to emulate what -didregress- does. You want

Code:

xtset id xtreg satis i.treatment_post i.nd_monthly, fe vce(cluster id)

I run the parallel-trends regression and this what I got
Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30141
#6

30 Dec 2023, 23:51

So, in the untreated group the slope of the pre-treatment curve is, to 4 decimal places -0.0009. In the treated group it is -0.0009 + .0028 = .0019. So there is a small amount of divergence here, but it is quite tiny, especially when compared to the vertical separation between the curves of 4.158. I would consider this quite acceptable as parallel trends.
Comment

Announcement

Difference in Difference model

Comment

Comment

Comment

Comment

Comment