Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in Difference model

    Hello everyone. I am running a difference in difference on a set of COVID-19 job satisfaction among workers( before and after Covid). The treatment is set for those who work after March 2020. The treatment group is individual who has jobs. I have a cross-sectional wave from the year 2017 - 2021
    I set up the data by generating a time monthly (nd_monthly) variable from a month and year interviewed variables (intdatm_dv and intdaty_dv). I get the result of -didregress- but when I run -estat trendplots- I get an error message (treatment assignment times vary)!!

    Code:
    gen nd_monthly = ym(intdaty_dv, intdatm_dv)
    format nd_monthly %tm
    gen work =  (job ==1)
    gen treatment = 0
    replace treatment = 1 if (work == 1)
    gen post = ( nd_monthly >= ym(2020,3))
    gen treatment_post = treatment * post
    
    didregress (satis) ( treatment_post ), group(id) time( nd_monthly )
    estat trendplots
    I'm not sure where the error comes from but I guess the error comes from the time monthly variable because I used the wave variable instead of monthly time and the DiD works correctly in my full dataset (not with this sample here)


    I am not sure how to deal with this. I need to use a monthly time variable. I appreciate any advice
    Thanks,


    Data:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id str1 wave byte(intdatd_dv intdatm_dv) int intdaty_dv float(satis job)
       22445 "j"  3  4 2018 6 1
      280165 "j"  2 12 2018 5 1
    68011568 "j" 11  1 2018 6 1
    68041488 "l"  4  2 2020 6 1
    68044208 "l"  4  2 2020 6 1
    68045568 "l" 12  1 2020 6 1
    68060528 "j"  5  2 2018 6 1
    68060528 "l" 10  2 2020 5 1
    68097248 "l"  2  5 2020 5 1
    68120368 "j" 12  2 2018 6 1
    68142888 "l" 11  3 2020 7 1
    68150968 "l" 10  4 2020 6 1
    68150976 "j" 24  4 2018 6 1
    68160488 "l" 15  1 2020 4 1
    68180888 "j" 28  1 2018 7 1
    68180888 "l"  9  2 2020 4 1
    68180888 "j"  9  2 2018 5 1
    68213528 "j" 10  4 2018 5 1
    68216248 "j" 10  1 2018 6 1
    68216248 "l"  5  2 2020 6 1
    68216248 "l" 22  1 2020 2 1
    68216248 "j" 25  1 2018 3 1
    68288328 "j"  7  3 2018 5 1
    68293088 "j" 10  1 2018 6 1
    68293096 "j" 21  1 2018 6 1
    68294448 "l" 15  1 2020 5 1
    68295128 "l" 24  1 2020 3 1
    68333208 "l" 10  3 2020 6 1
    68333208 "j"  5  3 2018 5 1
    68340072 "l"  6  2 2020 3 1
    68340072 "j" 10  2 2018 6 1
    68364488 "l" 16  7 2020 6 1
    68395768 "j" 23  2 2018 7 1
    68430448 "l" 16  1 2020 5 1
    68453568 "j"  7  4 2018 5 1
    68501168 "l"  7  3 2020 5 1
    68501168 "j" 25  2 2018 6 1
    68501168 "l" 29  2 2020 7 1
    68501168 "j" 26  2 2018 6 1
    68545368 "j" 23  2 2018 6 1
    68565088 "l" 23  2 2020 6 1
    68615408 "j" 27  2 2018 6 1
    68646096 "j" 30  5 2018 5 1
    68646096 "l" 12  3 2020 6 1
    68710608 "l" 23  2 2020 7 1
    68740600 "l" 20  4 2020 6 1
    68754808 "l"  4  3 2020 6 1
    68781328 "l" 20  1 2020 5 1
    68785408 "j" 16  2 2018 5 1
    68794256 "l" 13  3 2020 6 1
     29925 "l"  8 9 2020 5 0
       29925 "j" 29 7 2018 7 0
       76165 "j" 17 3 2018 5 0
       76165 "l"  1 4 2020 5 0
      333205 "j" 15 5 2018 6 0
      469205 "l"  5 5 2020 6 0
      469205 "j" 28 6 2018 7 0
      599765 "j" 14 1 2018 6 0
      665045 "l" 27 4 2020 4 0
      665045 "j"  1 5 2018 4 0
     4849085 "j" 28 4 2018 1 0
     4849085 "l"  3 4 2020 3 0
     4853165 "j" 30 1 2019 6 0
    68008848 "j"  8 6 2018 7 0
    68008848 "l"  8 3 2020 7 0
    68010888 "l" 10 3 2020 6 0
    68014288 "j" 20 6 2018 6 0
    68021768 "j" 21 5 2018 7 0
    68021784 "j" 28 5 2018 6 0
    68029928 "j" 21 3 2018 7 0
    68035368 "j" 30 4 2018 6 0
    68035368 "l" 12 3 2020 6 0
    68037408 "j" 30 1 2018 5 0
    68042168 "j"  7 2 2018 3 0
    68042168 "l" 11 3 2020 4 0
    68042168 "j"  6 2 2018 6 0
    68044208 "j" 12 2 2018 4 0
    68045568 "j" 28 1 2018 6 0
    68046928 "j"  6 4 2018 6 0
    68046936 "j"  6 4 2018 4 0
    68049648 "l"  8 1 2020 5 0
    68049648 "j" 12 1 2018 4 0
    68051008 "l" 13 1 2020 6 0
    68056448 "j"  6 2 2018 4 0
    68056448 "l" 22 1 2020 1 0
    68056448 "l" 10 1 2020 7 0
    68056456 "l" 15 1 2020 6 0
    68060528 "l" 10 2 2020 5 0
    68060528 "j"  5 2 2018 5 0
    68060536 "j" 24 2 2018 5 0
    68061288 "l" 28 2 2020 3 0
    68061288 "j" 23 2 2018 5 0
    68063248 "l" 28 1 2020 7 0
    68063248 "j" 25 1 2018 7 0
    68063256 "l" 19 2 2020 7 0
    68063928 "l" 31 1 2020 6 0
    68063928 "j" 26 1 2018 6 0
    68063928 "l"  2 2 2020 3 0
    68063928 "j" 27 1 2018 4 0
    68063936 "l" 16 2 2020 4 0
    
    end
    label values intdatd_dv i_intdatd_dv
    label values intdatm_dv i_intdatm_dv
    label def i_intdatm_dv 1 "January", modify
    label def i_intdatm_dv 2 "February", modify
    label def i_intdatm_dv 3 "March", modify
    label def i_intdatm_dv 4 "April", modify
    label def i_intdatm_dv 5 "May", modify
    label def i_intdatm_dv 7 "July", modify
    label def i_intdatm_dv 12 "December", modify
    label values intdaty_dv i_intdaty_dv
    label values satis j_satis
    label def j_satis 2 "mostly dissatisfied", modify
    label def j_satis 3 "somewhat dissatisfied", modify
    label def j_satis 4 "neither satisfied or dissatisfied", modify
    label def j_satis 5 "somewhat satisfied", modify
    label def j_satis 6 "mostly satisfied", modify
    label def j_satis 7 "completely satisfied", modify
    label values job job_1
    label def job_1 1 "Yes mentioned", modify
    label def job_1 0 "Not mentioned", modify

  • #2
    The problem is that your data has some problems.

    The one that -estat ptrends- is picking up on is that different id's in this study begin their treatment in different months. You claim that they all start in March 2020. But that's not true in the example data. Some of them don't have a post = 1 observation until April, May, July, or even September of 2020. Now, these anomalies arise because your data has lots of gaps. For example, id 29925, whose first observation with post = 1 isn't until September 2020 has no observations between July 2018 and September 2020. But -estat ptrends- can't handle this kind of situation.

    You might try to fill in the gaps in the data, but I suspect that these gaps really represent intermittent participation in the survey, and filling them in will be impossible. So I think you have to give up on using -estat ptrends-. Instead, you can just manually create a parallel trends plot, the way everybody did it before -estat ptrends- came into existence.

    Code:
    collapse (mean) satis, by(treatment nd_monthly)
    reshape wide satis, i(nd_monthly) j(treatment)
    label var satis0 "Untreated"
    label var satis1 "Treated"
    
    graph twoway connect satis* nd_monthly if nd_monthly < tm(2020m3), sort
    With the example data, this plot does not look like parallel trends, although the sample is so small and the data so noisy that it really isn't suitable. In your full data set, I expect this will give you a meaningful plot.

    There is another problem with your data. You have numerous id nd_monthly combinations that appear more than once. And, worse, the values of many of the other variables contradict each other in these observations. So you have a data set that is rife with inconsistencies.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      The problem is that your data has some problems.

      The one that -estat ptrends- is picking up on is that different id's in this study begin their treatment in different months. You claim that they all start in March 2020. But that's not true in the example data. Some of them don't have a post = 1 observation until April, May, July, or even September of 2020. Now, these anomalies arise because your data has lots of gaps. For example, id 29925, whose first observation with post = 1 isn't until September 2020 has no observations between July 2018 and September 2020. But -estat ptrends- can't handle this kind of situation.

      You might try to fill in the gaps in the data, but I suspect that these gaps really represent intermittent participation in the survey, and filling them in will be impossible. So I think you have to give up on using -estat ptrends-. Instead, you can just manually create a parallel trends plot, the way everybody did it before -estat ptrends- came into existence.

      Code:
      collapse (mean) satis, by(treatment nd_monthly)
      reshape wide satis, i(nd_monthly) j(treatment)
      label var satis0 "Untreated"
      label var satis1 "Treated"
      
      graph twoway connect satis* nd_monthly if nd_monthly < tm(2020m3), sort
      With the example data, this plot does not look like parallel trends, although the sample is so small and the data so noisy that it really isn't suitable. In your full data set, I expect this will give you a meaningful plot.

      There is another problem with your data. You have numerous id nd_monthly combinations that appear more than once. And, worse, the values of many of the other variables contradict each other in these observations. So you have a data set that is rife with inconsistencies.
      Thanks for your advice

      I got these graphs when I manually created a parallel trend plot, Which I think violates the parallel trends assumption (Non-parallel trends). Right?

      Click image for larger version

Name:	Graph.jpg
Views:	1
Size:	333.0 KB
ID:	1738498



      Also, when I ran the basic DiD mode by -reg-, I got different results from -didrgeress- , I tried to understand what is the problem in my data, but I could not? Could you help me find out what the problem is and how I can fix it ?

      Attached Files

      Comment


      • #4
        There are two graphs shown in #3. Of them, only the first appears to be a pre-covid graph of trends, so I'll just ignore the other. I think it falls into a "grey" area with regard to parallel trends. The data are fairly noisy. Viewed as a "big picture," both curves appear to be pretty much flat, which would support parallel trends. But there is a hint of a small upward trend in the treated group which would seem to contradict parallel trends. I'm not sure if that hint of small upward trend is real or if I'm just imagining it. Even if it is exists, it is very small, much smaller than the noisiness of the graphs. I'm inclined to say that for practical purposes you are seeing parallel trends here. I have enough uncertainty about this that I would want to see a parallel-trends regression. To do that, run the same -collapse- command that you did before the graphing, but don't -reshape-. Then run -regress satis i.treatment##c.nd_monthly if nd_monthly < tm(2020md)-. The coefficient of 1.treatment#c.nd_monthly will be the difference in slopes between the satis trends in the treated and untreated groups.

        With regard to the -didregress- and -regress- results, your -regress- command is not the way to emulate what -didregress- does. You want
        Code:
        xtset id
        xtreg satis i.treatment_post  i.nd_monthly, fe  vce(cluster id)

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          There are two graphs shown in #3. Of them, only the first appears to be a pre-covid graph of trends, so I'll just ignore the other. I think it falls into a "grey" area with regard to parallel trends. The data are fairly noisy. Viewed as a "big picture," both curves appear to be pretty much flat, which would support parallel trends. But there is a hint of a small upward trend in the treated group which would seem to contradict parallel trends. I'm not sure if that hint of small upward trend is real or if I'm just imagining it. Even if it is exists, it is very small, much smaller than the noisiness of the graphs. I'm inclined to say that for practical purposes you are seeing parallel trends here. I have enough uncertainty about this that I would want to see a parallel-trends regression. To do that, run the same -collapse- command that you did before the graphing, but don't -reshape-. Then run -regress satis i.treatment##c.nd_monthly if nd_monthly < tm(2020md)-. The coefficient of 1.treatment#c.nd_monthly will be the difference in slopes between the satis trends in the treated and untreated groups.

          With regard to the -didregress- and -regress- results, your -regress- command is not the way to emulate what -didregress- does. You want
          Code:
          xtset id
          xtreg satis i.treatment_post i.nd_monthly, fe vce(cluster id)
          I run the parallel-trends regression and this what I got
          Attached Files

          Comment


          • #6
            So, in the untreated group the slope of the pre-treatment curve is, to 4 decimal places -0.0009. In the treated group it is -0.0009 + .0028 = .0019. So there is a small amount of divergence here, but it is quite tiny, especially when compared to the vertical separation between the curves of 4.158. I would consider this quite acceptable as parallel trends.

            Comment

            Working...
            X