Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running difference-in-difference on a three time period cross-sectional data

    For my study, we have collected data on the same individuals at three time periods i.e. baseline, midline and endline. I am trying to run a diff-in-diff but am running into issues with most of the recommended commands. I am trying to see if there has been a rise in the number of employees the businesses have over the time period and if this change is significant. The treatment in my study has been delivered at the individual level, so there are no major groups to define other than the treatment and the control. I have generated a simple dummy variable called treatment, where 1 denote the businesses who received support and 0 for the control group. The variable ID is a unique identifier for each business. The variable tranche is the time variable, which takes on the value 1, 2 and 3 for baseline, midline and endline respectively. The variable total_employee is the indicator which contains the number of employees in the business at the three stages. I am sharing a snapshot of my data below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double total_employee byte tranche float treatment double WIDU_Project_number
     3 1 0  507
     3 2 0  507
     4 3 0  507
     1 1 0  516
     6 2 0  516
     1 3 0  516
     2 1 0  525
     2 2 0  525
     2 3 0  525
     4 1 0  595
     4 2 0  595
     5 3 0  595
     3 1 0  657
     3 2 0  657
     3 3 0  657
     3 1 0  740
     3 2 0  740
     3 3 0  740
     1 1 0  755
     1 2 0  755
     1 3 0  755
     4 1 0  795
     2 2 0  795
     3 3 0  795
     1 1 0  820
     6 2 0  820
     6 3 0  820
     5 1 0  822
     0 2 0  822
     3 3 0  822
     4 1 0  848
     1 2 0  848
     2 3 0  848
     1 1 0  889
     1 2 0  889
     3 3 0  889
     3 1 0  899
     3 2 0  899
     3 3 0  899
     2 1 0  913
     2 2 0  913
     3 3 0  913
     2 1 0  925
     2 2 0  925
     4 3 0  925
     3 1 0  936
     3 2 0  936
     3 3 0  936
     8 1 0  939
     6 2 0  939
    12 3 0  939
     3 1 0  956
     3 2 0  956
     3 3 0  956
     3 1 0  957
     3 2 0  957
     4 3 0  957
     1 1 0  968
     3 2 0  968
     3 3 0  968
     1 1 0  973
     1 2 0  973
     1 3 0  973
     1 1 0 1008
     1 2 0 1008
     1 3 0 1008
     1 1 0 1044
     2 2 0 1044
     6 3 0 1044
     2 1 0 1060
     5 2 0 1060
     5 3 0 1060
     6 1 0 1067
     3 2 0 1067
     8 3 0 1067
     7 1 0 1311
     4 2 0 1311
     8 3 0 1311
     4 1 0 1335
     7 2 0 1335
     6 3 0 1335
     1 1 0 1342
     2 2 0 1342
     5 3 0 1342
     6 1 0 1347
     5 2 0 1347
     6 3 0 1347
     9 1 0 1366
    10 2 0 1366
     9 3 0 1366
     3 1 0 1368
     2 2 0 1368
     3 3 0 1368
     2 1 0 1416
     3 2 0 1416
     2 3 0 1416
     2 1 0 1434
     3 2 0 1434
     0 3 0 1434
     7 1 0 1514
    end
    I used the diff command to begin with and added dummy variables called ib, im and ie to account for baseline, midline and endline cases as part of the covariates list (as recommended by the command page for multiple time periods). However, when I ran the regression, the values for the number of observations at baseline appear as zero. I am not sure what is happening in this case, and if I am losing any data. Posting the result below:

    diff total_employee, t(treatment) period(tranche) cov(ib im ie)
    DIFFERENCE-IN-DIFFERENCES WITH COVARIATES

    DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
    Number of observations in the DIFF-IN-DIFF: 1456
    Before After
    Control: 0 239 239
    Treated: 0 249 249
    0 488
    --------------------------------------------------------
    Outcome var. | total~e | S. Err. | |t| | P>|t|
    ----------------+---------+---------+---------+---------
    Before | | | |
    Control | 3.425 | | |
    Treated | 2.854 | | |
    Diff (T-C) | -0.571 | 0.435 | -1.31 | 0.190
    After | | | |
    Control | 3.491 | | |
    Treated | 3.272 | | |
    Diff (T-C) | -0.219 | 0.260 | 0.84 | 0.400
    | | | |
    Diff-in-Diff | 0.352 | 0.202 | 1.74 | 0.081*
    --------------------------------------------------------
    R-square: 0.01
    * Means and Standard Errors are estimated by linear regression
    **Inference: *** p<0.01; ** p<0.05; * p<0.1

    Then I tried using the didregress command. In my case, since the treatment is delivered at an individual level, I included the ID variable in the group category. However, the command doesn't run and produces this error.

    didregress (total_employee) (treatment), group(ID) time(tranche)
    note: treatment omitted because of collinearity.
    model is not identified
    The treatment variable treatment was omitted because of collinearity.

    I checked collinearity using vif and none of the variables had a value more than 2, so I am not sure why this result is popping up. Further, the command csid also doesn't run for my data. It would be great to know how I can go forward with this analysis.

  • #2
    search jwdid

    didregress (or xtdidregress in your case) is 2x2 did method, I think.

    you have no treated units in your dataex

    Comment


    • #3
      Thanks for the recommendation. I tried running jwdid but it is not producing accurate results.

      jwdid total_employee, tvar(tranche) gvar(treatment)
      WARNING: Singleton observations not dropped; statistical significance is biased (link)
      (MWFE estimator converged in 1 iterations)

      HDFE Linear regression Number of obs = 720
      Absorbing 1 HDFE group F( 0, 717) = .
      Prob > F = .
      R-squared = 0.0003
      Adj R-squared = -0.0025
      Within R-sq. = 0.0000
      Root MSE = 3.2325

      ------------------------------------------------------------------------------
      total_empl~e | Coefficient Std. err. t P>|t| [95% conf. interval]
      -------------+----------------------------------------------------------------
      _cons | 3.511111 .1204669 29.15 0.000 3.274601 3.747621
      ------------------------------------------------------------------------------

      Absorbed degrees of freedom:
      -----------------------------------------------------+
      Absorbed FE | Categories - Redundant = Num. Coefs |
      -------------+---------------------------------------|
      tranche | 3 0 3 |
      -----------------------------------------------------+

      Also, resharing a snippet of the data with a mix of treatment and control cases

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input double WIDU_Project_number float treatment byte tranche double total_employee
      4183 1 3  1
      4183 1 2  1
      4183 1 1  3
      4214 0 3  0
      4214 0 1  2
      4214 0 2  2
      4288 0 2  3
      4288 0 3  5
      4288 0 1  3
      4312 0 1  6
      4312 0 3  4
      4312 0 2  4
      4333 0 1  5
      4333 0 3  6
      4333 0 2  6
      4391 1 3  6
      4391 1 1  1
      4391 1 2  3
      4409 0 2  6
      4409 0 3  6
      4409 0 1  4
      4478 0 1  3
      4478 0 3  3
      4478 0 2  3
      4509 0 3  3
      4509 0 1  4
      4509 0 2  4
      4517 0 3 13
      4517 0 2 11
      4517 0 1 11
      4545 0 2  4
      4545 0 1  4
      4545 0 3  1
      4557 0 1  2
      4557 0 2  3
      4557 0 3  2
      4608 0 2  3
      4608 0 3  3
      4608 0 1  3
      4619 0 1  2
      4619 0 2  1
      4619 0 3  1
      4625 0 3 12
      4625 0 1 20
      4625 0 2 18
      4663 0 3  1
      4663 0 1  2
      4663 0 2  2
      4771 0 2  3
      4771 0 3  3
      4771 0 1  0
      4845 0 3  3
      4845 0 2  3
      4845 0 1  4
      4982 0 2  1
      4982 0 3  2
      4982 0 1  1
      5156 1 1  4
      5156 1 3  3
      5167 0 2  2
      5167 0 1  4
      5167 0 3  1
      5232 1 3  1
      5232 1 2  1
      5232 1 1  4
      5265 0 2  3
      5265 0 3  4
      5265 0 1  4
      5302 0 2  6
      5302 0 3  5
      5302 0 1 10
      5316 0 1  1
      5316 0 2  1
      5316 0 3  1
      5340 0 3  2
      5340 0 1  1
      5340 0 2  2
      5380 0 1  4
      5380 0 3  6
      5380 0 2  2
      5479 1 3  1
      5479 1 1  4
      5479 1 2  2
      5537 0 2  3
      5537 0 3  0
      5537 0 1  0
      5630 1 2  1
      5630 1 1  0
      5630 1 3  3
      5702 0 1  4
      5702 0 2  3
      5702 0 3  3
      5716 0 3  4
      5716 0 1  2
      5716 0 2  3
      5752 1 2  2
      5752 1 3  2
      5752 1 1  3
      end
      The main objective is to be able to do did with all three time period data included. Didregress with just the baseline and endline variables produces results but I want to be able to use the midline data as well. I have not come across any commands as of yet which let me use all the three time period data without any issues.

      Comment


      • #4
        Code:
        egen pid = group(WIDU_Project_number)
        xtset pid tranche
        g ly = ln(total_employee)
        g tt = tranche*treatment if treatment>0
        egen treattime = min(cond(treated,tt,0,.)) , by(pid)
        jwdid ly , ivar(pid) tvar(tranche) gvar(treattime)

        Comment

        Working...
        X