Running difference-in-difference on a three time period cross-sectional data

Ashmika Gouchwal

Join Date: Sep 2023

Posts: 2
#1

Running difference-in-difference on a three time period cross-sectional data

13 Sep 2023, 05:12

For my study, we have collected data on the same individuals at three time periods i.e. baseline, midline and endline. I am trying to run a diff-in-diff but am running into issues with most of the recommended commands. I am trying to see if there has been a rise in the number of employees the businesses have over the time period and if this change is significant. The treatment in my study has been delivered at the individual level, so there are no major groups to define other than the treatment and the control. I have generated a simple dummy variable called treatment, where 1 denote the businesses who received support and 0 for the control group. The variable ID is a unique identifier for each business. The variable tranche is the time variable, which takes on the value 1, 2 and 3 for baseline, midline and endline respectively. The variable total_employee is the indicator which contains the number of employees in the business at the three stages. I am sharing a snapshot of my data below:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input double total_employee byte tranche float treatment double WIDU_Project_number 3 1 0 507 3 2 0 507 4 3 0 507 1 1 0 516 6 2 0 516 1 3 0 516 2 1 0 525 2 2 0 525 2 3 0 525 4 1 0 595 4 2 0 595 5 3 0 595 3 1 0 657 3 2 0 657 3 3 0 657 3 1 0 740 3 2 0 740 3 3 0 740 1 1 0 755 1 2 0 755 1 3 0 755 4 1 0 795 2 2 0 795 3 3 0 795 1 1 0 820 6 2 0 820 6 3 0 820 5 1 0 822 0 2 0 822 3 3 0 822 4 1 0 848 1 2 0 848 2 3 0 848 1 1 0 889 1 2 0 889 3 3 0 889 3 1 0 899 3 2 0 899 3 3 0 899 2 1 0 913 2 2 0 913 3 3 0 913 2 1 0 925 2 2 0 925 4 3 0 925 3 1 0 936 3 2 0 936 3 3 0 936 8 1 0 939 6 2 0 939 12 3 0 939 3 1 0 956 3 2 0 956 3 3 0 956 3 1 0 957 3 2 0 957 4 3 0 957 1 1 0 968 3 2 0 968 3 3 0 968 1 1 0 973 1 2 0 973 1 3 0 973 1 1 0 1008 1 2 0 1008 1 3 0 1008 1 1 0 1044 2 2 0 1044 6 3 0 1044 2 1 0 1060 5 2 0 1060 5 3 0 1060 6 1 0 1067 3 2 0 1067 8 3 0 1067 7 1 0 1311 4 2 0 1311 8 3 0 1311 4 1 0 1335 7 2 0 1335 6 3 0 1335 1 1 0 1342 2 2 0 1342 5 3 0 1342 6 1 0 1347 5 2 0 1347 6 3 0 1347 9 1 0 1366 10 2 0 1366 9 3 0 1366 3 1 0 1368 2 2 0 1368 3 3 0 1368 2 1 0 1416 3 2 0 1416 2 3 0 1416 2 1 0 1434 3 2 0 1434 0 3 0 1434 7 1 0 1514 end

I used the diff command to begin with and added dummy variables called ib, im and ie to account for baseline, midline and endline cases as part of the covariates list (as recommended by the command page for multiple time periods). However, when I ran the regression, the values for the number of observations at baseline appear as zero. I am not sure what is happening in this case, and if I am losing any data. Posting the result below:

diff total_employee, t(treatment) period(tranche) cov(ib im ie)
DIFFERENCE-IN-DIFFERENCES WITH COVARIATES

DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 1456
Before After
Control: 0 239 239
Treated: 0 249 249
0 488
--------------------------------------------------------
Outcome var. | total~e | S. Err. | |t| | P>|t|
----------------+---------+---------+---------+---------
Before | | | |
Control | 3.425 | | |
Treated | 2.854 | | |
Diff (T-C) | -0.571 | 0.435 | -1.31 | 0.190
After | | | |
Control | 3.491 | | |
Treated | 3.272 | | |
Diff (T-C) | -0.219 | 0.260 | 0.84 | 0.400
| | | |
Diff-in-Diff | 0.352 | 0.202 | 1.74 | 0.081*
--------------------------------------------------------
R-square: 0.01
* Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1

Then I tried using the didregress command. In my case, since the treatment is delivered at an individual level, I included the ID variable in the group category. However, the command doesn't run and produces this error.

didregress (total_employee) (treatment), group(ID) time(tranche)
note: treatment omitted because of collinearity.
model is not identified
The treatment variable treatment was omitted because of collinearity.

I checked collinearity using vif and none of the variables had a value more than 2, so I am not sure why this result is popping up. Further, the command csid also doesn't run for my data. It would be great to know how I can go forward with this analysis.
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3182
#2

13 Sep 2023, 20:09

search jwdid

didregress (or xtdidregress in your case) is 2x2 did method, I think.

you have no treated units in your dataex
Comment
Ashmika Gouchwal

Join Date: Sep 2023

Posts: 2
#3

14 Sep 2023, 01:03

Thanks for the recommendation. I tried running jwdid but it is not producing accurate results.

jwdid total_employee, tvar(tranche) gvar(treatment)
WARNING: Singleton observations not dropped; statistical significance is biased (link)
(MWFE estimator converged in 1 iterations)

HDFE Linear regression Number of obs = 720
Absorbing 1 HDFE group F( 0, 717) = .
Prob > F = .
R-squared = 0.0003
Adj R-squared = -0.0025
Within R-sq. = 0.0000
Root MSE = 3.2325

------------------------------------------------------------------------------
total_empl~e | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
_cons | 3.511111 .1204669 29.15 0.000 3.274601 3.747621
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
tranche | 3 0 3 |
-----------------------------------------------------+

Also, resharing a snippet of the data with a mix of treatment and control cases

Code:

* Example generated by -dataex-. For more info, type help dataex clear input double WIDU_Project_number float treatment byte tranche double total_employee 4183 1 3 1 4183 1 2 1 4183 1 1 3 4214 0 3 0 4214 0 1 2 4214 0 2 2 4288 0 2 3 4288 0 3 5 4288 0 1 3 4312 0 1 6 4312 0 3 4 4312 0 2 4 4333 0 1 5 4333 0 3 6 4333 0 2 6 4391 1 3 6 4391 1 1 1 4391 1 2 3 4409 0 2 6 4409 0 3 6 4409 0 1 4 4478 0 1 3 4478 0 3 3 4478 0 2 3 4509 0 3 3 4509 0 1 4 4509 0 2 4 4517 0 3 13 4517 0 2 11 4517 0 1 11 4545 0 2 4 4545 0 1 4 4545 0 3 1 4557 0 1 2 4557 0 2 3 4557 0 3 2 4608 0 2 3 4608 0 3 3 4608 0 1 3 4619 0 1 2 4619 0 2 1 4619 0 3 1 4625 0 3 12 4625 0 1 20 4625 0 2 18 4663 0 3 1 4663 0 1 2 4663 0 2 2 4771 0 2 3 4771 0 3 3 4771 0 1 0 4845 0 3 3 4845 0 2 3 4845 0 1 4 4982 0 2 1 4982 0 3 2 4982 0 1 1 5156 1 1 4 5156 1 3 3 5167 0 2 2 5167 0 1 4 5167 0 3 1 5232 1 3 1 5232 1 2 1 5232 1 1 4 5265 0 2 3 5265 0 3 4 5265 0 1 4 5302 0 2 6 5302 0 3 5 5302 0 1 10 5316 0 1 1 5316 0 2 1 5316 0 3 1 5340 0 3 2 5340 0 1 1 5340 0 2 2 5380 0 1 4 5380 0 3 6 5380 0 2 2 5479 1 3 1 5479 1 1 4 5479 1 2 2 5537 0 2 3 5537 0 3 0 5537 0 1 0 5630 1 2 1 5630 1 1 0 5630 1 3 3 5702 0 1 4 5702 0 2 3 5702 0 3 3 5716 0 3 4 5716 0 1 2 5716 0 2 3 5752 1 2 2 5752 1 3 2 5752 1 1 3 end

The main objective is to be able to do did with all three time period data included. Didregress with just the baseline and endline variables produces results but I want to be able to use the midline data as well. I have not come across any commands as of yet which let me use all the three time period data without any issues.
Comment

George Ford

Join Date: Aug 2014
Posts: 3182

14 Sep 2023, 08:38

Code:

egen pid = group(WIDU_Project_number)
xtset pid tranche
g ly = ln(total_employee)
g tt = tranche*treatment if treatment>0
egen treattime = min(cond(treated,tt,0,.)) , by(pid)
jwdid ly , ivar(pid) tvar(tranche) gvar(treattime)

Announcement

Running difference-in-difference on a three time period cross-sectional data

Comment

Comment

Comment