For my study, we have collected data on the same individuals at three time periods i.e. baseline, midline and endline. I am trying to run a diff-in-diff but am running into issues with most of the recommended commands. I am trying to see if there has been a rise in the number of employees the businesses have over the time period and if this change is significant. The treatment in my study has been delivered at the individual level, so there are no major groups to define other than the treatment and the control. I have generated a simple dummy variable called treatment, where 1 denote the businesses who received support and 0 for the control group. The variable ID is a unique identifier for each business. The variable tranche is the time variable, which takes on the value 1, 2 and 3 for baseline, midline and endline respectively. The variable total_employee is the indicator which contains the number of employees in the business at the three stages. I am sharing a snapshot of my data below:
I used the diff command to begin with and added dummy variables called ib, im and ie to account for baseline, midline and endline cases as part of the covariates list (as recommended by the command page for multiple time periods). However, when I ran the regression, the values for the number of observations at baseline appear as zero. I am not sure what is happening in this case, and if I am losing any data. Posting the result below:
diff total_employee, t(treatment) period(tranche) cov(ib im ie)
DIFFERENCE-IN-DIFFERENCES WITH COVARIATES
DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 1456
Before After
Control: 0 239 239
Treated: 0 249 249
0 488
--------------------------------------------------------
Outcome var. | total~e | S. Err. | |t| | P>|t|
----------------+---------+---------+---------+---------
Before | | | |
Control | 3.425 | | |
Treated | 2.854 | | |
Diff (T-C) | -0.571 | 0.435 | -1.31 | 0.190
After | | | |
Control | 3.491 | | |
Treated | 3.272 | | |
Diff (T-C) | -0.219 | 0.260 | 0.84 | 0.400
| | | |
Diff-in-Diff | 0.352 | 0.202 | 1.74 | 0.081*
--------------------------------------------------------
R-square: 0.01
* Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1
Then I tried using the didregress command. In my case, since the treatment is delivered at an individual level, I included the ID variable in the group category. However, the command doesn't run and produces this error.
didregress (total_employee) (treatment), group(ID) time(tranche)
note: treatment omitted because of collinearity.
model is not identified
The treatment variable treatment was omitted because of collinearity.
I checked collinearity using vif and none of the variables had a value more than 2, so I am not sure why this result is popping up. Further, the command csid also doesn't run for my data. It would be great to know how I can go forward with this analysis.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input double total_employee byte tranche float treatment double WIDU_Project_number 3 1 0 507 3 2 0 507 4 3 0 507 1 1 0 516 6 2 0 516 1 3 0 516 2 1 0 525 2 2 0 525 2 3 0 525 4 1 0 595 4 2 0 595 5 3 0 595 3 1 0 657 3 2 0 657 3 3 0 657 3 1 0 740 3 2 0 740 3 3 0 740 1 1 0 755 1 2 0 755 1 3 0 755 4 1 0 795 2 2 0 795 3 3 0 795 1 1 0 820 6 2 0 820 6 3 0 820 5 1 0 822 0 2 0 822 3 3 0 822 4 1 0 848 1 2 0 848 2 3 0 848 1 1 0 889 1 2 0 889 3 3 0 889 3 1 0 899 3 2 0 899 3 3 0 899 2 1 0 913 2 2 0 913 3 3 0 913 2 1 0 925 2 2 0 925 4 3 0 925 3 1 0 936 3 2 0 936 3 3 0 936 8 1 0 939 6 2 0 939 12 3 0 939 3 1 0 956 3 2 0 956 3 3 0 956 3 1 0 957 3 2 0 957 4 3 0 957 1 1 0 968 3 2 0 968 3 3 0 968 1 1 0 973 1 2 0 973 1 3 0 973 1 1 0 1008 1 2 0 1008 1 3 0 1008 1 1 0 1044 2 2 0 1044 6 3 0 1044 2 1 0 1060 5 2 0 1060 5 3 0 1060 6 1 0 1067 3 2 0 1067 8 3 0 1067 7 1 0 1311 4 2 0 1311 8 3 0 1311 4 1 0 1335 7 2 0 1335 6 3 0 1335 1 1 0 1342 2 2 0 1342 5 3 0 1342 6 1 0 1347 5 2 0 1347 6 3 0 1347 9 1 0 1366 10 2 0 1366 9 3 0 1366 3 1 0 1368 2 2 0 1368 3 3 0 1368 2 1 0 1416 3 2 0 1416 2 3 0 1416 2 1 0 1434 3 2 0 1434 0 3 0 1434 7 1 0 1514 end
diff total_employee, t(treatment) period(tranche) cov(ib im ie)
DIFFERENCE-IN-DIFFERENCES WITH COVARIATES
DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 1456
Before After
Control: 0 239 239
Treated: 0 249 249
0 488
--------------------------------------------------------
Outcome var. | total~e | S. Err. | |t| | P>|t|
----------------+---------+---------+---------+---------
Before | | | |
Control | 3.425 | | |
Treated | 2.854 | | |
Diff (T-C) | -0.571 | 0.435 | -1.31 | 0.190
After | | | |
Control | 3.491 | | |
Treated | 3.272 | | |
Diff (T-C) | -0.219 | 0.260 | 0.84 | 0.400
| | | |
Diff-in-Diff | 0.352 | 0.202 | 1.74 | 0.081*
--------------------------------------------------------
R-square: 0.01
* Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1
Then I tried using the didregress command. In my case, since the treatment is delivered at an individual level, I included the ID variable in the group category. However, the command doesn't run and produces this error.
didregress (total_employee) (treatment), group(ID) time(tranche)
note: treatment omitted because of collinearity.
model is not identified
The treatment variable treatment was omitted because of collinearity.
I checked collinearity using vif and none of the variables had a value more than 2, so I am not sure why this result is popping up. Further, the command csid also doesn't run for my data. It would be great to know how I can go forward with this analysis.
Comment