I have a balanced dataset of 37,848 observations for 1577 IDs.
Actually, the delta is 1 month.
I set the date on the first day of each month.
There was a policy changed in 2015-08.
I created the time and the treat variable and ran the Difference in Difference with -diff- command and -xtreg- command.
About the -diff- command, I asked the author of this command about how to deal with panel data.
The author told me that if I have the panel data with year 1, 2, 3, 4 and 5, and the treatment starts in year 3,
then I will have a time=1 if year is greater than 3, and I must include a fixed effect (or binary variable) for each year in option cov(y1, y2, y3, y4). Therefore I insert the date into the model as follows.
--the diff command--
Actually, the result is acceptable as expected. However, since I have already dropped Aug2014 for reference, why the other date variable
(Aug2015) was automatically ignored? (I have checked the dummy variable and very sure there are no problem with the coding)
In order to overcome the date problem, I also conducted -xtreg- as follows.
(I only conducted random effects model because the result will be influenced by individual huge)
----------the xtreg command-----
The result is also acceptable as excepted.
As the -xtreg-'s result, I calculated the crosstable of the fee of time-treat as follows.
The difference between before and after in control group is same to -diff- command.
However, the difference between control and treat group at the baseline is different.
I calculated the unadjusted mean by treat and time as follows.
But still can find the cause of the different results between two commands.
The question is
1. why there is one date variable omitted automatically in the -diff- command? If I wrote the wrong code? If so, please tell me.
2. which command would be more reliable? why?
3. If fixed effects would be more suitable for my analysis? even if I would like to know the impact of every covariance of individuals.
Thank you for reading my post.
Any response will be appreciated and welcome.
Code:
. xtset panel variable: id (strongly balanced) time variable: date, 2014-08-01 to 2016-07-01, but with gaps delta: 1 day
I set the date on the first day of each month.
Code:
. xtdescribe id: 40644, 40645, ..., 100620 n = 1577 date: 2014-08-01, 2014-09-01, ..., 2016-07-01 T = 24 Delta(date) = 1 day Span(date) = 701 periods (id*date uniquely identifies each observation) Distribution of T_i: min 5% 25% 50% 75% 95% max 24 24 24 24 24 24 24 Freq. Percent Cum. | Pattern* ---------------------------+---------------------------- 1577 100.00 100.00 | 111111111111.11111111111.1 ---------------------------+---------------------------- 1577 100.00 | XXXXXXXXXXXX.XXXXXXXXXXX.X -------------------------------------------------------- *Each column represents 28 periods.
I created the time and the treat variable and ran the Difference in Difference with -diff- command and -xtreg- command.
About the -diff- command, I asked the author of this command about how to deal with panel data.
The author told me that if I have the panel data with year 1, 2, 3, 4 and 5, and the treatment starts in year 3,
then I will have a time=1 if year is greater than 3, and I must include a fixed effect (or binary variable) for each year in option cov(y1, y2, y3, y4). Therefore I insert the date into the model as follows.
--the diff command--
Code:
diff fee, t(treat) p(time) cov(s sex_female L1_age65_69 L1_age70_74 L1_age75_79 L1_age80_84 L1_age85_89 L1_age90_94 L1_ageover95 L2_age65_69 L2_age70_74 L2_age75_79 L2_age80_84 L2_age85_89 L2_age90_94 L2_ageover95 L3_age65_69 L3_age70_74 L3_age75_79 L3_age80_84 L3_age85_89 L3_age90_94 L3_ageover95 L4_age65_69 L4_age70_74 L4_age75_79 L4_age80_84 L4_age85_89 L4_age90_94 L4_ageover95 L5_age65_69 L5_age70_74 L5_age75_79 L5_age80_84 L5_age85_89 L5_age90_94 L5_ageover95 Sep2014 Oct2014 Nov2014 Dec2014 Jan2015 Feb2015 Mar2015 Apr2015 May2015 June2015 July2015 Aug2015 Sep2015 Oct2015 Nov2015 Dec2015 Jan2016 Feb2016 Mar2016 Apr2016 May2016 June2016 July2016) cluster(id) report DIFFERENCE-IN-DIFFERENCES WITH COVARIATES DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS Number of observations in the DIFF-IN-DIFF: 37848 Before After Control: 17268 17268 34536 Treated: 1656 1656 3312 18924 18924 Report - Covariates and coefficients: ------------------------------------------------------------------- Variable(s) | Coeff. | Std. Err. | t | P>|t| ---------------------+------------+-----------+---------+---------- s | 2906.258 | 446.874 | 6.504 | 0.000 sex_female | 495.006 | 146.965 | 3.368 | 0.001 L1_age65_69 | 0.000 | 0.000 | . | . L1_age70_74 | -307.995 | 670.403 | -0.459 | 0.646 L1_age75_79 | -484.816 | 600.984 | -0.807 | 0.420 L1_age80_84 | 101.153 | 570.206 | 0.177 | 0.859 L1_age85_89 | 815.487 | 567.607 | 1.437 | 0.151 L1_age90_94 | 1160.238 | 625.666 | 1.854 | 0.064 L1_ageover95 | 1102.411 | 862.477 | 1.278 | 0.201 L2_age65_69 | 747.135 | 667.301 | 1.120 | 0.263 L2_age70_74 | 814.379 | 626.941 | 1.299 | 0.194 L2_age75_79 | 725.179 | 594.531 | 1.220 | 0.223 L2_age80_84 | 1386.381 | 564.080 | 2.458 | 0.014 L2_age85_89 | 1536.432 | 553.949 | 2.774 | 0.006 L2_age90_94 | 1848.268 | 598.386 | 3.089 | 0.002 L2_ageover95 | 1590.388 | 759.812 | 2.093 | 0.036 L3_age65_69 | 2590.957 | 996.991 | 2.599 | 0.009 L3_age70_74 | 2781.282 | 610.705 | 4.554 | 0.000 L3_age75_79 | 3626.208 | 594.910 | 6.095 | 0.000 L3_age80_84 | 3528.867 | 575.977 | 6.127 | 0.000 L3_age85_89 | 3999.916 | 547.291 | 7.309 | 0.000 L3_age90_94 | 4650.286 | 569.773 | 8.162 | 0.000 L3_ageover95 | 4537.149 | 652.376 | 6.955 | 0.000 L4_age65_69 | 2832.997 | 1038.206 | 2.729 | 0.006 L4_age70_74 | 3921.618 | 700.088 | 5.602 | 0.000 L4_age75_79 | 4736.347 | 665.108 | 7.121 | 0.000 L4_age80_84 | 4922.572 | 601.524 | 8.184 | 0.000 L4_age85_89 | 5661.086 | 546.368 | 10.361 | 0.000 L4_age90_94 | 5360.643 | 586.429 | 9.141 | 0.000 L4_ageover95 | 5740.837 | 643.922 | 8.915 | 0.000 L5_age65_69 | 7205.586 | 694.969 | 10.368 | 0.000 L5_age70_74 | 4881.347 | 829.175 | 5.887 | 0.000 L5_age75_79 | 6038.985 | 626.985 | 9.632 | 0.000 L5_age80_84 | 6368.653 | 578.480 | 11.009 | 0.000 L5_age85_89 | 6765.761 | 620.331 | 10.907 | 0.000 L5_age90_94 | 6515.940 | 595.970 | 10.933 | 0.000 L5_ageover95 | 6200.080 | 827.143 | 7.496 | 0.000 Sep2014 | 226.769 | 26.960 | 8.411 | 0.000 Oct2014 | 207.278 | 32.457 | 6.386 | 0.000 Nov2014 | 155.040 | 35.309 | 4.391 | 0.000 Dec2014 | 144.638 | 37.005 | 3.909 | 0.000 Jan2015 | 9.289 | 39.094 | 0.238 | 0.812 Feb2015 | 404.816 | 41.610 | 9.729 | 0.000 Mar2015 | 270.213 | 41.479 | 6.515 | 0.000 Apr2015 | 294.530 | 43.962 | 6.700 | 0.000 May2015 | 152.098 | 47.719 | 3.187 | 0.001 June2015 | 327.056 | 48.629 | 6.726 | 0.000 July2015 | 309.303 | 46.907 | 6.594 | 0.000 Aug2015 | 0.000 | 0.000 | . | . Sep2015 | 99.994 | 23.564 | 4.243 | 0.000 Oct2015 | 91.097 | 27.618 | 3.298 | 0.001 Nov2015 | 96.589 | 28.947 | 3.337 | 0.001 Dec2015 | 33.397 | 31.238 | 1.069 | 0.285 Jan2016 | -54.779 | 31.482 | -1.740 | 0.082 Feb2016 | 208.422 | 34.785 | 5.992 | 0.000 Mar2016 | 173.085 | 34.398 | 5.032 | 0.000 Apr2016 | 171.250 | 40.390 | 4.240 | 0.000 May2016 | 120.902 | 39.475 | 3.063 | 0.002 June2016 | 249.506 | 42.656 | 5.849 | 0.000 July2016 | 131.812 | 45.212 | 2.915 | 0.004 ------------------------------------------------------------------- -------------------------------------------------------- Outcome var. | fee | S. Err. | |t| | P>|t| ----------------+---------+---------+---------+--------- Before | | | | Control | 3699.097| | | Treated | 3748.692| | | Diff (T-C) | 49.595 | 196.453 | 0.25 | 0.801 After | | | | Control | 3846.336| | | Treated | 3795.368| | | Diff (T-C) | -50.969 | 203.782 | 0.25 | 0.803 | | | | Diff-in-Diff | -100.564| 104.148 | 0.97 | 0.334 -------------------------------------------------------- R-square: 0.43 * Means and Standard Errors are estimated by linear regression **Clustered Std. Errors **Inference: *** p<0.01; ** p<0.05; * p<0.1
(Aug2015) was automatically ignored? (I have checked the dummy variable and very sure there are no problem with the coding)
In order to overcome the date problem, I also conducted -xtreg- as follows.
(I only conducted random effects model because the result will be influenced by individual huge)
----------the xtreg command-----
Code:
xtreg fee i.treat##i.time s sex_female L1_age70_74 L1_age75_79 L1_age80_84 L1_age85_89 L1_age90_94 L1_ageover95 L2_age65_69 L2_age70_74 L2_age75_79 L2_age80_84 L2_age85_89 L2_age90_94 L2_ageover95 L3_age65_69 L3_age70_74 L3_age75_79 L3_age80_84 L3_age85_89 L3_age90_94 L3_ageover95 L4_age65_69 L4_age70_74 L4_age75_79 L4_age80_84 L4_age85_89 L4_age90_94 L4_ageover95 L5_age65_69 L5_age70_74 L5_age75_79 L5_age80_84 L5_age85_89 L5_age90_94 L5_ageover95, cluster(id) re Random-effects GLS regression Number of obs = 37,848 Group variable: id Number of groups = 1,577 R-sq: Obs per group: within = 0.1923 min = 24 between = 0.4322 avg = 24.0 overall = 0.4001 max = 24 Wald chi2(39) = 908.24 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 (Std. Err. adjusted for 1,577 clusters in id) ------------------------------------------------------------------------------------ | Robust fee | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------------+---------------------------------------------------------------- treat | treat group | -106.0227 207.0139 -0.51 0.609 -511.7625 299.717 | time | after | 146.7757 27.66732 5.31 0.000 92.54878 201.0027 | treat#time | treat group#after | -79.51398 96.99125 -0.82 0.412 -269.6133 110.5854 | s | 1369.51 517.5397 2.65 0.008 355.1509 2383.869 sex_female | 543.0636 157.8672 3.44 0.001 233.6495 852.4777 L1_age70_74 | 1078.326 789.1379 1.37 0.172 -468.3559 2625.008 L1_age75_79 | 939.7711 779.4054 1.21 0.228 -587.8354 2467.378 L1_age80_84 | 1292.9 762.7049 1.70 0.090 -201.9738 2787.774 L1_age85_89 | 1582.06 752.4416 2.10 0.036 107.3013 3056.818 L1_age90_94 | 2159.894 779.672 2.77 0.006 631.7645 3688.023 L1_ageover95 | 2122.604 828.7015 2.56 0.010 498.3785 3746.829 L2_age65_69 | 1837.342 589.3703 3.12 0.002 682.1971 2992.486 L2_age70_74 | 2050.923 771.7736 2.66 0.008 538.2742 3563.571 L2_age75_79 | 2085.525 762.6016 2.73 0.006 590.8532 3580.196 L2_age80_84 | 2427.072 752.2051 3.23 0.001 952.7775 3901.367 L2_age85_89 | 2654.918 749.0674 3.54 0.000 1186.773 4123.063 L2_age90_94 | 2812.31 767.938 3.66 0.000 1307.18 4317.441 L2_ageover95 | 2962.812 821.8616 3.61 0.000 1351.993 4573.631 L3_age65_69 | 2997.728 875.2229 3.43 0.001 1282.323 4713.134 L3_age70_74 | 2806.276 781.4151 3.59 0.000 1274.731 4337.822 L3_age75_79 | 3367.41 765.8416 4.40 0.000 1866.388 4868.432 L3_age80_84 | 3713.104 752.7107 4.93 0.000 2237.818 5188.39 L3_age85_89 | 4453.83 748.1278 5.95 0.000 2987.526 5920.134 L3_age90_94 | 4616.31 756.1444 6.11 0.000 3134.294 6098.326 L3_ageover95 | 4477.685 800.3622 5.59 0.000 2909.004 6046.366 L4_age65_69 | 4536.098 1607.77 2.82 0.005 1384.927 7687.269 L4_age70_74 | 3593.762 796.5672 4.51 0.000 2032.519 5155.005 L4_age75_79 | 4564.423 778.8631 5.86 0.000 3037.88 6090.967 L4_age80_84 | 4623.097 762.4186 6.06 0.000 3128.784 6117.41 L4_age85_89 | 5176.192 755.8823 6.85 0.000 3694.69 6657.694 L4_age90_94 | 5428.645 764.0001 7.11 0.000 3931.232 6926.057 L4_ageover95 | 5553.606 798.0791 6.96 0.000 3989.4 7117.812 L5_age65_69 | 3845.099 1059.103 3.63 0.000 1769.296 5920.902 L5_age70_74 | 4196.181 899.9977 4.66 0.000 2432.218 5960.144 L5_age75_79 | 5012.77 790.9583 6.34 0.000 3462.52 6563.02 L5_age80_84 | 5510.147 802.9702 6.86 0.000 3936.354 7083.939 L5_age85_89 | 5907.422 769.2731 7.68 0.000 4399.674 7415.17 L5_age90_94 | 6347.972 804.2378 7.89 0.000 4771.695 7924.25 L5_ageover95 | 6621.562 851.6299 7.78 0.000 4952.398 8290.726 _cons | 3446.689 734.3336 4.69 0.000 2007.421 4885.956 -------------------+---------------------------------------------------------------- sigma_u | 2383.3859 sigma_e | 1081.7752 rho | .82918148 (fraction of variance due to u_i) ------------------------------------------------------------------------------------
As the -xtreg-'s result, I calculated the crosstable of the fee of time-treat as follows.
fee | control group | treated group |
before | 3699.097 | 3748.692 |
after | 3846.336 | 3795.368 |
The difference between before and after in control group is same to -diff- command.
However, the difference between control and treat group at the baseline is different.
I calculated the unadjusted mean by treat and time as follows.
Code:
. mean fee, over(treat time) Mean estimation Number of obs = 37,848 Over: treat time _subpop_1: control group before _subpop_2: control group after _subpop_3: treat group before _subpop_4: treat group after -------------------------------------------------------------- Over | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ fee _subpop_1 | 7240.817 26.93523 7188.023 7293.611 _subpop_2 | 7623.35 26.02227 7572.345 7674.354 _subpop_3 | 6510.688 70.34496 6372.81 6648.565 _subpop_4 | 6827.226 70.6509 6688.748 6965.703 --------------------------------------------------------------
The question is
1. why there is one date variable omitted automatically in the -diff- command? If I wrote the wrong code? If so, please tell me.
2. which command would be more reliable? why?
3. If fixed effects would be more suitable for my analysis? even if I would like to know the impact of every covariance of individuals.
Thank you for reading my post.
Any response will be appreciated and welcome.
Comment