Hello! I am seeking your feedback on my approach to estimate a difference-in-difference model (for an unbalanced panel).
My goal is to estimate the effect of a feature introduced at an online platform on some outcome variable. The feature became available in August of 2015. The monthly data available for analysis includes the following months: 01, 03, 05, 06, 08, 09, 10, 11, 12, and 01/2016:
Starting from August 2015, some platforms users began to use the new feature -- let me call them a treated group:
While some users used the new feature every single month starting from August, others used it only once or a few times:
Now, given the feature became available in 08/2015, I create a time variable:
And then estimate the following fixed-effects model:
Does everything seem to be appropriate so far?
Also, since different users started using the new feature at different times, does it make sense to create several time* variables and examine how the effect unfolds over time? E.g.:

I would sincerely appreciate your feedback.
My goal is to estimate the effect of a feature introduced at an online platform on some outcome variable. The feature became available in August of 2015. The monthly data available for analysis includes the following months: 01, 03, 05, 06, 08, 09, 10, 11, 12, and 01/2016:
Code:
xtset
panel variable: id (unbalanced)
time variable: month, 01/2015 to 01/2016, but with gaps
delta: 1 month
xtdescribe
id: 105, 2515, ..., 10289394 n = 68574
month: 01/2015, 03/2015, ..., 01/2016 T = 10
Delta(month) = 1 month
Span(month) = 13 periods
(id*month uniquely identifies each observation)
Distribution of T_i: min 5% 25% 50% 75% 95% max
1 1 2 4 7 10 10
Freq. Percent Cum. | Pattern
---------------------------+---------------
8446 12.32 12.32 | 1.1.11.111111
4239 6.18 18.50 | ............1
4194 6.12 24.61 | ...........11
3875 5.65 30.27 | 1............
3182 4.64 34.91 | .......111111
3087 4.50 39.41 | 1.1..........
2541 3.71 43.11 | ..........111
2040 2.97 46.09 | ........11111
1866 2.72 48.81 | .........1111
35104 51.19 100.00 | (other patterns)
---------------------------+---------------
68574 100.00 | X.X.XX.XXXXXX
Code:
sum treated
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
treated | 303,038 .0658399 .2480025 0 1
tab treated month
| month
treated | 01/2015 03/2015 05/2015 06/2015 08/2015 09/2015 10/2015 11/2015 12/2015 01/2016 | Total
-----------+--------------------------------------------------------------------------------------------------------------+----------
0 | 27,392 27,101 27,319 27,469 27,624 27,522 27,306 28,669 30,752 31,932 | 283,086
1 | 0 0 0 0 2,903 2,961 2,942 3,271 3,624 4,251 | 19,952
-----------+--------------------------------------------------------------------------------------------------------------+----------
Total | 27,392 27,101 27,319 27,469 30,527 30,483 30,248 31,940 34,376 36,183 | 303,038
Code:
tab feature_use_count month
feature_us | month
e_count | 08/2015 09/2015 10/2015 11/2015 12/2015 01/2016 | Total
-----------+------------------------------------------------------------------+----------
1 | 642 256 188 296 381 1,187 | 2,950
2 | 352 416 224 248 861 809 | 2,910
3 | 247 333 373 572 494 474 | 2,493
4 | 377 421 619 618 365 344 | 2,744
5 | 250 500 503 502 488 402 | 2,645
6 | 1,035 1,035 1,035 1,035 1,035 1,035 | 6,210
-----------+------------------------------------------------------------------+----------
Total | 2,903 2,961 2,942 3,271 3,624 4,251 | 19,952
Code:
gen time = (month > tm(2015m8)) & !missing(month)
sum time
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
time | 303,038 .5386453 .4985051 0 1
Code:
xtreg outcome time##treated, fe vce(robust)
Fixed-effects (within) regression Number of obs = 275,646
Group variable: id Number of groups = 64,699
R-sq: Obs per group:
within = 0.0013 min = 1
between = 0.0017 avg = 4.3
overall = 0.0029 max = 9
F(3,64698) = 55.45
corr(u_i, Xb) = 0.0427 Prob > F = 0.0000
(Std. Err. adjusted for 64,699 clusters in id)
------------------------------------------------------------------------------
| Robust
outcome | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.time | -.0536213 .0045599 -11.76 0.000 -.0625588 -.0446838
1.treated | -.0507019 .0223163 -2.27 0.023 -.0944419 -.0069619
|
time#treated |
1 1 | .1455297 .0233969 6.22 0.000 .0996717 .1913876
|
_cons | 1.816555 .0029693 611.79 0.000 1.810735 1.822375
-------------+----------------------------------------------------------------
sigma_u | 2.7997928
sigma_e | .75591234
rho | .93205863 (fraction of variance due to u_i)
------------------------------------------------------------------------------
margins time#treated
Adjusted predictions Number of obs = 275,646
Model VCE : Robust
Expression : Linear prediction, predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
time#treated |
0 0 | 1.816555 .0029693 611.79 0.000 1.810735 1.822374
0 1 | 1.765853 .0215303 82.02 0.000 1.723654 1.808052
1 0 | 1.762934 .0022852 771.46 0.000 1.758455 1.767412
1 1 | 1.857761 .0179003 103.78 0.000 1.822677 1.892845
------------------------------------------------------------------------------
marginsplot
///see screenshot attached below
Also, since different users started using the new feature at different times, does it make sense to create several time* variables and examine how the effect unfolds over time? E.g.:
Code:
gen time1 = (month > tm(2015m9)) & !missing(month) gen time2 = (month > tm(2015m10)) & !missing(month) gen time3 = (month > tm(2015m11)) & !missing(month) ///etc
I would sincerely appreciate your feedback.

Comment