Difference in Difference omitted treatment indicator

Vincent Rowold

Join Date: Jun 2020

Posts: 9
#1

Difference in Difference omitted treatment indicator

21 Jul 2020, 05:47

Hello!
I want to analyse how a market entry of a specific app is associated with a change in the download size (in bytes) of competing apps. I did a panel DiD with a control group containing app data of unaffected apps and in the treatment group all affected apps. At period >=4, the treatment indicator takes on the value 1, otherwise 0. First I did a graphical inspection to test the common trend assumption. It looked like there is a negative effect on the download size of affected apps (as expected). After I did the regression with stata, I got a significant postive effect. How is this possible? Im also concerened about the fact that stata is omitting the treatment indicator in the output. I did the same analysis with another app where stata didnt omit the treatment indicator.
Did I make a mistake and can someone help me to understand how the grafical trend inspection deviates so much from the estimation?
This is the graph (x-axis time, y-axis download_size)
Thank you and all the best
Vince

Graph:

Output:
. xtreg download_size i.post##i.treatment age, cl(application_id) fe
note: 1.treatment omitted because of collinearity

Fixed-effects (within) regression Number of obs = 172,875
Group variable: applicatio~d Number of groups = 49,617

R-sq: Obs per group:
within = 0.0023 min = 1
between = 0.0135 avg = 3.5
overall = 0.0102 max = 6

F(3,49616) = 40.99
corr(u_i, Xb) = -0.1156 Prob > F = 0.0000

(Std. Err. adjusted for 49,617 clusters in application_id)
--------------------------------------------------------------------------------
| Robust
download_size | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
1.post | -305664.7 148492.5 -2.06 0.040 -596711.9 -14617.6
1.treatment | 0 (omitted)
|
post#treatment |
1 1 | 9196838 1955476 4.70 0.000 5364082 1.30e+07
|
age | 3463.04 366.4428 9.45 0.000 2744.807 4181.272
_cons | 9.13e+07 313649.8 291.22 0.000 9.07e+07 9.20e+07
---------------+----------------------------------------------------------------
sigma_u | 1.542e+08
sigma_e | 19961402
rho | .98351201 (fraction of variance due to u_i)
--------------------------------------------------------------------------------

Output (log specifiaction):
. xtreg lsize i.post##i.treatment age, cl(application_id) fe
note: 1.treatment omitted because of collinearity

Fixed-effects (within) regression Number of obs = 172,875
Group variable: applicatio~d Number of groups = 49,617

R-sq: Obs per group:
within = 0.0145 min = 1
between = 0.0597 avg = 3.5
overall = 0.0431 max = 6

F(3,49616) = 131.86
corr(u_i, Xb) = -0.2236 Prob > F = 0.0000

(Std. Err. adjusted for 49,617 clusters in application_id)
--------------------------------------------------------------------------------
| Robust
lsize | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------+----------------------------------------------------------------
1.post | -.0008987 .0006355 -1.41 0.157 -.0021442 .0003469
1.treatment | 0 (omitted)
|
post#treatment |
1 1 | .1757632 .0272084 6.46 0.000 .1224344 .2290919
|
age | .000027 1.55e-06 17.40 0.000 .0000239 .00003
_cons | 17.69652 .0013271 1.3e+04 0.000 17.69392 17.69912
---------------+----------------------------------------------------------------
sigma_u | 1.1514444
sigma_e | .07930105
rho | .99527919 (fraction of variance due to u_i)
--------------------------------------------------------------------------------

Last edited by Vincent Rowold; 21 Jul 2020, 05:50.
Tags: None
Chris Boudreaux

Join Date: Jul 2020

Posts: 83
#2

21 Jul 2020, 06:37

The regression results are consistent with your visual inspection. The interaction coefficient in the log specification says the effect of treatment (when post = 0) is 0.1757632. When post = 1, the effect of treatment is 0.1748645 (-0.0008987 + 0.1757632). Although the effect of treatment is positive both before and after treatment, it is lower after post treatment.

Stata omits the treatment indicator due to collinearity as it should and often occurs with panel DID. In your sample, it must be the case that groups are either always treated or never treated, which gives no within-group variation. Anything time-invariant will drop out of the equation.
1 like
Comment
Vincent Rowold

Join Date: Jun 2020

Posts: 9
#3

21 Jul 2020, 06:47

Hi Chris,
thank you for you answer!
i´m not sure what you mean with "Although the effect of treatment is positive both before and after treatment, it is lower after post treatment.". How can there be a effect of treatment before treatment?
I thought the indicator of the interaction term (treatment x post) gives the ATT of treatment.
Comment
Chris Boudreaux

Join Date: Jul 2020

Posts: 83
#4

21 Jul 2020, 06:57

I was just stating it in general terms. Your treatment indicator is the market entry of a competing app, correct? Then the interpretation is the affected group has a positive effect on download size but it becomes lower (but still positive) after the market entry of a competing app.
Comment
Vincent Rowold

Join Date: Jun 2020

Posts: 9
#5

21 Jul 2020, 07:20

Sorry, I´m kinda lost here. How is there any effect of treatment before treatment? Or, what do you mean with there is a postive effect that becomes lower after treatment? I just plotted the dependend variable on time. The effect should be the diviation from the counterfactural treatment group (trend is shown in the control group line) and the actual treatment group. The effect looks negative.
Also, if I run a fully dynamic panel DiD I shows, that the positive treatment effect got stronger over time:
. xtreg lsize i.period##i.treatment age, cl(application_id) fe
note: 1.treatment omitted because of collinearity

Fixed-effects (within) regression Number of obs = 172,875
Group variable: applicatio~d Number of groups = 49,617

R-sq: Obs per group:
within = 0.0167 min = 1
between = 0.1008 avg = 3.5
overall = 0.0778 max = 6

F(11,49616) = 39.49
corr(u_i, Xb) = -0.3751 Prob > F = 0.0000

(Std. Err. adjusted for 49,617 clusters in application_id)
----------------------------------------------------------------------------------
| Robust
lsize | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
period |
2 | -.0266044 .0285306 -0.93 0.351 -.0825248 .029316
3 | -.0598818 .0622437 -0.96 0.336 -.1818801 .0621165
4 | -.0916205 .0959601 -0.95 0.340 -.2797035 .0964624
5 | -.1263464 .1335633 -0.95 0.344 -.3881321 .1354392
6 | -.1527309 .162132 -0.94 0.346 -.4705115 .1650498
|
1.treatment | 0 (omitted)
|
period#treatment |
2 1 | .0778891 .0176108 4.42 0.000 .0433718 .1124065
3 1 | .1525002 .0357187 4.27 0.000 .0824911 .2225093
4 1 | .2226254 .0351771 6.33 0.000 .1536779 .2915728
5 1 | .2446681 .048336 5.06 0.000 .1499291 .3394072
6 1 | .2959716 .0434816 6.81 0.000 .2107471 .3811961
|
age | .0002003 .0001852 1.08 0.280 -.0001628 .0005634
_cons | 17.60251 .1006158 174.95 0.000 17.4053 17.79972
-----------------+----------------------------------------------------------------
sigma_u | 1.1954073
sigma_e | .07921556
rho | .99562794 (fraction of variance due to u_i)
----------------------------------------------------------------------------------
Comment
Tom Bilach

Join Date: Sep 2018

Posts: 16
#6

30 Jul 2020, 17:01

I wonder if your variables are coded properly.

At period >=4, the treatment indicator takes on the value 1, otherwise 0

Not quite right. Your post-treatment indicator takes on the value 1 in all t periods greater than or equal to the implementation year (i.e., 2014) in both groups. This might have been a minor oversight on your part, but it is worth mentioning.

Treatment timing is well-defined so you can proceed with the "classical" difference-in-differences (DiD) approach. Again, your treatment dummy should be coded 1 for all apps experiencing market entry, 0 otherwise. The post-treatment variable should be coded 1 in all years from 2014 onward in both treatment and control groups. To address your other concerns, DiD performs a double-difference across groups and across times. You are comparing the before-and-after change in the treatment group with the before-and-after change in the control group. Could you show us a subset of your data with your newly created variables appended? And please provide examples from your actual dataset using the -dataex- command.
Comment

Announcement

Difference in Difference omitted treatment indicator

Comment

Comment

Comment

Comment

Comment