Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in Difference omitted treatment indicator

    Hello!
    I want to analyse how a market entry of a specific app is associated with a change in the download size (in bytes) of competing apps. I did a panel DiD with a control group containing app data of unaffected apps and in the treatment group all affected apps. At period >=4, the treatment indicator takes on the value 1, otherwise 0. First I did a graphical inspection to test the common trend assumption. It looked like there is a negative effect on the download size of affected apps (as expected). After I did the regression with stata, I got a significant postive effect. How is this possible? Im also concerened about the fact that stata is omitting the treatment indicator in the output. I did the same analysis with another app where stata didnt omit the treatment indicator.
    Did I make a mistake and can someone help me to understand how the grafical trend inspection deviates so much from the estimation?
    This is the graph (x-axis time, y-axis download_size)
    Thank you and all the best
    Vince

    Graph:
    Click image for larger version

Name:	statalist2.jpg
Views:	1
Size:	115.6 KB
ID:	1564486



    Output:
    . xtreg download_size i.post##i.treatment age, cl(application_id) fe
    note: 1.treatment omitted because of collinearity

    Fixed-effects (within) regression Number of obs = 172,875
    Group variable: applicatio~d Number of groups = 49,617

    R-sq: Obs per group:
    within = 0.0023 min = 1
    between = 0.0135 avg = 3.5
    overall = 0.0102 max = 6

    F(3,49616) = 40.99
    corr(u_i, Xb) = -0.1156 Prob > F = 0.0000

    (Std. Err. adjusted for 49,617 clusters in application_id)
    --------------------------------------------------------------------------------
    | Robust
    download_size | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    1.post | -305664.7 148492.5 -2.06 0.040 -596711.9 -14617.6
    1.treatment | 0 (omitted)
    |
    post#treatment |
    1 1 | 9196838 1955476 4.70 0.000 5364082 1.30e+07
    |
    age | 3463.04 366.4428 9.45 0.000 2744.807 4181.272
    _cons | 9.13e+07 313649.8 291.22 0.000 9.07e+07 9.20e+07
    ---------------+----------------------------------------------------------------
    sigma_u | 1.542e+08
    sigma_e | 19961402
    rho | .98351201 (fraction of variance due to u_i)
    --------------------------------------------------------------------------------




    Output (log specifiaction):
    . xtreg lsize i.post##i.treatment age, cl(application_id) fe
    note: 1.treatment omitted because of collinearity

    Fixed-effects (within) regression Number of obs = 172,875
    Group variable: applicatio~d Number of groups = 49,617

    R-sq: Obs per group:
    within = 0.0145 min = 1
    between = 0.0597 avg = 3.5
    overall = 0.0431 max = 6

    F(3,49616) = 131.86
    corr(u_i, Xb) = -0.2236 Prob > F = 0.0000

    (Std. Err. adjusted for 49,617 clusters in application_id)
    --------------------------------------------------------------------------------
    | Robust
    lsize | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
    1.post | -.0008987 .0006355 -1.41 0.157 -.0021442 .0003469
    1.treatment | 0 (omitted)
    |
    post#treatment |
    1 1 | .1757632 .0272084 6.46 0.000 .1224344 .2290919
    |
    age | .000027 1.55e-06 17.40 0.000 .0000239 .00003
    _cons | 17.69652 .0013271 1.3e+04 0.000 17.69392 17.69912
    ---------------+----------------------------------------------------------------
    sigma_u | 1.1514444
    sigma_e | .07930105
    rho | .99527919 (fraction of variance due to u_i)
    --------------------------------------------------------------------------------

    Last edited by Vincent Rowold; 21 Jul 2020, 05:50.

  • #2
    The regression results are consistent with your visual inspection. The interaction coefficient in the log specification says the effect of treatment (when post = 0) is 0.1757632. When post = 1, the effect of treatment is 0.1748645 (-0.0008987 + 0.1757632). Although the effect of treatment is positive both before and after treatment, it is lower after post treatment.

    Stata omits the treatment indicator due to collinearity as it should and often occurs with panel DID. In your sample, it must be the case that groups are either always treated or never treated, which gives no within-group variation. Anything time-invariant will drop out of the equation.

    Comment


    • #3
      Hi Chris,
      thank you for you answer!
      i´m not sure what you mean with "Although the effect of treatment is positive both before and after treatment, it is lower after post treatment.". How can there be a effect of treatment before treatment?
      I thought the indicator of the interaction term (treatment x post) gives the ATT of treatment.

      Comment


      • #4
        I was just stating it in general terms. Your treatment indicator is the market entry of a competing app, correct? Then the interpretation is the affected group has a positive effect on download size but it becomes lower (but still positive) after the market entry of a competing app.

        Comment


        • #5
          Sorry, I´m kinda lost here. How is there any effect of treatment before treatment? Or, what do you mean with there is a postive effect that becomes lower after treatment? I just plotted the dependend variable on time. The effect should be the diviation from the counterfactural treatment group (trend is shown in the control group line) and the actual treatment group. The effect looks negative.
          Also, if I run a fully dynamic panel DiD I shows, that the positive treatment effect got stronger over time:
          . xtreg lsize i.period##i.treatment age, cl(application_id) fe
          note: 1.treatment omitted because of collinearity

          Fixed-effects (within) regression Number of obs = 172,875
          Group variable: applicatio~d Number of groups = 49,617

          R-sq: Obs per group:
          within = 0.0167 min = 1
          between = 0.1008 avg = 3.5
          overall = 0.0778 max = 6

          F(11,49616) = 39.49
          corr(u_i, Xb) = -0.3751 Prob > F = 0.0000

          (Std. Err. adjusted for 49,617 clusters in application_id)
          ----------------------------------------------------------------------------------
          | Robust
          lsize | Coef. Std. Err. t P>|t| [95% Conf. Interval]
          -----------------+----------------------------------------------------------------
          period |
          2 | -.0266044 .0285306 -0.93 0.351 -.0825248 .029316
          3 | -.0598818 .0622437 -0.96 0.336 -.1818801 .0621165
          4 | -.0916205 .0959601 -0.95 0.340 -.2797035 .0964624
          5 | -.1263464 .1335633 -0.95 0.344 -.3881321 .1354392
          6 | -.1527309 .162132 -0.94 0.346 -.4705115 .1650498
          |
          1.treatment | 0 (omitted)
          |
          period#treatment |
          2 1 | .0778891 .0176108 4.42 0.000 .0433718 .1124065
          3 1 | .1525002 .0357187 4.27 0.000 .0824911 .2225093
          4 1 | .2226254 .0351771 6.33 0.000 .1536779 .2915728
          5 1 | .2446681 .048336 5.06 0.000 .1499291 .3394072
          6 1 | .2959716 .0434816 6.81 0.000 .2107471 .3811961
          |
          age | .0002003 .0001852 1.08 0.280 -.0001628 .0005634
          _cons | 17.60251 .1006158 174.95 0.000 17.4053 17.79972
          -----------------+----------------------------------------------------------------
          sigma_u | 1.1954073
          sigma_e | .07921556
          rho | .99562794 (fraction of variance due to u_i)
          ----------------------------------------------------------------------------------


          Comment


          • #6
            I wonder if your variables are coded properly.

            At period >=4, the treatment indicator takes on the value 1, otherwise 0
            Not quite right. Your post-treatment indicator takes on the value 1 in all t periods greater than or equal to the implementation year (i.e., 2014) in both groups. This might have been a minor oversight on your part, but it is worth mentioning.

            Treatment timing is well-defined so you can proceed with the "classical" difference-in-differences (DiD) approach. Again, your treatment dummy should be coded 1 for all apps experiencing market entry, 0 otherwise. The post-treatment variable should be coded 1 in all years from 2014 onward in both treatment and control groups. To address your other concerns, DiD performs a double-difference across groups and across times. You are comparing the before-and-after change in the treatment group with the before-and-after change in the control group. Could you show us a subset of your data with your newly created variables appended? And please provide examples from your actual dataset using the -dataex- command.

            Comment

            Working...
            X