Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in differences with changing sample

    Dear All,

    I am working on a quasi-experimental study with a large unbalanced panel dataset. This is the specification that I use.

    Y_it= β_1 Post_it + δi + γt+ u_it where Post_it is the value of treatment for individual i at week t, and δi and γt are individual and time fixed effect parameters that are estimated.

    Here comes my concern. When t =1, I have only 10.000 individuals in sample, and the number gradually increases to 50.000 over time. That is to say, many individuals only have observations in later periods of the sample. Is there an issue if I use all observations to estimate the equation? The value of dependent variable is decaying over time and therefore using calendar time fixed effects might not be enough. Do you have any suggestions?

  • #2
    Well, there is an issue either way. Another way of looking at this is that you have a sample of 50,000 individuals with lots of missing observations. If you use all observations you have, you are, in effect, doing a "complete cases" analysis, which may well be biased. On the other hand, if you use only the 10,000 people who have data throughout the study, you are selecting a differently biased sample. Do you know why the 40,000 people added to the data after the start of the study were not included from the beginning? An understanding of that would go a long way towards advising how to approach this.

    As for the fact that there is a secular trend on the outcome variable over the course of the study, you have a term for t in the model already and that will help. Also, your DID estimator is a within-person estimator, so the fact that later entrants may be starting from a lower baseline shouldn't matter provided the effect of the intervention/policy change/whatever is still the same for all these people as for those who were there from the start.

    So, basically, it all depends on the actual context and content of your research.

    Comment


    • #3
      Dear Clyde,
      Thank you very much for your reply. Actually, individuals are added to the sample when they launch an application on Appstore. New applications on Appstore, on average, have a higher number of downloads within the first couple of months after their launch and then the number of downloads decays over time. When I standardize the time around the treatment (i.e. -4, -3, -2, -1, 0, 1 ,2 ,3, 4) I can clearly see the effect of treatment on the graph. However with standard calendar time fixed effects, there is no effect of Post. In a different specification, I added age fixed effects that takes the value of 1 when app is launched to the community and incremented by 1 unit in the following weeks (I also keep time and individual fixed effects). In this specification, I obtain a significant coefficient on the Post variable.

      Comment

      Working...
      X