Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in Difference estimate with staggered coverage and increasing treatment intensity

    Hi

    I am new to the Stata forum so apologies if this question has been asked before. I have panel data on individual taxpayers who were enrolled in an electronic lodgment service that was introduced in 1999 as a pilot. The e-lodgment was expanded over the years to cover some 2.6m taxpayers by 2011. Starting in 2003 the e-lodgment also incorporated prefilled data for taxpayers to reduce the compliance cost and make it easier to file tax returns. Compliacting the study is the fact that the number of labels (information) being prefilled also increased over the years (from 1 label in 2003 to 21 labels in 2008 and ) while the enrollment in the e-lodgment service was being expanded.

    I am specifically trying to isolate the impact of prefilling on the individual tax return labels and am planning to use a difference-in-difference approach. I have data on a control group that did not use the e-lodgment system (and consequently did not use prefilling). However, I have not come across any research that uses DiD approach to study a program where coverage increases over time and the intensity of the treatment also changes. Would really appreciate if I could use the wisdom of the crowd to identify similar research.

    My fallback option is to study a group of taxpayers that did not use e-lodgement until 2011 (when all the available information was prefilled meaning the treatment was not changing) and compare that with a control that did not use e-lodgment in 2010 and in 2011 using a DiD approach.

    Many thanks in advance.

  • #2
    Would appreciate any feedback/guidance on the above post.

    Comment


    • #3
      While this is not a classical DID setup, there is no reason you cannot use the overall approach. The logical structure of DID is that you have a group of observational units that are exposed to an intervention, and another group that are not. You then build a regression model that is focused on the interaction between exposure-group membership and time (before vs after intervention). In this case, the key difference is that the exposure-group membership variable will not be dichotomous but will be a quasi-continuous variable. This is not a problem at all: it is a minor variation on the theme.

      There are potentially other complications here. Does everybody who uses the e-lodgment service in your study begin to do so in 2003, or do some take it up later? Is the number of pre-filled labels the same for all e-lodgment users in any given year?

      The aspect of your situation that actually does worry me is the appropriateness of your control group. If they are simply people who chose not to use the e-lodgment system even though it was available to them, it is likely that they are different in material ways from those who chose to use it. So you will have to pay very careful attention to demonstrating parallel trends before 2003 in these groups.

      If you want more specific advice on how to code this, then respond to these questions, and also, using the -dataex- command, post an example of your data. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
      Last edited by Clyde Schechter; 07 Apr 2019, 21:05.

      Comment


      • #4
        Hi Clyde

        Thanks for the detailed response which I really appreciate. I will be sending through an example of the dataset as soon as I can get to a computer and have access to Stata. Some clarification and responses to your questions are below:

        1. The e-lodgement service was expanded over the years, so new taxpayers were added over the years. The attached table shows the total population that used e-lodgement in any given year. I am able to identify those who used e-lodgement continuously since 2003 and compare them with a control group who did not use e-lodgement at all during the study period. I believe there will always be both switchers (who used e-lodgement then went to a tax agent and then again used e-lodgement ) as well as those who dropout of the e-lodgement system entirely.

        2. The number of prefilled labels AVAILABLE are the same for all e-lodgement users in a given year. Obviously some users will not have certain information prefilled compared to others because they might have different circumstances (some will have dividend income while others won’t). However, I am able to match on these observable characterstics.

        3. As far as the control group is concerned I will have access to the data prior to 2003 and was hopefully looking to demonstrate parallel trends.

        Grateful for any feedback on addressing these challenges.

        Thanks
        Attached Files
        Last edited by Nitin Sri; 07 Apr 2019, 22:45.

        Comment

        Working...
        X