Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • CSDID implementation with repeated cross section data and binary outcome

    Hello,
    I have a repeated cross section individual-level dataset for 7 years (2013 to 2020) and I am interested to assess the impact of a reform on a binary outcome variable.
    I decided to implement a staggered difference in differences, because the treatment timing is at regional level and different regions started to be treated in different years, some in 2016, others in 2017, others in 2018 and others in 2019.
    Therefore I thought that the best approach could be the CSDID package which allows different treatment timing but I am not quite sure that this command is the right choice because I have a binary dependent variable, so do you suggest me to do something else ? Like a generalized difference in difference using a probit model:
    Code:
    probit Y i.treat i.year X , cluster(region)
    in which Y is the binary outcome, treat is 1 if the individual lives in a treated region after treatment begins and 0 if he is never treated or if he lives in a treated region before the treatment begins, X is a vector of covariates.
    If CSDID is the right choice I have some doubts about writing the right command in stata.
    The first problem was to create the gvar variable. I just generate a variable called group, which assume the value 0 for the individuals that lives in regions never treated and the value 2016 for people that lives in regions that started to be treated in 2016 and so on. I read online that this value must be the same for all years, so if an individual is treated in 2017, the variable group value is 2017 for all the years in the dataset. So I just run:
    Code:
    gen group = 2016 if reg == 1 | reg == 2 | reg == 7 | reg == 8 | reg == 9 | reg == 16
    replace group = 2017 if reg == 14
    replace group = 2018 if reg == 3 | reg == 4 | reg == 5 | reg == 6 | reg == 12 | reg == 17 | reg == 18 | reg == 19 | reg == 20
    replace group = 2019 if reg == 10 | reg == 11 | reg == 13
    replace group = 0 if reg == 15
    Is this the right way to generate the gvar variable in a repeated cross section dataset ?
    Then, when I run the CSDID I decided to cluster for region variable, to use the dripw method and to use notyet treated observations in order to have a larger control group.
    Code:
    csdid Y X, time(year) gvar(group) method(dripw) cluster(region) notyet
    Is this the right way to proceed in a repeated cross section dataset ? I don't know if this is the correct way because there is little information on the web about this command in the case a repeated cross section dataset is used, so before going further I would like to understand if I am doing well.
    Thank you in advance.
    Francesco


  • #2
    Hi Francesco,
    1. Yes, i think the way you are generating your group variables is perfectly fine
    2. csdid will do a kind of LPM approximation to estimate the treatment effects. If you prefer using a probit logit model instead, try jwdid (same syntax)
    HTH
    Fernando

    Comment


    • #3
      Thank you very much Fernando. The jwdid command is really useful.

      Comment

      Working...
      X