Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • csdid: Defining never treated individuals

    Dear all,

    I am estimating the effect of a school reform on labor market outcomes using panel data. Since the reform (that cut one year of high school education) was implemented in a staggered way across different federal states, I decided to use csdid.

    Now I am facing problems with defining the time () and gvar () variable to run the command properly.

    I ran csdid with ivar(person_id), time(grad_year) and gvar(g8_grad_year). The variable "grad_year" describes the year in which a person graduated, "g8_grad_year" is defined as the first year in which students graduated with one year less schooling, indicating the start of the treatment in every federal state.

    My code therefore looks like this:
    Code:
    gen gym_start_year = birthdate + 10 // defining the year in which kids transition to secondary school
    
    gen g8_grad_year = .
    replace g8_grad_year = 2011 if school_fedstate == 8 & gym_start_year >= 2003
    replace g8_grad_year = 2011 if school_fedstate == 9 & gym_start_year >= 2003
    replace g8_grad_year = 2012 if school_fedstate == 11 & gym_start_year >= 2004
    replace g8_grad_year = 2012 if school_fedstate == 12 & gym_start_year >= 2006
    replace g8_grad_year = 2011 if school_fedstate == 4 & gym_start_year >= 2003
    replace g8_grad_year = 2010 if school_fedstate == 2 & gym_start_year >= 2002
    replace g8_grad_year = 2011 if school_fedstate == 13 & gym_start_year >= 2003
    replace g8_grad_year = 2011 if school_fedstate == 3 & gym_start_year >= 2003
    replace g8_grad_year = 2012 if school_fedstate == 5 & gym_start_year >= 2004
    replace g8_grad_year = 2009 if school_fedstate == 10 & gym_start_year >= 1999
    replace g8_grad_year = 2010 if school_fedstate == 15 & gym_start_year >= 2001
    replace g8_grad_year = 2014 if school_fedstate == 1 & gym_start_year >= 2006
    
    csdid income controls, ivar(person_id) time(grad_year) gvar(g8_grad_year) method(dripw)
    Stata is now constantly giving me the following error
    No never treated observations found. Using Not yet treated data
    Units always treated found. These will be ignored. Panel is not balanced.
    Will use observations with Pair balanced (observed at t0 and t1)
    I do have a lot of never treated individuals since my panel includes plenty of older individuals that graduated way before the reform was implemented.

    But somehow Stata does not use them as a control group, even though they should be identified through my time variable (if individual_graduation_year < first_g8_grads_in_fed_state). Does anyone know why this is happening?

    I am very grateful for any thoughts on this issue!

    Best,
    Frida

  • #2
    From the help file for -csdid-:
    gvar(varname) Variable identifying treatment groups or cohorts. Groups that are never treated should be coded as Zero. Any positive value indicates which year a group was initially treated. And once a group is treated, the underlying assumption is that it always remains treated. [emphasis added]

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      From the help file for -csdid-:
      Dear Clyde,

      Thank you very much for your reply! I did code never-treated groups as zero but I am still getting the following error code:
      Code:
      Panel is not balanced
      Will use observations with Pair balanced (observed at t0 and t1)
      So for each individual, csdid needs observations from before and after the treatment. My dataset contains multiple observations per person, but since labor market outcomes are only measured after school, the relevant observations are only post-treatment.

      Both pre- and post-treatment data I only have on a state level.

      Do you know if there is a way I can still use the csdid approach? Am I missing an obvious point here?

      Thank you very much!

      Best,
      Frida

      Comment


      • #4
        Can you do two things
        first run a simple regression with all your variables of interest. Order doesn’t matter
        then gen sample= e(sample)

        then.
        tab year gvar if sample==1
        Then show me that

        Comment


        • #5
          Thank you for your help, Fernando!

          This is the result:

          Code:
                     |           g8_grad_year
           grad_year |      2010       2011       2012 |     Total
          -----------+---------------------------------+----------
                2010 |         7          0          0 |         7 
                2011 |         4          9          0 |        13 
                2012 |         4         23         29 |        56 
                2013 |         0         12          4 |        16 
                2014 |         3          5          5 |        13 
                2015 |         0          1          3 |         4 
          -----------+---------------------------------+----------
               Total |        18         50         41 |       109
          I have a lot of missing values in my dataset which now decreased the sample size substantially. Could this be an issue?

          Actually I was also just thinking about whether my approach might not be staggered at all, even though the reform I am studying was implemented in a staggered way.

          Since I am looking at labor market outcomes I dropped observations from individuals younger than 25 years from my final sample. Data from before the treatment I only used to get information on where people went to school and to extract some socioeconomic controls.

          Consequently, all my observations are post-treatment (after individuals got treated with the school reform or not).

          Could it be that a simple TWFE approach might now be sufficient? Is this my mistake?

          Best,
          Frida

          Comment


          • #6
            you can see here, there is NO never treated group (gvar is never zero
            you also have zeroes before treatment happen, and that is incorrect.
            even with repeated crossection data, the underlying assumption to use DID in general (and CSDID in specific) is that you know exactly when a group would have been treated. So ou cannot have zeroes

            Comment

            Working...
            X