Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping group with missing waves for panel data

    Hey guys,

    I'm trying to put together a diff-in-diff panel where the treatment is a labour market program beginning in 2015. I've reduced the panel so that it only includes those over the period 2011-2018, unemployed between 2011-2014 as baseline and only engaging in labour market program after 2015 for the treated. The problem is I have some individuals who have only recorded observations past 2015. I'm not too concerned about the length or balance of years, but I just want to ensure that for both treatment and control, there is at least an entry for 2014 and 2015. My code/data so far looks like this:

    Code:
    egen id = group(xwaveid), label
    
    **refine years observed
    gen year = 0
    replace year = 2011 if wave==11
    replace year = 2012 if wave==12
    replace year = 2013 if wave==13
    replace year = 2014 if wave==14
    replace year = 2015 if wave==15
    replace year = 2016 if wave==16
    replace year = 2017 if wave==17
    replace year = 2018 if wave==18
    drop if year==0
    
    **Refine to only those not engaging with LMP prior to 2015
    bysort id (jsactrc) : drop if jsactrc==1 & year>=2011&year<=2014
    
    **Refine by only those unemployed prior to 2015
    bysort id (esbrd) : drop if esbrd!=2 & year>=2011&year<=2014
    
    sort id year
    **Treatment
    gen treat_ja = 0
    replace treat_ja = 1 if jsactrc==1 & year>=2015&year<=2018
    egen T_ja = max(treat_ja), by (id)
    
    **Drop later observations
    bysort id (year): drop if missing(year==2014 & year==2015)

    Code:
    input byte hgsex int hgage byte(esbrd esdtl jsactrc ehtujyr es esempdt) float(id year treat_ja T_ja)
    1 28 1 1 -1 -1  1  1 1 2014 0 0
    1 29 1 1 -1 -1  1  1 1 2015 0 0
    1 30 1 1 -1 -1  1  1 1 2016 0 0
    1 31 2 3  0 -1 -1 -1 1 2017 0 0
    1 32 1 1 -1 -1  1  1 1 2018 0 0
    1 27 1 1 -1 -1  1  1 2 2011 0 0
    1 28 1 1 -1 -1  1  1 2 2012 0 0
    1 29 1 1 -1 -1  1  1 2 2013 0 0
    1 30 1 1 -1 -1  1  1 2 2014 0 0
    1 31 1 1 -1 -1  1  1 2 2015 0 0
    1 32 1 1 -1 -1  1  1 2 2016 0 0
    1 33 1 1 -1 -1  1  1 2 2017 0 0
    2 34 1 2 -1 -1  1  1 2 2018 0 0
    2 22 1 1 -1 -1  1  1 3 2016 0 0
    2 23 1 1 -1 -1  1  1 3 2017 0 0
    2 24 1 1 -1 -1  1  1 3 2018 0 0
    2 57 1 1 -1 -1  1  1 4 2015 0 0
    2 58 1 1 -1 -1  1  1 4 2016 0 0
    2 59 1 1 -1 -1  1  1 4 2017 0 0
    2 60 3 6 -1 -1 -1 -1 4 2018 0 0
    When I try to drop conditional on missing year 2014 and 2015 it drops 0 observations. I'm assuming this is because the year entry is missing entirely? Other than that I'm out of ideas. Any help with this would be really appreciated!

    Cheers,
    Toby
    Last edited by Toby Markham; 29 Oct 2020, 04:30.

  • #2
    Code:
    by id, sort: egen has_2014 = max(year == 2014)
    by id: egen has_2015 = max(year == 2015)
    by id: keep if has_2014 & has_2015
    By the way, as an aside, that long series of -gen- and -replace- commands creating the year variable can be reduced to:
    Code:
    gen year = 2000 + wave
    drop if !inrange(year, 2011, 2018)

    Comment


    • #3
      The problem is that the missing function works like this:
      missing(x1,x2,...,xn)
      Description: 1 if any of the arguments evaluates to missing; otherwise, 0

      What original poster has written in the missing() is a logical expression, which is wrong in itself because it is identically false and hence 0, but on top of this it cannot be ever missing.

      What Clyde does might be also done like:

      Code:
      egen minyear = min(year), by(id)
       
      drop if minyear>2015

      Comment


      • #4
        Or without egen and in one line

        bysort id (year): drop if year[1]>2015

        Comment


        • #5
          In writing #2, I took "there is at least an entry for 2014 and 2015" as the exact requirement, and the code there keeps a group precisely when it has both a 2014 observation and a 2015 observation.

          The code in #4 is slightly different: it keeps any group that has at least one observation before 2015.

          Given the description of the way the data set was managed, these are equivalent. But if the data management description is not exactly correct, they will produce somewhat different results.

          Comment


          • #6
            Fantastic, #2 was exactly what I was after. Thank you both for the help and extra tips!

            Comment

            Working...
            X