Dropping group with missing waves for panel data

Toby Markham

Join Date: Sep 2020
Posts: 2

Dropping group with missing waves for panel data

29 Oct 2020, 03:52

Hey guys,

I'm trying to put together a diff-in-diff panel where the treatment is a labour market program beginning in 2015. I've reduced the panel so that it only includes those over the period 2011-2018, unemployed between 2011-2014 as baseline and only engaging in labour market program after 2015 for the treated. The problem is I have some individuals who have only recorded observations past 2015. I'm not too concerned about the length or balance of years, but I just want to ensure that for both treatment and control, there is at least an entry for 2014 and 2015. My code/data so far looks like this:

Code:

egen id = group(xwaveid), label

**refine years observed
gen year = 0
replace year = 2011 if wave==11
replace year = 2012 if wave==12
replace year = 2013 if wave==13
replace year = 2014 if wave==14
replace year = 2015 if wave==15
replace year = 2016 if wave==16
replace year = 2017 if wave==17
replace year = 2018 if wave==18
drop if year==0

**Refine to only those not engaging with LMP prior to 2015
bysort id (jsactrc) : drop if jsactrc==1 & year>=2011&year<=2014

**Refine by only those unemployed prior to 2015
bysort id (esbrd) : drop if esbrd!=2 & year>=2011&year<=2014

sort id year
**Treatment
gen treat_ja = 0
replace treat_ja = 1 if jsactrc==1 & year>=2015&year<=2018
egen T_ja = max(treat_ja), by (id)

**Drop later observations
bysort id (year): drop if missing(year==2014 & year==2015)

Code:

input byte hgsex int hgage byte(esbrd esdtl jsactrc ehtujyr es esempdt) float(id year treat_ja T_ja)
1 28 1 1 -1 -1  1  1 1 2014 0 0
1 29 1 1 -1 -1  1  1 1 2015 0 0
1 30 1 1 -1 -1  1  1 1 2016 0 0
1 31 2 3  0 -1 -1 -1 1 2017 0 0
1 32 1 1 -1 -1  1  1 1 2018 0 0
1 27 1 1 -1 -1  1  1 2 2011 0 0
1 28 1 1 -1 -1  1  1 2 2012 0 0
1 29 1 1 -1 -1  1  1 2 2013 0 0
1 30 1 1 -1 -1  1  1 2 2014 0 0
1 31 1 1 -1 -1  1  1 2 2015 0 0
1 32 1 1 -1 -1  1  1 2 2016 0 0
1 33 1 1 -1 -1  1  1 2 2017 0 0
2 34 1 2 -1 -1  1  1 2 2018 0 0
2 22 1 1 -1 -1  1  1 3 2016 0 0
2 23 1 1 -1 -1  1  1 3 2017 0 0
2 24 1 1 -1 -1  1  1 3 2018 0 0
2 57 1 1 -1 -1  1  1 4 2015 0 0
2 58 1 1 -1 -1  1  1 4 2016 0 0
2 59 1 1 -1 -1  1  1 4 2017 0 0
2 60 3 6 -1 -1 -1 -1 4 2018 0 0

When I try to drop conditional on missing year 2014 and 2015 it drops 0 observations. I'm assuming this is because the year entry is missing entirely? Other than that I'm out of ideas. Any help with this would be really appreciated!

Cheers,
Toby

Last edited by Toby Markham; 29 Oct 2020, 04:30.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

29 Oct 2020, 11:12

Code:

by id, sort: egen has_2014 = max(year == 2014) by id: egen has_2015 = max(year == 2015) by id: keep if has_2014 & has_2015

By the way, as an aside, that long series of -gen- and -replace- commands creating the year variable can be reduced to:

Code:

gen year = 2000 + wave drop if !inrange(year, 2011, 2018)
1 like
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#3

29 Oct 2020, 11:30

The problem is that the missing function works like this:
missing(x1,x2,...,xn)
Description: 1 if any of the arguments evaluates to missing; otherwise, 0

What original poster has written in the missing() is a logical expression, which is wrong in itself because it is identically false and hence 0, but on top of this it cannot be ever missing.

What Clyde does might be also done like:

Code:

egen minyear = min(year), by(id) drop if minyear>2015
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#4

29 Oct 2020, 11:33

Or without egen and in one line

bysort id (year): drop if year[1]>2015
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#5

29 Oct 2020, 12:43

In writing #2, I took "there is at least an entry for 2014 and 2015" as the exact requirement, and the code there keeps a group precisely when it has both a 2014 observation and a 2015 observation.

The code in #4 is slightly different: it keeps any group that has at least one observation before 2015.

Given the description of the way the data set was managed, these are equivalent. But if the data management description is not exactly correct, they will produce somewhat different results.
Comment
Toby Markham

Join Date: Sep 2020

Posts: 2
#6

30 Oct 2020, 00:46

Fantastic, #2 was exactly what I was after. Thank you both for the help and extra tips!
Comment

Announcement

Dropping group with missing waves for panel data

Comment

Comment

Comment

Comment

Comment