Deleting observations in long data

Tracy Lam

Join Date: Jul 2014

Posts: 91
#1

Deleting observations in long data

26 Aug 2014, 20:17

Hi Statalist -- I have data that looks like this:

id math
class

12345 .

12345 .

12345 1

10489 .

10489 .

10489 .

10489 .

I want to drop id #10489 because it has missing on all 4 rows, but want to keep id #12345 and all of its rows because it has a 1 on the variable math class and switch it's missing values to be 0. I've been reading up on the collapse command, but still unclear about what the appropriate procedure would be. Any help is greatly appreciated!
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 4987
#2

26 Aug 2014, 21:51

Do something like

Code:

webuse nlswork, clear egen nvalid = total(!missing(union)), by(idcode) drop if nvalid == 0 replace union = 0 if missing(union)

There are various other ways to do this using egen commands.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Roberto Ferrer

Join Date: Apr 2014

Posts: 449
#3

27 Aug 2014, 16:50

Another way:

Code:

clear all set more off input /// id math 12345 . 12345 . 12345 1 10489 . 10489 . 10489 . 10489 . end list, sepby(id) *----- what you want ----- bysort id (math): drop if missing(math[1]) & missing(math[_N]) list, sepby(id)

With Richard's setup:

Code:

bysort idcode (union): drop if missing(union[1]) & missing(union[_N])

You should:

1. Read the FAQ carefully.

2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#4

27 Aug 2014, 18:27

Actually, at the risk of some opacity, you can make it even shorter:

Code:

by idcode (math), sort: drop if missing(math[1])

because missing values are, in Stata, greater than all non missing values, so once the data are sorted, if the first is missing, they all are.
Comment
Richard Williams

Join Date: Apr 2014

Posts: 4987
#5

27 Aug 2014, 19:53

I thought there were simpler solutions out there but couldn't remember what they were. Nice work.

This article by Nick Cox shows all sorts of neat tricks for working with panel data: http://www.stata-journal.com/sjpdf.h...iclenum=dm0033

As a sidelight, I would make a copy of math class and then recode either the original or the copy. Doing things like recoding missings to 0 is something you might regret later, or at least want to double-check and see if it matters. In general if I am making major changes to variables I want to have a way to get back to the original.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Roberto Ferrer

Join Date: Apr 2014

Posts: 449
#6

27 Aug 2014, 21:04

Originally posted by Clyde Schechter View Post

Actually, at the risk of some opacity, you can make it even shorter:

Code:

by idcode (math), sort: drop if missing(math[1])

because missing values are, in Stata, greater than all non missing values, so once the data are sorted, if the first is missing, they all are.

Clyde makes a good point. This doesn't seem the case for Tracy Lam, but we must be careful if the variable we're interested is string and not numeric. The sorting of missing strings is the opposite: they sort to the first places and not the last. An example gone bad:

Code:

clear all set more off input /// id str5 math 12345 12345 12345 one 10489 10489 10489 10489 end list, sepby(id) *----- what you want ----- by id (math), sort: drop if missing(math[1]) list, sepby(id)

You should:

1. Read the FAQ carefully.

2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.
Comment

id	math class
12345	.
12345	.
12345	1
10489	.
10489	.
10489	.
10489	.

Announcement

Deleting observations in long data

Comment

Comment

Comment

Comment

Comment