Dropping in a Range

Monica White

Join Date: Jan 2017

Posts: 98
#1

Dropping in a Range

01 Apr 2017, 14:22

Hello,

I want to drop missing values in a variable, but only if the variable is missing for all years (2000-2013).

Code:

input double ccode1 float(ccode2 year) long pending_beginning 2 223 2000 . 2 223 2001 . 2 223 2002 . 2 223 2003 . 2 223 2004 . 2 223 2005 . 2 223 2006 . 2 223 2007 . 2 223 2008 . 2 223 2009 . 2 223 2010 . 2 223 2011 . 2 223 2012 . 2 223 2013 . 2 225 2000 1 2 225 2001 1 2 225 2002 1 2 225 2003 1 2 225 2004 1 2 225 2005 0 2 225 2006 0 2 225 2007 . 2 225 2008 0

In this case, I want to drop the missing values seen with 2-223 between 2000 and 2013, but what we see in 2-225 between 2000 and 2013 (or 2008 seen here) is okay.

I tried to play around with the drop in range codes, but can't seem to put my head around how to make a working code for this particular range.
Tags: None
Andy dB

Join Date: Sep 2015

Posts: 23
#2

01 Apr 2017, 14:52

Hi,

You could "reshape" to wide, then use "egen" with "rownonmiss" to create a new variable, and keep all observations where the new variable is !=0

After the reshape, the "missings" command with "dropobs" would do the same thing for you.

Just reshape back to long afterwards and you're good.

I hope this helps
Comment
Monica White

Join Date: Jan 2017

Posts: 98
#3

01 Apr 2017, 15:05

Originally posted by Andy dB View Post

Hi,

You could "reshape" to wide, then use "egen" with "rownonmiss" to create a new variable, and keep all observations where the new variable is !=0

After the reshape, the "missings" command with "dropobs" would do the same thing for you.

Just reshape back to long afterwards and you're good.

I hope this helps

Hi Andy,

Thanks. It's a bit confusing, though. I did come across this in my search:
http://www.stata.com/statalist/archi.../msg00948.html

No idea if it's something similar. Trying to see if there's a simpler way, because it's quite confusing to wrap my head around it.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

01 Apr 2017, 15:20

Does this do what you want?

Code:

clear
input double ccode1 float(ccode2 year) long pending_beginning
2 223 2000  .
2 223 2001  .
2 223 2002  .
2 223 2003  .
2 223 2004  .
2 223 2005  .
2 223 2006  .
2 223 2007  .
2 223 2008  .
2 223 2009  .
2 223 2010  .
2 223 2011  .
2 223 2012  .
2 223 2013  .
2 225 2000  1
2 225 2001  1
2 225 2002  1
2 225 2003  1
2 225 2004  1
2 225 2005  0
2 225 2006  0
2 225 2007  .
2 225 2008  0
end
bysort ccode1 ccode2 (year): egen good = total(!missing(pending_beginning))
drop if good==0

Code:

. list, clean noobs

    ccode1   ccode2   year   pendin~g   good  
         2      225   2000          1      8  
         2      225   2001          1      8  
         2      225   2002          1      8  
         2      225   2003          1      8  
         2      225   2004          1      8  
         2      225   2005          0      8  
         2      225   2006          0      8  
         2      225   2007          .      8  
         2      225   2008          0      8

Comment

Andy dB

Join Date: Sep 2015

Posts: 23
#5

01 Apr 2017, 15:23

The post you sent suggests exactly the same - to reshape and drop those with all variables missing. How about:

Code:

reshape wide pending_begin, i(ccode1 ccode2) j(year) missings dropobs pending_beginning* reshape long
Comment

Monica White

Join Date: Jan 2017
Posts: 98

01 Apr 2017, 15:44

Originally posted by William Lisowski View Post

Does this do what you want?

Code:

clear
input double ccode1 float(ccode2 year) long pending_beginning
2 223 2000 .
2 223 2001 .
2 223 2002 .
2 223 2003 .
2 223 2004 .
2 223 2005 .
2 223 2006 .
2 223 2007 .
2 223 2008 .
2 223 2009 .
2 223 2010 .
2 223 2011 .
2 223 2012 .
2 223 2013 .
2 225 2000 1
2 225 2001 1
2 225 2002 1
2 225 2003 1
2 225 2004 1
2 225 2005 0
2 225 2006 0
2 225 2007 .
2 225 2008 0
end
bysort ccode1 ccode2 (year): egen good = total(!missing(pending_beginning))
drop if good==0

Code:

. list, clean noobs

ccode1 ccode2 year pendin~g good
2 225 2000 1 8
2 225 2001 1 8
2 225 2002 1 8
2 225 2003 1 8
2 225 2004 1 8
2 225 2005 0 8
2 225 2006 0 8
2 225 2007 . 8
2 225 2008 0 8

Hi Will,

Reviewing the dataset and it's looking like it's what I want. Thank you!

how exactly does

Code:

total(!missing(pending_beginning))

read, though?

Comment

Monica White

Join Date: Jan 2017

Posts: 98
#7

01 Apr 2017, 15:46

Originally posted by Andy dB View Post

The post you sent suggests exactly the same - to reshape and drop those with all variables missing. How about:

Code:

reshape wide pending_begin, i(ccode1 ccode2) j(year) missings dropobs pending_beginning* reshape long

Hi Andy,

in between your's and Will's reply, I was trying to fumble with the reshape command, but it wasn't working because there was a problem with the dataset that I originally didn't pickup/overlooked. I tried Will's suggestion after, and it worked. But I'm going to bookmark this one in the meantime as well. Thank you.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

01 Apr 2017, 15:56

Start by understanding that when Stata evaluates a logical expression or function, it evaluates true as 1 and false as 0.

Then from help missing() we see that that missing(pending_beginning) will be true (1) if pending_beginning is missing and 0 (false) otherwise.

The ! is Stata's not operator, which reverses truth and falsity, so !missing(pending_beginning) is 1 (true) if pending_beginning is not missing and 0 otherwise.

The egen total function gives the total of the values of !missing(pending_beginning), in other words will be the number of non-missing values.
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35711
#9

02 Apr 2017, 01:07

reshape is a very helpful command for when it is needed, but a double reshape is over-elaborate here. Here is another solution:

Code:

bysort code1 code2 (pending_beginning) : drop if missing(pending_beginning[1])

If after sorting, the first value in each block is missing, then they all are.

See also http://www.stata.com/support/faqs/da...ions-in-group/
1 like
Comment
Monica White

Join Date: Jan 2017

Posts: 98
#10

02 Apr 2017, 12:08

Thank you Nick and William!
Comment

Natalia Remel

Join Date: Mar 2017
Posts: 23

#11

07 May 2017, 01:16

Hello,

I have a similar question. Namely, I want to identify those brands (true as 1 and false as 0 ) for which the variable downward_phase for the whole time period 2003q1-2016q4 was 0 . I used the following command bysort brand neu_qsif ( downward_phase ): egen good = total(downward_phase==0) but the result is not correct. In the example below the downward_phase was not 0 from 2003q1-2016q4 only during the first year. Thank you in advance.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str21 intname float(brand neu_qsif peak_quarter peak_su) byte downward_phase float good
"INS.ACTRAPID HM" 3 172 173 533.75 0 1
"INS.ACTRAPID HM" 3 173 173 533.75 0 1
"INS.ACTRAPID HM" 3 174 173 533.75 0 1
"INS.ACTRAPID HM" 3 175 173 533.75 0 1
"INS.ACTRAPID HM" 3 176 173 533.75 1 0
"INS.ACTRAPID HM" 3 177 173 533.75 1 0
"INS.ACTRAPID HM" 3 178 173 533.75 1 0
"INS.ACTRAPID HM" 3 179 173 533.75 1 0
"INS.ACTRAPID HM" 3 180 173 533.75 1 0
"INS.ACTRAPID HM" 3 181 173 533.75 1 0
"INS.ACTRAPID HM" 3 182 173 533.75 1 0
"INS.ACTRAPID HM" 3 183 173 533.75 1 0
"INS.ACTRAPID HM" 3 184 173 533.75 1 0
"INS.ACTRAPID HM" 3 185 173 533.75 1 0
"INS.ACTRAPID HM" 3 186 173 533.75 1 0
"INS.ACTRAPID HM" 3 187 173 533.75 1 0
"INS.ACTRAPID HM" 3 188 173 533.75 1 0
"INS.ACTRAPID HM" 3 189 173 533.75 1 0
"INS.ACTRAPID HM" 3 190 173 533.75 1 0
"INS.ACTRAPID HM" 3 191 173 533.75 1 0
end
format %tq neu_qsif
format %tq peak_quarter

Comment

Natalia Remel

Join Date: Mar 2017

Posts: 23
#12

07 May 2017, 05:38

I just wanted to say that I solved the problem myself.

First I replaced 0 values with missing:
foreach x of varlist downward_phase {
replace `x' = (.) if(`x' == 0)
}
Then used the solution from the previous post:
bysort brand (neu_qsif ): egen good = total (!missing( downward_phase ))
Comment

Announcement