Replace missing cases within a variable under some circumstances

Mahdie Raj

Join Date: Feb 2016

Posts: 48
#1

Replace missing cases within a variable under some circumstances

10 Dec 2017, 19:19

Hi Everyone!
I have a dataset looks like the below. I wanna replace missing values withing "locale" with the first valid value in "locale", while the data are sorted by "endyear" and "unitid". I tried the following command buy it did not work. I appreciate your advice.
Best,
Nader

bysort unitid endyear: replace locale = locale[n+1] if locale==.

Endyear-unitid-locale
2000-1-.
2001-1-.
2002-1-2
2003-1-3
2000-2-.
2001-2-.
2002-2-9
2000-3-5
2001-3-.
2002-3-6

I would like to get the following:

Endyear-unitid-locale
2000-1-2
2001-1-2
2002-1-2
2003-1-3
2000-2-9
2001-2-9
2002-2-9
2000-3-5
2001-3-6
2002-3-6
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

10 Dec 2017, 19:43

Welcome to Statalist.

Can you tell us what would you expect to do for the following example?

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int endyear byte(unitid locale) 2000 4 . 2001 4 10 2002 4 . 2003 4 30 2004 4 . end

Note also how I posted the sample data - in a way that can be copied and pasted into the do-file editor to recreate the example.

Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. In particular, please read FAQ #12 and help those whose help you seek by posting example data using the dataex command. If you are running Stata 15.1 or later, it is already installed. For earlier versions of Stata, install dataex by typing ssc install dataex. Type help dataex to read the simple instructions for using it. Using dataex will enable those who want to help you to quickly and easily create a 100% faithful replica of your situation to test their ideas and code on.

The more you help others understand your problem, the more likely others are to be able to help you solve your problem.
1 like
Comment
Mahdie Raj

Join Date: Feb 2016

Posts: 48
#3

10 Dec 2017, 20:34

Hi William,
Thanks for your reply. I would like to replace missing values within "locale" with the first next valid value in "locale" across "unitid"s.
Following is parts of my data that shows how my data looks like and what I need. Hope my new post fits Statalist posting policies.
Best,
Nader

. *How my data looks like
. input int endyear byte(unitid locale)
endyear unitid locale
1. 2000 1 .
2. 2001 1 .
3. 2002 1 .
4. 2003 1 30
5. 2001 2 1
6. 2002 2 .
7. 2003 2 2
8. 2004 2 .
9. 2005 2 3

. *What I need
. input int _endyear byte(_unitid _locale)
endyear unitid locale
1. 2000 1 30
2. 2001 1 30
3. 2002 1 30
4. 2003 1 30
5. 2001 2 1
6. 2002 2 2
7. 2003 2 2
8. 2004 2 3
9. 2005 2 3

Last edited by Mahdie Raj; 10 Dec 2017, 20:36.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

10 Dec 2017, 20:44

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int endyear byte(unitid locale) 2000 1 . 2001 1 . 2002 1 . 2003 1 30 2001 2 1 2002 2 . 2003 2 2 2004 2 . 2005 2 3 end gsort unitid -endyear by unitid: replace locale = locale[_n-1] if missing(locale) sort unitid endyear

Your original code was, in a sense, on the right track. But Stata processes its commands from the top of the data set to the bottom, so -replace locale = locale[_n+1] if missing(locale- would only result in a change in a single observation immediately above the first missing value of locale, it would not propagate backwards.

The key is to exploit Stata's top to bottom processing by sorting the data in reverse order on date. Then, when done, restore the original sort order.

Note: It appears that what you pasted into #3 is the output that you get from reading in the data, as opposed to the output of -dataex- itself. They look somewhat similar, but if you compare what you have to what I show here you will notice that yours has extra numbering at the left edge that corresponds to no variable. That layout is difficult to bring in to Stata. Please read -help dataex- carefully and use it effectively in the future. Thanks!
1 like
Comment
Mahdie Raj

Join Date: Feb 2016

Posts: 48
#5

10 Dec 2017, 21:07

Thanks Clyde. Your trick worked perfectly! Much appreciated!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#6

11 Dec 2017, 01:46

You got excellent advice, but this is also an FAQ. See therefore

https://www.stata.com/support/faqs/d...issing-values/

https://www.statalist.org/forums/for...-interpolation
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

11 Dec 2017, 05:59

For completeness, here is a mipolate (SSC) solution.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int endyear byte(unitid locale)
2000 1  .
2001 1  .
2002 1  .
2003 1 30
2001 2  1
2002 2  .
2003 2  2
2004 2  .
2005 2  3
end

mipolate locale endyear, by(unitid) backward gen(locale2) 

list, sepby(unitid) 

     +-------------------------------------+
     | endyear   unitid   locale   locale2 |
     |-------------------------------------|
  1. |    2000        1        .        30 |
  2. |    2001        1        .        30 |
  3. |    2002        1        .        30 |
  4. |    2003        1       30        30 |
     |-------------------------------------|
  5. |    2001        2        1         1 |
  6. |    2002        2        .         2 |
  7. |    2003        2        2         2 |
  8. |    2004        2        .         3 |
  9. |    2005        2        3         3 |
     +-------------------------------------+

Comment

Mahdie Raj

Join Date: Feb 2016

Posts: 48
#8

13 Dec 2017, 18:25

Thanks Nick! much appreciated!
Comment

Announcement

Replace missing cases within a variable under some circumstances

Comment

Comment

Comment

Comment

Comment

Comment

Comment