Generating a new variable that equals the Nth non-missing value in a row

Chrisk Burningham

Join Date: Oct 2016

Posts: 35
#1

Generating a new variable that equals the Nth non-missing value in a row

30 Sep 2019, 07:39

Hi All,

I am essentially trying to compress a row of variables into just the non-missing entries. My dataset looks like this

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(q5__1 q5__2 q5__3 q5__4 q5__5 q5__6 q5__7 q5__8 q5__9 q5__10) . 2 . . . . . . 0 0 1 . . . . . . . 0 0 . . . . . . . . 0 0 . . . . . . . . 0 0 . . . . . . . . 0 0 . . . . . . . . 0 0 . 2 . . . . . . 0 0 1 . . . . . . . 0 0 . . 3 . . . . . 0 0 . . . 4 . . . . 0 0 . . . . . . 7 . 0 0 end

So I was to gen var_a = the first non missing value from each column, so that would be all the observations basically, 2,1,.,.,.,.,2,1,3,4,7. If there are more, then the next variable var_b would include them, but would be missing otherwise.

Any ideas?

Many thanks!

Chris
Tags: None

Mike Lacy

Join Date: Apr 2014
Posts: 2411

30 Sep 2019, 08:31

I believe this does what you want. It was easier for me in long format, but I suppose there's a wide format approach, too.

Code:

gen int id = _n
reshape long q5__, i(id) j(j)
//
bysort id: gen byte count = sum(!missing(q5__)) if !missing(q5__)
// A standard trick with a denominator that will create missings.
by id: egen var_a = min(q5__/(count ==1))
by id: egen var_b = min(q5__/(count ==2))
// (others if you like)
//
// Back to wide if needed
drop count
reshape wide q5__, i(id) j(j)

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35589

30 Sep 2019, 08:36

Mike is correct. This can be done with a wide layout (following Clyde Schechter I find this term clearer than format, which is mightily overloaded in computing).

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(q5__1 q5__2 q5__3 q5__4 q5__5 q5__6 q5__7 q5__8 q5__9 q5__10)
. 2 . . . . . . 0 0
1 . . . . . . . 0 0
. . . . . . . . 0 0
. . . . . . . . 0 0
. . . . . . . . 0 0
. . . . . . . . 0 0
. 2 . . . . . . 0 0
1 . . . . . . . 0 0
. . 3 . . . . . 0 0
. . . 4 . . . . 0 0
. . . . . . 7 . 0 0
end

gen wanted = .

quietly forval j = 1/10 {
    replace wanted = q5__`j' if wanted == . & q5__`j' < .
}

list wanted, sep(0)

     +--------+
     | wanted |
     |--------|
  1. |      2 |
  2. |      1 |
  3. |      0 |
  4. |      0 |
  5. |      0 |
  6. |      0 |
  7. |      2 |
  8. |      1 |
  9. |      3 |
 10. |      4 |
 11. |      7 |
     +--------+

A more elaborate approach which may not work well if the values are not integers

Code:

gen wanted = . 
gen found = "" 

quietly forval j = 1/10 { 
    replace wanted = q5__`j' if wanted == . & q5__`j' < . 
    replace found = found + " " + strofreal(q5__`j') if q5__`j' < . 
} 

list wanted found, sep(0)

Although the extra variable is string, you can extract using e.g. real(word(found, 2))

Last edited by Nick Cox; 30 Sep 2019, 08:41.

Comment

Chrisk Burningham

Join Date: Oct 2016

Posts: 35
#4

01 Oct 2019, 00:04

Fantastic, thats perfect - thank you both!!
Best,

Chris
Comment

Announcement