Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a new variable that equals the Nth non-missing value in a row

    Hi All,

    I am essentially trying to compress a row of variables into just the non-missing entries. My dataset looks like this

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(q5__1 q5__2 q5__3 q5__4 q5__5 q5__6 q5__7 q5__8 q5__9 q5__10)
    . 2 . . . . . . 0 0
    1 . . . . . . . 0 0
    . . . . . . . . 0 0
    . . . . . . . . 0 0
    . . . . . . . . 0 0
    . . . . . . . . 0 0
    . 2 . . . . . . 0 0
    1 . . . . . . . 0 0
    . . 3 . . . . . 0 0
    . . . 4 . . . . 0 0
    . . . . . . 7 . 0 0
    end
    So I was to gen var_a = the first non missing value from each column, so that would be all the observations basically, 2,1,.,.,.,.,2,1,3,4,7. If there are more, then the next variable var_b would include them, but would be missing otherwise.

    Any ideas?

    Many thanks!

    Chris


  • #2
    I believe this does what you want. It was easier for me in long format, but I suppose there's a wide format approach, too.
    Code:
    gen int id = _n
    reshape long q5__, i(id) j(j)
    //
    bysort id: gen byte count = sum(!missing(q5__)) if !missing(q5__)
    // A standard trick with a denominator that will create missings.
    by id: egen var_a = min(q5__/(count ==1))
    by id: egen var_b = min(q5__/(count ==2))
    // (others if you like)
    //
    // Back to wide if needed
    drop count
    reshape wide q5__, i(id) j(j)

    Comment


    • #3
      Mike is correct. This can be done with a wide layout (following Clyde Schechter I find this term clearer than format, which is mightily overloaded in computing).


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte(q5__1 q5__2 q5__3 q5__4 q5__5 q5__6 q5__7 q5__8 q5__9 q5__10)
      . 2 . . . . . . 0 0
      1 . . . . . . . 0 0
      . . . . . . . . 0 0
      . . . . . . . . 0 0
      . . . . . . . . 0 0
      . . . . . . . . 0 0
      . 2 . . . . . . 0 0
      1 . . . . . . . 0 0
      . . 3 . . . . . 0 0
      . . . 4 . . . . 0 0
      . . . . . . 7 . 0 0
      end
      
      gen wanted = .
      
      quietly forval j = 1/10 {
          replace wanted = q5__`j' if wanted == . & q5__`j' < .
      }
      
      list wanted, sep(0)
      
           +--------+
           | wanted |
           |--------|
        1. |      2 |
        2. |      1 |
        3. |      0 |
        4. |      0 |
        5. |      0 |
        6. |      0 |
        7. |      2 |
        8. |      1 |
        9. |      3 |
       10. |      4 |
       11. |      7 |
           +--------+
      A more elaborate approach which may not work well if the values are not integers


      Code:
      gen wanted = . 
      gen found = "" 
      
      quietly forval j = 1/10 { 
          replace wanted = q5__`j' if wanted == . & q5__`j' < . 
          replace found = found + " " + strofreal(q5__`j') if q5__`j' < . 
      } 
      
      list wanted found, sep(0)
      Although the extra variable is string, you can extract using e.g. real(word(found, 2))
      Last edited by Nick Cox; 30 Sep 2019, 08:41.

      Comment


      • #4
        Fantastic, thats perfect - thank you both!!
        Best,

        Chris

        Comment

        Working...
        X