Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find the first occurrence of variable value that follows a specific value in one of the previous variables

    Dear all,

    I am working with a dataset which has 24 float variables named temp1-temp24 (among many other variables). The variables temp1-temp24 have values of 0, 1, 2 or 3 and there is no pattern to their order, aside from that a value of 1 will always have a value of 2 or 3 as one of the subsequent variables.
    I would like help writing code that would allow me to create a new variable (newvar) that contains the value of the first occurrence of 2 or 3 that follows a 1 please (example below)? I would be grateful for all help and advice given.

    Epikey temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8 temp9 temp10 temp11 temp12 temp13 temp14 temp15 temp16 temp17 temp18 temp19 temp20 temp21 temp22 temp23 temp24 Newvar
    1 0 1 2 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2
    2 3 0 0 1 0 0 2 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 2
    3 0 1 0 3 0 2 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3

  • #2
    Code:
    clear 
    input Epikey temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8 temp9 temp10 temp11 temp12 temp13 temp14 temp15 temp16 temp17 temp18 temp19 temp20 temp21 temp22 temp23 temp24 Newvar
    1 0 1 2 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2
    2 3 0 0 1 0 0 2 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 2
    3 0 1 0 3 0 2 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3
    end 
    
    gen seen1 = temp1 == 1 
    gen wanted = . 
    
    quietly forval j = 2/24 { 
        replace wanted = temp`j' if seen1 == 1 & temp`j' > 1 
        replace seen1 = 1 if temp`j' == 1 & wanted == . 
        replace seen1 = 0 if inlist(wanted, 2, 3)
    }
    
    list Newvar wanted
    The small trickery here is setting a flag to 1 if we have seen 1 before now -- and setting it back to 0 when we stop looking because we found 2 or 3.

    There will be other ways to do this.

    Comment


    • #3
      Here's another. We first concatenate the history to a string variable. Then we remove any 0s as irrelevant.

      The main idea is then to look for whether 12 or 13 occurs first. We need to work around the fact that strpos() returns 0 if it doesn't find what it's looking for.


      Code:
      clear 
      input Epikey temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8 temp9 temp10 temp11 temp12 temp13 temp14 temp15 temp16 temp17 temp18 temp19 temp20 temp21 temp22 temp23 temp24 Newvar
      1 0 1 2 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2
      2 3 0 0 1 0 0 2 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 2
      3 0 1 0 3 0 2 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3
      end 
      
      egen concat = concat(temp*)
      replace concat = subinstr(concat, "0", "", .)
      
      gen where12 = strpos(concat, "12") if strpos(concat, "12")
      gen where13 = strpos(concat, "13") if strpos(concat, "13")
      
      gen WANTED = cond(where12 < where13, 2, 3) if min(where12, where13) < . 
      
      l concat WANTED 
      
           +-----------------+
           | concat   WANTED |
           |-----------------|
        1. |    123        2 |
        2. |   3123        2 |
        3. |   1323        3 |
           +-----------------+

      Comment


      • #4
        That's great, thank you!

        Comment


        • #5
          In many ways this problem is easier with a long layout (to use @Clyde Schechter's term; the term format in particular is overloaded).

          So we reshape first and create a sequence (time?) variable.

          Then we find when the first 1 occurred.

          Then we find when the first 2 occurred after that first 1 -- and the first 3 occurred after that first 1.

          Whether it's guaranteed in your data that a 2 or a 3 will always follow 1, the code allows for absences too.

          See https://journals.sagepub.com/doi/pdf...867X1101100210 esp. Section 9

          and

          https://www.stata.com/support/faqs/d...t-occurrences/

          Code:
          clear
          input Epikey temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8 temp9 temp10 temp11 temp12 temp13 temp14 temp15 temp16 temp17 temp18 temp19 temp20 temp21 temp22 temp23 temp24 Newvar
          1 0 1 2 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2
          2 3 0 0 1 0 0 2 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 2
          3 0 1 0 3 0 2 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3
          end
          
          reshape long temp, i(Epikey) j(seq)
          
          egen first1 = min(cond(temp == 1, seq, .)), by(Epikey) 
          
          forval j = 2/3 {
              egen first`j' = min(cond(temp == `j' & seq > first1, seq, .)), by(Epikey)
          }
          
          * the only way they can be equal is that both are missing
          gen wanted = cond(first2 <  first3, 2, cond(first3 < first2, 3, .))
          
          list if temp , sepby(Epikey)
          
          assert wanted == Newvar
          
               +------------------------------------------------------------------+
               | Epikey   seq   temp   Newvar   first1   first2   first3   wanted |
               |------------------------------------------------------------------|
            2. |      1     2      1        2        2        3        5        2 |
            3. |      1     3      2        2        2        3        5        2 |
            5. |      1     5      3        2        2        3        5        2 |
               |------------------------------------------------------------------|
           25. |      2     1      3        2        2        7       11        2 |
           28. |      2     4      1        2        2        7       11        2 |
           31. |      2     7      2        2        2        7       11        2 |
           35. |      2    11      3        2        2        7       11        2 |
               |------------------------------------------------------------------|
           50. |      3     2      1        3        2        6        4        3 |
           52. |      3     4      3        3        2        6        4        3 |
           54. |      3     6      2        3        2        6        4        3 |
           58. |      3    10      3        3        2        6        4        3 |
               +------------------------------------------------------------------+
          Last edited by Nick Cox; 05 Sep 2024, 03:31.

          Comment

          Working...
          X