Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing values in subsequent rows based on another variable

    I have a data set that has a variable "Injury" is an indicator of whether an individual sustained an injury in a particular month. The variable "InjuryLength" is how many months the individual was limited due to that injury. Sample below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(ID Month Injury InjuryLength)
    1  1 0 .
    1  2 0 .
    1  3 1 6
    1  4 0 .
    1  5 0 .
    1  6 0 .
    1  7 0 .
    1  8 0 .
    1  9 0 .
    1 10 0 .
    1 11 0 .
    1 12 0 .
    2  1 0 .
    2  2 0 .
    2  3 0 .
    2  4 0 .
    2  5 0 .
    2  6 1 3
    2  7 0 .
    2  8 0 .
    2  9 0 .
    2 10 0 .
    2 11 0 .
    2 12 0 .
    end
    I want to transform the variable "Injury" to equal 1 for each month the individual is limited, based on the value in the variable "InjuryLength." So the resulting dataset would look as follows:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(ID Month Injury InjuryLength)
    1  1 0 .
    1  2 0 .
    1  3 1 6
    1  4 1 .
    1  5 1 .
    1  6 1 .
    1  7 1 .
    1  8 1 .
    1  9 0 .
    1 10 0 .
    1 11 0 .
    1 12 0 .
    2  1 0 .
    2  2 0 .
    2  3 0 .
    2  4 0 .
    2  5 0 .
    2  6 1 3
    2  7 1 .
    2  8 1 .
    2  9 0 .
    2 10 0 .
    2 11 0 .
    2 12 0 .
    end
    Any thoughts would be greatly appreciated.


  • #2
    The following will work correctly if the full data set resembles your example:

    Code:
    assert (Injury == 1) == !missing(InjuryLength)
    by ID (Month), sort: gen spell_num = sum(Injury)
    by ID spell_num (Month), sort: replace Injury = Injury[1] if _n <= InjuryLength[1]
    However, it may produce incorrect results if there are any IDs who experience a new injury while still remaining limited by a previous injury. The code for handling that kind of situation would be more complicated.

    Added: Here is code that will work even if there are some IDs experiencing a new injury while still limited by a previous one.
    Code:
    assert (Injury == 1) == !missing(InjuryLength)
    
    frame put ID Month InjuryLength if Injury, into(working)
    frame working {
        gen `c(obs_t)' seq = _n
        expand InjuryLength
        by seq, sort: replace Month = Month[1] + _n - 1
        sort ID Month seq
        keep ID Month
        duplicates drop
    }
    
    frlink 1:1 ID Month, frame(working)
    replace Injury = 1 if !missing(working)
    Last edited by Clyde Schechter; 11 Jul 2025, 11:48.

    Comment


    • #3
      Here is some alternative code that should also work. I have modified the original data example to include the complications discussed in #2. I also show that Clyde's and my results are the same.

      Code:
      clear all
      input byte(ID Month Injury InjuryLength)
      1  1 0 .
      1  2 0 .
      1  3 1 6
      1  4 0 .
      1  5 0 .
      1  6 0 .
      1  7 0 .
      1  8 1 3
      1  9 0 .
      1 10 0 .
      1 11 0 .
      1 12 0 .
      2  1 0 .
      2  2 1 2
      2  3 0 .
      2  4 0 .
      2  5 0 .
      2  6 1 7
      2  7 1 4
      2  8 0 .
      2  9 0 .
      2 10 0 .
      2 11 0 .
      2 12 0 .
      end
      
      assert (Injury == 1) == !missing(InjuryLength)
      
      *** MY CODE
      
      sort ID Month
      by ID: gen start = cond(missing(InjuryLength), 0 , _n)
      by ID: gen end = cond(missing(InjuryLength), 0 , _n + InjuryLength - 1)
      
      by ID: replace start = start[_n-1] if end == 0
      by ID: replace end = end[_n-1] if end == 0
      
      by ID: replace start = start[_n-1] if start != start[_n-1] & start <= end[_n-1] & !missing(start) & !missing(start[_n-1])
      by ID: replace end = end[_n-1] if end < end[_n-1] & !missing(end) & !missing(end[_n-1])
      
      by ID: gen byte is_limited = inrange(_n, start, end) & !missing(start)
      
      drop start end
      
      *** CLYDE'S CODE
      
      frame put ID Month InjuryLength if Injury, into(working)
      frame working {
          gen `c(obs_t)' seq = _n
          expand InjuryLength
          by seq, sort: replace Month = Month[1] + _n - 1
          sort ID Month seq
          keep ID Month
          duplicates drop
      }
      
      frlink 1:1 ID Month, frame(working)
      gen wanted = !missing(working)
      drop working
      
      *** check both methods produce the same result
      assert is_limited == wanted
      The results are:
      Code:
      . list, noobs sepby(ID) abbrev(10)
      
        +--------------------------------------------------------+
        | ID   Month   Injury   InjuryLe~h   is_limited   wanted |
        |--------------------------------------------------------|
        |  1       1        0            .            0        0 |
        |  1       2        0            .            0        0 |
        |  1       3        1            6            1        1 |
        |  1       4        0            .            1        1 |
        |  1       5        0            .            1        1 |
        |  1       6        0            .            1        1 |
        |  1       7        0            .            1        1 |
        |  1       8        1            3            1        1 |
        |  1       9        0            .            1        1 |
        |  1      10        0            .            1        1 |
        |  1      11        0            .            0        0 |
        |  1      12        0            .            0        0 |
        |--------------------------------------------------------|
        |  2       1        0            .            0        0 |
        |  2       2        1            2            1        1 |
        |  2       3        0            .            1        1 |
        |  2       4        0            .            0        0 |
        |  2       5        1            7            1        1 |
        |  2       6        1            4            1        1 |
        |  2       7        0            .            1        1 |
        |  2       8        0            .            1        1 |
        |  2       9        0            .            1        1 |
        |  2      10        0            .            1        1 |
        |  2      11        0            .            1        1 |
        |  2      12        0            .            0        0 |
        +--------------------------------------------------------+
      Last edited by Hemanshu Kumar; 11 Jul 2025, 15:42.

      Comment


      • #4
        Thank you so much Clyde Schechter and Hemanshu Kumar, this worked perfectly and I learned a few new tricks today! I appreciate you both taking the time to help.

        Comment

        Working...
        X