Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • create new variable taking the last non-zero value by group

    Hi Everyone

    I have a dataset that looks like this

    id date Var
    1 1 2
    1 2 2
    1 3 0
    2 1 4
    2 2 5
    2 3 5
    2 4 0
    2 4 0
    etc

    I would like the zero values in the Var column to be replaced with the last non-zero value by id group, so my data should look like:
    id date Var
    1 1 2
    1 2 2
    1 3 2
    2 1 4
    2 2 5
    2 3 5
    2 4 5
    2 4 5

    Thank you in advance

    Costas

  • #2
    This raises various small questions, including

    1. I strongly advise creating a new variable as overwriting your data runs a risk if you mess it up or change your mind.

    2. Not zero includes negative values (which may be irrelevant) and missing values (ditto). If you have missing values, your code needs to trap them somehow.

    3. As your example includes examples of repeated observations with the same identifier and date, the question arises of what you want done if the last non-zero is observed on the same date as a zero. Indeed, what should happen if different values occur on the last date with any non-zero?

    Otherwise, this may help:

    Code:
    clear 
    input id date Var
    1 1 2
    1 2 2
    1 3 0
    2 1 4
    2 2 5
    2 3 5
    2 4 0
    2 4 0
    end 
    
    egen whenlastNZ = max(cond(Var != 0, date, .)), by(id)
    egen lastNZ = total(Var * (date == whenlastNZ)), by(id)
    
    gen wanted = cond(date > whenlastNZ, lastNZ, Var) 
    
    list, sepby(id)
    
         +----------------------------------------------+
         | id   date   Var   whenla~Z   lastNZ   wanted |
         |----------------------------------------------|
      1. |  1      1     2          2        2        2 |
      2. |  1      2     2          2        2        2 |
      3. |  1      3     0          2        2        2 |
         |----------------------------------------------|
      4. |  2      1     4          3        5        4 |
      5. |  2      2     5          3        5        5 |
      6. |  2      3     5          3        5        5 |
      7. |  2      4     0          3        5        5 |
      8. |  2      4     0          3        5        5 |
         +----------------------------------------------+

    Comment


    • #3
      Dear Nick
      Many thanks for your reply and code.
      On your questions:
      1. yes, i agree that a new variable is much better.
      2. There are no negative or missing values in Var
      3. It was a typo on my part including a repeated observations by id date--there is only one id date pair in my data.

      Again many thanks, your code works fine and does what I need.

      Costas

      Comment

      Working...
      X