Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to replace first non-missing observation?

    Hello everyone,

    I would like to replace the first non-missing value of each ID (company) in a panel dataset and change it from whatever value it has to "missing."

    This is the initial code:

    Code:
     company year value
    1 2000 .
    1 2001 .
    1 2002 10
    1 2003 11
    1 2004 15
    2 2000 .
    2 2001 7
    2 2002 9
    2 2003 6
    2 2004 12
    3 2000 20
    3 2001 18
    3 2002 17
    3 2003 15
    3 2004 19
    and this is what I need:

    Code:
     company year value
    1 2000 .
    1 2001 .
    1 2002 .
    1 2003 11
    1 2004 15
    2 2000 .
    2 2001 .
    2 2002 9
    2 2003 6
    2 2004 12
    3 2000 .
    3 2001 18
    3 2002 17
    3 2003 15
    3 2004 19
    My problem is I don't know how to identify first non-missing value. Is it possible?

    Thank you very much!
    Last edited by Jiri Havel; 04 May 2022, 07:56.

  • #2
    Summing with conditions will do the trick.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte company int year byte value
    1 2000  .
    1 2001  .
    1 2002 10
    1 2003 11
    1 2004 15
    2 2000  .
    2 2001  7
    2 2002  9
    2 2003  6
    2 2004 12
    3 2000 20
    3 2001 18
    3 2002 17
    3 2003 15
    3 2004 19
    end
    
    bys company (year): gen firstnonmissing= sum(sum(!missing(value)))==1
    Res.:

    Code:
    . l, sepby(company)
    
         +-----------------------------------+
         | company   year   value   firstn~g |
         |-----------------------------------|
      1. |       1   2000       .          0 |
      2. |       1   2001       .          0 |
      3. |       1   2002      10          1 |
      4. |       1   2003      11          0 |
      5. |       1   2004      15          0 |
         |-----------------------------------|
      6. |       2   2000       .          0 |
      7. |       2   2001       7          1 |
      8. |       2   2002       9          0 |
      9. |       2   2003       6          0 |
     10. |       2   2004      12          0 |
         |-----------------------------------|
     11. |       3   2000      20          1 |
     12. |       3   2001      18          0 |
     13. |       3   2002      17          0 |
     14. |       3   2003      15          0 |
     15. |       3   2004      19          0 |
         +-----------------------------------+

    Comment


    • #3
      See also https://www.stata.com/support/faqs/d...t-occurrences/

      Here is another way to do it. First we focus on finding the first date for each company when there was a non-missing value. That gives us which values to replace. I clone the variable because if you get this wrong it's hard to reverse.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte company int year byte value
      1 2000  .
      1 2001  .
      1 2002 10
      1 2003 11
      1 2004 15
      2 2000  .
      2 2001  7
      2 2002  9
      2 2003  6
      2 2004 12
      3 2000 20
      3 2001 18
      3 2002 17
      3 2003 15
      3 2004 19
      end
      
      egen whenfirst = min(cond(!missing(value), year, .)), by(company)
      
      gen value2 = cond(year == whenfirst, ., value)
      
      list, sepby(company) 
      
      
           +--------------------------------------------+
           | company   year   value   whenfi~t   value2 |
           |--------------------------------------------|
        1. |       1   2000       .       2002        . |
        2. |       1   2001       .       2002        . |
        3. |       1   2002      10       2002        . |
        4. |       1   2003      11       2002       11 |
        5. |       1   2004      15       2002       15 |
           |--------------------------------------------|
        6. |       2   2000       .       2001        . |
        7. |       2   2001       7       2001        . |
        8. |       2   2002       9       2001        9 |
        9. |       2   2003       6       2001        6 |
       10. |       2   2004      12       2001       12 |
           |--------------------------------------------|
       11. |       3   2000      20       2000        . |
       12. |       3   2001      18       2000       18 |
       13. |       3   2002      17       2000       17 |
       14. |       3   2003      15       2000       15 |
       15. |       3   2004      19       2000       19 |
           +--------------------------------------------+
      In turn, for how to get the first year with non-missing value, see Section 9 in https://www.stata-journal.com/articl...article=dm0055

      Comment


      • #4
        This is great.
        Thanks so much to both of you!

        Comment

        Working...
        X