Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Carry forward w/ replacement if following value is less than preceding value

    Hello,

    I am working with a large panel dataset (NLSY79 if that means anything to you) and I am trying to clean up a variable capturing the highest grade of education completed by the individual (HGC). Below is a table where I provide an example of the variable I am working with, HGC, and a variable to illustrate the problem that occurs when I carry forward missing values: HGC_Carried.

    However, the problem I am running into is highlighted in red for ID #2. When I carry forward the values, ID 2 ends up with HGC of 13 for years 2005-2009, but then switches to 12 for 2010-2015. I need a way to code this such that the highest value of the variable will carry forward and replace any lesser values that occur after it. Correctly coded would show HGC =13 for years of ID# 2.

    I hope that makes sense. Please let me know if you have any questions and thank you for your time and assistance.

    Joe

    ID Year HGC HGC_Carried
    1 2005 12 12
    1 2006 12 12
    1 2007 12 12
    1 2008 . 12
    1 2009 . 12
    1 2010 14 14
    1 2011 15 15
    1 2012 . 15
    1 2013 16 16
    1 2014 . 16
    1 2015 . 16
    2 2005 . 13
    2 2006 13 13
    2 2007 13 13
    2 2008 13 13
    2 2009 . 12
    2 2010 12 12
    2 2011 12 12
    2 2012 . 12
    2 2013 . 12
    2 2014 . 12
    2 2015 . 12
    3 2005 11 11
    3 2006 12 12
    3 2007 13 13
    3 2008 14 14
    3 2009 . 14
    3 2010 . 14
    3 2011 17 17
    3 2012 18 18
    3 2013 19 19
    3 2014 . 19
    3 2015 21 20

  • #2
    I think what you want is:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte id int year byte hgc
    1 2005 12
    1 2006 12
    1 2007 12
    1 2008  .
    1 2009  .
    1 2010 14
    1 2011 15
    1 2012  .
    1 2013 16
    1 2014  .
    1 2015  .
    2 2005  .
    2 2006 13
    2 2007 13
    2 2008 13
    2 2009  .
    2 2010 12
    2 2011 12
    2 2012  .
    2 2013  .
    2 2014  .
    2 2015  .
    3 2005 11
    3 2006 12
    3 2007 13
    3 2008 14
    3 2009  .
    3 2010  .
    3 2011 17
    3 2012 18
    3 2013 19
    3 2014  .
    3 2015 21
    end
    
    by id (year), sort: gen hgc_carried = hgc if _n == 1
    by id: replace hgc_carried = max(hgc, hgc_carried[_n-1]) if _n > 1
    Note: I do not understand where the hgc_carried = 13 for id 2 in 2005 in your example comes from. That is the very first observation for id 2, and the value of hgc itself is missing, so hgc_carried should also be missing. My code produces a missing value in that result.

    In the future, when posting example data please use the -dataex- command. Run -ssc install dataex- to get it, and then run -help dataex- to read the simple instructions for using it. By using -dataex- you will enable those who want to help you to create a complete and faithful replica of your Stata example with a simple copy/paste operation.

    Comment


    • #3
      I see no explanation here of the code you used, nor of how "carrying forward" filled in the value for identifier 2 in 2005.

      Please do read and act on FAQ Advice #12 as requested. That would lead you not only to showing code but also to using dataex (SSC) and CODE delimiters to show your data example.

      Getting the maximum so far (the "record" for whenever higher is better) is documented at
      https://www.stata.com/support/faqs/d...m-of-sequence/

      This example may help, but the code doesn't backfill. There are other ways to do that.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte id int year byte(hgc hgc_carried)
      1 2005 12 12
      1 2006 12 12
      1 2007 12 12
      1 2008  . 12
      1 2009  . 12
      1 2010 14 14
      1 2011 15 15
      1 2012  . 15
      1 2013 16 16
      1 2014  . 16
      1 2015  . 16
      2 2005  . 13
      2 2006 13 13
      2 2007 13 13
      2 2008 13 13
      2 2009  . 12
      2 2010 12 12
      2 2011 12 12
      2 2012  . 12
      2 2013  . 12
      2 2014  . 12
      2 2015  . 12
      3 2005 11 11
      3 2006 12 12
      3 2007 13 13
      3 2008 14 14
      3 2009  . 14
      3 2010  . 14
      3 2011 17 17
      3 2012 18 18
      3 2013 19 19
      3 2014  . 19
      3 2015 21 20
      end
      
      bysort id (year) : gen wanted = hgc[1]
      by id: replace wanted = max(wanted[_n-1], hgc) if _n > 2
      list, sepby(id) 
      
           +-------------------------------------+
           | id   year   hgc   hgc_ca~d   wanted |
           |-------------------------------------|
        1. |  1   2005    12         12       12 |
        2. |  1   2006    12         12       12 |
        3. |  1   2007    12         12       12 |
        4. |  1   2008     .         12       12 |
        5. |  1   2009     .         12       12 |
        6. |  1   2010    14         14       14 |
        7. |  1   2011    15         15       15 |
        8. |  1   2012     .         15       15 |
        9. |  1   2013    16         16       16 |
       10. |  1   2014     .         16       16 |
       11. |  1   2015     .         16       16 |
           |-------------------------------------|
       12. |  2   2005     .         13        . |
       13. |  2   2006    13         13        . |
       14. |  2   2007    13         13       13 |
       15. |  2   2008    13         13       13 |
       16. |  2   2009     .         12       13 |
       17. |  2   2010    12         12       13 |
       18. |  2   2011    12         12       13 |
       19. |  2   2012     .         12       13 |
       20. |  2   2013     .         12       13 |
       21. |  2   2014     .         12       13 |
       22. |  2   2015     .         12       13 |
           |-------------------------------------|
       23. |  3   2005    11         11       11 |
       24. |  3   2006    12         12       11 |
       25. |  3   2007    13         13       13 |
       26. |  3   2008    14         14       14 |
       27. |  3   2009     .         14       14 |
       28. |  3   2010     .         14       14 |
       29. |  3   2011    17         17       17 |
       30. |  3   2012    18         18       18 |
       31. |  3   2013    19         19       19 |
       32. |  3   2014     .         19       19 |
       33. |  3   2015    21         20       21 |
           +-------------------------------------+

      Comment

      Working...
      X