Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Carryforward data for a limited number of rows

    Dear readers,

    I am currently working with a data set and I would like some of the variables to be extended for a limited number of rows. The data for the moment looks like the following:

    Country Year StartingYear EndYear Program Duration Amount
    1 1990 . . . . .
    1 1991 . . . . .
    1 1992 1993 1996 A 3 5000
    1 1993 . . . . .
    1 1994 . . . . .
    1 1995 . . . . .
    1 1996 . . . . .
    2 1990 . . . . .
    2 1991 1991 1995 B 4 6500
    2 1992 . . . . .
    2 1993 . . . . .
    2 1994 . . . . .
    2 1995 . . . . .
    2 1996 . . . . .


    I would like to carry forward the variables StartingYear, EndYear, Program, Duration, Amount so that I will have the information for programs A and B spreading for the whole program duration (3 and 4 in this case), like the following:

    Country Year StartingYear EndYear Program Duration Amount
    1 1990 . . . . .
    1 1991 . . . . .
    1 1992 1993 1996 A 3 5000
    1 1993 1993 1996 A 3 5000
    1 1994 1993 1996 A 3 5000
    1 1995 . . . . .
    1 1996 . . . . .
    2 1990 . . . . .
    2 1991 1991 1995 B 4 6500
    2 1992 1991 1995 B 4 6500
    2 1993 1991 1995 B 4 6500
    2 1994 1991 1995 B 4 6500
    2 1995 . . . . .
    2 1996 . . . . .


    I tried to use the carryforward command by typing:
    by Country(Year): carryforward StartingYear EndYear Program Duration Amount

    which works but carries forward for the whole period (so until 1996) and I couldn't code it such that it carries forward conditional on the duration variable.

    Thank you in advance for your help and advice.

    Best,
    Sandra

  • #2
    -carryforward- is a user-written command from SSC, as you are asked to explain (please see FAQ Advice).
    I don't understand your example or what you want precisely. In particular why are values not copied until the end of a program? But I trust that the following technique can be adapted to your problem.

    Code:
    clear
    set obs 10
    gen id = cond(_n <= 5, 1, 2)
    gen year = cond(id == 1, 1999 + _n, 1994 + _n)
    gen known = 42 if year == 2000
    gen latest = year if known < .
    bysort id (year) : replace latest = latest[_n-1] if latest == .
    
    . l
    
         +----------------------------+
         | id   year   known   latest |
         |----------------------------|
      1. |  1   2000      42     2000 |
      2. |  1   2001       .     2000 |
      3. |  1   2002       .     2000 |
      4. |  1   2003       .     2000 |
      5. |  1   2004       .     2000 |
         |----------------------------|
      6. |  2   2000      42     2000 |
      7. |  2   2001       .     2000 |
      8. |  2   2002       .     2000 |
      9. |  2   2003       .     2000 |
     10. |  2   2004       .     2000 |
         +----------------------------+
    
    . bysort id (year) : replace known = known[_n-1] if known == . & (year - latest) <= 3
    (6 real changes made)
    
    . l
    
         +----------------------------+
         | id   year   known   latest |
         |----------------------------|
      1. |  1   2000      42     2000 |
      2. |  1   2001      42     2000 |
      3. |  1   2002      42     2000 |
      4. |  1   2003      42     2000 |
      5. |  1   2004       .     2000 |
         |----------------------------|
      6. |  2   2000      42     2000 |
      7. |  2   2001      42     2000 |
      8. |  2   2002      42     2000 |
      9. |  2   2003      42     2000 |
     10. |  2   2004       .     2000 |
         +----------------------------+
    In short, carry forward the last date on which you have values, and replace only within a desired period after that date.
    See also http://www.stata-journal.com/article...article=dm0079 in the latest issue of the Stata Journal.

    .



    Last edited by Nick Cox; 23 Apr 2015, 08:13.

    Comment


    • #3
      A two-step solution for your example data, using -carryforward- (from SSC) is:

      Code:
      clear
      set more off
      
      *----- example data -----
      
      input ///
      Country Year StartingYear EndYear str1 Program Duration Amount
      1 1990 . . "" . .
      1 1991 . . "" . .
      1 1992 . . "" . .
      1 1993 1993 1996 "A" 3 5000
      1 1994 . . "" . .
      1 1995 . . "" . .
      1 1996 . . "" . .
      2 1990 . . "" . .
      2 1991 1991 1995 "B" 4 6500
      2 1992 . . "" . .
      2 1993 . . "" . .
      2 1994 . . "" . .
      2 1995 . . "" . .
      2 1996 . . "" . .
      end
      
      list, sepby(Country)
      
      *----- what you want -----
      
      bysort Country (StartingYear) : replace StartingYear = StartingYear[1] ///
          if inrange(Year, StartingYear[1], EndYear[1])
          
      bysort Country (Year): carryforward EndYear Program Duration Amount ///
          if !missing(StartingYear), replace
       
      list, sepby(Country)
      Maybe you can make it a one-step solution using the -dynamic_condition- option, but I wouldn't know; I don't use carryforward often.

      (Like Nick, I don't understand the desired example data you posted. Rather, I tried interpreting your wording.)
      Last edited by Roberto Ferrer; 23 Apr 2015, 08:07.
      You should:

      1. Read the FAQ carefully.

      2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

      3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

      4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

      Comment

      Working...
      X