Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding the most recent value from previous year that satisfies a condition

    Dear Statalist users,

    I have the following data:

    id year date value
    1 2005 1-Feb-06 5
    1 2005 21-Feb-06 10
    1 2005 19-Feb-07 20
    1 2006 11-Jan-07 30
    1 2006 30-Mar-07 35
    1 2007 11-Jan-07 40
    2 2005 17-Feb-06 50
    2 2006 18-Jun-11 60
    2 2006 18-Jun-12 70
    ...
    (I have code below for getting this data into Stata)

    My problem is the following:
    I would like to add two columns to this dataset.
    For each row, I need the value for the previous year (with the same id) with the most recent date that is prior to that row's date, and the date of that value. So, for the example above, the resulting dataset would be:

    id year date value valueB dateofvalueB
    1 2005 1-Feb-06 5 . .
    1 2005 21-Feb-06 10 . .
    1 2005 19-Feb-07 20 . .
    1 2006 11-Jan-07 30 10 21-Feb-06
    1 2006 30-Mar-07 35 20 19-Feb-07
    1 2007 11-Jan-07 40 30 11-Jan-07
    2 2005 17-Feb-06 50 . .
    2 2006 18-Jun-11 60 50 17-Feb-06
    2 2006 18-Jun-12 70 50 17-Feb-06
    ...

    I an wondering if this can be done without a loop and if not what would be an efficient way of doing this.

    Thanks in advance for your help!

    Ana

    ==============

    input id year str20 date value
    1 2005 "1-Feb-06" 5
    1 2005 "21-Feb-06" 10
    1 2005 "19-Feb-07" 20
    1 2006 "11-Jan-07" 30
    1 2006 "30-Mar-07" 35
    1 2007 "11-Jan-07" 40
    2 2005 "17-Feb-06" 50
    2 2006 "18-Jun-11" 60
    2 2006 "18-Jun-12" 70
    end

  • #2
    There may be a simpler method, but haven't given it enough attention to find it.

    Code:
    clear
    input id year str20 date value
    1 2005 "1-Feb-06" 5
    1 2005 "21-Feb-06" 10
    1 2005 "19-Feb-07" 20 
    1 2006 "11-Jan-07" 30
    1 2006 "30-Mar-07" 35 
    1 2007 "11-Jan-07" 40
    2 2005 "17-Feb-06" 50
    2 2006 "18-Jun-11" 60 
    2 2006 "18-Jun-12" 70
    end
    
    gen dt = subinstr(lower(date), "-", " ", .)
    gen date2 = date(dt, "DM20Y")
    drop date dt
    format date2 %td
    tempfile data
    save `data'
    replace year= year+1
    rename (date2 value) (date3 value2)
    joinby id year using `data'
    bys id year value: gen dist= date2-date3 if date2>=date3
    bys id year value (dist): keep if _n==1
    merge 1:1 id year value date2 using `data'
    drop _merge dist
    order id year date2 value value2 date3
    sort id year date2
    list, clean
    The result

    Code:
     
    . list, clean
    
           id   year       date2   value   value2       date3  
      1.    1   2005   01feb2006       5        .           .  
      2.    1   2005   21feb2006      10        .           .  
      3.    1   2005   19feb2007      20        .           .  
      4.    1   2006   11jan2007      30       10   21feb2006  
      5.    1   2006   30mar2007      35       20   19feb2007  
      6.    1   2007   11jan2007      40       30   11jan2007  
      7.    2   2005   17feb2006      50        .           .  
      8.    2   2006   18jun2011      60       50   17feb2006  
      9.    2   2006   18jun2012      70       50   17feb2006

    Comment


    • #3
      I thought it would be of interest to work at this with loops, despite many warnings -- some from myself -- about avoiding explicit loops over observations, it is often easier to read such code.

      I can't reproduce the result for id 1 and year 2007. There is no observation for id 1 and year 2006 with a prior date. If as Andrew does guess, I think, "prior" really means for you "prior to or same as" then modify the code accordingly,

      Either way, your string dates are not fit for purpose, as Andrew's code also implies.

      Code:
      clear 
      input id year str20 date value
      1 2005 "1-Feb-06" 5
      1 2005 "21-Feb-06" 10
      1 2005 "19-Feb-07" 20 
      1 2006 "11-Jan-07" 30
      1 2006 "30-Mar-07" 35 
      1 2007 "11-Jan-07" 40
      2 2005 "17-Feb-06" 50
      2 2006 "18-Jun-11" 60 
      2 2006 "18-Jun-12" 70
      end
      
      gen wanted1 = . 
      gen wanted2 = . 
      gen ddate = daily(date, "DMY", 2050) 
      gen long obs = _n 
      
      quietly forval i = 1/`=_N' { 
          local sameandprev id == id[`i'] & year == year[`i'] - 1
          su ddate if ddate < ddate[`i'] & `sameandprev' , meanonly 
          su obs if ddate == r(max) & `sameandprev', meanonly
          replace wanted1 = ddate[r(min)] in `i' 
          replace wanted2 = value[r(min)] in `i' 
      } 
      
      list, sepby(id) 
                  
           +-----------------------------------------------------------------+
           | id   year        date   value   ddate   wanted1   wanted2   obs |
           |-----------------------------------------------------------------|
        1. |  1   2005    1-Feb-06       5   16833         .         .     1 |
        2. |  1   2005   21-Feb-06      10   16853         .         .     2 |
        3. |  1   2005   19-Feb-07      20   17216         .         .     3 |
        4. |  1   2006   11-Jan-07      30   17177     16853        10     4 |
        5. |  1   2006   30-Mar-07      35   17255     17216        20     5 |
        6. |  1   2007   11-Jan-07      40   17177         .         .     6 |
           |-----------------------------------------------------------------|
        7. |  2   2005   17-Feb-06      50   16849         .         .     7 |
        8. |  2   2006   18-Jun-11      60   18796     16849        50     8 |
        9. |  2   2006   18-Jun-12      70   19162     16849        50     9 |
           +-----------------------------------------------------------------+

      Comment


      • #4
        Thanks Andrew and Nick.
        Both solutions work although they take a while given the size of my data. That is why I was trying to avoid loops but I am not sure there is an easy way out in this case.
        Nick: you are right, what I meant was "prior to or same as".

        Comment

        Working...
        X