Finding the most recent value from previous year that satisfies a condition

Maria-Ana Vitorino

Join Date: Jun 2014

Posts: 20
#1

Finding the most recent value from previous year that satisfies a condition

08 Nov 2018, 03:00

Dear Statalist users,

I have the following data:

id year date value
1 2005 1-Feb-06 5
1 2005 21-Feb-06 10
1 2005 19-Feb-07 20
1 2006 11-Jan-07 30
1 2006 30-Mar-07 35
1 2007 11-Jan-07 40
2 2005 17-Feb-06 50
2 2006 18-Jun-11 60
2 2006 18-Jun-12 70
...
(I have code below for getting this data into Stata)

My problem is the following:
I would like to add two columns to this dataset.
For each row, I need the value for the previous year (with the same id) with the most recent date that is prior to that row's date, and the date of that value. So, for the example above, the resulting dataset would be:

id year date value valueB dateofvalueB
1 2005 1-Feb-06 5 . .
1 2005 21-Feb-06 10 . .
1 2005 19-Feb-07 20 . .
1 2006 11-Jan-07 30 10 21-Feb-06
1 2006 30-Mar-07 35 20 19-Feb-07
1 2007 11-Jan-07 40 30 11-Jan-07
2 2005 17-Feb-06 50 . .
2 2006 18-Jun-11 60 50 17-Feb-06
2 2006 18-Jun-12 70 50 17-Feb-06
...

I an wondering if this can be done without a loop and if not what would be an efficient way of doing this.

Thanks in advance for your help!

Ana

==============

input id year str20 date value
1 2005 "1-Feb-06" 5
1 2005 "21-Feb-06" 10
1 2005 "19-Feb-07" 20
1 2006 "11-Jan-07" 30
1 2006 "30-Mar-07" 35
1 2007 "11-Jan-07" 40
2 2005 "17-Feb-06" 50
2 2006 "18-Jun-11" 60
2 2006 "18-Jun-12" 70
end
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10223

08 Nov 2018, 07:45

There may be a simpler method, but haven't given it enough attention to find it.

Code:

clear
input id year str20 date value
1 2005 "1-Feb-06" 5
1 2005 "21-Feb-06" 10
1 2005 "19-Feb-07" 20 
1 2006 "11-Jan-07" 30
1 2006 "30-Mar-07" 35 
1 2007 "11-Jan-07" 40
2 2005 "17-Feb-06" 50
2 2006 "18-Jun-11" 60 
2 2006 "18-Jun-12" 70
end

gen dt = subinstr(lower(date), "-", " ", .)
gen date2 = date(dt, "DM20Y")
drop date dt
format date2 %td
tempfile data
save `data'
replace year= year+1
rename (date2 value) (date3 value2)
joinby id year using `data'
bys id year value: gen dist= date2-date3 if date2>=date3
bys id year value (dist): keep if _n==1
merge 1:1 id year value date2 using `data'
drop _merge dist
order id year date2 value value2 date3
sort id year date2
list, clean

The result

Code:

 
. list, clean

       id   year       date2   value   value2       date3  
  1.    1   2005   01feb2006       5        .           .  
  2.    1   2005   21feb2006      10        .           .  
  3.    1   2005   19feb2007      20        .           .  
  4.    1   2006   11jan2007      30       10   21feb2006  
  5.    1   2006   30mar2007      35       20   19feb2007  
  6.    1   2007   11jan2007      40       30   11jan2007  
  7.    2   2005   17feb2006      50        .           .  
  8.    2   2006   18jun2011      60       50   17feb2006  
  9.    2   2006   18jun2012      70       50   17feb2006

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35730

08 Nov 2018, 09:53

I thought it would be of interest to work at this with loops, despite many warnings -- some from myself -- about avoiding explicit loops over observations, it is often easier to read such code.

I can't reproduce the result for id 1 and year 2007. There is no observation for id 1 and year 2006 with a prior date. If as Andrew does guess, I think, "prior" really means for you "prior to or same as" then modify the code accordingly,

Either way, your string dates are not fit for purpose, as Andrew's code also implies.

Code:

clear 
input id year str20 date value
1 2005 "1-Feb-06" 5
1 2005 "21-Feb-06" 10
1 2005 "19-Feb-07" 20 
1 2006 "11-Jan-07" 30
1 2006 "30-Mar-07" 35 
1 2007 "11-Jan-07" 40
2 2005 "17-Feb-06" 50
2 2006 "18-Jun-11" 60 
2 2006 "18-Jun-12" 70
end

gen wanted1 = . 
gen wanted2 = . 
gen ddate = daily(date, "DMY", 2050) 
gen long obs = _n 

quietly forval i = 1/`=_N' { 
    local sameandprev id == id[`i'] & year == year[`i'] - 1
    su ddate if ddate < ddate[`i'] & `sameandprev' , meanonly 
    su obs if ddate == r(max) & `sameandprev', meanonly
    replace wanted1 = ddate[r(min)] in `i' 
    replace wanted2 = value[r(min)] in `i' 
} 

list, sepby(id) 
            
     +-----------------------------------------------------------------+
     | id   year        date   value   ddate   wanted1   wanted2   obs |
     |-----------------------------------------------------------------|
  1. |  1   2005    1-Feb-06       5   16833         .         .     1 |
  2. |  1   2005   21-Feb-06      10   16853         .         .     2 |
  3. |  1   2005   19-Feb-07      20   17216         .         .     3 |
  4. |  1   2006   11-Jan-07      30   17177     16853        10     4 |
  5. |  1   2006   30-Mar-07      35   17255     17216        20     5 |
  6. |  1   2007   11-Jan-07      40   17177         .         .     6 |
     |-----------------------------------------------------------------|
  7. |  2   2005   17-Feb-06      50   16849         .         .     7 |
  8. |  2   2006   18-Jun-11      60   18796     16849        50     8 |
  9. |  2   2006   18-Jun-12      70   19162     16849        50     9 |
     +-----------------------------------------------------------------+

Comment

Maria-Ana Vitorino

Join Date: Jun 2014

Posts: 20
#4

08 Nov 2018, 14:40

Thanks Andrew and Nick.
Both solutions work although they take a while given the size of my data. That is why I was trying to avoid loops but I am not sure there is an easy way out in this case.
Nick: you are right, what I meant was "prior to or same as".
1 like
Comment

Announcement