I have a variable measured at different times and locations, and some events for which I want the most recent observation of this variable:
I want to find, for each event, the most recent measurement in its location at its time:
Event A has missing data because there is no measurement in 2011 or before; B has its value taken from the corresponding row of geo_vars; and C has the most recent measurement to 2015 (from 2014).
In R, I can do this with a "rolling join". The author of the R package with that functionality made it sound routine in time series work, so I figure maybe it is available in Stata already but I just haven't found it...? Anyway, for illustration, here's the example in R:
My backup plan is to take advantage of the fact that my measurements are in years and just `expand` the data to fill years for which I don't have measurements before merging. Annoyingly, I have multiple variables with different years for their most recent observation:
This plan for expanding the data just for merging seems kludgy and will be memory-intensive if my data is measured at the day level or more finely. Thanks for any pointers.
Code:
** geo vars clear input /*start*/ str30 loc int value int year n 3 2013 n 4 2014 m 7 2013 m 8 2014 end save "geo_vars.dta", replace ** events clear input /*start*/ str30 e_id str30 loc int year A n 2011 B m 2013 C m 2015 end save "event.dta", replace
Code:
clear input /*start*/ str30 e_id int value A . B 7 C 8 end
In R, I can do this with a "rolling join". The author of the R package with that functionality made it sound routine in time series work, so I figure maybe it is available in Stata already but I just haven't found it...? Anyway, for illustration, here's the example in R:
Code:
## note: this is R code, not Stata library(data.table) geo_vars = fread("loc value year n 3 2013 n 4 2014 m 7 2013 m 8 2014") event = fread("e_id loc year A n 2011 B m 2013 C m 2015") geo_vars[event, on=c("loc", "year"), roll=TRUE] # result: # loc value year e_id # 1: n NA 2011 A # 2: m 7 2013 B # 3: m 8 2015 C
Code:
clear input /*start*/ str30 loc str30 varname int value int year n inc 1 2011 n inc 2 2012 n pop 3 2013 n pop 4 2014 m inc 5 2011 m inc 6 2012 m pop 7 2013 m pop 8 2014 end
Comment