Hi all,
can someone confirm if my code is doing what I think it is? I've tried to run it using little extracts of the data to check and it seems to be working the way I expect it to, but I'm a liitle sceptical of the results I get for my entire dataset.
I have two datasets, A and B. Each contain coordinates for most observations and a unique identifier, id_A or id_B. There is a value for start_date and end_date for most observations in data_B. In data_A I want to create:
- a dummy "active", equal to 1 if in data_B there is an observation within 50km of distance for which start_date <= 2008 <= end_date
- a dummy "inactive", equal to 1 if in data_B there is an observation within 50km of distance for which start_date > 2008
Further, I want to drop all observations in data_A for which end_date < 2008.
I have already been helped in this forum to construct the parts of this code, but I'm unsure if I put it all together correctly.
Thank you very much in advance!
Best regards,
Marco
can someone confirm if my code is doing what I think it is? I've tried to run it using little extracts of the data to check and it seems to be working the way I expect it to, but I'm a liitle sceptical of the results I get for my entire dataset.
I have two datasets, A and B. Each contain coordinates for most observations and a unique identifier, id_A or id_B. There is a value for start_date and end_date for most observations in data_B. In data_A I want to create:
- a dummy "active", equal to 1 if in data_B there is an observation within 50km of distance for which start_date <= 2008 <= end_date
- a dummy "inactive", equal to 1 if in data_B there is an observation within 50km of distance for which start_date > 2008
Further, I want to drop all observations in data_A for which end_date < 2008.
I have already been helped in this forum to construct the parts of this code, but I'm unsure if I put it all together correctly.
Code:
cd "..." * Find all pairs of observations between A and B that are within 50km of each other use "data_A.dta", clear geonear id_A latitude longitude using "data_B.dta", n(id_B latitude longitude) long within(50) near(0) save "geonear.dta", replace * Merge datasets using geonear file use "data_A.dta", clear merge 1:m id_A using "geonear.dta", keep(master match) keepusing(id_B) drop _merge merge m:1 id_B using "data_B", keep(master match) keepusing(start_date end_date) drop _merge Drop all observations with a certain id_A if one of them has end_date < 2008 gen closed=0 replace closed = 1 if !missing(end_date) & end_date < 2008 bysort id_A: egen closed1 = max(closed) drop closed drop if closed1 == 1 drop closed1 * Create dummies gen active1= 0 replace active1 = 1 if start_date <= 2008 & 2008 <= end_date & !missing(start_date) * (I know I could just set the both =2008, but I will want to vary the years) gen inactive1 = 0 replace inactive1 = 1 if start_date > 2008 & !missing(start_date) sort id_A * Make active = 1 if any of active1 equals 1 by id_a: egen active = max(active1) drop active1 by id_A: egen inactive = max(inactive1) drop inactive1 * Keep only one of each id_A bysort RESPNO (active): keep if _n == _N
Best regards,
Marco
Comment