Short version
I want one information and one information only: in each and every variable in the dataset, is there only one non-missing observation, yes or no? Is there a neat way to achieve that?
Long version
I have a mapping table which connects IDs (JANIS_ID and BORID) from two datasets which I want to merge and between those IDs there can be m:n relations. So I have to identify the networks of IDs that belong together. The code below does that by:
What I am ultimately aiming for is a solution in which the number of iterations depends on whether the condition on the missings is fulfilled. I.e. "run the two mergers and duplicates drop commands until there are no variables with more than one non-missing".
I want one information and one information only: in each and every variable in the dataset, is there only one non-missing observation, yes or no? Is there a neat way to achieve that?
Long version
I have a mapping table which connects IDs (JANIS_ID and BORID) from two datasets which I want to merge and between those IDs there can be m:n relations. So I have to identify the networks of IDs that belong together. The code below does that by:
- creating two datasets which show each BORID per JANIS_ID and each JANIS_ID per BORID.
- iteratively merging the datasets from step 1 until all IDs that belong to each other are in one row, so it holds that: row = network
What I am ultimately aiming for is a solution in which the number of iterations depends on whether the condition on the missings is fulfilled. I.e. "run the two mergers and duplicates drop commands until there are no variables with more than one non-missing".
Code:
cd ""
use mapping, clear
gen x = _n
// all JANIS_ID per BORID
preserve
reshape wide JANIS_ID, i(BORID) j(x)
save mapping_1, replace
describe
global n_janisid = r(k) - 1
restore
// all BORID per JANIS_ID
preserve
reshape wide BORID, i(JANIS_ID) j(x)
save mapping_2, replace
describe
global n_borid = r(k) - 1
restore
use mapping_1, clear
drop BORID
local n_janisid = ${n_janisid}
local n_borid = ${n_borid}
// ITERATION 1
forvalues n = 1(1)`n_janisid' {
rename JANIS_ID`n' JANIS_ID
display `n'
qui merge m:1 JANIS_ID using mapping_2, update replace keep(1 3 4 5) nogen
rename JANIS_ID JANIS_ID`n'
}
duplicates drop JANIS_ID* BORID*, force
forvalues n = 1(1)`n_borid' {
rename BORID`n' BORID
display `n'
qui merge m:1 BORID using mapping_1, update replace keep(1 3 4 5) nogen
rename BORID BORID`n'
}
duplicates drop JANIS_ID* BORID*, force
// ITERATION 2
forvalues n = 1(1)`n_janisid' {
rename JANIS_ID`n' JANIS_ID
display `n'
qui merge m:1 JANIS_ID using mapping_2, update replace keep(1 3 4 5) nogen
rename JANIS_ID JANIS_ID`n'
}
duplicates drop JANIS_ID* BORID*, force
forvalues n = 1(1)`n_borid' {
rename BORID`n' BORID
display `n'
qui merge m:1 BORID using mapping_1, update replace keep(1 3 4 5) nogen
rename BORID BORID`n'
}
duplicates drop JANIS_ID* BORID*, force
// ITERATION 3
forvalues n = 1(1)`n_janisid' {
rename JANIS_ID`n' JANIS_ID
display `n'
qui merge m:1 JANIS_ID using mapping_2, update replace keep(1 3 4 5) nogen
rename JANIS_ID JANIS_ID`n'
}
duplicates drop JANIS_ID* BORID*, force
forvalues n = 1(1)`n_borid' {
rename BORID`n' BORID
display `n'
qui merge m:1 BORID using mapping_1, update replace keep(1 3 4 5) nogen
rename BORID BORID`n'
}
duplicates drop JANIS_ID* BORID*, force
// check for missings
missings report
// Reshape to long format
gen NETWID = _n
reshape long JANIS_ID BORID, i(NETWID) j(n)
drop if JANIS_ID==. & BORID==.
drop n

Comment