Hi everyone,
I would love to receive your advice regarding a strategy to match two datasets.
Specifically I have these sets of datasets:
Given that my final goal is to merge A and B, I'm trying to determine the best way to leverage this linking table. If I perform a 1:m merge from C using A, I'm struggling to find a way to finally merge with B, as I'll have repeated identifiers in the A+C dataset.
Any suggestions?
Thank you!
I would love to receive your advice regarding a strategy to match two datasets.
Specifically I have these sets of datasets:
- Dataset A: Contains observations and information about establishments on a yearly basis, with a variable identifier "X_id" (identifying establishments). It's a panel dataset, but there are multiple instances of "X_id" within each year (i.e., duplicates).
- Dataset B: Contains observations and information (different from Dataset A) about establishments on a yearly basis, with a variable identifier "Y_id" (identifying establishments). It's also a panel dataset with multiple instances of "Y_id" within each year (i.e., duplicates).
- Extracted unique identifiers associated with establishments from each dataset.
- Performed a fuzzy match on establishment names.
Given that my final goal is to merge A and B, I'm trying to determine the best way to leverage this linking table. If I perform a 1:m merge from C using A, I'm struggling to find a way to finally merge with B, as I'll have repeated identifiers in the A+C dataset.
Any suggestions?
Thank you!
Comment