Hello,
I am appending and merging DHS data in hope of creating an extensive panel data set covering different countries and survey waves.
For an individual country and year this involves creating a unique ID in each dataset, appending individual and men's datasets, and then merging 1:1 on the ID with the HIV dataset.
The steps for each country is the same for each of the 20 countries/ survey years I have. I was hoping to automate this process to save time and avoid human error.
The issue is the dataset names are inconsistent across countries and survey years. For example, one may be:
use "BUIR61FL.DTA", clear
append using "BUMR61FL.DTA"
merge 1:1 id using "BUAR61FL.DTA"
While the other:
use "CMIR4AFL.DTA", clear
append using "CMMR4AFL.DTA"
merge 1:1 id using "CMAR4AFL.DTA"
The first two letters identify the country and are predictable such that I may do I loop for each foreach ff in "BU" "CM" { }.
The 5th and 6th numbers/letters identify the survey year, which is not predictable across countries or survey years.
Would looping be possible? Is there a better command to complete this process? I could rename the files but that would be just as time-consuming.
Thank you for your help and let me know if any clarifications are needed I am not sure how to explain myself.
I am appending and merging DHS data in hope of creating an extensive panel data set covering different countries and survey waves.
For an individual country and year this involves creating a unique ID in each dataset, appending individual and men's datasets, and then merging 1:1 on the ID with the HIV dataset.
The steps for each country is the same for each of the 20 countries/ survey years I have. I was hoping to automate this process to save time and avoid human error.
The issue is the dataset names are inconsistent across countries and survey years. For example, one may be:
use "BUIR61FL.DTA", clear
append using "BUMR61FL.DTA"
merge 1:1 id using "BUAR61FL.DTA"
While the other:
use "CMIR4AFL.DTA", clear
append using "CMMR4AFL.DTA"
merge 1:1 id using "CMAR4AFL.DTA"
The first two letters identify the country and are predictable such that I may do I loop for each foreach ff in "BU" "CM" { }.
The 5th and 6th numbers/letters identify the survey year, which is not predictable across countries or survey years.
Would looping be possible? Is there a better command to complete this process? I could rename the files but that would be just as time-consuming.
Thank you for your help and let me know if any clarifications are needed I am not sure how to explain myself.

Comment