I am trying to combine multiple datasets (around 40) with different number of variables in each. I cannot use merge as there are duplicates in the ID field in each data set.
I am trying to use the append function with the below listed code
However I am facing the problem that each time stata encounters a new variable it creates a new version of the variable. eg., if the variable name was a and it was present it datasets number 2 and 4, I find the final output has variables a_m2 and a_m4 instead of just a.
Is there a simple way to prevent this from happening
I tried adding all variables to all datasets and adding zeros but cannot seem to get all variable names from the different datasets.
clear all
cd C:\hrs2002\stata
! dir *.dct /a-d /b >C:\hrs2002\stata\filelistdct.txt
file open myfile using C:\hrs2002\stata\filelistdct.txt, read
/* extract all the dta files */
file read myfile line
local i= 1
while r(eof)==0 { /* while you're not at the end of the file */
display "`line'"
infile using "`line'"
save c:\hrs2002\data\H002_`i'.DTA
local a " descsave, list(name, clean noobs noheader) "
local i = `i' + 1
file read myfile line
clear
}
file close myfile
cd C:\hrs2002\data
use H002_1, clear
forvalues j = 2/`i'{
append using H002_`j'
}
save wave_002.DTA
save wave_002, replace
I am trying to use the append function with the below listed code
However I am facing the problem that each time stata encounters a new variable it creates a new version of the variable. eg., if the variable name was a and it was present it datasets number 2 and 4, I find the final output has variables a_m2 and a_m4 instead of just a.
Is there a simple way to prevent this from happening
I tried adding all variables to all datasets and adding zeros but cannot seem to get all variable names from the different datasets.
clear all
cd C:\hrs2002\stata
! dir *.dct /a-d /b >C:\hrs2002\stata\filelistdct.txt
file open myfile using C:\hrs2002\stata\filelistdct.txt, read
/* extract all the dta files */
file read myfile line
local i= 1
while r(eof)==0 { /* while you're not at the end of the file */
display "`line'"
infile using "`line'"
save c:\hrs2002\data\H002_`i'.DTA
local a " descsave, list(name, clean noobs noheader) "
local i = `i' + 1
file read myfile line
clear
}
file close myfile
cd C:\hrs2002\data
use H002_1, clear
forvalues j = 2/`i'{
append using H002_`j'
}
save wave_002.DTA
save wave_002, replace
Comment