Hello,
I am working with a large dataset (15GB, 90mio observations) that is really slow to handle. I have thus split it up into four files and performed all my commands separately. However, I'm having issues with the value labels when I append the four files in the end:
My files contain string data (for example on company names), that I encode in each separate file, using
.
Encode numbers the variables in each file from 1:n, so when I append the files, the 1s, 2s,...ns are included in the same value label (which is logical). However, the string values in my files don't overlap, so in file1 newvar with value 1 would be company A while in file2, newvar with value 1 would be company B.
Is there any way for append to recognize that the newvar==1 in file1 is not equal to the newvar==1 in file 2, even though they are stored in the same variable?
What I am doing now is to recode my values in the individual files by adding the number of observations in the previous file, and to then create a new value label in the merged file. This is a little cumbersome however...Leaving the variables as string and encoding in the final dataset is not an option, unfortunately, since it takes >3 hours per variable.
Thank you for your help!
I am working with a large dataset (15GB, 90mio observations) that is really slow to handle. I have thus split it up into four files and performed all my commands separately. However, I'm having issues with the value labels when I append the four files in the end:
My files contain string data (for example on company names), that I encode in each separate file, using
Code:
encode oldvar, generate(newvar)
Encode numbers the variables in each file from 1:n, so when I append the files, the 1s, 2s,...ns are included in the same value label (which is logical). However, the string values in my files don't overlap, so in file1 newvar with value 1 would be company A while in file2, newvar with value 1 would be company B.
Is there any way for append to recognize that the newvar==1 in file1 is not equal to the newvar==1 in file 2, even though they are stored in the same variable?
What I am doing now is to recode my values in the individual files by adding the number of observations in the previous file, and to then create a new value label in the merged file. This is a little cumbersome however...Leaving the variables as string and encoding in the final dataset is not an option, unfortunately, since it takes >3 hours per variable.
Thank you for your help!
Comment