I am attempting to append/merge several sets of data. These contain about 30-135 variables, many of which are various ID numbers. Frequently, they are stored as float or long in one dataset, and as string in another. This makes the append process very tedious, as I open each dataset in a different window, type "codebook" and manually go through each shared variable to ensure they are the same variable type.
The best way I can think to speed this up is to "tostring" every variable that is not intended to be a date or amount. I'm not sure if there are unforeseen consequences to doing this. (And I may need to convert dates to strings anyway, as they're saved as strings in one dataset and it seems that destringing them would be more hassle than it may be worth.)
Is there a faster way to identify conflicts in data types before appending/merging, and are there issues with the idea to make all non-amount variables string that I'm not seeing? I am using Stata 18.5.
To add to this, "tostring" doesn't always seem to stick. The example below is copy/pasted from my results window, aside from text in brackets.
Thanks!
The best way I can think to speed this up is to "tostring" every variable that is not intended to be a date or amount. I'm not sure if there are unforeseen consequences to doing this. (And I may need to convert dates to strings anyway, as they're saved as strings in one dataset and it seems that destringing them would be more hassle than it may be worth.)
Is there a faster way to identify conflicts in data types before appending/merging, and are there issues with the idea to make all non-amount variables string that I'm not seeing? I am using Stata 18.5.
To add to this, "tostring" doesn't always seem to stick. The example below is copy/pasted from my results window, aside from text in brackets.
Code:
. append using "[dataset2]" variable [var1] is byte in master but str1 in using data You could specify append's force option to ignore this numeric/string mismatch. The using variable would then be treated as if it contained numeric missing value. r(106); . tostring [var1], replace [var1] already string; no replace . append using "[dataset2]" variable [var1] is byte in master but str1 in using data You could specify append's force option to ignore this numeric/string mismatch. The using variable would then be treated as if it contained numeric missing value. r(106);
Comment