I am quite new to Stata (17) and I have to deal with many datasets originating from Excel. The problem with these datasets is that their column orders is different, although the content of each of them is the same. The cause of this problem is that some datasets have additional variables in the middle of the "normal" order (i.e. the variables common to every dataset) which shifts the order of the Excel file to a few columns throughout the spreadsheet. I can't show my actual data, so I took the time to create a fake example resembling the data salad I have right now. If var1, var2, var3... are the variables that are supposed to be common to every dataset, and btwn1, btwn2... etc. are supposed to be variables specific to one file.
Now this is file1 after it just got imported from Excel -without the option firstrow- :
This is file2 :
I can't just append my datasets like this, because the contents of var2 in file1 would get mixed up with btwn1. And to make things worse, one variable may have several writings across Excel files ! for instance var1 could be "time", "B1. time", or just "B1" etc. This is why I can't use the -firstrow- option. Fortunately, I have a mapping table that tells me the id and the label of each variable.
Now, I am clearly not expecting someone to give me a Stata code that would solve perfectly my problem, because let's be honest, data cleaning is boring... I came on this forum with the hope of finding users more familiar with data cleaning that would give me some guidance on how THEY would approach the problem. Hence the "general advices" in my title.
I can give as many details regarding my data structure as one wants if you need, as long as it's fakeable.
Thank you for your help !
Now this is file1 after it just got imported from Excel -without the option firstrow- :
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str20(A B C D E F) "var1" "var2" "var3" "var4" "var5" "var6" "info" "info" "info" "info" "info" "info" "info" "info" "info" "info" "info" "info" end
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str20(A B C D E F) "var1" "btwn1" "var2" "btwn2" "var3" "var4" "info" "info" "info" "info" "info" "info" "info" "info" "info" "info" "info" "info" end
Now, I am clearly not expecting someone to give me a Stata code that would solve perfectly my problem, because let's be honest, data cleaning is boring... I came on this forum with the hope of finding users more familiar with data cleaning that would give me some guidance on how THEY would approach the problem. Hence the "general advices" in my title.
I can give as many details regarding my data structure as one wants if you need, as long as it's fakeable.
Thank you for your help !

Comment