Dear Statalisters,
I hope you are fine. I am new to importing datasets from Excel, and I'm sure that with a very long code I could find a solution to my problem but I'd like to learn fast ways to code which is why I am asking for your help. I basically have a dataset from Excel where each column is a day coded either in the 01/20/2011 format, or the 20Jan2011 one (I assume this is due to some mistake in the coding of the raw data). I'd like to import that dataset to Stata without changing anything in my raw dataset in the first place. Ideally, I'd like to have this format:
As you can see, there are three problems in this dataset:
1) First, dates are coded differently as you can see in the variables.
2) For the dates that are coded in the "1/1/2020" format, there are these two annoying spaces that I can't seem to remove
3) Variables names are letters because Stata cannot import variables names that start with a number.
I would be grateful if you could find an efficient way to harmonize the dates name for every label in a way that looks like "Index_(the date in whatever format)" and then to give the variables the name of its label. Thanks a lot for the help!
Regards,
Adam
I hope you are fine. I am new to importing datasets from Excel, and I'm sure that with a very long code I could find a solution to my problem but I'd like to learn fast ways to code which is why I am asking for your help. I basically have a dataset from Excel where each column is a day coded either in the 01/20/2011 format, or the 20Jan2011 one (I assume this is due to some mistake in the coding of the raw data). I'd like to import that dataset to Stata without changing anything in my raw dataset in the first place. Ideally, I'd like to have this format:
Country | Index_01Jan2020 | Index_02Jan2020 | Index_03Jan2020 |
xxxx | data | data | data |
Code:
global path "C:/Users/xxxx/Desktop/xxxx/yyyy" global inp_str "$path/Inputs/Part II/Chap I" import excel "$inp_str/file.xlsx", first clear cellrange(B2:AKL264) * please note that the cellrange option is not the source of my problem. It is simply to avoid importing a line full of "Column 1", "Column 2", etc.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str28 country_name byte(G H I J) str5 AH "Aruba" . . . . "0" "Afghanistan" . . . . "0" "Angola" . . . . "0" end label var country_name "country_name" label var G " 1/1/2020" label var H " 1/2/2020" label var I " 1/3/2020" label var J " 1/4/2020" label var AH "28Jan2020"
1) First, dates are coded differently as you can see in the variables.
2) For the dates that are coded in the "1/1/2020" format, there are these two annoying spaces that I can't seem to remove
3) Variables names are letters because Stata cannot import variables names that start with a number.
I would be grateful if you could find an efficient way to harmonize the dates name for every label in a way that looks like "Index_(the date in whatever format)" and then to give the variables the name of its label. Thanks a lot for the help!
Regards,
Adam
Comment