Hello everyone,
I am using Stata/MP 13. I have 4 datasets on firm daily stock prices for two stock exchanges (BSE & NSE, India), and market index values - all over the same time period of 5 years.
Since the data on firm prices was too large, the database I used required me to download them year-wise. Similar for one of the two market indices, due to website settings.
What I need to do, is:
1. Clean each file for each dataset (firm price, market index price) for each year, then append all years
2. Generate a date function, merge all the consolidated datasets for the dates that are commonly present across all sets
3. Run other tests
I have more or less figured out the code to do all 3 steps, thanks in no small measure to statalist posts. But the code for each year is long, so it is tiresome to do it for each year individually. And I may need to do the exercise again using other variables. Each file is named in the following format - "filepath\firm price_13-14.csv" (or .dta or .txt, based on source format, but uniform for each dataset), followed by 14-15, 15-16, etc. A sample of my code is as follows, I have chosen the cleanest dataset for brevity of illustration:
So I was wondering if there was a way to write a loop to call on each file, run the code, append it to the master dataset, and then run again for the next successive year.
Then I can merge the datasets at the end.
Thanking you in advance for your help. If I can make any part of my question clearer I would be grateful for feedback. I did try searching through statalist for existing posts on this, but couldn't find something that exactly seemed to answer my question - if there is something I have missed, I would be grateful for the reference.
I am using Stata/MP 13. I have 4 datasets on firm daily stock prices for two stock exchanges (BSE & NSE, India), and market index values - all over the same time period of 5 years.
Since the data on firm prices was too large, the database I used required me to download them year-wise. Similar for one of the two market indices, due to website settings.
What I need to do, is:
1. Clean each file for each dataset (firm price, market index price) for each year, then append all years
2. Generate a date function, merge all the consolidated datasets for the dates that are commonly present across all sets
3. Run other tests
I have more or less figured out the code to do all 3 steps, thanks in no small measure to statalist posts. But the code for each year is long, so it is tiresome to do it for each year individually. And I may need to do the exercise again using other variables. Each file is named in the following format - "filepath\firm price_13-14.csv" (or .dta or .txt, based on source format, but uniform for each dataset), followed by 14-15, 15-16, etc. A sample of my code is as follows, I have chosen the cleanest dataset for brevity of illustration:
Code:
clear all set more off use "filepath\prices 13-14.dta", clear sort cmiecode reshape long AdjCP_, i(cmiecode) j(dt) string rename AdjCP_ adjcp destring adjcp, replace gen dt2=subinstr(dt,"_","",.) gen date = date(dt2, "DMY",2050) format %td date drop dt dt2 order cmiecode date drop if adjcp==. sort cmiecode date save "filepath\prices_BSE.dta", replace clear all set more off use "filepath\prices 14-15.dta", clear sort cmiecode reshape long AdjCP_, i(cmiecode) j(dt) string rename AdjCP_ adjcp destring adjcp, replace gen dt2=subinstr(dt,"_","",.) gen date = date(dt2, "DMY",2050) format %td date drop dt dt2 order cmiecode date drop if adjcp==. sort cmiecode date append using "filepath\prices_BSE.dta"
Then I can merge the datasets at the end.
Thanking you in advance for your help. If I can make any part of my question clearer I would be grateful for feedback. I did try searching through statalist for existing posts on this, but couldn't find something that exactly seemed to answer my question - if there is something I have missed, I would be grateful for the reference.
Comment