Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help in running code on successive years stored in separate files

    Hello everyone,

    I am using Stata/MP 13. I have 4 datasets on firm daily stock prices for two stock exchanges (BSE & NSE, India), and market index values - all over the same time period of 5 years.
    Since the data on firm prices was too large, the database I used required me to download them year-wise. Similar for one of the two market indices, due to website settings.

    What I need to do, is:
    1. Clean each file for each dataset (firm price, market index price) for each year, then append all years
    2. Generate a date function, merge all the consolidated datasets for the dates that are commonly present across all sets
    3. Run other tests

    I have more or less figured out the code to do all 3 steps, thanks in no small measure to statalist posts. But the code for each year is long, so it is tiresome to do it for each year individually. And I may need to do the exercise again using other variables. Each file is named in the following format - "filepath\firm price_13-14.csv" (or .dta or .txt, based on source format, but uniform for each dataset), followed by 14-15, 15-16, etc. A sample of my code is as follows, I have chosen the cleanest dataset for brevity of illustration:

    Code:
    clear all
    set more off
    use "filepath\prices 13-14.dta", clear
    
    sort cmiecode
    reshape long AdjCP_, i(cmiecode) j(dt) string
    rename AdjCP_ adjcp
    destring adjcp, replace
    
    gen dt2=subinstr(dt,"_","",.)
    gen date = date(dt2, "DMY",2050)
    format %td date
    drop dt dt2
    order cmiecode date
    drop if adjcp==.
    sort cmiecode date
    
    save "filepath\prices_BSE.dta", replace
    
    clear all
    set more off
    use "filepath\prices 14-15.dta", clear
    
    sort cmiecode
    reshape long AdjCP_, i(cmiecode) j(dt) string
    rename AdjCP_ adjcp
    destring adjcp, replace
    
    gen dt2=subinstr(dt,"_","",.)
    gen date = date(dt2, "DMY",2050)
    format %td date
    drop dt dt2
    order cmiecode date
    drop if adjcp==.
    sort cmiecode date
    
    append using "filepath\prices_BSE.dta"
    So I was wondering if there was a way to write a loop to call on each file, run the code, append it to the master dataset, and then run again for the next successive year.

    Then I can merge the datasets at the end.

    Thanking you in advance for your help. If I can make any part of my question clearer I would be grateful for feedback. I did try searching through statalist for existing posts on this, but couldn't find something that exactly seemed to answer my question - if there is something I have missed, I would be grateful for the reference.

  • #2
    I'm not sure I understand your request. But if this isn't what you want, it should point you in the right direction. You can simply loop over the code you've written. I assume the years you are dealing with start at 13-14 and end with 20-21 in the following code. (Modify accordingly if that's not right.)

    Code:
    clear
    save "filepath/prices_BSE.dta", replace
    
    forvalues y = 13/20 {
        local yy = `y'+1
        clear all
        set more off
        use "filepath\prices `y'-`yy'.dta", clear
    
        sort cmiecode
        reshape long AdjCP_, i(cmiecode) j(dt) string
        rename AdjCP_ adjcp
        destring adjcp, replace
    
        gen dt2=subinstr(dt,"_","",.)
        gen date = date(dt2, "DMY",2050)
        format %td date
        drop dt dt2
        order cmiecode date
        drop if adjcp==.
        sort cmiecode date
        
        gen byte source = `y'
        
        append using "filepath\prices_BSE.dta"
        save "filepath\prices_BSE.dta", replace
    }
    
    sort `y', stable
    save "filepath\prices_BSE.dta", replace

    Comment


    • #3
      Thank you so much Clyde! With some modification it was exactly what I needed. I am including the modified code I ran below:

      Code:
      clear all
      set more off
      use "filepath\prices_BSE.dta"
      clear all
      save "filepath\prices_BSE.dta", replace emptyok
      
      **extra lines added above in case I have to re-run the code after creating the file already (and if not created, should return an error to alert)
      
      forvalues y = 13/20 {
          local yy = `y'+1
          clear all
          set more off
          use "filepath\prices `y'-`yy'.dta", clear
      
          sort cmiecode
          reshape long AdjCP_, i(cmiecode) j(dt) string
          rename AdjCP_ adjcp
          destring adjcp, replace
      
          gen dt2=subinstr(dt,"_","",.)
          gen date = date(dt2, "DMY",2050)
          format %td date
          drop dt dt2
          order cmiecode date
          drop if adjcp==.
          sort cmiecode date
          
          gen byte source = `y'
          
          append using "filepath\prices_BSE.dta"
          save "filepath\prices_BSE.dta", replace
          
          }
      
      sort cmiecode date
      save "filepath\prices_BSE.dta", replace
      The line
      Code:
      sort `y', stable
      was not running for some reason, so I replaced it with the variable names.

      Thank you once again! This cleared paragraphs of repetitive code.

      Comment


      • #4
        Yes, sorry about -sort `y', stable-. The local macro y is no longer defined outside the -forvalues- loop. What I meant to say is -sort source, stable-. But I see you have a preferred sort order of cmiecode date--so that's fine and just forgetting about my -sort, stable- command is fine.

        Comment


        • #5
          Understood. Many thanks once again!

          Comment

          Working...
          X