Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Code:
    local todrop
    empties (or equivalently deletes) the local macro todrop

    Comment


    • #32
      Have you used compress on your data? See https://www.stata.com/help.cgi?compress

      This mainly helps you to reduce the size of the data set on disk, but that also reduces the size of the data Stata holds in working memory. Stata's tool for importing data usually imports the data in the most efficient form, so you might not gain that much depending on how your .dta file was created. I am not sure if it is the case, but optimized variable types should be faster to process, but I do not understand the low level implementation of Stata to know how much of a difference that makes.

      You do not want to run compress every time you run your file as compress can be slow, but you can run the code below in a separate do-file once and never have to do it again (unless the original data is updated). Compress tells you in the end if you saved data size.You can also save it under a different name, but you might not want to have two copies of this large data set. You never loose any information with compress so it should always be fine to do this:

      Code:
      use TF_Short_10Y_5BP_US_end.dta
      compress
      save TF_Short_10Y_5BP_US_end.dta, replace
      You can also run this after you have generated the new variables. When you generate new variables without specifying variable type, Stata selects the one with highest precision but that means least efficient. You can run compress after you have generated the new variables. If your code later requires a variable to have more precision to not loose information, then Stata will change the variable for you. If this is required often, then you might loose waste run time doing that.

      Finally, compress the data set before saving it. This is always a good practice before saving big data sets.

      I can't guarantee that this will speed up things a lot, but you should probably at least get some disk space gains by using this.
      Last edited by Kristoffer Bjarkefur; 29 May 2018, 02:44.

      Comment


      • #33
        With the additional guidance from #30, I created the following demonstration dataset with 2 trades using the following code:
        Code:
        clear all
        set seed 431
        set maxvar 10000
        
        input float(Tradenumb start_date duration)
         44 14444  444
        187 14714 1621
        end
        format %tdDayDDmonCCYY start_date
        
        expand duration
        bysort Tradenumb : gen Date = start_date + _n - 1
        format %tdDayDDmonCCYY Date
        drop if inlist(dow(Date), 0, 6)  // Sunday and Saturday    
        drop start_date duration
        
        isid Tradenumb Date, sort
        
        foreach v in Treasury Swap LIBOR LIBOR_discount Repo {
            gen `v' = runiform()
        }
        
        forvalues i = 0/3600 {
            gen DT__`i' = 1 / (1 + runiform(.055,.065)/360)^`i'
            gen DS__`i' = 1 / (1 + runiform(.065,.075)/360)^`i'
        }
        save "test_data.dta", replace
        In principle, in order to get present values, all that is needed are the amounts to be discounted and the discount factors. The latter seem to have been pre-computed so I'll use these DT_* and DS_* variables but all this would be a piece of cake if the formula for creating these discount factors was known. I ran the code from #1, with the following modifications (the offset used in #1 is 1960 when it should be 1980)
        Code:
        replace Treasury_coupon_11_date = trade_start_date + 1980 if Tradenumb == `i'
        replace Swap_coupon_11_date = trade_start_date + 1980 if Tradenumb == `i'
        and saved the results in "slow_code.dta".

        Here's how I would replicate these results using much simpler and faster code. Again, this would go much faster if I knew how to compute directly the desired discount factor.

        Code:
        use "test_data.dta", clear
        
        * verify assumptions about the data
        isid Tradenumb Date, sort
        
        * the day of each observation relative to the first trade date
        by Tradenumb: gen tday = Date - Date[1]
        
        by Tradenumb: gen LIBOR_coupon_date = Date[1] + 90 * (ceil(tday/90))
        by Tradenumb: gen REPO_reset_date = Date[_n-1]
        format %td LIBOR_coupon_date REPO_reset_date
        
        * copy relevant DT_* DS_* values at half-year coupon dates
        qui forvalues i = 1/20 {
            local periods = `i' * 180
            by Tradenumb: gen target_`i' = `periods' - tday
            gen Treasury_disc_`i' = 1
            gen Swap_disc_`i'     = 1
            forvalues j = 1/`periods' {
                replace Treasury_disc_`i' = DT__`j' if target_`i' == `j'
                replace Swap_disc_`i'     = DS__`j' if target_`i' == `j'
            }
        }
        
        drop DT_* DS_*
        
        save "faster.dta", replace
        
        ds target_* tday, not
         
        cf `r(varlist)' using slow_code.dta, all
        and the results:
        Code:
        . cf `r(varlist)' using slow_code.dta, all
               Tradenumb:  match
                    Date:  match
                Treasury:  match
                    Swap:  match
                   LIBOR:  match
          LIBOR_discount:  match
                    Repo:  match
        LIBOR_coupon_d~e:  2 mismatches
         REPO_reset_date:  match
         Treasury_disc_1:  match
             Swap_disc_1:  match
         Treasury_disc_2:  match
             Swap_disc_2:  match
         Treasury_disc_3:  match
             Swap_disc_3:  match
         Treasury_disc_4:  match
             Swap_disc_4:  match
         Treasury_disc_5:  match
             Swap_disc_5:  match
         Treasury_disc_6:  match
             Swap_disc_6:  match
         Treasury_disc_7:  match
             Swap_disc_7:  match
         Treasury_disc_8:  match
             Swap_disc_8:  match
         Treasury_disc_9:  match
             Swap_disc_9:  match
        Treasury_disc_10:  match
            Swap_disc_10:  match
        Treasury_disc_11:  match
            Swap_disc_11:  match
        Treasury_disc_12:  match
            Swap_disc_12:  match
        Treasury_disc_13:  match
            Swap_disc_13:  match
        Treasury_disc_14:  match
            Swap_disc_14:  match
        Treasury_disc_15:  match
            Swap_disc_15:  match
        Treasury_disc_16:  match
            Swap_disc_16:  match
        Treasury_disc_17:  match
            Swap_disc_17:  match
        Treasury_disc_18:  match
            Swap_disc_18:  match
        Treasury_disc_19:  match
            Swap_disc_19:  match
        Treasury_disc_20:  match
            Swap_disc_20:  match
        The difference for LIBOR_coupon_date is due to the non-regular binning of the first date for each trade in the #1 code. Note that I'm simply replicating the results generated from the #1 code using better coding strategies and I express no opinion as to whether or not any of this makes sense.

        Comment


        • #34
          Dear Nick!

          Thanks for the information about how to discard local macros. As I saw, this way anyways already included in your initial code, therefore I don't really know why the list was continued...


          Thank you Kristoffer too, for the helpful tip with compress. I did use it on my data files and although it reduced their size just slightly, I think it contributed to the general outcome described further down below.


          And thank you also Robert of the very detailed last post. Unfortuntately, I already found a working solution to the encountered problems just some hours earlier. And was just running it on some of the files to tell you later about it. I really appreciate all of your effort and try to use some parts to make my approach even a bit faster.

          What I ultimately did now is pretty much identical as in #14, however I moved only one line and it had astonishing implications. I didn't figure out to how to solve the problem arising when running several blocks of code in a sequence executed by the same DO-file. Therefore, I just start each block of code individually now, rather than as a sequence. Although it is more time intensive and needs constant observation it yields what I was aiming for.

          Code:
           // US 10Y 10BP SHORT
          clear all
          quietly {
          use TF_Short_10Y_10BP_US_end.dta
          
          forval j = 1/19 {
              gen Treasury_coupon_`j' = Treasury 
          }
          gen Treasury_coupon_20 = Treasury + 100
          
          forval j = 1/19 {
              gen Swap_coupon_`j' = Swap
          }
          gen Swap_coupon_20 = Swap + 100
          
          gen LIBOR_coupon = LIBOR + 100
          
          forval j = 1/20 {
              gen Treasury_disc_`j' = 1
              gen Swap_disc_`j' = 1
              gen Treasury_coupon_`j'_date =.
              gen Swap_coupon_`j'_date =.
          }
          
          gen LIBOR_coupon_date = .
          
          gen REPO_reset_date =. 
          
          quietly bysort Tradenumb (Date) : gen trade_start_date = Date[1]
          quietly bysort Tradenumb (Date) : gen trade_end_date = Date[_N]
          
          forval j = 1/20 {
              local J = 180 * `j'
              replace Treasury_coupon_`j'_date = trade_start_date + `J'
              replace Swap_coupon_`j'_date = trade_start_date + `J'
          }
          
          gen y = Date - trade_start_date
          gen z= 90*ceil(y / 90)
          replace z = 90 if z == 0
          replace LIBOR_coupon_date = trade_start_date + z
          drop y z 
          replace REPO_reset_date = Date[_n-1] if Date <= trade_end_date
          replace REPO_reset_date =. if Date == trade_start_date
          
          set tracedepth 1
          set trace on
          forvalues j= 0/3600 {
              forval k = 1/20 {
                  replace Treasury_disc_`k' = DT__`j' if Treasury_coupon_`k'_date - Date == `j'
                  replace Swap_disc_`k' = DS__`j' if Swap_coupon_`k'_date - Date == `j'
              }
              local todrop `todrop' DT__`j' DS__`j'
          }
          
          drop `todrop'
          }
          save TF_Short_10Y_10BP_US_last.dta, replace
          Thank you once again for all the help during the last week and all the helpful tips & solutions you proposed. I definitely couldn't have done it without all of you guys. Thanks so much

          Comment

          Working...
          X