Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Runby corr, error: store_data(): 3900 unable to allocate real <tmp>

    Hello,

    I am working on a project where I need to runby the corr program and return the correlation in the generated variables.
    Code:
    program my_corr
        pwcorr resid_rm resid_x resid_r
        mat c = r(C)
        gen rm_x_corr = c[2,1]
        gen rm_r_corr = c[3,1]
        gen x_r_corr = c[3,2]
        gen rm_x_t = rm_x_corr*(count_permno-2)^0.5/ (1-rm_r_corr^2)^0.5
        gen rm_r_t = rm_r_corr*(count_permno-2)^0.5/ (1-rm_r_corr^2)^0.5
        gen x_r_t = x_r_corr*(count_permno-2)^0.5/ (1-x_r_corr^2)^0.5
    end
    runby my_corr, by(id) verbose
    -runby- successfully excuses when id is less than 10,000. When I try to implement -runby- with my_corr, it has the error:
    Code:
     store_data():  3900  unable to allocate real <tmp>[2893885,1]
                runby_main():     -  function returned error
                     <istmt>:     -  function returned error
    After googling the error message, I believe the error is raised due to memory problem. It seems that -runby- would record the results of the program in a big matrix for each by(id).
    How can I solve this problem? Is it possible that I return the values in the big matrix and then clear the results in the big matrix for each by(id)?

    The complete data range for 55 years, and -runby- bumps into the error when I try to apply it in a 2-year data. Therefore, I would love not split the data for each year and run 55 times.
    Any comment and advice is welcome
    Last edited by Wen-Hung Hsu; 23 Oct 2022, 05:06.

  • #2
    Perhaps the problem is that runby treats everything in memory as results, including the original variables.

    If so, then perhaps this untested code will start you in a useful direction.
    Code:
    program my_corr
        pwcorr resid_rm resid_x resid_r
        mat c = r(C)
        gen rm_x_corr = c[2,1]
        gen rm_r_corr = c[3,1]
        gen x_r_corr = c[3,2]
        keep id rm_x_corr rm_r_corr x_r_corr
    end
    
    use mydata, clear
    runby my_corr, by(id) verbose
    
    merge 1:m id using mydata
    gen rm_x_t = rm_x_corr*(count_permno-2)^0.5/ (1-rm_r_corr^2)^0.5
    gen rm_r_t = rm_r_corr*(count_permno-2)^0.5/ (1-rm_r_corr^2)^0.5
    gen x_r_t = x_r_corr*(count_permno-2)^0.5/ (1-x_r_corr^2)^0.5
    Last edited by William Lisowski; 23 Oct 2022, 11:41.

    Comment


    • #3
      Dear William,

      It's a brilliant way to just store the yearly data since the values within the id is constant.
      However, I have another program my_var
      Code:
      program my_var
          xtset id mydate
          var sprtrn sign_trading ret, lags(1/5)
          mat t = r(table)
          predict resid_rm, residuals equation (sprtrn)
          predict resid_x, residuals equation (sign_trading)
          predict resid_r, residuals equation (ret)
          
          local count = 0
          foreach k in rm x r {
              forvalues m = 1/5 {        
                  gen rm_`k'_l`m'_coef = t[1,`= `m' + `count'']
                  gen rm_`k'_l`m'_t = t[3,`= `m' + `count'']            
              }
                  local count = `count' +5
          }
          local count = `count'+1
          foreach k in rm x r {
              forvalues m = 1/5 {        
                  gen x_`k'_l`m'_coef = t[1,`= `m' + `count'']
                  gen x_`k'_l`m'_t = t[3,`= `m' + `count'']                    
              }
                  local count = `count' +5
          }
          local count = `count'+1
          foreach k in rm x r {
              forvalues m = 1/5 {        
                  gen r_`k'_l`m'_coef = t[1,`= `m' + `count'']
                  gen r_`k'_l`m'_t = t[3,`= `m' + `count'']          
              }
                  local count = `count' +5
          }
      end
      my_var estimate VAR model and return the predicted values for each window in each id, and the values within id are variable.
      my_var works fine when in the sub-dataset with around 10 million rows, but the complete dataset consists aroumd 80 million rows. Since the solution to my_corr does not work here, how can I solve this situation?

      Comment


      • #4
        The code in post #2 made two points.

        The important point is to keep only those variables that were created in the program, otherwise runby will try to store copies of the input variables as well. After runby has processed all your id's, you then merge this small set of created variables back to your dataset.

        The less important point was that variables that do not need to be calculated within your program can be calculated outside the program, reducing runby space requirements.

        There is a third point I overlooked. Since the created variables are constant across all observations in the by-group, you should keep just a single observation. Indeed, for the merge I showed to work, the variable id needs to uniquely identify observations in the created mydata dataset. And by keeping just a single observation, you will radically reduce the runby space requirements.

        So with that said, you need to add something like the following just before the end of the my_var program.
        Code:
        keep in 1
        keep id rm_* x_* r_*
        and then merge the results back to the original data.

        Comment

        Working...
        X