Runby corr, error: store_data(): 3900 unable to allocate real <tmp>

Wen-Hung Hsu

Join Date: Mar 2021

Posts: 37
#1

Runby corr, error: store_data(): 3900 unable to allocate real <tmp>

23 Oct 2022, 05:03

Hello,

I am working on a project where I need to runby the corr program and return the correlation in the generated variables.

Code:

program my_corr pwcorr resid_rm resid_x resid_r mat c = r(C) gen rm_x_corr = c[2,1] gen rm_r_corr = c[3,1] gen x_r_corr = c[3,2] gen rm_x_t = rm_x_corr*(count_permno-2)^0.5/ (1-rm_r_corr^2)^0.5 gen rm_r_t = rm_r_corr*(count_permno-2)^0.5/ (1-rm_r_corr^2)^0.5 gen x_r_t = x_r_corr*(count_permno-2)^0.5/ (1-x_r_corr^2)^0.5 end runby my_corr, by(id) verbose

-runby- successfully excuses when id is less than 10,000. When I try to implement -runby- with my_corr, it has the error:

Code:

store_data(): 3900 unable to allocate real <tmp>[2893885,1] runby_main(): - function returned error <istmt>: - function returned error

After googling the error message, I believe the error is raised due to memory problem. It seems that -runby- would record the results of the program in a big matrix for each by(id).
How can I solve this problem? Is it possible that I return the values in the big matrix and then clear the results in the big matrix for each by(id)?

The complete data range for 55 years, and -runby- bumps into the error when I try to apply it in a 2-year data. Therefore, I would love not split the data for each year and run 55 times.
Any comment and advice is welcome

Last edited by Wen-Hung Hsu; 23 Oct 2022, 05:06.
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

23 Oct 2022, 11:38

Perhaps the problem is that runby treats everything in memory as results, including the original variables.

If so, then perhaps this untested code will start you in a useful direction.

Code:

program my_corr
    pwcorr resid_rm resid_x resid_r
    mat c = r(C)
    gen rm_x_corr = c[2,1]
    gen rm_r_corr = c[3,1]
    gen x_r_corr = c[3,2]
    keep id rm_x_corr rm_r_corr x_r_corr
end

use mydata, clear
runby my_corr, by(id) verbose

merge 1:m id using mydata
gen rm_x_t = rm_x_corr*(count_permno-2)^0.5/ (1-rm_r_corr^2)^0.5
gen rm_r_t = rm_r_corr*(count_permno-2)^0.5/ (1-rm_r_corr^2)^0.5
gen x_r_t = x_r_corr*(count_permno-2)^0.5/ (1-x_r_corr^2)^0.5

Last edited by William Lisowski; 23 Oct 2022, 11:41.

Comment

Wen-Hung Hsu

Join Date: Mar 2021
Posts: 37

24 Oct 2022, 00:55

Dear William,

It's a brilliant way to just store the yearly data since the values within the id is constant.
However, I have another program my_var

Code:

program my_var
    xtset id mydate
    var sprtrn sign_trading ret, lags(1/5)
    mat t = r(table)
    predict resid_rm, residuals equation (sprtrn)
    predict resid_x, residuals equation (sign_trading)
    predict resid_r, residuals equation (ret)
    
    local count = 0
    foreach k in rm x r {
        forvalues m = 1/5 {        
            gen rm_`k'_l`m'_coef = t[1,`= `m' + `count'']
            gen rm_`k'_l`m'_t = t[3,`= `m' + `count'']            
        }
            local count = `count' +5
    }
    local count = `count'+1
    foreach k in rm x r {
        forvalues m = 1/5 {        
            gen x_`k'_l`m'_coef = t[1,`= `m' + `count'']
            gen x_`k'_l`m'_t = t[3,`= `m' + `count'']                    
        }
            local count = `count' +5
    }
    local count = `count'+1
    foreach k in rm x r {
        forvalues m = 1/5 {        
            gen r_`k'_l`m'_coef = t[1,`= `m' + `count'']
            gen r_`k'_l`m'_t = t[3,`= `m' + `count'']          
        }
            local count = `count' +5
    }
end

my_var estimate VAR model and return the predicted values for each window in each id, and the values within id are variable.
my_var works fine when in the sub-dataset with around 10 million rows, but the complete dataset consists aroumd 80 million rows. Since the solution to my_corr does not work here, how can I solve this situation?

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

24 Oct 2022, 08:44

The code in post #2 made two points.

The important point is to keep only those variables that were created in the program, otherwise runby will try to store copies of the input variables as well. After runby has processed all your id's, you then merge this small set of created variables back to your dataset.

The less important point was that variables that do not need to be calculated within your program can be calculated outside the program, reducing runby space requirements.

There is a third point I overlooked. Since the created variables are constant across all observations in the by-group, you should keep just a single observation. Indeed, for the merge I showed to work, the variable id needs to uniquely identify observations in the created mydata dataset. And by keeping just a single observation, you will radically reduce the runby space requirements.

So with that said, you need to add something like the following just before the end of the my_var program.

Code:

keep in 1 keep id rm_* x_* r_*

and then merge the results back to the original data.
Comment

Announcement