Hi Statalists,
I have two datasets to calculate 2 variables: Roll (1984) liquidity measure and the 4-year-rolling-window standard deviations of the residuals of a cross-sectional regression.
1) For Roll liquidity:
Roll liquidity = 2*sqrt(- covariance (price_change_t, price_change_t-1))
I have unbalanced panel for daily stock price (i.e. id date price). I use rangestat to calculate the covariance over 21 days. I have quite big data with nearly 5,300,000 rows (about 800 firms over 14 years). It takes me years to get the results and I am not sure when it will complete.
2) For the 4-year-rolling-window standard deviation of the residuals of a cross-sectional regression for unbalanced panel with over firm-year 35,000 obs from 2005 to 2017:
First, I run the cross-sectional regression like this reg accruals cf_1lag cf cf_1lead rev ppe
So I use runby as suggested by some prior posts here, and get the residuals, then I want to calculate the standard deviations of the residuals rolling 4 years.
Again, it takes me forever to have the results by using asrol.
I cannot upload any dataset sample here because rangestat or asrol is run based on the actual data sample size.
Please accept my apology for this long post, but because they are the same topic how to speed up the running process.
Can anyone please help if I did something wrong with the codes? How can I check if when they will finish?
I really appreciate your help.
Kind regards,
Ken
I have two datasets to calculate 2 variables: Roll (1984) liquidity measure and the 4-year-rolling-window standard deviations of the residuals of a cross-sectional regression.
1) For Roll liquidity:
Roll liquidity = 2*sqrt(- covariance (price_change_t, price_change_t-1))
I have unbalanced panel for daily stock price (i.e. id date price). I use rangestat to calculate the covariance over 21 days. I have quite big data with nearly 5,300,000 rows (about 800 firms over 14 years). It takes me years to get the results and I am not sure when it will complete.
Code:
encode id,gen (firm) sort firm date format %td date by firm: gen obs_count=_n xtset firm obs_count bys firm: gen change_prc= prc - L.prc bys firm: gen lag_change_prc=L.change_prc drop if year(date)<2004 drop if year(date)>2017 ssc install rangestat rangestat (cov) lag_change_prc change_prc, by(firm) interval(obs_count -20 0)
2) For the 4-year-rolling-window standard deviation of the residuals of a cross-sectional regression for unbalanced panel with over firm-year 35,000 obs from 2005 to 2017:
First, I run the cross-sectional regression like this reg accruals cf_1lag cf cf_1lead rev ppe
So I use runby as suggested by some prior posts here, and get the residuals, then I want to calculate the standard deviations of the residuals rolling 4 years.
Again, it takes me forever to have the results by using asrol.
Code:
ssc install runby capture program drop one_regression program define one_regression if _N > 10 { capture noisily reg accruals cf_1lag cf cf_1lead rev ppe, noconstant if c(rc) == 0 { // REGRESSION WENT OK predict r } else if inlist(c(rc), 2000, 2001) { // NO OR INSUFFICIENT OBSERVATIONS gen r = . } else { // THERE WAS AN UNEXPECTED PROBLEM gen comment = "Unexpected error `c(rc)''" } } exit end runby one_regression, by(year industry) status replace r=0 if missing(r) rename r residuals ///use asrol to obtain the standard deviation of the residuals rolling 4 years sort firm year bys firm: gen t=_n tsset firm t asrol residuals, w(year 4) s(sd) g(sd)
Please accept my apology for this long post, but because they are the same topic how to speed up the running process.
Can anyone please help if I did something wrong with the codes? How can I check if when they will finish?
I really appreciate your help.
Kind regards,
Ken
Comment