statsby vs rangerun

River Huang

Join Date: Mar 2016
Posts: 1908

#16

08 Aug 2017, 02:38

Dear Robert, Following the previous question, suppose that I run the following regression

Code:

set rmsg on
set more off

* Example generated by -dataex-. To install: ssc install dataex
clear
input long id float yw double(ri rm) float year
 1 2861  -.04798   .01786 2015
 1 2862  .019231  .027828 2015
 1 2863  -.06311 -.001805 2015
 1 2864 -.032639 -.034765 2015
 1 2865 -.030151 -.037525 2015
 1 2866  .032568  .044766 2015
 1 2867  .002867  .016261 2015
 1 2868        0  .018274 2015
 1 2869 -1.03931 -.013548 2015
 1 2870  .115327  .039701 2015
 1 2871  .022015   .07656 2015
 1 2872 -.015666  .027218 2015
 1 2873  .051061  .056553 2015
 1 2874  .249211  .041905 2015
 1 2875  .034848  .051511 2015
 1 2876 -.046663  .031184 2015
 1 2877  .034696  .008503 2015
 1 2878 -.050898 -.044991 2015
 1 2879 -.025237  .034662 2015
 1 2880  .053074  .093365 2015
 1 2881  -1.0583 -.008247 2015
 1 2882  .063969   .10488 2015
 1 2883  .012883  .034483 2015
 1 2884  -.11387 -.136604 2015
 1 2885 -.058783 -.074094 2015
 1 2886 -.050835 -.140998 2015
 1 2887  .136955  .042479 2015
 1 2888 -.069987  .038188 2015
 1 2889 -.031838  .038027 2015
 1 2890 -.076233 -.105651 2015
 1 2891  .020227  .028003 2015
 1 2892  .002379  .067808 2015
 1 2893  -.09019 -.124472 2015
 1 2894 -.058261 -.093368 2015
 1 2895  .093259 -.040442 2015
 1 2896 -.074324  .022072 2015
 1 2897 -.013686 -.045713 2015
 1 2898 -.024052  .000718 2015
 1 2899 -.005687 -.009631 2015
 1 2900  .039085   .05271 2015
 1 2901  .030275  .077331 2015
 1 2902  .021371   .01406 2015
 1 2903  -.00959 -.007376 2015
 1 2904  .090669  .063103 2015
 1 2905 -.012107  .003349 2015
 1 2906  .025327  .022665 2015
 1 2907 -.065339  -.05515 2015
 1 2908  .033248  .029686 2015
 1 2909 -.023927 -.024785 2015
 1 2910  .033812  .049859 2015
 1 2911  .014718  .015578 2015
 1 2913  -.07256  -.11381 2016
 1 2914 -.059353 -.096372 2016
 1 2915 -.005736  .008176 2016
 1 2916 -.038462 -.067808 2016
 1 2917    -.008  .015627 2016
 1 2919  .012097  .043275 2016
 1 2920   -.0249 -.040242 2016
 1 2921  .062308  .032311 2016
 1 2922 -.023077 -.023095 2016
 1 2923  .037402  .061767 2016
 1 2924  .004744  .013456 2016
 1 2925   .00661  .010947 2016
 1 2926 -.008443 -.004801 2016
 1 2927  .029328  .033933 2016
 1 2928 -.030331 -.043164 2016
 1 2929  .001896 -.004306 2016
 1 2930  -.00473 -.007785 2016
 1 2931  -.01616 -.034009 2016
 1 2932 -.004831  .001119 2016
 1 2933 -.002913  .000574 2016
 1 2934  .022395  .047394 2016
 1 2936 -.004857 -.011127 2016
 1 2937 -.001166 -.007345 2016
 1 2938  .016336  .030832 2016
 1 2939  .003444  .024083 2016
 1 2940  .028604   .02332 2016
 1 2941 -.005562 -.011295 2016
 1 2942  .029083 -.016186 2016
 1 2943 -.017391  .000192 2016
 1 2944  .050885  .024757 2016
 1 2945  .001053  .024473 2016
 1 2946 -.006309 -.013214 2016
 1 2947        0  -.00175 2016
 1 2948 -.007407  .007795 2016
 1 2950  .009934   .01416 2016
 1 2951 -.008743 -.007065 2016
 1 2953  .002205  .021257 2016
 1 2954    .0044  .007937 2016
 1 2955  .004381  .002376 2016
 1 2956 -.006543  .005227 2016
 1 2957  .007684   .02185 2016
 1 2958        0  .000973 2016
 1 2959   .04793  .020443 2016
 1 2960 -.007277 -.007483 2016
 1 2961  .010471 -.002223 2016
 1 2962 -.041451 -.034551 2016
 1 2963 -.018378 -.004889 2016
18 2861  -.11284   .01786 2015
18 2862  .142544  .027828 2015
18 2863   .03263 -.001805 2015
18 2864  .019703 -.034765 2015
18 2865 -.088225 -.037525 2015
18 2866 -.032387  .044766 2015
18 2867  .029339  .016261 2015
18 2868 -.018065  .018274 2015
18 2869  .066639 -.013548 2015
18 2870  .038712  .039701 2015
18 2871 -.050554   .07656 2015
18 2872 -.038865  .027218 2015
18 2873   .11201  .056553 2015
18 2874  .002909  .041905 2015
18 2875  .171139  .051511 2015
18 2876  .018576  .031184 2015
18 2877 -.068085  .008503 2015
18 2878  .079256 -.044991 2015
18 2879  .085524  .034662 2015
18 2880  .238864  .093365 2015
18 2881  .123596 -.008247 2015
18 2882    .0658   .10488 2015
18 2883 -.058735  .034483 2015
18 2884 -.132775 -.136604 2015
18 2885 -.226667 -.074094 2015
18 2886 -.206302 -.140998 2015
18 2887 -.117603  .042479 2015
18 2888  .097199  .038188 2015
18 2889  .307544  .038027 2015
18 2890 -.227515 -.105651 2015
18 2891  .221754  .028003 2015
18 2892  .009404  .067808 2015
18 2893 -.147516 -.124472 2015
18 2894 -.136612 -.093368 2015
18 2895 -.198312 -.040442 2015
18 2896  .357895  .022072 2015
18 2897  -.10155 -.045713 2015
18 2898  .091458  .000718 2015
18 2899 -.011858 -.009631 2015
18 2900    .1004   .05271 2015
18 2901  .123228  .077331 2015
18 2902 -.017476   .01406 2015
18 2903   .08498 -.007376 2015
18 2904  -.03643  .063103 2015
18 2905  .045999  .003349 2015
18 2906  .144277  .022665 2015
18 2907 -.138194  -.05515 2015
18 2908   .00336  .029686 2015
18 2909  .029833 -.024785 2015
18 2910  .155779  .049859 2015
18 2911   .19821  .015578 2015
18 2913 -.126423  -.11381 2016
18 2914 -.184737 -.096372 2016
18 2915  .131564  .008176 2016
18 2916 -.054981 -.067808 2016
18 2917  .094209  .015627 2016
18 2919  .273171  .043275 2016
18 2920 -.213027 -.040242 2016
18 2921 -.123661  .032311 2016
18 2922  .058611 -.023095 2016
18 2923  .121228  .061767 2016
18 2924  .098526  .013456 2016
18 2925  .048147  .010947 2016
18 2926  .092073 -.004801 2016
18 2927 -.051182  .033933 2016
18 2928 -.098666 -.043164 2016
18 2929 -.028292 -.004306 2016
18 2930   .06047 -.007785 2016
18 2931 -.077508 -.034009 2016
18 2932 -.051282  .001119 2016
18 2933 -.004826  .000574 2016
18 2934  .132396  .047394 2016
18 2936 -.099297 -.011127 2016
18 2937  .187317 -.007345 2016
18 2940 -.032046   .02332 2016
18 2941 -.017827 -.011295 2016
18 2942 -.069144 -.016186 2016
18 2943  .012999  .000192 2016
18 2944 -.020165  .024757 2016
18 2945  .043031  .024473 2016
18 2946 -.000897 -.013214 2016
18 2947 -.030521  -.00175 2016
18 2948  .039815  .007795 2016
18 2950  -.01107   .01416 2016
18 2951 -.071828 -.007065 2016
18 2953  .003015  .021257 2016
18 2954 -.044088  .007937 2016
18 2955 -.012579  .002376 2016
18 2956 -.006369  .005227 2016
18 2957  .073718   .02185 2016
18 2958  .059701  .000973 2016
18 2959  -0.8328  .020443 2016
18 2960  .007767 -.007483 2016
18 2961        0 -.002223 2016
18 2962  .040462 -.034551 2016
18 2963  .090741 -.004889 2016
end
format %tw yw

bys id (year): gen t = _n
xtset id t

gen F1rm = F1.rm
gen F2rm = F2.rm
gen L1rm = L1.rm
gen L2rm = L2.rm

// -rangestat-
rangestat (reg) ri F2rm F1rm rm L1rm L2rm, interval(year 0 0) by(id)
gen e = ri-(F2rm*b_F2rm + F1rm*b_F1rm + rm*b_rm + L1rm*b_L1rm + L2rm*b_L2rm + b_cons)

// gen firm-specific returns
gen W = ln(1+e)

and generate the so-called firm-specific (weekly) returns W, which is defined as the logarithm of 1 plus the residuals from the regression.
Now, for each `id' and each `year', I first calculate the normalized weekly returns as W_norm

Code:

bys id year: egen W_mean = mean(W)
bys id year: egen W_sd = sd(W)
gen W_norm = (W-W_mean)/W_sd
gen crash_week = (W_norm < -3.2) if !missing(W_norm) 
bys id year: egen crash_freq = total(crash_week)
gen crash = (crash_freq > 0) if !missing(crash_freq)

Then, I want to know how many weeks (crash_week) the normalized returns W_norm is less than -3.2 (for each id and year). Does it pay to use (and how) -rangerun- command in this situation?

Ho-Chuan (River) Huang
Stata 19.0, MP(4)

Comment

Nick Cox

Join Date: Mar 2014
Posts: 36053

#17

08 Aug 2017, 05:39

I have two quite different comments on this.

First, it seems that your weeks do not map on to Stata weeks, which is common: in your case the lack of observations for 1996w1 is tell-tale. In general, it is in my view better to define weeks by the daily dates that start or end them and use delta(7) in any tsset or xtset declaration.

For much more discussion, see these in chronological order until sated (search week, sj brings up clickable links).

Code:

Search of official help files, FAQs, Examples, SJs, and STBs

SJ-12-4 dm0065_1  . . . . . Stata tip 111: More on working with weeks, erratum
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q4/12   SJ 12(4):765                                     (no commands)
        lists previously omitted key reference

SJ-12-3 dm0065  . . . . . . . . . .  Stata tip 111: More on working with weeks
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q3/12   SJ 12(3):565--569                                (no commands)
        discusses how to convert data presented in yearly and weekly
        form to daily dates and how to aggregate such data to months
        or longer intervals

SJ-10-4 dm0052  . . . . . . . . . . . . . . . . Stata tip 68: Week assumptions
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q4/10   SJ 10(4):682--685                                (no commands)
        tip on Stata's solution for weeks and on how to set up
        your own alternatives given different definitions of the
        week

But sorting on your identifier and your weekly date variable is more reliable than what you did, as sort id year is satisfied by many inappropriate orders and doesn't necessarily respect week order.

Second, and to your direct question, it seems that you have already calculated what you want. Here's the total code I used after your dataex example (for which thanks!), including code reflecting the first point.

Code:

format %tw yw

bysort id (yw) : gen t = _n 
xtset id t

gen F1rm = F1.rm
gen F2rm = F2.rm
gen L1rm = L1.rm
gen L2rm = L2.rm

// -rangestat-
rangestat (reg) ri F2rm F1rm rm L1rm L2rm, interval(year 0 0) by(id)
gen e = ri-(F2rm*b_F2rm + F1rm*b_F1rm + rm*b_rm + L1rm*b_L1rm + L2rm*b_L2rm + b_cons)

// gen firm-specific returns
gen W = ln(1+e)
bys id year: egen W_mean = mean(W)
by id year: egen W_sd = sd(W)
gen W_norm = (W - W_mean) / W_sd
by id year: egen crash_freq = total(W_norm < 3.2)

I yield to no-one in admiring rangerun but I wouldn't write a program for it in this case as the calculations following rangestat are easy enough.

Comment

River Huang

Join Date: Mar 2016
Posts: 1908

#18

08 Aug 2017, 17:38

Dear Nick, many thanks for the suggestions. When I tried to run the following two codes (using the data above)

Code:

// ncskew0 
gen W20 = W^2
gen W30 = W^3
bys id year: egen TW20 = total(W20)
bys id year: egen TW30 = total(W30)
bys id year: egen n0 = count(W)
gen ncskew0 = -(n0*(n0-1)^1.5*TW30)/((n0-1)*(n0-2)*(TW20)^1.5)

// ncskew1
capture program drop mycr
program mycr
  sum W      
  gen n = r(N)
  gen W2 = W^2
  gen W3 = W^3
  egen TW2 = total(W2)
  egen TW3 = total(W3)
  gen ncskew = -(n*(n-1)^1.5*TW3)/((n-1)*(n-2)*TW2^1.5)   
end 

bysort id year (yw): gen high = cond(_n==1, year, -1)
rangerun mycr, interval(year 0 high) use(W) by(id)

I found their results are mostly identical except for W2 and W3, which are missing in the year 2016. Could you explain why this is happening? (Is it related to -interval- option?)

Ho-Chuan (River) Huang
Stata 19.0, MP(4)

Comment

Nick Cox

Join Date: Mar 2014

Posts: 36053
#19

09 Aug 2017, 02:28

I don't really understand what you're trying to do. Why is the first observation in each year treated differently?

I would use double precision for calculating sums of squares and cubes.

Code:

egen TW2 = total(W2)

is a poor way to get totals in this case. summarize yields r(sum) directly.
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#20

09 Aug 2017, 03:34

1. Hi, Nick, my purpose is very simple. For each id in each year, I have different numbers of weekly observations, which are used to calculate the `ncskew' indicator (I believe that you can figure out the formula as done above).
2. Why is the first observation in each year treated differently? Do you mean

Code:

bysort id year (yw): gen high = cond(_n==1, year, -1)

? If so, I guess that I just followed the advice of Robert (please see above) to consider the first observation in each year and avoid repeating the same task for the other observations in that year.
3. So, you are suggesting

Code:

capture program drop mycr program mycr sum W gen n = r(N) gen W2 = W^2 sum W2 gen TW2 = r(sum) gen W3 = W^3 sum W3 gen TW3 = r(sum) gen ncskew = -(n*(n-1)^1.5*TW3)/((n-1)*(n-2)*TW2^1.5) end

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

#21

09 Aug 2017, 10:08

The short answer to River's puzzlement is that the residuals are missing for the first two and last two observations of each panels because of the lead and lags of the variable rm. Since the W* variables are derived from these residuals, they are equally missing for these observations. And as is explained in the help file for rangerun:

Results are picked out from what is left in memory when program_name terminates without error. rangerun identifies all new numeric variables and stores results using values from the last observation in memory.

Since each panel in the test data end in 2016, W2 and W3 are missing in the last observation.

In the interest of all current and future users of rangestat and rangerun, I'll recap how to approach River's problem (without any effort to understand the logic behind the calculations). I assume that the data presented in #16 has been saved in a file called "test_data.dta". The first thing to do is to develop code for a single case and I'll use the first group of id and year:

Code:

use "test_data.dta", clear

egen id_year = group(id year)
keep if id_year == 1

reg ri F2rm F1rm rm L1rm L2rm
predict double e, residuals

gen n = e(N)
gen double W = ln(1+e)
gen double W2 = W^2
gen double W3 = W^3
egen double TW2 = total(W2)
egen double TW3 = total(W3)
gen ncskew = -(n*(n-1)^1.5*TW3)/((n-1)*(n-2)*TW2^1.5)   

* 
bysort id_year (yw): gen first_last = _n == 1 | _n == _N
list yw e-ncskew if first_last

and the results that will be used to compare with those of rangestat and rangerun:

Code:

. list yw e-ncskew if first_last

     +-----------------------------------------------------------------------------------------------------+
     |      yw            e    n            W          W2           W3         TW2          TW3     ncskew |
     |-----------------------------------------------------------------------------------------------------|
  1. |  2015w2            .   49            .           .            .   9.5996286   -20.480589   4.973701 |
 51. | 2015w52   -.03120216   49   -.03169931   .00100485   -.00003185   9.5996286   -20.480589   4.973701 |
     +-----------------------------------------------------------------------------------------------------+

You can use rangestat to quickly calculate the regressions for all groups. The residuals can be manually calculated from the results and all other variables from there:

Code:

use "test_data.dta", clear

egen id_year = group(id year)
rangestat (reg) ri F2rm F1rm rm L1rm L2rm, interval(id_year 0 0)
gen double e = ri-(F2rm*b_F2rm + F1rm*b_F1rm + rm*b_rm + L1rm*b_L1rm + L2rm*b_L2rm + b_cons)

gen n = reg_nobs
gen double W = ln(1+e)
gen double W2 = W^2
gen double W3 = W^3
bys id_year: egen double TW2 = total(W2)
bys id_year: egen double TW3 = total(W3)
gen ncskew = -(n*(n-1)^1.5*TW3)/((n-1)*(n-2)*(TW2)^1.5)

bysort id_year (yw): gen first_last = _n == 1 | _n == _N
list id yw e-ncskew if first_last, sepby(id_year)

and the results

Code:

. list id yw e-ncskew if first_last, sepby(id_year)

     +-----------------------------------------------------------------------------------------------------------+
     | id        yw            e    n            W          W2           W3         TW2          TW3      ncskew |
     |-----------------------------------------------------------------------------------------------------------|
  1. |  1    2015w2            .   49            .           .            .   9.5996286   -20.480589    4.973701 |
 51. |  1   2015w52   -.03120216   49   -.03169931   .00100485   -.00003185   9.5996286   -20.480589    4.973701 |
     |-----------------------------------------------------------------------------------------------------------|
 52. |  1    2016w2   -.01394307   45   -.01404119   .00019715   -2.768e-06    .0067886    .00011867   -1.472797 |
 98. |  1   2016w52            .   45            .           .            .    .0067886    .00011867   -1.472797 |
     |-----------------------------------------------------------------------------------------------------------|
 99. | 18    2015w2            .   49            .           .            .   .48828409    .01721096   -.3643469 |
149. | 18   2015w52    .14773306   49    .13778875   .01898574    .00261602   .48828409    .01721096   -.3643469 |
     |-----------------------------------------------------------------------------------------------------------|
150. | 18    2016w2    .10108138   43    .09629277    .0092723    .00089286   3.1374713   -4.9544807    6.059513 |
194. | 18   2016w52            .   43            .           .            .   3.1374713   -4.9544807    6.059513 |
     +-----------------------------------------------------------------------------------------------------------+

You can also encapsulate the code for the single case above into a Stata program and use rangerun to perform the same calculations. As was explained many times, you only want to perform the regressions once per group so you need to create an invalid interval for all but one observation per group. This is explained in details in the help file.

Code:

clear all
use "test_data.dta"
egen id_year = group(id year)

program mycr
    reg ri F2rm F1rm rm L1rm L2rm
    predict double e, residuals

    gen n = e(N)
    gen double W = ln(1+e)
    gen double W2 = W^2
    gen double W3 = W^3
    egen double TW2 = total(W2)
    egen double TW3 = total(W3)
    gen ncskew = -(n*(n-1)^1.5*TW3)/((n-1)*(n-2)*TW2^1.5)   
end 

bysort id_year (yw): gen high = cond(_n==1, id_year, -1)
rangerun mycr, interval(id_year 0 high)

list id yw e-ncskew if !mi(n)

and the results:

Code:

. list id yw e-ncskew if !mi(n)

     +----------------------------------------------------------------------------------------------------------+
     | id       yw            e    n            W          W2           W3         TW2          TW3      ncskew |
     |----------------------------------------------------------------------------------------------------------|
  1. |  1   2015w2   -.03120216   49   -.03169931   .00100485   -.00003185   9.5996286   -20.480589    4.973701 |
 52. |  1   2016w2            .   45            .           .            .    .0067886    .00011867   -1.472797 |
 99. | 18   2015w2    .14773306   49    .13778875   .01898574    .00261602   .48828409    .01721096   -.3643469 |
150. | 18   2016w2            .   43            .           .            .   3.1374713   -4.9544807    6.059513 |
     +----------------------------------------------------------------------------------------------------------+

As you can see, all results match and the mystery of missing W* variables is explained by the fact that forward lags are missing for the last year of each firm.

In terms of whether River should use rangerun or rangestat for this specific problem, I would go with rangestat because it will be faster.

Comment

River Huang

Join Date: Mar 2016

Posts: 1908
#22

09 Aug 2017, 19:01

Thanks again for your detailed and helpful suggestions, Robert. Another question (not really related to -rangestat- or -rangerun-) is (following the above example):
Given the firms' specific weekly returns W=ln(1+e), I first calculate the mean returns for each year (and for each id) as

Code:

bys id_year: egen W_mean = mean(W)

and then I calculate the standard deviation of those (called down weeks) weekly returns which are less than the mean returns W_mean, and the standard deviation of those weekly (called up weeks) returns which are larger than or equal to the mean returns W_mean as

Code:

by id_year: egen Wd = sd(W) if W < W_mean by id_year: egen Wu = sd(W) if W >= W_mean

and the final purpose is to obtain an indicator called duvol as

Code:

by id_year: gen duvol = log(Wd/Wu)

but I got missing values for all observations. I wonder if you are aware of any concise method to deal with this problem?

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#23

10 Aug 2017, 08:57

When you are puzzled by the results you get, why not simply look at the data using browse? You would see that, with the code you crafted, W is either below the mean or not for each observation and therefore Wd or Wu will be missing. This leads to the ratio Wd/Wu to be missing for each observation.

I'm guessing this is what you are trying to do:

Code:

bys id_year: egen double W_mean = mean(W) by id_year: egen double Wd = sd(W / (W < W_mean)) by id_year: egen double Wu = sd(W / (W >= W_mean)) by id_year: gen duvol = log(Wd/Wu)

When (W < W_mean) and (W >= W_mean) are true, the expression evaluates to 1, zero otherwise. If you divide W by 1, you get W. If you divide W by zero, you get a missing value. The egen sd() function ignores missing values so you get what you want.
1 like
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#24

10 Aug 2017, 17:37

Thanks a again, Robert. Indeed, I have looked at the data and found that either Wd or Wu is missing. I was thinking to use something (a little bit stupid) like

Code:

bys id_year: egen Wdm = mean(Wd)

to fill the gaps. But, I believe that there must be an easier way, and your code is exactly what I need.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#25

28 Aug 2017, 04:07

Dear Robert, Is it possible to use -rangerun- to speed up the speed of bootstrapping procedures. Example is below: https://www.statalist.org/forums/for...w-to-bootstrap.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#26

28 Aug 2017, 13:21

Perhaps but I have never used bootstrap and I'm not sure I can completely replicate the whole process. I'll post what I came up with in the other thread.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment