Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • value and equal weighted averages over a window?

    Hi

    How to create daily value-weighted and equal-weighted averages of earnings ratio X over a window t-1 to t+1 relative to day t?


    For daily equal-weighted average calculated at date t
    Code:
    bys date : egen ew_X= mean (X)
    For daily value-weighted average calculated at date t
    Code:
    bys date : egen vw_X= wtmean (X), weight(market_value)

    The above codes calculate the averages for one day, date t, and I want to do so for all observations in a window spanning the period t-1 to t+1 (three days). I have several firm observations each day.

    An example of the data is below:

    Code:
    * Example generated by -dataex-. To install:    ssc    install    dataex
    clear
    input double(date day) long lpermno double X
    4386  4 48063   -.017680526788815442
    4387  5 32328    .004979764504273257
    4387  5 26710    .013923698163334514
    4387  5 45356    .030431487452540987
    4388  6 33312    .004819280099424343
    4388  6 37487    .061908231434809856
    4392 10 41734  -.0047108946665263395
    4392 10 30680    .012353034519365353
    4392 10 39386    .007323568661099492
    4393 11 21733   .0010962955461855245
    4393 11 42593    .008016466404701529
    4393 11 46754    .014612875383848886
    4394 12 31878    .047724595825440544
    4394 12 39263    .007307486919867627
    4394 12 42462   -.013253157401120177
    4395 13 47837    .018265066255042116
    4395 13 41996   .0005258543830600333
    4395 13 36775    .008686566347012507
    4395 13 19940   .0010242368276101013
    4396 14 21282    .012426076086511086
    4396 14 37073      .2394106870472846
    4399 17 46156    -.00840515136005388
    4399 17 30234  -.0006765900768512746
    4399 17 21397    -.00675601830814635
    4399 17 42104     .03558888911356758
    4399 17 45022   .0013793108362174368
    4399 17 49517    -.02818880793687696
    4399 17 28353    .005380657688987578
    4399 17 41486    -.02101226676147718
    4399 17 41523   -.012747568621689085
    4400 18 10487   -.006458697029810499
    end
    format %td date
    Thanks

  • #2
    Thanks for the data example. I don't see a market_value here, so I made up a silly one.

    rangestat from SSC will cope with the windowing here, which is to me the harder part of the calculation, although it would yield to a loop. It doesn't support weighted means directly, although that is programmable within rangestat. But it seems simpler to calculate the numerator and denominator of the weighted mean, and then divide.


    Code:
    clear
    input double(date day) long lpermno double X
    4386  4 48063   -.017680526788815442
    4387  5 32328    .004979764504273257
    4387  5 26710    .013923698163334514
    4387  5 45356    .030431487452540987
    4388  6 33312    .004819280099424343
    4388  6 37487    .061908231434809856
    4392 10 41734  -.0047108946665263395
    4392 10 30680    .012353034519365353
    4392 10 39386    .007323568661099492
    4393 11 21733   .0010962955461855245
    4393 11 42593    .008016466404701529
    4393 11 46754    .014612875383848886
    4394 12 31878    .047724595825440544
    4394 12 39263    .007307486919867627
    4394 12 42462   -.013253157401120177
    4395 13 47837    .018265066255042116
    4395 13 41996   .0005258543830600333
    4395 13 36775    .008686566347012507
    4395 13 19940   .0010242368276101013
    4396 14 21282    .012426076086511086
    4396 14 37073      .2394106870472846
    4399 17 46156    -.00840515136005388
    4399 17 30234  -.0006765900768512746
    4399 17 21397    -.00675601830814635
    4399 17 42104     .03558888911356758
    4399 17 45022   .0013793108362174368
    4399 17 49517    -.02818880793687696
    4399 17 28353    .005380657688987578
    4399 17 41486    -.02101226676147718
    4399 17 41523   -.012747568621689085
    4400 18 10487   -.006458697029810499
    end
    format %td date
    
    gen market_value = _n 
    
    gen double wm_num = X * market_value 
    
    rangestat (mean) mean=X (sum) wm_denom=market_value (sum) wm_numer=wm_num, int(date -1 1)  
    
    gen double wt_mean = wm_numer/wm_denom
    
    l date X market_value mean wt_mean , sepby(date) 
    
         +-------------------------------------------------------------+
         |      date            X   market~e         mean      wt_mean |
         |-------------------------------------------------------------|
      1. | 04jan1972   -.01768053          1    .00791361     .0155776 |
         |-------------------------------------------------------------|
      2. | 05jan1972    .00497976          2    .01639699    .02625342 |
      3. | 05jan1972     .0139237          3    .01639699    .02625342 |
      4. | 05jan1972    .03043149          4    .01639699    .02625342 |
         |-------------------------------------------------------------|
      5. | 06jan1972    .00481928          5    .02321249    .02845012 |
      6. | 06jan1972    .06190823          6    .02321249    .02845012 |
         |-------------------------------------------------------------|
      7. | 10jan1972   -.00471089          7    .00644856    .00712735 |
      8. | 10jan1972    .01235303          8    .00644856    .00712735 |
      9. | 10jan1972    .00732357          9    .00644856    .00712735 |
         |-------------------------------------------------------------|
     10. | 11jan1972     .0010963         10    .00894114    .00939582 |
     11. | 11jan1972    .00801647         11    .00894114    .00939582 |
     12. | 11jan1972    .01461288         12    .00894114    .00939582 |
         |-------------------------------------------------------------|
     13. | 12jan1972     .0477246         13    .00940063    .00879603 |
     14. | 12jan1972    .00730749         14    .00940063    .00879603 |
     15. | 12jan1972   -.01325316         15    .00940063    .00879603 |
         |-------------------------------------------------------------|
     16. | 13jan1972    .01826507         16    .03579082    .04102662 |
     17. | 13jan1972    .00052585         17    .03579082    .04102662 |
     18. | 13jan1972    .00868657         18    .03579082    .04102662 |
     19. | 13jan1972    .00102424         19    .03579082    .04102662 |
         |-------------------------------------------------------------|
     20. | 14jan1972    .01242608         20    .04672308    .05183014 |
     21. | 14jan1972    .23941069         21    .04672308    .05183014 |
         |-------------------------------------------------------------|
     22. | 17jan1972   -.00840515         22   -.00418962   -.00467727 |
     23. | 17jan1972   -.00067659         23   -.00418962   -.00467727 |
     24. | 17jan1972   -.00675602         24   -.00418962   -.00467727 |
     25. | 17jan1972    .03558889         25   -.00418962   -.00467727 |
     26. | 17jan1972    .00137931         26   -.00418962   -.00467727 |
     27. | 17jan1972   -.02818881         27   -.00418962   -.00467727 |
     28. | 17jan1972    .00538066         28   -.00418962   -.00467727 |
     29. | 17jan1972   -.02101227         29   -.00418962   -.00467727 |
     30. | 17jan1972   -.01274757         30   -.00418962   -.00467727 |
         |-------------------------------------------------------------|
     31. | 18jan1972    -.0064587         31   -.00418962   -.00467727 |
         +-------------------------------------------------------------+

    Hand-checks of simple cases:

    Code:
     
    . di (X[1] + X[2] + X[3] + X[4]) / 4
    .00791361
    
    . di (X[1] + 2 * X[2] + 3 * X[3] + 4 * X[4]) / (1 + 2 + 3 + 4)
    .0155776

    Comment


    • #3
      Thank you so much Nick. This is awesome. Indeed, rangestat is a very helpful one. I appreciate it.

      Two follow-up questions please:

      1- If I want to do the same exercise with weights based on "the rank of market-value within the 3-day window" rather than market-value, how can I do that? If, for example, there are 25 observations within the 3-day window, the observation with the largest market_value will have a weight of 25 while the smallest will have a weight of 1 (and so on depending on the number of observations within the window).



      2- Here I just want to confirm that for simple averages the code will simply be:
      Code:
       
       rangestat (mean) mean=X, int(date -1 1)
      And thank you.

      I look forward to your responses.

      Best regards
      Lisa

      Comment


      • #4
        Q2 of #3 yes. You could omit (mean) as that is the default.

        Q1 needs me to be back at a computer and not at my phone as now.

        Comment


        • #5
          Thank you. I look forward to hearing back regarding Q1 at your earliest convenience, thanks.

          Comment


          • #6
            This requires that you have installed moremata from SSC. I have assumed assigning equal ranks to tied values, as is the default with egen, rank().

            As with previous answers, this takes no special care with weekends or holidays without data.

            Code:
            clear
            input double(date day) long lpermno double X
            4386  4 48063   -.017680526788815442
            4387  5 32328    .004979764504273257
            4387  5 26710    .013923698163334514
            4387  5 45356    .030431487452540987
            4388  6 33312    .004819280099424343
            4388  6 37487    .061908231434809856
            4392 10 41734  -.0047108946665263395
            4392 10 30680    .012353034519365353
            4392 10 39386    .007323568661099492
            4393 11 21733   .0010962955461855245
            4393 11 42593    .008016466404701529
            4393 11 46754    .014612875383848886
            4394 12 31878    .047724595825440544
            4394 12 39263    .007307486919867627
            4394 12 42462   -.013253157401120177
            4395 13 47837    .018265066255042116
            4395 13 41996   .0005258543830600333
            4395 13 36775    .008686566347012507
            4395 13 19940   .0010242368276101013
            4396 14 21282    .012426076086511086
            4396 14 37073      .2394106870472846
            4399 17 46156    -.00840515136005388
            4399 17 30234  -.0006765900768512746
            4399 17 21397    -.00675601830814635
            4399 17 42104     .03558888911356758
            4399 17 45022   .0013793108362174368
            4399 17 49517    -.02818880793687696
            4399 17 28353    .005380657688987578
            4399 17 41486    -.02101226676147718
            4399 17 41523   -.012747568621689085
            4400 18 10487   -.006458697029810499
            end
            format %td date
            
            mata:  
            mata clear
            real rowvector rankwtmean(real colvector X) {
                real colvector ranks 
                ranks = mm_ranks(X, 1, 2)
                return(sum(ranks :* X)/sum(ranks))
            }
            end 
            
            rangestat (rankwtmean) X, int(date -1 1)
            
            list date X rankwtmean, sepby(date)
            
                +------------------------------------+
                 |      date            X   rankwtm~1 |
                 |------------------------------------|
              1. | 04jan1972   -.01768053    .0155776 |
                 |------------------------------------|
              2. | 05jan1972    .00497976   .02791424 |
              3. | 05jan1972     .0139237   .02791424 |
              4. | 05jan1972    .03043149   .02791424 |
                 |------------------------------------|
              5. | 06jan1972    .00481928   .03252113 |
              6. | 06jan1972    .06190823   .03252113 |
                 |------------------------------------|
              7. | 10jan1972   -.00471089   .00956956 |
              8. | 10jan1972    .01235303   .00956956 |
              9. | 10jan1972    .00732357   .00956956 |
                 |------------------------------------|
             10. | 11jan1972     .0010963   .01616569 |
             11. | 11jan1972    .00801647   .01616569 |
             12. | 11jan1972    .01461288   .01616569 |
                 |------------------------------------|
             13. | 12jan1972     .0477246   .01634969 |
             14. | 12jan1972    .00730749   .01634969 |
             15. | 12jan1972   -.01325316   .01634969 |
                 |------------------------------------|
             16. | 13jan1972    .01826507   .06227642 |
             17. | 13jan1972    .00052585   .06227642 |
             18. | 13jan1972    .00868657   .06227642 |
             19. | 13jan1972    .00102424   .06227642 |
                 |------------------------------------|
             20. | 14jan1972    .01242608   .07648228 |
             21. | 14jan1972    .23941069   .07648228 |
                 |------------------------------------|
             22. | 17jan1972   -.00840515   .00356371 |
             23. | 17jan1972   -.00067659   .00356371 |
             24. | 17jan1972   -.00675602   .00356371 |
             25. | 17jan1972    .03558889   .00356371 |
             26. | 17jan1972    .00137931   .00356371 |
             27. | 17jan1972   -.02818881   .00356371 |
             28. | 17jan1972    .00538066   .00356371 |
             29. | 17jan1972   -.02101227   .00356371 |
             30. | 17jan1972   -.01274757   .00356371 |
                 |------------------------------------|
             31. | 18jan1972    -.0064587   .00356371 |
                 +------------------------------------+

            Comment


            • #7
              Thanks Nick but the code does not have the weight variable "market_value". Am I missing something here?

              Comment


              • #8
                Correct. I was not remembering the problem correctly.


                Code:
                mata:  
                mata clear
                real rowvector rankwtmean(real matrix data) {
                    real colvector ranks 
                    ranks = mm_ranks(data[,2], 1, 2)
                    return(sum(ranks :* data[,1])/sum(ranks))
                }
                end 
                
                rangestat (rankwtmean) X market_value, int(date -1 1)
                
                list date X market_value rankwtmean, sepby(date)
                Note that the ordering of variables in the rangestat call is crucial.

                Comment


                • #9
                  Thanks a lot Nick. I am sorry but I do not really understand the mata language. Thus, will use your code as is.

                  Just to confirm what you mean by the importance of ordering in rangestat when I adapt the code to create other weighted averages in my data using the rank of the weight as described in #3 Q1: Did you mean that, after using your mata code, the first variable in rangestat must be the variable for which I need to create the weighted average and the second variable is the original weights variable in my data used for that, and then the last variable is the new 'rankweight' variable generated in the mata code?

                  Comment


                  • #10

                    Code:
                    rangestat (rankwtmean) A B, ...
                    returns the mean of A weighted by rank on B, and so
                    Code:
                    rangestat (rankwtmean) B A, ...
                    returns the mean of B weighted by rank on A.

                    In each case you minimally need an
                    interval() option too. The ellipses are not syntax, but to indicate incomplete commands.

                    You can also see that
                    rangestat needs two variables either way, which are fed into a N x 2 matrix data. data[, 1] is the first column, and so on.

                    Otherwise put, the syntax is positional, just as regress y x and regress x y are different.

                    mm_ranks() does the ranking.


                    The quirkiest bit of Mata in evidence here is :* for elementwise multiplication, * being reserved for matrix multiplication, although scalar multiplication is a special case of that.
                    Last edited by Nick Cox; 19 Aug 2022, 01:18.

                    Comment

                    Working...
                    X