Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to double-sort quartile portfolio?

    Hello Statalists,

    I've only coded for single-sorted quartile portfolios in Stata but now I need to construct a double-sorted quartile portfolio. Put it simply is, in each quarter, I wanna firstly sort all stocks into quartiles based on their institutional ownership (tmi1). Next, with each quartile in each quarter, I wanna sort them into another set of quartiles based on their monitoring attention (wtio). Does anyone possibly know how to code in this way? I will be really grateful if any help comments are given. Many thanks in advance!

    Here is a partial dataset:


    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float quarter double(tmi1 wtio)
    117                   0  .00014276566598830893
    118                   0   .0001755080414020555
    119                   0   .0001991004412036009
    120                   0  .00024617221132283473
    121                   0   .0002354729108330631
    122                   0   .0002651689932122743
    123                   0   .0002275756588937073
    124                   0  .00008437844694620305
    125                   0  .00003834384297903891
    126                   0  1.422305485063643e-09
    127                   0  7.673856332974414e-09
    128                   0  4.842226747841946e-08
    129                   0  6.098118635452773e-08
    130                   0 1.0750975821075236e-07
    131                   0  1.412647771956958e-07
    132                   0 1.9660068313479758e-07
    133                   0  2.643924890089161e-07
    134                   0 4.2989573086812293e-07
    135                   0  6.760521129931567e-07
    136                   0  7.645901953251392e-07
    137                   0  6.903121146701238e-06
    138                   0  7.332357899342033e-06
    139                   0   7.66037555616215e-06
    140                   0 1.0702806958231637e-06
    141                   0  3.356142584589346e-08
    142                   0 2.8489726219135167e-08
    143                   0  2.919635418372186e-08
    144                   0 4.2363554399705993e-07
    145                   0 3.9909727128928835e-07
    146                   0 1.0368295044984471e-07
    147                   0 1.2051755094269015e-07
    148                   0 4.1842950920091276e-07
    149                   0   5.57991055698808e-07
    end
    format %tq quarter
    ------------------ copy up to and including the previous line ------------------

    Listed 33 out of 513687 observations


  • #2
    Sounds like you want:

    Code:
    by quarter, sort: egen tmi_quartile = xtile(tmi1), nq(4)
    by quarter tmi_quartile, sort: egen double_quartile = xtile(wtio), nq(4)
    Note: Your data example (thank you for using -dataex-) is not suitable for testing this code as the posted observations exhibit no variation in variable tmi1, so no quartiles can be calculated for it.

    Additional note: to use the -egen- function -xtile()- you must install the -egenmore- package from SSC if you do not already have it.

    Since the code could not be tested, beware of typos, punctuation issues, etc.

    Comment


    • #3
      @Clyde Schechter Hi Clyde, thank you for your reply! since there exists a huge number of zero values in variable tmi1, that's why I choose to build a quartile portfolio rather decile portfolio. I will pay more attention to attach non-zero observations next time. Thank you for your comment!

      Have you heard of -xtileJ-? xtileJ is able to use -by- option when sorting the portfolio. It's not developed by SSC but another academic professor. (sorry I couldn't remember his name now)

      When I compared with their xtile results to each other, there are only 201 contradictions in 637,406 observations. Do you know what does it result in such difference?

      Also, I am curious to know that if I want to first compute the cutpoints for each quartile by using -pctile- then maybe use loops to double sort them, do you possibly know how to do that way? Do you think it's necessary to do that?

      Many thanks indeed for your help!!!!

      Comment


      • #4
        I am not familiar with -xtileJ-, so I can't comment on it.

        From a more general perspective, there is an inherent indeterminacy in defining quartiles (or deciles, or any other quantiles) in terms of things like how ties are broken. So it is not uncommon for different approaches to defining quantiles to give slightly different results, and it isn't really possible to designate any one of them as "right."

        I don't think the alternate mechanisms for doing this you propose are in any way superior to what you have already done. They might even increase the confusion by resulting in yet a third version of the quartiles, differing slightly from the two you already have. My advice is to use one of the versions you already have (it doesn't matter which one) and leave it at that. If your ultimate results change appreciably depending on the way the quartiles are resolved by different software, then the entire approach is misguided and should be abandoned. (Which, by the way, goes along with my frequent advice that splitting continuous variables into discrete groups is usually a bad idea, though I am aware that what you are doing here is frequently done in financial analysis.)

        Comment


        • #5
          @Clyde Schechter Hi Clyde, thank you for your reply! You are right that I shouldn't confuse with finding the "right" method. When I applied both methods and they generated almost the same results. Thus, my portfolio sorting has already done a good job with your codes. One more thing that might need your help with is ultimately there should have 4*4 portfolios created in total since it's double sorting. But now, I have two individual decile sorts and am not sure how to connect them with each other with respect to estimating their excess returns, alphas and t-statistics under double sorting, in order to check when one condition keeps unchanged, what relation can be found from the pattern by examing the others?

          I wanna have something like the below: P1 - P4 is first sort and D1 - D4 is the second sort.


          Code:
              D1 D2 D3 D4
          P1    X  X   X   X
          P2    X  X   X   X
          P3    X  X   X   X
          P4    X  X   X   X
          Many thanks indeed in advance!!! Hope to hear from you soon!
          Last edited by Jae Li; 17 Nov 2017, 16:16.

          Comment


          • #6
            I don't understand what you're asking. You refer now to decile sorts, but before it was quartiles, and you show a diagram with a 4 x 4 matrix filled with x's. And what are those x's supposed to be? And what do you want to do with them? Whatever it is, it's unlikely it will work out easily in that arrangement. Almost certainly you will want to put the day into long layout first. But, in any case, I simply have no clue what you have now nor what you want to get from it.

            Comment


            • #7
              @Clyde Schechter Sorry about that! It's my mistakes that it should be quartiles, not deciles. I am working late at the office today so my brain gets slow a bit now.

              What I want to have is just like this random screenshot downloaded from somewhere else. For my case, based on the variable tmi1, sort all stocks into 4 equal-weighted quartile portfolios. Within these 4 quartiles, I would like to double sort them into another set of quartile portfolios by using the variable wtio. This is because I would like to find out, in each of the quartiles for tmi1, how is the relation between the obtained excess returns and wtio? Hopefully, it's clearer to you now. Sorry for the confusion earlier! Thank you very much for your time and patience with my question!

              Click image for larger version

Name:	double sort.JPG
Views:	1
Size:	19.5 KB
ID:	1418709

              Comment


              • #8
                So assuming you have already generated the quartile indicators as in #2, you can do this:

                Code:
                capture program drop myregress
                program define myregress
                    regress excess_returns wtio
                    gen b_wtio = _b[wtio]
                    gen se_wtio = _se[wtio]
                    exit
                end
                
                runby myregress, by(tmi_quartile double_quartile)
                
                table tmi_quartile double_quartile, c(mean b_wtio mean se_wtio)
                Note: Not tested, beware of typos.

                You can get the -runby- command from SSC.

                Comment


                • #9
                  Clyde Schechter Hi Clyde, thank you very much for your reply! I can always learn a great deal of Stata from you!! These codes work very well and thank you very much for letting me learn those commands: -program-, -runby- and -table-.

                  I'm just a bit confused about one thing and may I ask for your clarification? In the above codes, since double_quartile is sorted based on tmi_quartile and wtio, when you regress excess returns with wtio and runby myregress by both quartiles again rather than individually tmi1_quartile, do you think it can be a double effect for wtio to the excess returns?

                  What if this following way, is it able to demonstrate the relationship between excess returns or alphas under double sorting quartiles with respect to levels of tmi and wtio?

                  Code:
                  capture program drop myregress
                  program define myregress
                       reg excess_returns market_factor  * this is CAPM model
                       exit
                  end
                  runby myregress, by(tmi_quartile double_quartile) verbose
                  table tmi_quartile double_quartile, c(mean excess_returns mean alpha)
                  I am not very sure about that so will be grateful if you can give me some ideas. Many thanks in advance!!

                  Comment


                  • #10
                    In the above codes, since double_quartile is sorted based on tmi_quartile and wtio, when you regress excess returns with wtio and runby myregress by both quartiles again rather than individually tmi1_quartile, do you think it can be a double effect for wtio to the excess returns?
                    I'm not sure what you mean by this. Evidently the code in #2 generates a different set of quartiles of tm1 for each quartile of wtio. So when we run the code shown in #6, wtio both influences the definition of the tm1 quartiles and also appears in the regression. Is this what you mean by double effect? If so, it appears to be precisely what you said you wanted. Perhaps I misunderstood you originally--if so you need to clarify your question.

                    Your code in #9 will not do anything, because program myregress does not create any new variables. Basically it carries out the regression and then discards the results! Also the -table- command refers to a non-existent variable alpha.

                    Comment


                    • #11
                      @Clyde Schechter Hi Clyde, sorry for the late reply!

                      After -runby myregress- in the #9, it actually generates new variables for the regression, such as reg_nobs, reg_r2, reg_adj_r2 and etc. Since I didn't generate any variables in the -program define myregress-, does it happen because of the updated version of -runby-?

                      Can I ask you one more questions? If I'd like to get the 4*4 t-stat of the alpha generated from -runby myregress-, is the following code right? If so, why I obtained a negative t-stat for a positive alpha?

                      Code:
                      gen t_stat= alpha / se_cons
                      table tmi1_quartile double_quartile, c(mean t_stat)
                      Many thanks for your help and time indeed! Hope to hear from you!
                      Last edited by Jae Li; 07 Dec 2017, 13:38.

                      Comment


                      • #12
                        After -runby myregress- in the #9, it actually generates new variables for the regression, such as reg_nobs, reg_r2, reg_adj_r2 and etc. Since I didn't generate any variables in the -program define myregress-, does it happen because of the updated version of -runby-?
                        No, and, I find it impossible to believe. -runby- will not create any variables of its own: the program it calls has sole responsibility for that. Are you sure you weren't using -rangestat- with the (reg) specifier? It does generate new variables, and with precisely the names you show.

                        If I'd like to get the 4*4 t-stat of the alpha generated from -runby myregress-,
                        I have never heard of a 4*4 t statistic and have no idea what you want here. If you want the usual t-statistic and you already have b_cons and se_cons, then -gen t = b_cons/se_cons- will do it. I don't know what "alpha" is supposed to be here.

                        Comment


                        • #13
                          @Clyde Schechter Thank you for your swift reply! Yes, you are right that I use the -rangestat- instead of -reg-. My apologies for causing you the confusions!

                          I renamed the b_cons to become "alpha" for easy demonstration, then it should be alright. Thank you very much for your help with my questions! It's a great learning! :D

                          Comment


                          • #14
                            @Clyde Schechter Hi Clyde, can I ask you a clarification? If I would like to calculate the t-stat for the return difference between the high portfolio and low portfolio which are indicated in bold font. Can I just get their t-stats individually and subtract their t-stats in order to get the t-stat for their return differences? I will be really appreciated if you can give me some ideas. Many thanks in advance!
                            double_P2
                            P1 1(L) 2 3 4(H) H-L
                            1(L) -0.0142 -0.0017207 0.005844 0.012678
                            0.012307 0.00793432 0.013025 0.021179 0.008872
                            2 -0.01537 0.0031013 0.011542 0.019934
                            0.004306 0.01732273 0.023941 0.023994 0.019689
                            3 -0.01238 0.0081871 0.017074 0.026242
                            0.009943 0.01774806 0.021783 0.027239 0.017297
                            4(H) -0.00658 0.0063933 0.014818 0.020411
                            0.003146 0.00392176 0.01055 0.017706 0.014559
                            H-L -0.00916 -0.0040126 -0.00247 -0.00347
                            Last edited by Jae Li; 08 Dec 2017, 08:01.

                            Comment


                            • #15
                              To get an answer to your question you need to show an example of the data you now have. Mybe that array of numbers at the bottom of your post is it, but a) it isn't explained what any of these things are, and b) it clearly does not come from a Stata data set so it won't serve as a basis for doing anything with it in Stata.

                              Generically, you cannot calculate the t-statistic for a difference by subtracting within-group t-statistics. In fact, the two have essentially nothing to do with each other.

                              Comment

                              Working...
                              X