Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • why xtile nq(3) only builds two groups, instead of three groups?

    I would like to build three groups (terciles) by using xtile. However, it turned out that xtile nq(3) only builds two groups, instead of three groups. Did I do something wrong? Thanks in advance for any suggestions.
    Click image for larger version

Name:	Screen Shot 2021-09-14 at 11.35.24 AM.png
Views:	1
Size:	20.5 KB
ID:	1627430


    Attached Files

  • #2
    Show us the results of

    Code:
    tab share  tercile
    for full information. You expect three groups and only got two. The simplest explanation is that there are only two distinct values in share. The more general rule that can overrule expectations is that observations with the same value will necessarily be assigned to the same quantile bin.

    There is more discussion in many places here and also within

    https://www.stata-journal.com/articl...article=pr0054 Section 4


    https://www.stata-journal.com/articl...article=dm0095 Section 6

    The executive summary is that quantile binning is often misunderstood, inappropriate or both, but a little thought and scrutiny of the data explains why that is.


    Quantile binning can't help you here -- that's the implication. Perhaps you should just use share directly.





    Comment


    • #3
      Originally posted by Nick Cox View Post
      Show us the results of

      Code:
      tab share tercile
      for full information. You expect three groups and only got two. The simplest explanation is that there are only two distinct values in share. The more general rule that can overrule expectations is that observations with the same value will necessarily be assigned to the same quantile bin.

      There is more discussion in many places here and also within

      https://www.stata-journal.com/articl...article=pr0054 Section 4


      https://www.stata-journal.com/articl...article=dm0095 Section 6

      The executive summary is that quantile binning is often misunderstood, inappropriate or both, but a little thought and scrutiny of the data explains why that is.


      Quantile binning can't help you here -- that's the implication. Perhaps you should just use share directly.




      Thanks a lot, Nick.

      Click image for larger version

Name:	Screen Shot 2021-09-15 at 10.06.25 AM.png
Views:	1
Size:	11.4 KB
ID:	1627527


      Click image for larger version

Name:	Screen Shot 2021-09-15 at 10.07.15 AM.png
Views:	1
Size:	22.3 KB
ID:	1627528

      Here is the part of the data of the share variable and the result with the code "tab share tercile". Thanks again for your suggestions.

      Comment


      • #4
        So, the next suggestion as in the dm0095 paper is to show the results of


        Code:
        quantile share, msymbol(none) mlabel(tercile) mlabposition(0) rlopts(lcolor(none))

        Comment


        • #5
          Originally posted by Nick Cox View Post
          So, the next suggestion as in the dm0095 paper is to show the results of


          Code:
          quantile share, msymbol(none) mlabel(tercile) mlabposition(0) rlopts(lcolor(none))
          sorry, i cannot access to the paper. Here is the result of the code. Thanks a lot.
          Click image for larger version

Name:	Screen Shot 2021-09-15 at 2.38.16 PM.png
Views:	1
Size:	36.7 KB
ID:	1627556

          Last edited by mws macekk; 15 Sep 2021, 06:40.

          Comment


          • #6
            Can you provide a sample of your data using -dataex- which replicates the problem?

            Looking at the information that you have provided my reactions is "this cannot be"...

            Comment


            • #7
              Originally posted by Joro Kolev View Post
              Can you provide a sample of your data using -dataex- which replicates the problem?

              Looking at the information that you have provided my reactions is "this cannot be"...
              Hi Joro, thanks for your reply. Unfortunately I am not allowed to share the data, which has more than 5000 obs. I just used a small sample of data and it works. I think the problem is that the value 1 counts more than 1/3 of the data in the share variable, as you can see part of the data in the screen shot. Probably that is why it is impossible to get three groups. Thanks.

              Comment


              • #8
                On the face of it tertile or tercile bins should be possible with your data. I think xtile is struggling with your set-up. A solution is indicated in https://www.stata-journal.com/articl...article=pr0054 (already cited in #2): negate the variable and try again.


                Code:
                . clear
                
                . set obs 5000
                Number of observations (_N) was 0, now 5,000.
                
                . gen y = cond(runiform() < 0.4, 1, runiform())
                
                . xtile bin3=y , nq(3)
                
                . tab bin3
                
                3 quantiles |
                      of y  |      Freq.     Percent        Cum.
                ------------+-----------------------------------
                          1 |      1,667       33.34       33.34
                          2 |      3,333       66.66      100.00
                ------------+-----------------------------------
                      Total |      5,000      100.00
                
                . gen negy = -y
                
                . xtile negbin3=negy , nq(3)
                
                . tab negbin3
                
                3 quantiles |
                   of negy  |      Freq.     Percent        Cum.
                ------------+-----------------------------------
                          1 |      1,946       38.92       38.92
                          2 |      1,388       27.76       66.68
                          3 |      1,666       33.32      100.00
                ------------+-----------------------------------
                      Total |      5,000      100.00

                Comment


                • #9
                  By "I just used a small sample of data and it works" you mean that when you do not use the full data the command does generate 3 terciles as it should?

                  You might be right about your explanation. I thought that it cannot be because the way how I eyeballed the data it seemed to me that at most 1/3 of the data is 1s, and below the data is roughly continuous...

                  You can check exactly what fraction of the data are 1s by

                  Code:
                  count if share==1
                  dis r(N)/_N
                  Originally posted by mws macekk View Post

                  Hi Joro, thanks for your reply. Unfortunately I am not allowed to share the data, which has more than 5000 obs. I just used a small sample of data and it works. I think the problem is that the value 1 counts more than 1/3 of the data in the share variable, as you can see part of the data in the screen shot. Probably that is why it is impossible to get three groups. Thanks.

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    On the face of it tertile or tercile bins should be possible with your data. I think xtile is struggling with your set-up. A solution is indicated in https://www.stata-journal.com/articl...article=pr0054 (already cited in #2): negate the variable and try again.


                    Code:
                    . clear
                    
                    . set obs 5000
                    Number of observations (_N) was 0, now 5,000.
                    
                    . gen y = cond(runiform() < 0.4, 1, runiform())
                    
                    . xtile bin3=y , nq(3)
                    
                    . tab bin3
                    
                    3 quantiles |
                    of y | Freq. Percent Cum.
                    ------------+-----------------------------------
                    1 | 1,667 33.34 33.34
                    2 | 3,333 66.66 100.00
                    ------------+-----------------------------------
                    Total | 5,000 100.00
                    
                    . gen negy = -y
                    
                    . xtile negbin3=negy , nq(3)
                    
                    . tab negbin3
                    
                    3 quantiles |
                    of negy | Freq. Percent Cum.
                    ------------+-----------------------------------
                    1 | 1,946 38.92 38.92
                    2 | 1,388 27.76 66.68
                    3 | 1,666 33.32 100.00
                    ------------+-----------------------------------
                    Total | 5,000 100.00
                    Thanks a lot. It works

                    Comment


                    • #11
                      Originally posted by Joro Kolev View Post
                      By "I just used a small sample of data and it works" you mean that when you do not use the full data the command does generate 3 terciles as it should?

                      You might be right about your explanation. I thought that it cannot be because the way how I eyeballed the data it seemed to me that at most 1/3 of the data is 1s, and below the data is roughly continuous...

                      You can check exactly what fraction of the data are 1s by

                      Code:
                      count if share==1
                      dis r(N)/_N
                      Yes, it is around 34%. Now it works. Thank you.
                      Last edited by mws macekk; 15 Sep 2021, 08:21.

                      Comment

                      Working...
                      X