Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating deciles per year

    Dear StataWizard,

    I would like to create decile ranks per year. Below you can see my dataset and the code i have used:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str5 company_id float year double beta
    "0013" 2014    -.8181261724728726
    "0013" 2014   -.17427367231650542
    "0013" 2014    .08928337610626988
    "0013" 2015   -.17147193710867764
    "0013" 2015   -.47435350036829466
    "0013" 2015    .28930930367750896
    "0013" 2015     -.820590757908715
    "0013" 2016    -.3577401651022578
    "0013" 2016    -.5542820687273543
    "0013" 2016  -.012939297409046369
    "0013" 2016   -.42743684790169756
    "0013" 2017    -.8150554286127736
    "002F" 2015    -.8181261724728726
    "002F" 2015   -.17427367231650542
    "002F" 2015    .08928337610626988
    "002F" 2015   -.17147193710867764
    "002F" 2016   -.47435350036829466
    "002F" 2016    .28930930367750896
    "002F" 2016     -.820590757908715
    "002F" 2016    -.3577401651022578
    "002W" 2001    -.8181261724728726
    "002W" 2001   -.17427367231650542
    "002W" 2002    .08928337610626988
    "002W" 2002   -.17147193710867764
    "002W" 2002   -.47435350036829466
    "002W" 2002    .28930930367750896
    "002W" 2003     -.820590757908715
    "002W" 2003    -.3577401651022578
    "002W" 2003    -.5542820687273543
    "002W" 2003  -.012939297409046369
    "002W" 2004   -.42743684790169756
    "002W" 2004    -.8150554286127736
    "002W" 2004    -.2319791393100137
    "002W" 2004   -.17906705270339707
    "002W" 2005    -.2896405274944829
    "002W" 2005    -.7676753415100355
    "002W" 2005 -.0045243236635026385
    "002W" 2005   -.19705930266270438
    "002W" 2007   -.12678526986697786
    "002W" 2007    .06699673632162817
    "002W" 2007    -.4628438727003854
    "002W" 2007   -.18550289506870765
    "002W" 2008    -.2738693303011448
    "002W" 2008    -.8321019617913985
    "002W" 2008    -.6671928146692207
    "002W" 2008   -1.6131799891838055
    "002W" 2009   -.29943731165699705
    "002W" 2009    .18254707346473306
    "002W" 2009    -.5401758695997162
    "002W" 2009   -.41399701514675574
    "002W" 2010    -.8577921649224735
    "002W" 2010    -.9600599772907442
    "002W" 2010   -.41119169684900947
    "002W" 2010   -.12722453763740654
    "002W" 2011     -.680052395219459
    "002W" 2011    -.5076892945904703
    "002W" 2011    -.6224289775728925
    "002W" 2011    .16849812963142574
    "002W" 2012    -.2728935370318462
    "002W" 2012    .23822229814533857
    "002W" 2012   -.31032437186417106
    "002W" 2012    -.6426758762510004
    "002W" 2013    -.5887759655030005
    "002W" 2013    -.6330764922901906
    "002W" 2013   -.13110440800620055
    "002W" 2013   -.22357250499700038
    "002W" 2014    -.6163777700358478
    "002W" 2014    -.3929825492568509
    "002W" 2014    -.8586695093276018
    "002W" 2014    -.8232270150155061
    "002W" 2015    -.7034491235678368
    "002W" 2015   -.36800327623950807
    "002W" 2015    -.3952934898868018
    "002W" 2015     -.825400356687473
    "002W" 2016      -.20182042522839
    "002W" 2016    -.9171364992471871
    "002W" 2016   -.14131675615380257
    "002W" 2016     -.870594068304925
    "002Z" 2014    -.8181261724728726
    "002Z" 2014   -.17427367231650542
    "002Z" 2015    .08928337610626988
    "002Z" 2015   -.17147193710867764
    "002Z" 2015   -.47435350036829466
    "002Z" 2015    .28930930367750896
    "002Z" 2016     -.820590757908715
    "002Z" 2016    -.3577401651022578
    "003M" 2015    -.8181261724728726
    "003M" 2015   -.17427367231650542
    "003M" 2015    .08928337610626988
    "003M" 2015   -.17147193710867764
    "003M" 2016   -.47435350036829466
    "003M" 2016    .28930930367750896
    "003M" 2016     -.820590757908715
    "003M" 2016    -.3577401651022578
    "0048" 2015    -.8181261724728726
    "0048" 2015   -.17427367231650542
    "0048" 2015    .08928337610626988
    "0048" 2016   -.17147193710867764
    "0048" 2016   -.47435350036829466
    "0048" 2016    .28930930367750896
    end

    I thought I should have used the following command:

    xtile D_beta = beta, nq(10)
    gen beta_Decile = .

    foreach i in `year' {
    xtile D_beta2= beta if yearvar==`i', nq(10)
    replace beta_Decile = D_beta2 if missing(beta_Decile)
    drop D_beta
    }


    However, when I command: by year: tab D_beta I got the following the following output(expample year = 2010)

    10 |
    quantiles |
    of beta | Freq. Percent Cum.
    ------------+-----------------------------------
    1 | 92 2.40 2.40
    2 | 84 2.19 4.59
    3 | 792 20.64 25.22
    4 | 767 19.98 45.21
    5 | 856 22.30 67.51
    6 | 42 1.09 68.60
    7 | 101 2.63 71.24
    8 | 147 3.83 75.07
    9 | 191 4.98 80.04
    10 | 766 19.96 100.00
    ------------+-----------------------------------
    Total | 3,838 100.00


    As you can see the cumulative percentage is indeed 100%, however the frequencies and the percentage of each quantiles should be the same (10%). Does anybody has any clue what is wrong with my loop?

    Many thanks in advance

    Arnout



  • #2
    Nothing is wrong with your loop that I can see. The problem is almost certainly that Stata necessarily assigns identical values to the same quantile bin. So, check your data for ties.

    Comment


    • #3
      Hi Arnout,

      you might not be able to create deciles for all the years because for some the of the years you have less than 10 observations. I assumed that the data you shared with us is just a subsample of your real data. Thus, see the code below that creates artificial data keeping the same structure as yours. It creates the deciles as you need them.

      Code:
      drop _all
      set obs 100000
      gen beta  = round(runiform(-1.61318, .2893093)) 
      gen year  = round(runiform(2001,2017)) 
      
      levelsof year, local(years)
      gen decile = .
      foreach year of local years {
          tempvar dy
          xtile `dy' = beta if year == `year', n(10)
          replace decile = `dy' if year == `year'
      }
      Best,
      Best,
      Pablo Bonilla

      Comment


      • #4
        Pablo's point could be a problem in some datasets, but the example tabulation for 2010 was based on 3838 observations.

        Comment


        • #5
          Thanks for you help Pablo and Nick. However It is still not working. All the years that are important to me (2003-2016) contain more than 10 observations.

          -> year = 2014

          10 |
          quantiles |
          of beta | Freq. Percent Cum.
          ------------+-----------------------------------
          1 | 133 3.26 3.26
          2 | 794 19.47 22.73
          3 | 876 21.48 44.21
          4 | 139 3.41 47.62
          5 | 172 4.22 51.84
          6 | 1,528 37.47 89.31
          7 | 102 2.50 91.81
          8 | 80 1.96 93.77
          9 | 145 3.56 97.33
          10 | 109 2.67 100.00
          ------------+-----------------------------------
          Total | 4,078 100.00

          As you can see in the table above, the frequencies are still very dispersed. Quantile 6 takes 1528 observations while quantile 8 takes only 80. I would like to have equal (ass closest as possible) frequenties in each quantile in each year. Is this possible?

          Many thanks for your help.

          Kind regards,

          Arnout

          Comment


          • #6
            Clearly we can't see your data and don't expect you to post it all either. But the problem was already explained in #2 and can't be solved without arbitrary decisions. There is more discussion within http://www.stata-journal.com/sjpdf.h...iclenum=pr0054 in Section 4.

            Comment


            • #7
              Originally posted by Arnout Poelstra View Post
              As you can see in the table above, the frequencies are still very dispersed. Quantile 6 takes 1528 observations while quantile 8 takes only 80. I would like to have equal (ass closest as possible) frequenties in each quantile in each year. Is this possible?

              Many thanks for your help.

              Kind regards,

              Arnout

              Hi Arnout,

              I think the reason why your deciles are of sizes is because xtile divides the sample in 10 according to distribution of your variable of interest (i.e., beta) and not according to size of your sample. you may download the command quantiles by Rafael Guerreiro Osorio from ssc (ssc install quantiles). This command creates the deciles based on the sample size and not by the cutpoints of your variable of interest. This approach has pros and cons that should take into account but at least produces what you need (also, it is 'byable'). See below


              Code:
              drop _all
              set obs 100000
              gen beta  = round(runiform(-1.61318, .2893093)) 
              gen year  = round(runiform(2001,2017)) 
              
              bysort year: quantiles beta, gen(decile) n(10)
              tab year decile

              Best,
              Pablo Bonilla

              Comment

              Working...
              X