Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a percentile for panel data

    Hi all,

    I've got panel data for S&P500 companies from 1980-2020. I'm trying to compare the growth rates of companies each year (mean_SetGrowthRate). I've created a quartile variable using:
    Code:
     egen quartiles = xtile(mean_SetGrowthRate), nq(4) by(FiscalYear)
    However, I'm struggling to create a variable capturing the percentile. Does anyone have any advice?


    [CODE]
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str6 gvkey int FiscalYear double mean_SetGrowthRate float quartiles
    "010553" 1996 .5132858276367187 4
    "010553" 1997 .555506682395935 4
    "010553" 1998 .500489966571331 4
    "010553" 1999 .3374000534415245 4
    "010553" 2000 .10826473087072372 2
    "007435" 1980 . .
    "007435" 1981 . .
    "007435" 1982 . .
    "007435" 1983 . .
    "007435" 1984 . .
    "007435" 1985 .052807162143290044 2
    "007435" 1986 .05798280630260706 2
    "007435" 1987 .07435287125408649 2
    "007435" 1988 .0855174083262682 2
    "007435" 1989 .09322690851986408 3
    "007435" 1990 .10676461160182953 3
    "007435" 1991 .0923934232443571 3
    "007435" 1992 .08130626715719699 3
    "007435" 1993 .05884465053677559 3
    "007435" 1994 .04731899723410606 2
    "007435" 1995 .00864776000380516 1
    "007435" 1996 .015278443321585655 1
    "007435" 1997 .01885428763926029 1
    "007435" 1998 .016230352222919464 1
    "007435" 1999 .009618137031793594 1
    "007435" 2000 .044694111496210095 1
    "007435" 2001 .02545018568634987 1
    "007435" 2002 .016880373284220696 1
    "007435" 2003 .04079787693917751 2
    "007435" 2004 .05181824155151844 2
    "007435" 2005 .049769485369324684 2
    "007435" 2006 .07407481558620929 2
    "007435" 2007 .08435541540384292 2
    "007435" 2008 .06768619939684868 2
    "007435" 2009 .03118582144379616 2
    "007435" 2010 .0502423919737339 2
    "007435" 2011 .05577189847826958 2
    "007435" 2012 .04432333130389452 2
    "007435" 2013 .04419270474463701 3
    "007435" 2014 .06733258720487356 3
    "007435" 2015 .026999232731759547 2
    "007435" 2016 .003787817154079676 2
    "007435" 2017 .012091462220996618 2
    "007435" 2018 .012624131049960852 2
    "007435" 2019 .0026300246827304363 1
    "007435" 2020 .012651893892325461 2
    "009872" 1984 . .
    "009872" 1985 .2168377824127674 4
    "009872" 1986 .1147272415459156 3
    "009872" 1987 .06412613615393639 2
    "001078" 1980 . .
    "001078" 1981 . .
    "001078" 1982 . .
    "001078" 1983 . .
    "001078" 1984 . .
    "001078" 1985 .10561169162392617 3
    "001078" 1986 .10237123146653175 3
    "001078" 1987 .11065723821520805 3
    "001078" 1988 .11067664846777917 3
    "001078" 1989 .11658454835414886 3
    "001078" 1990 .1290273442864418 3
    "001078" 1991 .12571450173854828 3
    "001078" 1992 .1236031860113144 4
    "001078" 1993 .11273484528064728 4
    "001078" 1994 .11259527355432511 4
    "001078" 1995 .10233953446149827 3
    "001078" 1996 .09902683943510056 3
    "001078" 1997 .08645921647548675 2
    "001078" 1998 .08230235129594803 2
    "001078" 1999 .07572185173630715 2
    "001078" 2000 .0656447984278202 2
    "001078" 2001 .08259055241942406 2
    "001078" 2002 .0839779831469059 2
    "001078" 2003 .09654652848839759 3
    "001078" 2004 .08532463044284669 3
    "001078" 2005 .10320147008060303 3
    "001078" 2006 .06794641942142335 2
    "001078" 2007 .08135154873489228 2
    "001078" 2008 .08666618049501267 2
    "001078" 2009 .09505138397216797 3
    "001078" 2010 .09716675281524659 3
    "001078" 2011 .11642980575561523 4
    "001078" 2012 .0911028016358614 3
    "001078" 2013 -.027198467776179312 1
    "001078" 2014 -.05023393593728542 1
    "001078" 2015 -.07729050945490598 1
    "001078" 2016 -.09385411534458399 1
    "001078" 2017 -.036422541551291945 1
    "001078" 2018 .07727058697491884 3
    "001078" 2019 .10059929210692645 4
    "001078" 2020 .11598942019045352 4
    "016101" 2013 . .
    "016101" 2014 . .
    "016101" 2015 .07979576960206032 3
    "016101" 2016 .08100925013422966 3
    "016101" 2017 .09038794711232186 3
    "016101" 2018 .11808563321828842 4
    "016101" 2019 .1087647382169962 4
    "016101" 2020 .15497666634619237 4
    "063643" 2007 .19062627255916595 4
    end

  • #2
    Code:
    foreach p of numlist 25 50 75 {
        egen percentile_`p' = pctile(mean_SetGrowthRate), by(FiscalYear) p(`p')
    }
    Added: As noted by Nick Cox in #3, the code originally omitted the p() option, now added in bold face.
    Last edited by Clyde Schechter; 27 Apr 2023, 11:49.

    Comment


    • #3
      The code in #1 creates quartilebins and the code for egen function xtile() comes from egenmore on SSC.

      The code in #2 creates quartiles as particular percentiles and the code for egen function pctile() is official -- except that the option

      Code:
      p(`p')
      needs to be added for 25 and 75%, as otherwise you'll get three copies of the medians. Having p(50) is fine for the median.

      Hope that matches what is wanted and is also clear to interested readers.


      Comment


      • #4
        Thanks very much Nick and Clyde!

        Using Clyde's suggested code
        Code:
         
         foreach p of numlist 25 50 75 {egen percentile_`p' = pctile(mean_SetGrowthRate), by(FiscalYear) p(`p') }

        I've generated the cut-off points for the 25th, 50th, and 70th percentiles.

        I'm wondering though, is there a way to generate a variable that, rather than providing the cut-offs, shows what percentile a certain mean_SetGrowthRate is?

        (This is just a made-up example of a subsection of firms from 1990)
        Firm mean_SetGrowth Rate Quartile Percentile
        Firm A 0.299 3 0.74
        Firm C 0.0015 1 0.12
        Firm D 0.86 4 0.98
        Firm E 0.003 2 0.33
        Firm F 0.33 4 0.88

        Comment


        • #5
          Code:
          by FiscalYear (mean_SetGrowthRate), sort: egen rank = rank(mean_SetGrowthRate)
          replace rank = rank-1
          by FiscalYear (mean_SetGrowthRate): gen percentile = 100*(rank)/(rank[_N])

          Comment


          • #6
            See also the Stata FAQ on percentile ranks and plotting positions.

            Comment


            • #7
              Thanks all, this was very helpful!

              Comment

              Working...
              X