Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using by and pctile to create a dummy percentile variable

    Hi there,
    I have a variable exp and a time variable yyyy. I am trying to create a new variable, exp_dummy that will take a value of 0-3 based on what quartile it falls into of exp by yyyy. At the moment, the best I can come up with is to do something like:
    Code:
     
     by yyyy: egen exp_dummy = pctile(exp), p(4)
    However, this is not working for me. Can someone please help? Thanks

  • #2
    -egen pctile- is meant for calculation in percentiles, so p(4) is actually referring to 4%. -egen cut, group(4)- would be what you were looking for, except it cannot be combined with the by option.

    There's probably a smarter way, but I'd resort to the dumb -levelsof- + -foreach- solution if all else fails. Here's an example.

    Code:
    sysuse auto,clear
    levelsof foreign, l(origin)
    gen quartile=.
    foreach o of numlist `origin' {
        su price if foreign==`o',d
        replace quartile=0 if foreign==`o' & price <`r(p25)'
        replace quartile=1 if foreign==`o' & price >=`r(p25)' & price<`r(p50)'
        replace quartile=2 if foreign==`o' & price >=`r(p50)' & price<`r(p75)'
        replace quartile=3 if foreign==`o' & price >=`r(p75)'
    }

    Comment


    • #3
      You can also use -xtile- from the EGENMORE module (type -ssc install egenmore- to install):

      Code:
      clear
      set more off
      
      sysuse auto
      keep foreign price
      
      egen quartile = xtile(price), by(foreign) n(4)
      
      list
      
      /*
      // code to compare with Aspen Chen's results
      replace quartile = quartile - 1
      cf _all using <somefile>, verbose
      */
      This will give you slightly different results than Aspen's code because limits are treated differently. If you change Aspen's code to:

      Code:
      <snip>
      replace quartile=0 if foreign==`o' & price <=`r(p25)'
      replace quartile=1 if foreign==`o' & price >`r(p25)' & price<=`r(p50)'
      replace quartile=2 if foreign==`o' & price >`r(p50)' & price<=`r(p75)'
      replace quartile=3 if foreign==`o' & price >`r(p75)'
      <snip>
      the results are the same.
      You should:

      1. Read the FAQ carefully.

      2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

      3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

      4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

      Comment


      • #4
        Thanks, Reberto. It's great to learn about the -egenmore-, The ability to combine by() with -egen- makes things much more easier.

        Comment


        • #5
          If you're using relatively large data sets, the EGENMORE option will be very slow.
          Try using gquantiles from the gtools package:
          Code:
           
           gquantiles exp_dummy = exp, xtile nquantiles(4) by(yyyy)

          Comment

          Working...
          X