Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • collapse a dataset using _pctile

    Dear All,

    Please can you help me with the following aim:

    I want to obtain the mean, the percentile 2.5 and percentile 97.5 for the variable rmu_0 by values of w, x and y. I can achieve this with a loop as you can see below. However, the loop is really slow. It is particularly problematic because I want to do this for a series of variables.

    to generate a mean of a variable by subgroups I can simply do the fast command:

    Code:
    bysort w x y: egen _mean_rmu_0=mean(rmu_0)
    However, how do I achieve this to obtain 2.5 and 97.5 percentile? it does seem that only _pctile command allows to obtain 2.5 and 97.5 percentile

    This is my current slow code:

    Code:
    gen _mean_rmu_0=.
    gen _p1_rmu_0=.
    gen _p2_rmu_0=.
    foreach w in 0.3 0.5 0.70 {
    foreach x in 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.7 2 {
        foreach y in  -20 -15 -10 -5 0 5 10 15 20 {
            display "`x' & `y' & `w'"
            cap sum rmu_0 if HAZARDRATIO==`x' & HRQOL_DIF==`y' & PFSRATECONTROL==`w'
            cap replace _mean_rmu_0=`r(mean)' if HAZARDRATIO==`x' & HRQOL_DIF==`y' & PFSRATECONTROL==`w'
            cap _pctile rmu_0 if HAZARDRATIO==`x' & HRQOL_DIF==`y' & PFSRATECONTROL==`w', percentiles(2.5)
            cap replace _p1_rmu_0=`r(r1)' if HAZARDRATIO==`x' & HRQOL_DIF==`y'  & PFSRATECONTROL==`w'
            cap _pctile rmu_0 if HAZARDRATIO==`x' & HRQOL_DIF==`y'  & PFSRATECONTROL==`w', percentiles(97.5)
            cap replace _p2_rmu_0=`r(r1)' if HAZARDRATIO==`x' & HRQOL_DIF==`y' & PFSRATECONTROL==`w'
        }
    }
    }
    Thanks

    A

  • #2
    See the help for centile.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      See the help for centile.
      Yes centile calculate 2.5 and 97.5 percentiles. But the problem is it does not generate a variable that contains these percentile values by the levels of some variables.

      How can I do

      bysort x w z: centile rmu_0, centile(2.5 97.5) gen(p*) ??

      Thanks

      Andre

      Comment


      • #4
        Does the -pctile- function for -egen- not do this?

        Comment


        • #5
          Originally posted by Hemanshu Kumar View Post
          Does the -pctile- function for -egen- not do this?
          no, it does not do it. pctile function for egen only does percentile 1,2,3,4...99 but it does not do 2.5 or 97.5

          Comment


          • #6
            I just checked, and it seems to do the job. Can you double-check? I tried

            Code:
            clear
            set obs 10000
            gen x = runiform()
            gen y = runiformint(1,10)
            
            bys y: egen p2 = pctile(x), p(2)
            bys y: egen p25 = pctile(x), p(2.5)
            bys y: egen p3 = pctile(x), p(3)
            and it seems to work
            Last edited by Hemanshu Kumar; 15 Aug 2022, 08:26.

            Comment


            • #7
              All methods use some kind of approximation. Don't expect exactness here.

              Code:
              . sysuse auto, clear
              (1978 automobile data)
              
              . egen p_high = pctile(price), p(97.5)
              
              . su p_high
              
                  Variable |        Obs        Mean    Std. dev.       Min        Max
              -------------+---------------------------------------------------------
                    p_high |         74       14500           0      14500      14500
              
              . centile price, c(97.5)
              
                                                                        Binom. interp.   
                  Variable |       Obs  Percentile    Centile        [95% conf. interval]
              -------------+-------------------------------------------------------------
                     price |        74       97.5    14675.75        12516.46       15906*
              Otherwise the answer to #3 is that centile doesn't support a generate() option. So, at best the command works loosely like summarize: you have to pick up saved results after a command.

              Note that using say

              Code:
              `r(mean)' 
              offers no advantages and some disadvantages over using

              Code:
              r(mean) 
              as the first asks for the local macro persona of a saved result, which sometimes loses a little precision over using the returned scalar, and never gains anything.

              Comment


              • #8
                Thank you, yes!!! bys y: egen p25 = pctile(x), p(2.5) does work

                Thank you a lot

                Comment


                • #9
                  "works" doesn't necessarily mean "works well". It might be salutary to show the number of observations satisfying your joint conditions of the form

                  Code:
                   
                   if HAZARDRATIO==`x' & HRQOL_DIF==`y' & PFSRATECONTROL==`w'

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    "works" doesn't necessarily mean "works well". It might be salutary to show the number of observations satisfying your joint conditions of the form

                    Code:
                    if HAZARDRATIO==`x' & HRQOL_DIF==`y' & PFSRATECONTROL==`w'
                    Yes, thank you, I have done that.

                    using egen is so much faster than using the loop!! Using the loop takes around 4 minutes and using egen less than 10 seconds. And I obtain the exact same results.

                    Thanks

                    Comment

                    Working...
                    X