Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating Graph with User-Defined Percentiles

    Hello!
    I'm trying to create a graph where the x-axis has user-specified percentiles of a variable, and the y-axis is the value of the variable at that percentile.
    I have recently started using STATA, so the solution may be simple. I was unable to find a way of doing this with graphs in STATA (the closest I found was the quantile graph), nor did I find previous discussion of this type of graph.
    Thank you for your guidance!

  • #2
    Your terminology is the wrong way round. Your x variable includes cumulative probabilities such as .25 .50 .75 or the corresponding percents such as 25% 50% 75%; and your y axis is to show the corresponding percentiles of some variable. So with a variable mpg the 50% percentile is a value of mpg, not the percent 50%.

    Such a graph is indeed a little unusual in Stata terms, unless you call it a boxplot. The usual line is that you might well as look at all the quantiles at once, using quantile or a more general command such as qplot (SJ).

    Here is some technique:

    Code:
     
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . _pctile mpg, p(5 10 25(25)75 90 95)
    
    . ret li
    
    scalars:
                     r(r1) =  14
                     r(r2) =  14
                     r(r3) =  18
                     r(r4) =  20
                     r(r5) =  25
                     r(r6) =  29
                     r(r7) =  34
    
    . gen quantile = .
    (74 missing values generated)
    
    . qui forval i = 1/7 {
    . replace quantile = r(r`i') in `i'
    . }
    
    . mat percent = (5,10,25,50,75,90,95)
    
    . mat li percent
    
    percent[1,7]
        c1  c2  c3  c4  c5  c6  c7
    r1   5  10  25  50  75  90  95
    
    . gen percent = percent[1,_n]
    (67 missing values generated)
    
    . scatter quantile percent, xla(5 10 25 50 75 90 95)
    For "STATA" read "Stata" passim: please do read the FAQ Advice all the way to Section 18.

    .

    Comment


    • #3
      Thanks for the response Nick! I did not make it to the end of the FAQ, apologies for that! I have reread it, and will keep it in mind going forwards.

      I went through the code Nick wrote, and believe I understand it. I will have to spend some time familiarizing myself with Mata, as it clearly is a powerful tool for generating Stata variables.

      I ended up using a graph bar with the variables created, which worked perfectly! Thanks again for the help!

      Comment


      • #4
        Thanks for this. I agree with your statement on Mata; nevertheless note that there is no Mata in my sample code; the matrix commands are from Stata's matrix language.

        Your comment reminds me that you can do things like this:

        Code:
        sysuse auto
        graph bar (p5) mpg (p10) mpg (p25) mpg (p50) mpg (p75) mpg (p90) mpg (p95) mpg , legend(col(1) pos(3))
        Last edited by Nick Cox; 17 Jul 2014, 14:32.

        Comment


        • #5
          Isn't there a way to do this using qplot? I assume that the idea is only graphing from the 5th to the 95th percentile. In case I want to recast my qplot to the line one, what would be the procedure?

          Comment


          • #6
            #5 In short, no. qplot (Stata Journal, as explained in #2) doesn't allow selection of quantiles, which is a different problem. See also https://journals.sagepub.com/doi/pdf...867X1601600413

            Code:
            qplot mpg, recast(line)
            exemplifies recasting to
            line. You may want to choose a connect() option other than the default.

            Depending on what "this" is in #5, it will be programmable. Here is some technique.

            Code:
            sysuse auto, clear 
            
            numlist "5 10 25(25)75 90 95"
            local numlist `r(numlist)'
            local nargs : word count `numlist'
             
            _pctile mpg, p(`numlist')
            
            gen quantile = . 
            gen percent = . 
            tokenize `numlist'
            
            quietly forval j = 1/`nargs' { 
                replace quantile = r(r`j') in `j'
                replace percent = ``j'' in `j'
            }
            
            set scheme s1color 
            local opts yla(, ang(h)) subtitle("`: var label mpg'")
            scatter quantile percent, xla(`numlist') `opts' name(G1, replace)
            
            twoway connected quantile percent, xla(`numlist') `opts' name(G2, replace)
            
            gen qnorm = invnormal(percent/100)
            
            twoway connected quantile qnorm, xla(-2/2) xtitle(standardized normal score) `opts'  name(G3, replace) 
            
            graph combine G1 G2 G3


            Click image for larger version

Name:	qplot_selected.png
Views:	1
Size:	26.6 KB
ID:	1553052

            Comment


            • #7
              Thank you Nick, it was indeed a very useful technique.

              Comment


              • #8
                I would like to create a bar graph (with error bars) that dísplays the values (finr) for two categories of a multiple categorical variable (vtype) on the y-axis by percentile (5, 10, 25, 50, 75, 90, 95) along the x-axis.
                Code:
                graph bar (p5) finr (p10) finr (p25) finr (p50) finr (p75) finr (p90) finr (p95) finr ///
                if inlist(faith2, 2, 3), over(at3) over(faith2) ytitle("") yla() title(Financial assets, size(small)) ///
                legend(lab(1 "5th") lab(2 "10th") lab(3 "25th") lab(4 "50th") ///
                        lab(5 "75th") lab(6 "90th") lab(7 "95th") col(7))
                I tried to replace the (busy) legend with x-axis labels
                Code:
                xlabel(1 "5th" 2 "10th" 3 "25th" 4 "50th" 5 "75th" 6 "90th" 7 "95th") xti("Percentiles")
                but Stata outputs
                Code:
                xlabels(1 5th" 2 "10th" 3 "25th" 4 "50th" 5 "75th" 6 "90th" 7 "95th") not allowed, " invalid na
                > me
                r(198);
                I still need to add the error bars - suggestions appreciated.

                I came across -cibar- (ssc install cibar), which provides confidence intervals, however, how can I include the percentiles along the x-axis using this program?
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float finr byte(at3 vtype)
                117951.12 0 2
                 235799.2 1 3
                 208530.8 1 3
                305687.22 0 3
                253412.14 0 3
                 413083.6 0 3
                280980.72 0 3
                28540.285 0 3
                  323.646 0 3
                223375.38 1 3
                147438.67 1 3
                 379603.6 0 3
                 715984.1 0 3
                 515115.4 0 3
                581753.56 0 3
                 458876.3 0 3
                   534214 1 3
                 367631.5 1 3
                 490251.7 1 3
                111402.04 1 3
                118890.35 1 3
                 95747.05 1 3
                 611670.6 1 3
                   772532 1 2
                1085729.4 1 2
                2158285.5 1 2
                 82337.68 0 2
                105099.09 0 2
                 118543.2 0 2
                239144.95 1 2
                   253273 1 2
                 208199.4 1 2
                 125484.6 1 2
                 365277.2 1 2
                 64402.76 1 2
                 11397.52 1 2
                 204755.6 1 2
                 194343.6 1 2
                   179749 1 3
                end
                Last edited by Chris Boulis; 02 Aug 2021, 22:09.

                Comment


                • #9
                  Correction: I mistyped my code in #8, it should be:
                  Code:
                  graph bar (p5) finr (p10) finr (p25) finr (p50) finr (p75) finr (p90) finr (p95) finr ///
                  if inlist(vtype, 2, 3), over(at3) over(vtype) yti("") yla() ti(Financial assets, size(small)) ///
                  legend(lab(1 "5th") lab(2 "10th") lab(3 "25th") lab(4 "50th") lab(5 "75th") lab(6 "90th") lab(7 "95th") col(7))
                  With -cibar-, I have confidence intervals, but am yet unable to add percentiles to the x-axis:
                  Code:
                  cibar finr if inlist(vtype, 2, 3), over1(vtype) over2(at3) ciopts(lcolor(red))  ///
                  graphopts(title("Financial assets", size(medsmall)) name(graph1, replace))

                  Help appreciated on either:

                  (1) adding confidence intervals using -graph bar-
                  (2) including percentiles on x-axis using -cibar-
                  Last edited by Chris Boulis; 03 Aug 2021, 06:00.

                  Comment


                  • #10
                    Chris Boulis I think your question might be better suited for a new thread. Also, your data example (thanks for using dataex) doesn't include all the variables used in your graph bar call.

                    Comment


                    • #11
                      Thank you for your advice Justin Niakamal. I will repost separately.

                      Comment

                      Working...
                      X