Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Catplot or graph hbar - how to specify appropriate percentages

    Hello All,

    I am trying to produce a horizontal bar graph using 3 categorical variables
    1. akps (measure of a patients function): 10 categories, 0-100
    2. phase3 (phase of palliative illness): 4 cats, stable, unstable, deteriorating and dying
    3. setting: 4 cats, hospital, hospice, community

    I want to display the percentage distribution for the akps categories in each setting, with a separate graph for each phase. The point is to see whether the distribution for akps in each phase is similar across settings. I want to display the proportions because there is an uneven number of cases in each setting which makes patterns using the counts less clear, and stretches the graph.

    Using catplot from SCC in stata 13, i have produced this....
    catplot setting, over(akps) by(phase3) asyvars

    Click image for larger version

Name:	catplot.png
Views:	1
Size:	36.7 KB
ID:	1344439

    And using graph hbar, i have produced the exact same graph....
    gen count=1
    replace count=. if akps==.
    graph hbar (sum)count, over(setting) over(akps) by(phase3) asyvars

    I would be grateful for advice on how to achieve the desired display of percentages - I have been trying various combinations and re-structures for some time now and unfortunately cannot resolve it.

    Your help is much appreciated.

    Many thanks,
    Joanna




  • #2
    Dataset example?

    With catplot (from SSC, not SCC!) the percent() option is to be thought of as specifying predictors that define separate conditional distributions.

    The difference between

    Code:
    sysuse auto, clear
    
    catplot rep78 foreign, percent(foreign)
    
    catplot foreign rep78, percent(rep78)
    should make that clear, or clearer.

    By the way, you are mixing graph hbar and catplot syntax here; that's allowed because catplot is here just a wrapper for graph hbar, but the syntax of catplot was designed so that people could think

    catplot response predictor [predictor]

    just as they would for a model fitting command.

    I can't see that akps is a predictor here; it sounds like the outcome or response of interest. It's on an ordinal scale, so other graph forms might work better here.

    Naturally I don't have your data, but for a graded response that is 0(10)100 (11 categories, not 10!) and 4 phases and 3 settings, I did this as a graph sketch using tabplot (SSC; Stata Journal in press):

    Code:
    clear
    set scheme s1color
    set seed 2803
    set obs 1200
    egen setting = seq(), to(3)
    label def setting 1 community 2 hospital 3 hospice
    label val setting setting
    egen phase3 = seq(), to(4) block(100)
    label def phase3 1 stable 2 unstable 3 deteriorating 4 dying
    label val phase3 phase3
    gen akps = 10 * floor(11 * runiform())
    
    tabplot akps setting, by(phase3, note("")) percent(setting phase3) showval(format("%2.0f") offset(7))   yla(0(20)100) bfcolor(none) horizontal barw(10) yasis

    Click image for larger version

Name:	joanna6.png
Views:	1
Size:	54.3 KB
ID:	1344442

    Last edited by Nick Cox; 08 Jun 2016, 03:06.

    Comment


    • #3
      Hi Nick,

      Thank you - this is really helpful.

      I like the tabplot and I will use it. But I still see value in a catplot for being able to grasp, at a glance, similarities and differences in the distribution of akps across settings according to phase.

      Re what is the predictor: akps (or functional status 0=dead; 100=perfect health) is associated with phase of illness (or, we expect to see lower function in patients who are dying or deteriorating). Phase is a new measure for us - i want to see (in a very descriptive preliminary way) if phase is being applied in a similar way across settings - or - does the association between akps and phase look similar or vastly different across settings?

      So i think in the catplot, i am trying to predict akps based on setting and phase. Thank you for the example explanatory code re catplot - i think i understand where I was going wrong.

      Using catplot from SSC (!!sorry)
      catplot setting, over(akps) by(phase3) percent(setting phase3) asyvars

      Click image for larger version

Name:	catplot 2.png
Views:	2
Size:	28.5 KB
ID:	1344453


      I think this gets me what im after - unless im miss-understanding the use of percentage()??

      I realise its not easy to see the detail on this graph but i think the overall patters are still useful - any further thoughts/comments you have are much appreciated.

      Best,
      Joanna
      Attached Files

      Comment


      • #4
        Good.

        But you don't give us the equivalent tabplot for comparison. I see the three settings mushed up together inside each panel -- in principle they are separated, but in practice grasping each pattern is hard, so I can't make comparisons easily.

        Comment


        • #5
          Ah, I see your point re providing (and actually doing!!) the tabplot for comparison. I take back everything i said about the merit in catplot in this case. Tabplot is a much better option for this data. Using your above code, i produced this

          tabplot akps setting, by(phase3, note("")) percent(setting phase3) showval(format("%2.0f") offset(7)) yla(0(20)100) bfcolor(none) horizontal barw(10) yasis

          Thank you Nick - this is much better.
          Joanna

          Thread closed

          Click image for larger version

Name:	tabplot.png
Views:	1
Size:	54.9 KB
ID:	1344460

          Comment


          • #6
            Hello Nick,

            I have a further question - as it is about tabplot I am keeping it on this thread (?).

            Im having trouble formatting a tabplot (SSC; Stata Journal in press) im trying to produce. The problem is that im not getting the usual bars - instead im getting a marker - I prefer the bars but I cant work out how to change this (after many attempts and reading the help files).

            I have built a dataset of summary statistics showing the proportion of patients who had an improvement in symptom scores at t2, each symptom has a separate denominator because not all patients are assessed for every symptom, i need to display the denominators in the tabplot. This is why I have produced this summary data rather than just working with the individual-level data.

            Dataset example:

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float ipos3 str19 ipos_end float prop_improved_atend
             1 "pain (n=69)"          55.07246
             2 "sob (n=47)"           46.80851
             3 "weakness (n=119)"      30.2521
             4 "nausea (n=13)"        76.92308
             5 "vomit (n=14)"         71.42857
             6 "appetite (n=75)"            40
             7 "constipation (n=34)"  64.70588
             8 "mouth (n=25)"               64
             9 "drowsiness (n=66)"    43.93939
            10 "mobility (n=108)"     22.22222
            11 "anxiety (n=60)"       48.33333
            12 "family (n=98)"       1.9525802
            13 "depressed (n=36)"     52.77778
            14 "peace (n=47)"         38.29787
            15 "feelings (n=35)"            40
            16 "information (n=20)"         55
            17 "practical (n=28)"     78.57143
            end
            label values ipos3 ipos3
            label def ipos3 1 "pain (n=69)", modify
            label def ipos3 2 "sob (n=47)", modify
            label def ipos3 3 "weakness (n=119)", modify
            label def ipos3 4 "nausea (n=13)", modify
            label def ipos3 5 "vomit (n=14)", modify
            label def ipos3 6 "appetite (n=75)", modify
            label def ipos3 7 "constipation (n=34)", modify
            label def ipos3 8 "mouth (n=25)", modify
            label def ipos3 9 "drowsiness (n=66)", modify
            label def ipos3 10 "mobility (n=108)", modify
            label def ipos3 11 "anxiety (n=60)", modify
            label def ipos3 12 "family (n=98)", modify
            label def ipos3 13 "depressed (n=36)", modify
            label def ipos3 14 "peace (n=47)", modify
            label def ipos3 15 "feelings (n=35)", modify
            label def ipos3 16 "information (n=20)", modify
            label def ipos3 17 "practical (n=28)", modify


            Using the following syntax:
            labmask ipos3, values(ipos_end)
            format prop_improved_atend %2.0f
            tabplot ipos3 prop_improved_atend, xasis showval(prop_improved_atend) horizontal barw(0.9) ///
            yla(, noticks) ytitle("") subtitle("percent") xtitle("") xlabel(,labsize(vsmall))


            I have produced this....
            Click image for larger version

Name:	tabplot markers not bars.png
Views:	1
Size:	35.2 KB
ID:	1345752

            Is it possible to have the bars instead of these markers? Sorry if there is an obvious resolution to this but I cant find it.

            Thank you.
            Joanna

            p.s. I realise that in this dataset the denominators are really too small to produce proportions - this is just an example dataset im using to develop the code for reporting, the real data is larger so the proportions will be more appropriate.


            Comment


            • #7
              This additional post is in error - i cant see how to delete
              Last edited by Joanna Davies; 17 Jun 2016, 12:42.

              Comment


              • #8
                tabplot is doing exactly as you ask. The problem is that prop_improved_atend is not a categorical variable and makes no sense as a column identifier. It needs to be supplied as a weight.

                Further, what tabplot does by default is to count occurrences. In your case there is precisely one occurrence of each cross-combination, so you get bars all of length 1. They really are bars, not markers. Clearly they look small on your scale, which is a side-effect of your specifying xasis.

                I guess this is closer to what you want. I would re-order the bars unless there is a psychological/psychiatric/clinical rationale for the order you use.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                set scheme s1color
                input float ipos3 str19 ipos_end float prop_improved_atend
                 1 "pain (n=69)"          55.07246
                 2 "sob (n=47)"           46.80851
                 3 "weakness (n=119)"      30.2521
                 4 "nausea (n=13)"        76.92308
                 5 "vomit (n=14)"         71.42857
                 6 "appetite (n=75)"            40
                 7 "constipation (n=34)"  64.70588
                 8 "mouth (n=25)"               64
                 9 "drowsiness (n=66)"    43.93939
                10 "mobility (n=108)"     22.22222
                11 "anxiety (n=60)"       48.33333
                12 "family (n=98)"       1.9525802
                13 "depressed (n=36)"     52.77778
                14 "peace (n=47)"         38.29787
                15 "feelings (n=35)"            40
                16 "information (n=20)"         55
                17 "practical (n=28)"     78.57143
                end
                label values ipos3 ipos3
                label def ipos3 1 "pain (n=69)", modify
                label def ipos3 2 "sob (n=47)", modify
                label def ipos3 3 "weakness (n=119)", modify
                label def ipos3 4 "nausea (n=13)", modify
                label def ipos3 5 "vomit (n=14)", modify
                label def ipos3 6 "appetite (n=75)", modify
                label def ipos3 7 "constipation (n=34)", modify
                label def ipos3 8 "mouth (n=25)", modify
                label def ipos3 9 "drowsiness (n=66)", modify
                label def ipos3 10 "mobility (n=108)", modify
                label def ipos3 11 "anxiety (n=60)", modify
                label def ipos3 12 "family (n=98)", modify
                label def ipos3 13 "depressed (n=36)", modify
                label def ipos3 14 "peace (n=47)", modify
                label def ipos3 15 "feelings (n=35)", modify
                label def ipos3 16 "information (n=20)", modify
                label def ipos3 17 "practical (n=28)", modify
                tabplot ipos3 [iw=prop_improved_atend] , showval(format(%2.0f) offset(0.45)) horizontal subtitle("        percent") bfcolor(green*0.2)  ytitle("")
                Click image for larger version

Name:	joanna.png
Views:	1
Size:	16.1 KB
ID:	1345760

                Last edited by Nick Cox; 17 Jun 2016, 12:54.

                Comment


                • #9
                  Thanks Nick, I see where I was going wrong. This is exactly what I am after.

                  There is a reason for the order of the bars - it follows the order they appear on the measure so makes sense for the clinicians.

                  Thank you!
                  j

                  thread closed.

                  Comment


                  • #10
                    It should be pointed out that graph hbar will work fine here.

                    Code:
                    graph hbar (asis) prop_improved_atend, over(ipos3) blabel(total, format(%2.0f)) subtitle("percent") bar(1, bfcolor(green*0.2)) ysc(off)
                    Click image for larger version

Name:	joanna2.png
Views:	1
Size:	17.3 KB
ID:	1345806



                    I sometimes shift the bars away from the y axis with an extra option such as ysc(r(-2 .))

                    Comment


                    • #11
                      When I say the y axis here I mean the left-hand axis. Stata describes the response axis, here horizontal, as the y axis with graph hbar.

                      Comment

                      Working...
                      X