No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I produce multiple bar graphs? A matrix of bar graphs? Can it be done with catplot? or tabplot?

    Using Stata SE12

    Data: Patient-level health data; patient characteristics and responses to a quality of life questionnaire

    I want to produce multiple bar charts displaying the categorical data distribution (proportion of patients in each category) for each item on my questionnaire, and show this separately for patients in 4 different settings. The aim is to provide a visual summary of patient responses to the questions, comparing differences between settings of care - I want to show a lot of information on one page as a visual summary NB: this is not for an academic paper - im reporting to health care teams on the data they have collected.

    I'm attaching the graph I have produced using catplot (SSC), code: catplot iposq2pain2_3, by(setting) percent(setting) blabel(bar, format(%4.1f) pos(top))

    This is how I want each bar graph to look - but I want multiple items/questions included, by setting (4 bar charts per question) - all within the same graph. Is this possible?

    Im also attaching a crude mock up of the graph I ideally want - a cut and paste of the graphs I have produced with catplot. example of ideal chart.docx

    One final point - there is an example of a 'matrix of bar graphs' here: does anyone know how this was produced?

    Thank you

    Attached Files

  • #2
    Extending previous private email with Joanna (who approached me as author of catplot (SSC)):

    You can get a two-way array of bar charts with a combination of by() and over() or just something like this.

    I suspect that the blog graph you report was produced using a graph hbar equivalent of catplot. Here is code you can run:

    set obs 1000
    egen question = seq(), to(20) block(50)
    set seed 2803 
    st_addvar("int", "answer")
    st_store(., "answer", rdiscrete(1000, 1, (0.1, 0.2, 0.4, 0.2, 0.1)))
    catplot answer, by(question)

    If your questions are separate variables, you need a reshape long first.


    • #3
      Thanks Nick. I have made some progress after reshaping to long. I produced: catplot by question long_reshape.docx using code: catplot ipos_answer, by(question) percent(question) ylabel(none) blabel(bar, format(%4.1f) pos(top))

      Running it separately for each setting - but this still doesn't get me quite what I want. Do you know a way to successfully add setting using catplot or another command? Everything I have tried is eligible. The ideal for me is to have four graphs for each question (one for each setting), with each question aligned on a separate row - as in my attached 'example of ideal chart' in above post.

      I cant post my actual data but the structure after reshaping is 4 variables: id; setting; question; answer

      Thank you for your help.



      • #4
        Please note that Word documents are deprecated here: many people can't (or won't) open them. For example, on the machine I am currently using I can't open that document at all. The point is covered at

        Does "eligible" mean "illegible"?

        I'll add more later when I can read that file. But posting graphs as .png (use attachment icon, not photo icon) would help.


        • #5
          Sorry! I will repost as .png......and yes, I mean illegible. In a meeting for the next hour.
          Thank you.


          • #6
            As I understand it, you have 10 questions X 4 settings X 7 possible answers. That's 40 bar charts with 7 bars each. In principle, it can be done; in practice it is hard work to make this at all legible without doing it on a poster that few might want to read. This demo understates the problem if anything, as extra text is needed in the real case.

            To document what is elsewhere stated: catplot and tabplot are from SSC and must be installed first.

            I haven't much explored the scope here for stacked (divided, segmented) bar charts which some might prefer here.

            My first attempt is a disaster.

            set obs 4000
            egen question = seq(), to(10) block(100)
            egen setting = seq(), to(4) block(1000)
            set seed 2803
            st_addvar("int", "answer")
            st_store(., "answer", rdiscrete(4000, 1, (0.05, 0.1, 0.2, 0.3, 0.2, 0.1, 0.05)))
            catplot answer question, by(setting)
            Click image for larger version

Name:	joanna1.png
Views:	1
Size:	17.9 KB
ID:	1317972

            My second attempt is not so bad.

            tabplot question answer, by(setting, compact note("")) showval(mlabsize(*.5)) ysize(7)
            Click image for larger version

Name:	image_3535.png
Views:	1
Size:	15.4 KB
ID:	1317973

            Last edited by Nick Cox; 25 Nov 2015, 05:58.


            • #7
              Joanna Davies it may be easier to first collapse the data - or use the table command to do the equivalent - to create frequencies for the aggregations of interest then you could visualize the summarized data. There's almost definitely a way to do it from the raw data as well, but if you need to get something turned around quickly it may make it easier for you to aggregate the data first and use the same advice about reshaping/structuring the data that was previously mentioned before graphing.


              • #8
                Billy: I think the data structure Joanna has is fine. The problems lie elsewhere in the amount of detail that is of interest.


                • #9
                  Nick Cox the tabplot example is really well put together. I can't see a larger version of the image, but the amount of additional strategically placed white space appears to make it much easier to view.


                  • #10
                    The code is self-contained in producing a fake dataset with the same structure, so you can reproduce the graph with your Stata.

                    The problem of adding informative text remains for the real data.


                    • #11
                      Thanks for the tabplot code example - I've produced something similar to your second attempt
                      Click image for larger version

Name:	Graph.png
Views:	1
Size:	61.0 KB
ID:	1317986

                      .....I think the earlier catplot is actually easier to read 'catplot ipos_answer, by(question) percent(question) ylabel(none) blabel(bar, format(%4.1f) pos(top))'

                      Click image for larger version

Name:	catplot.png
Views:	1
Size:	211.1 KB
ID:	1317987

                      Do you think any other commands might be useful? Or should I give up and look outside of stata? Maybe excel or tableau?

                      Thank you again for your help.


                      • #12
                        Naturally the catplot for one setting is easier to read than the tabplot for all four. That's not a fair comparison.

                        In the tabplot the ytitle and xtitle could both be deleted without loss.

                        In the catplot you don't need all the question labels to be repeated so much. That's probably for the Graph Editor.

                        I have never used Tableau. I once read two awful books on it, but they are no doubt not to taken as indicting the program. I don't advise using Excel for graphics. Clearly I can't and won't rule out the possibility of something better in other software, but the root difficulty here is trying to squeeze a lot of information on to a single display and remain legible and intelligible. There's no magic bullet for that.

                        If this were my problem, I would fall back on one graph for each setting and expect to tell people in text what is interesting or surprising. At some point you have to change viewpoint and focus on what the reader will find an easy and effective display.

                        Unless all your graphics are produced that way, the default blue background for s2color will be a distraction. I switched to s1color some time ago. It doesn't match all my preferences, but whenever I write my own graph scheme I forget what it's called.


                        • #13
                          I should also say the reason im pursuing this is to develop a template for national reporting, to be repeated across services a number of times per year - which is why im keen that it is automated (no cut and paste)... keen to get the cleanest, simplest solution, and would like to use stata if I can


                          • #14
                            Thanks Nick.Good point re reducing axis titles and changing colour. I appreciate your help. I'll endeavour to post the final graph on here at some point.


                            • #15
                              We agree: this is Statalist!