Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape data to create a stacked horizontal bar-graph, such as catplot from ssc, for several variables

    Dear all,

    First of all, I am really sorry this post is so long winded. I was not sure how much information I should give you, since I am asking both about - reshape - and about how to create a stacked bar graph for several (3 to 12) variables. My questions are at the end of this post, so if everything else seems logical to you then please jump to the end.

    I am running Stata 16 IC on Windows and I am currently working on survey data with ordinal level variables where a number of these variables are related to the same overall question. For example, a question from the survey could be "To what extent are these factors of importance to your choice of teaching methods?" And then there will follow a number of claims/questions (i.e. the variables) such as "Available resources (equipment, financial resources, support for the development of new teaching methods, etc.)", "Available time for planning / development" , "Feedback/expectations from the students" etc, which the respondent is supposed to "grade" on a scale from 1 to 5 (where 1 might be "Not at all" and 5 might be "Always")

    In my data there are a number of these types of batteries of questions and what I would like to do is create a graph for each battery of questions, such that the reader of the end result (a report) can compare the respondents' ratings on each of the related questions.

    I have already made bar graphs of the mean for the batteries of questions. Each of these graphs thus show information for a number of variables (from 3 variables to 12 variables, depending on how many questions were included in the particular battery of questions).

    What I need to do now, is to create similar graphs for each battery of questions, only now I need to show percentages for each value of each variable.

    I have been using Prof. Nick Cox program - catplot -
    Code:
     ssc install catplot
    However, I only used it for one variable. Creating stacked bars which show the percentage of each value for a number of variables with a common legend and axis showing percentage range from 0 to 100 is not as easy (or perhaps it is me who is not just smart enough).

    After reading a number of posts (on this forum) related to the same matter, I noticed that people asking for help were often advised either to use - tabplot -
    Code:
     ssc install tabplot
    or - statplot -
    Code:
     ssc install statplot
    ...or is was suggested that they reshape their data into long form (from wide form).

    My superior wants - catplot - type graphs, so unless - tabplot - and/or - statplot - can create the types of stacked percentage-bars I am looking for, then I think I am limited to what I can make happen with - catplot - .

    I then started fiddling around with a small working example of my data, which is now in this form:

    Code:
    clear
    input lpenr  b1_var1  b1_var2  b1_var3  nivaa 
                 1           1         4         3             1
                 2           4         .          5             1
                 3           5         3         4             2
                 4          .a         2         1             3
                 5           5         1         1             2
                 6           2         1         2             1
                 7          3         4          3             1
                 8          4         3          5             2
                 9          2         5          4             3
                 10        1        .a           .             2
    end
    ....and then I manually (!) "reshaped" it into this form:

    Code:
    clear 
    input float lpenr battcount  b1vars  nivaa
                        1        1              1          1
                        2        1              4          1
                        3        1              5          2
                        4        1             .a          3
                        5        1              5          2    
                        6        1              2          1
                        7        1              3          1
                        8        1              4          2
                        9        1              2          3
                      10        1              1          2                
                       1         2              4          1        
                       2         2                .          1
                       3         2              3          2
                       4         2              2          3
                       5         2              1          2                
                       6         2              1          1
                       7         2              4          1
                       8         2              3          2
                       9         2              5          3
                       10       2             .a          2            
                       1         3              3          1    
                       2         3              5          1
                       3         3              4          2
                       4         3              1          3
                       5         3              1          2            
                       6         3              2          1
                       7         3              3          1
                       8         3              5          2
                       9         3              4          3
                      10        3             .a          2
    end

    Notice that I did try to use the - reshape - command, but I wasn't able to make it work. Stata either noticed me that my data already was in a long form, or I was told that the variable "battcount", that I was trying to create in the - reshape - command:
    Code:
     reshape long b1, i(lpenr) j(battcount)
    was empty. Either way the result is my data is still in the first form listed above.

    When i "reshaped" the data manually and then called - catplot - in order to create the preferred graph, it seemed to work. The command I wrote which seems to be what I am looking for (though of this I am not sure, since I am sadly not very skilled at Stata) was:

    Code:
    la def battcount 1"Variabel 1" 2"Variabel 2" 3 "Variabel 3"
    la val battcount battcount
    
    catplot b1vars, over(battcount) percent(battcount) asyvars stack ///
    blabel(bar, pos(center) format(%2.1f)) yla(`label') ///
        ytitle(Prosent) legend(row(1) keygap(0.5) symxsize(4) size(small) region(lcolor(gs8))) ///
        scheme(s1mono) ///
        bar(1, fcolor(eggshell)) bar(2, fcolor(ltkhaki)) ///
        bar(3, fcolor(olive_teal)) bar(4, fcolor(bluishgray)) bar(5, fcolor(ltblue))
        blabel(bar, pos(center) format(%2.1f))  ///
        ysize(3.5) yla(0(20)100) 
        plotregion(lcolor(none))
    The resulting graph is this one:


    Click image for larger version

Name:	catplot__three variables-five-values-stacked-percentage-bars.png
Views:	1
Size:	19.2 KB
ID:	1614875


    My questions are:

    1) How can I reshape my data so that it gets from the "wide" form that I currently have, to the long form which I would like? (remember, the long format I posted above was created manually, and I need to do it in code, since my data set has over 7000 observations)

    2) Is what I have done in this example a meaningful way of getting to where I would like to go (given that I cannot use - statplot - or - tabplot -) or are there any other graph commands that may more easily create the same/very similar result?

    3) The variable labels listed to the left of the bars in the graph are short, but a number of my variable labels are long (i.e. they are questions). I came across a previous post where Nick Cox posted some code creating a program called - splitvarlabels - based on his program - splitvallabels -
    Code:
     ssc install splitvallabels
    and I have used it with some success, but not for several variables in the same graph and not for stacked bars. Any help on this matter is highly appreciated, as I can see it as an upcoming problem in creating these graphs.

    Thank you so much for reading this. I hope someone can help me out.

    Kind regards,
    Hilde

  • #2
    1) How can I reshape my data so that it gets from the "wide" form that I currently have, to the long form which I would like? (remember, the long format I posted above was created manually, and I need to do it in code, since my data set has over 7000 observations)
    The stub in

    b1_var1 b1_var2 b1_var3
    is b1_var. So the reshape command should be

    Code:
    reshape long b1_var, i(lpenr) j(battcount)

    3) The variable labels listed to the left of the bars in the graph are short, but a number of my variable labels are long (i.e. they are questions). I came across a previous post where Nick Cox posted some code creating a program called - splitvarlabels - based on his program - splitvallabels -
    Code:
    ssc install splitvallabels
    and I have used it with some success, but not for several variables in the same graph and not for stacked bars. Any help on this matter is highly appreciated, as I can see it as an upcoming problem in creating these graphs.
    It is possible to split the label over several lines, e.g., using the -relabel- option. I am not advanced in the automation of this process as I never have to do it, so perhaps Nick or someone else can advise you further.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(lpenr b1_var1 b1_var2 b1_var3 nivaa)
     1  1  4 3 1
     2  4  . 5 1
     3  5  3 4 2
     4 .a  2 1 3
     5  5  1 1 2
     6  2  1 2 1
     7  3  4 3 1
     8  4  3 5 2
     9  2  5 4 3
    10  1 .a . 2
    end
    
    
    reshape long b1_var, i(lpenr) j(battcount)
    rename b1_var b1vars
    la def battcount 1"Variabel 1" 2"Variabel 2" 3 "Variabel 3"
    la val battcount battcount
    
    catplot b1vars, over(battcount, relabel(1 `""A very very long label" "split over two lines""')) percent(battcount) asyvars stack ///
    blabel(bar, pos(center) format(%2.1f)) yla(`label') ///
        ytitle(Prosent) legend(row(1) keygap(0.5) symxsize(4) size(small) region(lcolor(gs8))) ///
        scheme(s1mono)  ///
        bar(1, fcolor(eggshell)) bar(2, fcolor(ltkhaki)) ///
        bar(3, fcolor(olive_teal)) bar(4, fcolor(bluishgray)) bar(5, fcolor(ltblue))
        blabel(bar, pos(center) format(%2.1f))  ///
        ysize(3.5) yla(0(20)100) 
        plotregion(lcolor(none))
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	31.8 KB
ID:	1614898

    Comment


    • #3
      Thank you so much, Andrew Musau, that was really helpful and of course your code works like a charm!

      I have actually been trying to do what you now showed me how to do (i.e. splitting the labels "manually" using double compound quotes) as a solution while I figure out how to do it in a more automated fashion, but I haven't been able to do it. The double compound quotes are something I need to get a better grip on I see, and the same with reshape, clearly. Back to reading the Stata programming manual.

      Thank you for helping me with this, I really appreciate it!

      All the best,
      Hilde

      Comment

      Working...
      X