Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating graphs showing N for number of respondents using - graph hbar (mean) - and - catplot -

    Dear Stata-listers

    This is my first time posting to this list, so I hope I'm able to do this "the right way".

    What I would like to do, is to create a number of graphs using - graph hbar - and to have these graphs show number of respondents (N) for for each variable. Some graphs contain up to 7 different variables, for which my current code calculates the mean values. I would like to have the graph show N (respondents) for each variable to the left in the graph and the mean value of the variable to the right (outside the particular bar) (see example graph provided below).

    So, as the example-graph illustrates, currently I'm able to have my graphs show N when I'm using the user written program - catplot - ( - catplot - has been written by Kit Baum and can be found on SSC ( - findit catplot - )), which is for plots of categorical data showing frequencies, fractions or percents i Stata. I hope that the code I am adding further down in this post is able to have you generate the same graph using auto.dta provided by Stata. Anyway, - this code gives me what I need for the graphs showing percentages for each value of a variable separately for a number of groups and N for each group of respondents that answered the particular question (the data I'm myself using is a type of student survey data, so the variables have Likert scales and the categories are typically master- and bachelor students, or students attending different study programmes, such as nursing, engineering, law etc)

    However, I'm not able to do this for mean values of the variables when using the command - graph hbar - .
    (By the way, these particular graphs are sometimes drawn separately for different groups and sometimes not)

    Could anyone please advice me as to how I can create a graph showing mean values (using - graph hbar - ) instead of percentages, and to have N included in the same manner as the graphs I have currently made using - catplot - ? (see example graph which should be attached below)


    ************************************************** **********************************
    // An example using auto.data and - catplot - :


    //First: making variable(s) containing origin of cars

    sysuse auto, clear


    ta foreign
    ta foreign, nol

    set more off
    foreach var of varlist rep78 { //My own loop contains several variables, but for simplicity there's only 1 variable now
    ge N3`var' = 0 if foreign==0 & `var' <=5 //Domestic
    replace N3`var' = 1 if foreign==1 & `var' <=5 //Foreign
    }

    capture la drop origin2

    la def origin2 0"Domestic" 1"Foreign"

    la val N3rep78 origin2

    label list origin2

    capture la drop N3rep78

    foreach var of varlist N3rep78 {
    la var `var' "Prep for N for origin in graphs"
    }


    ta rep78
    ta N3rep78

    ta rep78 foreign

    //Second: drawing the - catplot - graph:

    capture drop clone
    capture drop origin2

    foreach val of varlist N3rep78 {
    clonevar clone = `val'
    decode `val', gen(origin2)
    bysort `val': replace origin2=origin2 + " (n="+string(_N)+")"
    labmask clone, values(origin2)
    #delimit ;
    catplot rep78 clone, stack asyvars percent(clone)
    bar(1, fcolor(gs15)) bar(2, fcolor(gs14)) bar(3, fcolor(gs13)) bar(4, fcolor(gs12)) bar(5, fcolor(gs11))
    blabel(bar, pos(center) format(%2.0f)) legend (pos(bottom) col(5))
    ysize(3) yla(0(20)100)
    plotregion(lcolor(none))
    scheme(s1mono)
    title("Repairs for foreign and domestic cars")
    ytitle("(Percentages calculated for each value of rep78)")
    note(" " " Note: Mean of rep78 for foreign =__ and for domestic=__", span) //<--this I would ideally have liked to automate (i.e. include mean values), but I have not been able to
    legend(keygap(0.5) symxsize(9))
    name(rep78_origin_N1 , replace) ;
    drop clone origin2 ;
    #delimit cr
    }


    ************************************************** ***************************************

    // Here follows my code for the graphs I would like to include N, but have not been able to:

    // graph hbar over two categories

    //here I would like to have N to say 48 and 21, since this is the number of domestic and foreign cars in rep78:

    ta rep78 foreign


    #delimit ;
    graph hbar (mean)rep78 ,
    over(foreign, relabel(1"DOMESTIC" 2"FOREIGN") gap(*2.5) label(labcolor(gs1)))
    showyvars
    yvaroptions(relabel(1 "Repair records")
    gap(*1.5) label(labcolor(black) labsize(small)))
    bar(1, fcolor(gs9))
    blabel(bar, pos(outside) format(%12.1f))
    ysize(3) yla(1(1)5)
    exclude0
    legend(off)
    plotregion(lcolor(none))
    scheme(s1mono)
    title("Mean repair record for domestic and foreign cars" " ", size(large) span)
    ytitle(" ""(Scale: Number of repairs)" "", size(small))
    name(rep78_origin_N2 , replace) ;
    graph save rep78_origin_N2, replace;
    #delimit cr


    // graph hbar - not separately for any categories

    //here I would like N to show that N = 69:

    ta rep78

    #delimit ;
    graph hbar (mean)rep78,
    asyvars
    showyvars
    yvaroptions(relabel(1 "Repair record")
    gap(*1.5) label(labcolor(black) labsize(small)))
    bar(1, fcolor(gs9))
    blabel(bar, pos(outside) format(%12.1f))
    ysize(3) yla(1(1)5)
    exclude0
    legend(off)
    plotregion(lcolor(none))
    scheme(s1mono)
    title("Mean repair record for cars, regardless of origin" " ", size(large) span)
    ytitle(" ""(Scale: Number of repairs)" "", size(small))
    name(rep78_origin_N3 , replace) ;
    graph save rep78_origin_N3, replace;
    #delimit cr



    I guess some of the problem is caused by me using the - relabel - option in graph hbar.
    However, I would not like to have the graphs show simply the variable names, since these are not descriptive enough for my audience



    Any help on this matter is greatly appreciated. Thank you all so much in advance.

    Best wishes,
    Hilde Johanne


    Attached Files

  • #2
    I read this twice and don't really understand what your question is, partly because you seem to answer it yourself. If your main graph shows means, the main way to show sample sizes too is by added text, somehow, say by modifying value labels to include that information.

    For the record, I don't think my great friend Kit Baum is the author of catplot (SSC). Last I heard, that was me.

    Comment


    • #3
      Something like this?

      Code:
      // graph hbar over two categories
      
      //here I would like to have N to say 48 and 21, since this is the number of domestic and foreign cars in rep78:
      
      ta rep78 foreign
      
      
      levelsof foreign, loc(f)
      foreach n in `f' {
          qui su foreign if foreign == `n'
          loc f`n' `"`r(N)'"'
          }
          
          #delimit ;
      
      graph hbar (mean)rep78 ,
      over(foreign, relabel(1 "DOMESTIC (n=`f1')" 2 "FOREIGN(n=`f2')") gap(*2.5) label(labcolor(gs1)))
      showyvars
      yvaroptions(relabel(1 "Repair records")
      gap(*1.5) label(labcolor(black) labsize(small)))
      bar(1, fcolor(gs9))
      blabel(bar, pos(outside) format(%12.1f))
      ysize(3) yla(1(1)5)
      exclude0
      legend(off)
      plotregion(lcolor(none))
      scheme(s1mono)
      title("Mean repair record for domestic and foreign cars" " ", size(large) span)
      ytitle(" ""(Scale: Number of repairs)" "", size(small))
      name(rep78_origin_N2 , replace) ;
      graph save rep78_origin_N2, replace;
      #delimit cr
      
      
      // graph hbar - not separately for any categories
      
      //here I would like N to show that N = 69:
      
      ta rep78
      
      qui su rep78
       loc rn  "`r(N)'"
       #delimit ;
      
      graph hbar (mean)rep78,
      asyvars
      showyvars
      yvaroptions(relabel(1 "Repair record (n=`rn')")
      gap(*1.5) label(labcolor(black) labsize(small)))
      bar(1, fcolor(gs9))
      blabel(bar, pos(outside) format(%12.1f))
      ysize(3) yla(1(1)5)
      exclude0
      legend(off)
      plotregion(lcolor(none))
      scheme(s1mono)
      title("Mean repair record for cars, regardless of origin" " ", size(large) span)
      ytitle(" ""(Scale: Number of repairs)" "", size(small))
      name(rep78_origin_N3 , replace) ;
      graph save rep78_origin_N3, replace;
      #delimit cr
      Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

      Comment


      • #4
        Johanne Karlsen and for the new members in general, just to expand on Nick's comment about the SSC (in #2).

        There are many "resources for adding features to Stata", as we may check here (http://www.stata.com/links/resources...ding-features/).

        The most cited and used one is the SSC (Statistical Software Components) and it is maintained by Kit Baum (among other remarkable deeds, the author of the excellent "An Introduction to Stata Programming").

        Hence, the ubiquitous prelude in the announcements seen here whenever a new SSC (or an update) is launched - "thanks to Kit Baum" - is on account of this fact, not necessarily the authorship.
        Last edited by Marcos Almeida; 17 Jan 2017, 06:28.
        Best regards,

        Marcos

        Comment


        • #5
          First of all, I am so sorry, Nick Cox , I totally misunderstood the information given in an older post here on Statalist. Like Marcos Almeida suggests, I wrongly assumed that "thanks to Kit Baum" meant Kit Baum is the author of - catplot -. Now I had to learn this the hard way, but at least I will not forget it. - catplot - is a really helpful program which I have used a lot lately, so thank you so much for taking your time making it!

          I understand that my questions were not very clear, and I apologise. I have been seriously ill for four years and my brain is simply not working as well as it used to do (sorry, this may be too much information, but anyway, it might explain some things). Luckily though, my problem has almost been solved by eric_a_booth. I was in fact trying something like this myself, but I was unable to figure out how to make it work. Just one small thing, which I would be so thankful if you could help me with: When I run your code, the graph I get has correctly inserted n=22 for domestic cars, but the n for foreign car is missing. All i get is: "FOREIGN (n= ) Repair records". Do you get the same result (graph) when you run your code, or might my problem rather be caused by me not doing it correctly?

          By the way, please excuse my poor English. I'm Norwegian myself and have never lived abroad, so my vocabulary for formulating intelligent questions is a bit limited.

          Also, thank you both for commenting and for providing this excellent forum for us Stata-lovers!

          PS: I edited this post after noticing that I was thanking the wrong person for the code suggested above.
          Last edited by Johanne Karlsen; 18 Jan 2017, 01:58.

          Comment


          • #6
            So sorry! It was eric_a_booth 's code that helped me! Please forgive me for initially crediting the wrong person, and thank you so much for helping me. Do you think you can help me solve the small issue that I mentioned in my last post?
            Last edited by Johanne Karlsen; 18 Jan 2017, 01:59.

            Comment


            • #7
              You are probably running Eric's code line by line, or chunk by chunk, from a do-file editor window. If you do that then the local macros are not visible to the commands being issued. That's what "local" means: visible only locally in the space within which is defined. You need to run the code all at once, as a block.

              Comment


              • #8
                Thank you for commenting, Nick Cox .

                What I did was I ran this code as a block:


                sysuse auto, clear


                ta rep78 foreign


                levelsof foreign, loc(f)
                foreach n in `f' {
                qui su foreign if foreign == `n'
                loc f`n' `"`r(N)'"'
                }
                #delimit ;
                graph hbar (mean)rep78 ,
                over(foreign, relabel(1 "DOMESTIC (n=`f1')" 2 "FOREIGN (n=`f2')") gap(*2.5) label(labcolor(gs1)))
                showyvars
                yvaroptions(relabel(1 "Repair records")
                gap(*1.5) label(labcolor(black) labsize(small)))
                bar(1, fcolor(gs9))
                blabel(bar, pos(outside) format(%12.1f))
                ysize(3) yla(1(1)5)
                exclude0
                legend(off)
                plotregion(lcolor(none))
                scheme(s1mono)
                title("Mean repair record for domestic and foreign cars" " ", size(large) span)
                ytitle(" ""(Scale: Number of repairs)" "", size(small))
                name(rep78_origin_N2 , replace) ;
                graph save rep78_origin_N2, replace;
                #delimit cr

                I cannot see anything wrong with the code eric_a_booth sent me, but then again I'm not very familiar with using local macros.

                Comment


                • #9
                  There is a bug in Eric's code. The offending part should be

                  Code:
                  graph hbar (mean)rep78 ,
                  over(foreign, relabel(1 "DOMESTIC (n=`f0')" 2 "FOREIGN (n=`f1')")
                  The distinct values of foreign are 0 and 1 so the local macros will be named accordingly.

                  Comment


                  • #10
                    Sorry about that Johanne & thanks Nick for the fix!
                    I wasn't at a machine with Stata to test that code & I forgot that the levels in foreign are "0" "1", not "1" "2".
                    Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

                    Comment


                    • #11
                      Thank you so much Nick and Eric! Your help has been wonderful and I have learned something about how I can use local macros now.

                      I would like to learn more, in particular (due to my current work tasks) how to use local macros to store estimates, variable labels and value labels, and then use these in different types of graphs. Do you have any suggestions on literature I can find online (and please consider me a "macro"-novice)?
                      For example, in addition to the obvious need to delve into the Stata manual, are there any Stata journal articles on this particular matter? I guess what I'm looking for is more knowledge not only on local macro functions in general, but extended local macro functions as well?

                      And finally, one last thing: If I have follow-up questions related to this post (that is, questions about adding additional lines of code to Eric's code) should I continue this thread or start a new topic in which I simply refer to this thread?

                      Comment


                      • #12
                        Johanne Karlsen
                        Do you have any suggestions on literature I can find online

                        The Stata Manual is really a "must read". Also, there is a reference mentioned in #4. You may find it (and many other books) here: http://www.stata.com/bookstore/books-on-stata/
                        Best regards,

                        Marcos

                        Comment


                        • #13
                          Chapter 18 of the User's Manual is for me the best starting point. Kit Baum's book on Stata programming is excellent.


                          http://www.stata-journal.com/sjpdf.h...iclenum=pr0005 springs to mind as a tutorial for various reasons.

                          Comment


                          • #14
                            Tank you for your advice, Marcos and Nick! I'll look into these resources, then.

                            Comment


                            • #15
                              I feel bad for already asking asking more questions on this topic, but I've tried building on eric_a_booth 's code myself and I see that I might need at least one more loop incorporated into his code, and I really don't get how to do it. (I feel the need to say that I worked on my "N in graphs-problem" for many days (and weeks in fact, because for some graphs I have figured out a solution on my own) before contacting Stata list in the first place. I just don' want you to think that I have not tried for myself or that I'm asking you to do my job because can't be bothered myself. It really is not like that)

                              Anyway, my problem is this: As I mentioned in my initial post, I have to run these types of graphs on several variables at once. So, for example (still using auto.dta for simplicity), in the code below I have added more variables than rep78, but then my code gives me the same N for both variables, which I know is not correct, since rep78 has N = 69 and turn has N= 74 . I guess what is happening in is that the loop remembers N for the last variable read, but not the first variable, since the loop is "read twice", i.e. once for each variable, and then of course substitutes r(N) for the last variable with r(N) for the current variable in the loop. I feel I am missing out on at least two essential parts of the code, which I don't know how to write. In some way, I need to specify separa r(N)'s for rep78 and turn. Can anyone please help me?

                              sysuse auto, clear

                              foreach var of varlist rep78 turn {
                              qui su `var'
                              loc rn "`r(N)'"
                              }
                              #delimit ;
                              graph hbar (mean)rep78 turn,
                              asyvars
                              showyvars
                              yvaroptions(relabel(1 "rep78 (n=`rn')" 2 " turn (n=`rn')")
                              gap(*1.5) label(labcolor(black) labsize(small)))
                              bar(1, fcolor(gs9))
                              blabel(bar, pos(outside) format(%12.1f))
                              ysize(3) yla(1(1)5)
                              exclude0
                              legend(off)
                              plotregion(lcolor(none))
                              scheme(s1mono)
                              title("Some title" " ", size(large) span)
                              ytitle(" ""Some ytitle" "", size(small))
                              name(rep78_origin_N3 , replace) ;
                              graph save rep78_origin_N3, replace;
                              #delimit cr

                              Of course, I have the same problem with the other code too (which is run separately for foreign), but that is even more complex to me. There I have to spesifcy four different N's, i.e. separately for the two levels for foreign for several variables.

                              I understand if you cannot help me, but I just really need to ask for help.

                              Best wishes,
                              Johanne


                              Comment

                              Working...
                              X