Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustered Bar chart with N values bewlo x-axis in each cluster

    Can someone please asap help me how to make the below clsutered graph in STATA? I am struggeling to make it so nice!
    Also please hoe to insert the N values of each cluster? Thanks inadvance!
    Attached Files

  • #2
    This is not really up my alley, and I rarely respond to graph questions unless they are pretty simple, but there are many others on this forum who can help you. I think your chances of getting a timely and helpful response from one of them would be greatly increased if you provide example data for them to work with.

    The most helpful way to give example data is by using the -dataex- command. If you are running version 16 or later, or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Line 1 Line 2 Line 3
      A C C
      B A A
      D B
      A B
      C D D
      fictive dataset (I am not allowed to share real data): I have 5 patients receiving different types of medicines namely A, B, C and D. They get 3 lines and in each line they get a medicine. I want to put it exactly like in the above chart. With the three lines (line1 line2 line3) on the x-axis and the perccent of patients on the y-axis. I also want to include the text under the x-axis showing the N= 5 in line1, N=3 in line2 and N=4 in line3.
      Last edited by Kim Vaarts; 18 May 2025, 15:57.

      Comment


      • #4
        Your graph in #1 shows 5 categories for one variable and 5 categories for another. Now you're referring to 5 and 4, which isn't a big difference in principle.

        More puzzlingly your example graph shows several thousand patients but here you seem to be saying that you have 5 patients only.

        So I am lost here, or least puzzled. I can't think of a graph that doesn't improve on the table with some extra numbers that you can calculate by hand.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          Your graph in #1 shows 5 categories for one variable and 5 categories for another. Now you're referring to 5 and 4, which isn't a big difference in principle.

          More puzzlingly your example graph shows several thousand patients but here you seem to be saying that you have 5 patients only.

          So I am lost here, or least puzzled. I can't think of a graph that doesn't improve on the table with some extra numbers that you can calculate by hand.
          I used a fictive dataset because I cannot share the actual/real data (no permission to share). In the actual dataset I have similar amount of subjects and lines of medicine just like the picture I posted first. There a five types of medicine subjects are given A B C or D. I left some empty cells because I also have missing values in the real dataset. I don't know the code/command in STATA to make such a clustered graph. Please someone help.

          Comment


          • #6
            It's fine to invent datasets when the real data are confidential. You don't need to explain or apologise there, as we urge doing exactly that in the FAQ Advice.
            You don't seem to have read that FAQ Advice yet, or at least you stopped before the final section.

            My problem remains that I still don't follow clearly what you have.

            Here are some guesses. You may need to make several small changes to match your set-up.

            Code:
            clear 
            set seed 314159
            set obs 1200 
            gen line = ceil(_n/400)
            gen drug = runiformint(1, 4) if runiform() < 0.8 
            label def drug 1 "A" 2 "B" 3 "C" 4 "D" 
            label val drug drug 
            
            capture set scheme stcolor 
            
            quietly forval j = 1/3 {
                count if drug < . & line == `j'
                local which = word("first second third", `j')
                label def line `j' `" "`which'" "{it:n} = `r(N)'" "', add 
            }
            
            label val line line 
            
            * you must install this
            ssc inst catplot 
            
            catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars name(D1, replace)
            
            catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars recast(bar) legend(row(1) pos(12)) name(D2, replace)
            Click image for larger version

Name:	D1.png
Views:	1
Size:	34.0 KB
ID:	1777596
            Click image for larger version

Name:	D2.png
Views:	1
Size:	38.0 KB
ID:	1777597

            Comment


            • #7
              Originally posted by Nick Cox View Post
              It's fine to invent datasets when the real data are confidential. You don't need to explain or apologise there, as we urge doing exactly that in the FAQ Advice.
              You don't seem to have read that FAQ Advice yet, or at least you stopped before the final section.

              My problem remains that I still don't follow clearly what you have.

              Here are some guesses. You may need to make several small changes to match your set-up.

              Code:
              clear
              set seed 314159
              set obs 1200
              gen line = ceil(_n/400)
              gen drug = runiformint(1, 4) if runiform() < 0.8
              label def drug 1 "A" 2 "B" 3 "C" 4 "D"
              label val drug drug
              
              capture set scheme stcolor
              
              quietly forval j = 1/3 {
              count if drug < . & line == `j'
              local which = word("first second third", `j')
              label def line `j' `" "`which'" "{it:n} = `r(N)'" "', add
              }
              
              label val line line
              
              * you must install this
              ssc inst catplot
              
              catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars name(D1, replace)
              
              catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars recast(bar) legend(row(1) pos(12)) name(D2, replace)
              [ATTACH=CONFIG]n1777596[/ATTACH] [ATTACH=CONFIG]n1777597[/ATTACH]
              This is exactly what I need! I just don't understand the code. I already have the Line variables in my dataset. Line 1 t/m Line 5. Why do I need to generate a new line variable:
              gen line = ceil(_n/400)?? Each line is a seperate variable just like the fictive example. Line1 is one variable, Line 2 is another variable, Line 3 is another variable. Within each of these variables the medicines are present. So in the variable Line1 you have medine A, B, C etc and in the other variable Line2 you have medince A, B ,C. Why do I need to create a new line variable?

              Ps I have string variables only.
              Last edited by Kim Vaarts; 18 May 2025, 17:09.

              Comment


              • #8
                You don't need to generate a new line variable. I do need to do that because you didn't give a very good data example.

                You will need to reshape your data to use my code. String variables should not be a great problem.

                I guess we're in different time zones, and it is late where I am, so either others may be able and willing to answer any other questions or you will have to wait until I can answer them.

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  You don't need to generate a new line variable. I do need to do that because you didn't give a very good data example.

                  You will need to reshape your data to use my code. String variables should not be a great problem.

                  I guess we're in different time zones, and it is late where I am, so either others may be able and willing to answer any other questions or you will have to wait until I can answer them.
                  Thank you Nick very much. Please help me when you're awake. It is also late here, but I have a presentation next week so I am still working at 01.30 am. I will wait for your response or from anybody else. I am not strong with STATA but I need to learn it. Sleep well and hope to hear from you as soon as possible. I appreaciate your help and excuses for all my many questions. Goodnight.

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    You don't need to generate a new line variable. I do need to do that because you didn't give a very good data example.

                    You will need to reshape your data to use my code. String variables should not be a great problem.

                    I guess we're in different time zones, and it is late where I am, so either others may be able and willing to answer any other questions or you will have to wait until I can answer them.
                    I tried after reshaping. It did not work. I cannot make the code where in j I get the N. I need the whole code. I will wait for you. I will try in the mean time.

                    Last edited by Kim Vaarts; 18 May 2025, 17:40.

                    Comment


                    • #11
                      The first 7 lines of Nick's code in #6 are just there to create a toy data set that demonstrates the approach. The tableau you set out in #3 to illustrate your data was helpful to show the kind of data you had and the way it was arrayed, but it was not something that could actually be directly used in Stata to develop and test the code to solve your problem. So Nick wrote 7 lines of code that would create a data set similar to yours that he could work with.

                      So you don't need to run those first 7 lines: you would replace all of those lines just by a command to -use- your data set. The rest of the code, starting from -capture set scheme stcolor- actually provides the solution to your graphing problem.

                      Your remark that you have string variables only, however, suggests that you will have to modify the code, because the subsequent commands involving the variable line assume it is numeric. I will assume here that the values of the variable line are "first line", "second line", and "third line". Then the solution to your problem would look like this:
                      Code:
                      capture set scheme stcolor
                      
                      levelsof line, local(lines)
                      foreach l of local lines {
                          count if !missing(drug) & line == `"`l'"'
                          replace line = line + " {it:n} = `r(N)'" if line == `"`l'"'
                      }
                      
                      ssc inst catplot
                      
                      catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars name(D1, replace)
                      catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars recast(bar) legend(row(1) pos(12)) name(D2, replace)
                      Added: Crossed with #8, 9, 10.
                      Last edited by Clyde Schechter; 18 May 2025, 17:45.

                      Comment


                      • #12
                        Originally posted by Clyde Schechter View Post
                        The first 7 lines of Nick's code in #6 are just there to create a toy data set that demonstrates the approach. The tableau you set out in #3 to illustrate your data was helpful to show the kind of data you had and the way it was arrayed, but it was not something that could actually be directly used in Stata to develop and test the code to solve your problem. So Nick wrote 7 lines of code that would create a data set similar to yours that he could work with.

                        So you don't need to run those first 7 lines: you would replace all of those lines just by a command to -use- your data set. The rest of the code, starting from -capture set scheme stcolor- actually provides the solution to your graphing problem.

                        Your remark that you have string variables only, however, suggests that you will have to modify the code, because the subsequent commands involving the variable line assume it is numeric. I will assume here that the values of the variable line are "first line", "second line", and "third line". Then the solution to your problem would look like this:
                        Code:
                        capture set scheme stcolor
                        
                        levelsof line, local(lines)
                        foreach l of local lines {
                        count if !missing(drug) & line == `"`l'"'
                        replace line = line + " {it:n} = `r(N)'" if line == `"`l'"'
                        }
                        
                        ssc inst catplot
                        
                        catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars name(D1, replace)
                        catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars recast(bar) legend(row(1) pos(12)) name(D2, replace)
                        Added: Crossed with #8, 9, 10.
                        Dear Clyde, I don't have two variables Lines and Drugs Nick Cox . I just have the variables Lines. I have Line1, Line2 Line3. Within these lines the drug categories are present A, B C. In your code you are using two seperate variables Lines and Drugs. Furthermore, the Line variables are string. Please help. I think the answer in in yours and Nick's but I cannot see it.
                        Last edited by Kim Vaarts; 18 May 2025, 18:34.

                        Comment


                        • #13
                          So, I think I understand what your data looks like. Run this and take a look in the data browser to see if this resembles your data set in the relevant respects:
                          Code:
                          * Example generated by -dataex-. For more info, type help dataex
                          clear
                          input float id str1(line1 line2 line3)
                           1 ""  "D" "C"
                           2 "C" "B" ""
                           3 "C" "B" "D"
                           4 "D" ""  "C"
                           5 "A" ""  "C"
                           6 ""  ""  "C"
                           7 ""  "D" "B"
                           8 "B" "A" "D"
                           9 ""  "A" "D"
                          10 "B" "A" ""
                          11 "D" "A" "B"
                          12 "A" "C" "D"
                          13 "A" "D" ""
                          14 "A" ""  "C"
                          15 "C" ""  "D"
                          16 "C" ""  "B"
                          17 "C" "B" ""
                          18 "A" "C" "B"
                          19 "C" ""  "D"
                          20 "C" ""  "B"
                          21 ""  "C" "D"
                          22 ""  "B" "A"
                          23 "C" "D" "B"
                          24 "C" "B" "A"
                          25 ""  "C" "D"
                          26 "B" "C" "D"
                          27 "D" "A" ""
                          28 "A" "D" "C"
                          29 "A" "C" ""
                          30 "A" ""  "D"
                          31 "D" "C" ""
                          32 ""  "D" "B"
                          33 "D" "C" "A"
                          34 "D" "C" "B"
                          35 "C" "B" "A"
                          36 "C" ""  "A"
                          37 "A" "D" "C"
                          38 "D" "C" "A"
                          39 ""  "C" "A"
                          40 "A" ""  "B"
                          41 "D" ""  ""
                          42 "C" "A" "B"
                          43 "C" "A" "D"
                          44 "D" "C" "B"
                          45 "D" "B" ""
                          46 "D" "B" ""
                          47 "C" "B" "D"
                          48 ""  ""  "A"
                          49 "D" "C" "A"
                          50 "A" "D" "C"
                          end
                          Note that I am assuming that your data set includes some kind of id variable, perhaps a patient MRN, and that that variable uniquely identifies observations in your data set. If you do not have such a variable, and have only the line1, line2, and line3 variables, then you need to create one, which you can easily do just with:
                          Code:
                          gen `c(obs_t)' id = _n
                          Assuming that we are now on the same page about what your data looks like, the best solution is to transform your data set so that it looks like what Nick created in #6. Then we can apply Nick's original solution to that:
                          Code:
                          capture set scheme stcolor
                          
                          //  Nick already suggested -reshape-; I'm just giving explicit code here.
                          rename line* _drug*
                          reshape long _drug, i(id) j(line)
                          encode _drug, gen(drug)
                          drop _drug
                          
                          // From here down it's Nick's original code with just one tiny tweak.
                          quietly forval j = 1/3 {
                              count if drug < . & line == `j'
                              local which = word("first second third", `j')
                              label def line `j' `" "`which' line" "{it:n} = `r(N)'" "', add
                          }
                          
                          label val line line
                          
                          catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars name(D1, replace)
                          
                          catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars recast(bar) legend(row(1) pos(12)) name(D2, replace)
                          I have eliminated the -ssc install catplot- command because you have clearly already done that, and there is no reason to do it again.

                          I will add that the original organization of your data, with three line variables that, I suppose, are drug names, is not conducive to analysis in Stata. It is not just a matter of this particular graphing problem. It is a matter of Stata working better with data in long layout rather than wide for almost everything. It is likely that whatever other analysis of this data you plan, it will be facilitated by using this revised data organization. To avoid having to re-create it each time, I suggest you actually -save- it as a new data set after you have used it for this purpose.

                          Comment


                          • #14
                            Originally posted by Clyde Schechter View Post
                            So, I think I understand what your data looks like. Run this and take a look in the data browser to see if this resembles your data set in the relevant respects:
                            Code:
                            * Example generated by -dataex-. For more info, type help dataex
                            clear
                            input float id str1(line1 line2 line3)
                            1 "" "D" "C"
                            2 "C" "B" ""
                            3 "C" "B" "D"
                            4 "D" "" "C"
                            5 "A" "" "C"
                            6 "" "" "C"
                            7 "" "D" "B"
                            8 "B" "A" "D"
                            9 "" "A" "D"
                            10 "B" "A" ""
                            11 "D" "A" "B"
                            12 "A" "C" "D"
                            13 "A" "D" ""
                            14 "A" "" "C"
                            15 "C" "" "D"
                            16 "C" "" "B"
                            17 "C" "B" ""
                            18 "A" "C" "B"
                            19 "C" "" "D"
                            20 "C" "" "B"
                            21 "" "C" "D"
                            22 "" "B" "A"
                            23 "C" "D" "B"
                            24 "C" "B" "A"
                            25 "" "C" "D"
                            26 "B" "C" "D"
                            27 "D" "A" ""
                            28 "A" "D" "C"
                            29 "A" "C" ""
                            30 "A" "" "D"
                            31 "D" "C" ""
                            32 "" "D" "B"
                            33 "D" "C" "A"
                            34 "D" "C" "B"
                            35 "C" "B" "A"
                            36 "C" "" "A"
                            37 "A" "D" "C"
                            38 "D" "C" "A"
                            39 "" "C" "A"
                            40 "A" "" "B"
                            41 "D" "" ""
                            42 "C" "A" "B"
                            43 "C" "A" "D"
                            44 "D" "C" "B"
                            45 "D" "B" ""
                            46 "D" "B" ""
                            47 "C" "B" "D"
                            48 "" "" "A"
                            49 "D" "C" "A"
                            50 "A" "D" "C"
                            end
                            Note that I am assuming that your data set includes some kind of id variable, perhaps a patient MRN, and that that variable uniquely identifies observations in your data set. If you do not have such a variable, and have only the line1, line2, and line3 variables, then you need to create one, which you can easily do just with:
                            Code:
                            gen `c(obs_t)' id = _n
                            Assuming that we are now on the same page about what your data looks like, the best solution is to transform your data set so that it looks like what Nick created in #6. Then we can apply Nick's original solution to that:
                            Code:
                            capture set scheme stcolor
                            
                            // Nick already suggested -reshape-; I'm just giving explicit code here.
                            rename line* _drug*
                            reshape long _drug, i(id) j(line)
                            encode _drug, gen(drug)
                            drop _drug
                            
                            // From here down it's Nick's original code with just one tiny tweak.
                            quietly forval j = 1/3 {
                            count if drug < . & line == `j'
                            local which = word("first second third", `j')
                            label def line `j' `" "`which' line" "{it:n} = `r(N)'" "', add
                            }
                            
                            label val line line
                            
                            catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars name(D1, replace)
                            
                            catplot , over(drug) over(line) percent(line) blabel(bar, format(%2.1f)) asyvars recast(bar) legend(row(1) pos(12)) name(D2, replace)
                            I have eliminated the -ssc install catplot- command because you have clearly already done that, and there is no reason to do it again.

                            I will add that the original organization of your data, with three line variables that, I suppose, are drug names, is not conducive to analysis in Stata. It is not just a matter of this particular graphing problem. It is a matter of Stata working better with data in long layout rather than wide for almost everything. It is likely that whatever other analysis of this data you plan, it will be facilitated by using this revised data organization. To avoid having to re-create it each time, I suggest you actually -save- it as a new data set after you have used it for this purpose.
                            Thank you very much! It works! I am sooo happy! I have been struggling with this for two weeks straight! And in one night you have helped me! You don't know hopw much this means to me! THANK YOU! One last question: I get the following graph, see below. The lay-out needs some work. Is there a code somewhere that I can copy and change myself? I am not good with STATA. I need to see the codes and than copy them and makes changes step by step so I can see visually what changes or else I cannot do it. The help ifunction n STATA does not help me. This is a programming language and I am bad at it. Ps i copy-pasted just a part of the graph due to privacy reasons. They are very strict with the data. Thanks in advance and God bless!

                            Attached Files

                            Comment


                            • #15
                              The problem is that with the number of lines and drugs you have, the number of bars is large enough that you can't really accommodate all those percentages at the bars ends without them overlapping. All this requires is that you specify a smaller size. You want to find one that fits gracefully in the available space, but is still large enough to read. Try this:

                              Code:
                              catplot , over(drug) over(line) percent(line) blabel(bar, size(vsmall) format(%2.1f)) asyvars name(D1, replace)
                              
                              catplot , over(drug) over(line) percent(line) blabel(bar, size(vsmall) format(%2.1f)) asyvars recast(bar) legend(row(1) pos(12)) name(D2, replace)
                              If that's not a good size, you can go larger or smaller by choosing from among the sizes you will find by running -help textsizestyle-.


                              Comment

                              Working...
                              X