Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bar plot comparing multiple variables among two groups

    Hi all,

    I am trying to draw a bar plot comparing multiple variables among two groups. var1 is the group indicator (1 or 2). The figure should have four bars. From left to right, the mean of var2 in group 1, the mean of var2 in group 2, the mean of var3 in group 1, and the mean of var3 in group 2. The first bar and second bar are side-by-side, so are the third and fourth bars. There is a gap between the second and the third bar. The first and third bars use one color (red), while the second and fourth bars use another color (blue).

    Thank you in advance.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(var1 var2 var3)
    1 1 1
    1 0 0
    1 1 0
    1 1 0
    2 0 1
    2 0 1
    2 0 1
    2 1 0
    end

  • #2
    You can do this with statplot from SSC. statplot is a good search term to find several discussions here on Statalist.


    Code:
    clear
    input float(var1 var2 var3)
    1 1 1
    1 0 0
    1 1 0
    1 1 0
    2 0 1
    2 0 1
    2 0 1
    2 1 0
    end
    
    * a fairly standard plot, which is not what you want 
    graph bar var2 var3, over(var1) bar(1, color(red)) bar(2, color(blue))
    
    ssc inst statplot 
    
    statplot var2 var3, over(var1)  asyvars bar(1, color(red)) bar(2, color(blue)) recast(bar)
    Note that "Thanks in advance" divides the world. See e.g. https://www.businesswritingblog.com/...n-advance.html

    Comment


    • #3
      Thank you, Nick.

      Comment


      • #4
        Hey Nick!

        I have a similar situation like Chris Liu. The ssc stataplot worked fine for me(see screenshot below), but I aim to use box plot instead of bars. However, I checked
        Code:
        help statplot
        and it seems it doesn't support box plot. Is there a way to have this group comparison with box plot?

        Here is a sample of my dataset:
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input float(Id status) byte task float(Idstrong efficacy AItrust) byte(Hilfreich owntrust)
         1 1 1   6 6.4 2.3333333 1 4
         1 1 0 1.5 3.2 4.3333335 6 5
         2 0 1 1.5   4  5.666667 7 2
         2 0 0 2.5 3.8  6.333333 6 5
         3 1 1   7   7         2 1 6
         3 1 0 5.5 4.6  2.666667 2 7
         4 1 1   1   5         2 1 2
         4 1 0   4 2.8 2.3333333 6 5
         5 0 1 1.5 6.6  2.666667 5 3
         5 0 0   2 5.8         5 6 6
         6 0 1   4 6.2         5 5 5
         6 0 0   3 4.6         5 6 5
         7 0 1 1.5   6  5.333333 5 6
         7 0 0   2 3.4  2.666667 7 5
         8 0 1   2 5.8         6 2 2
         8 0 0   6 5.4  5.666667 2 5
         9 0 1   2 5.8 4.6666665 6 1
         9 0 0 4.5 4.6         6 5 6
        10 0 1 1.5 4.6         5 3 2
        10 0 0 6.5 4.2  3.666667 5 6
        end
        label values status statuslbl
        label def statuslbl 0 "Non-experts", modify
        label def statuslbl 1 "Experts", modify
        label values task tasklbl
        label def tasklbl 0 "Information", modify
        label def tasklbl 1 "Poetry", modify
        Click image for larger version

Name:	Screenshot 2025-08-16 171825.png
Views:	1
Size:	16.9 KB
ID:	1781066


        I will appreciate your help with my situation!

        Comment


        • #5
          statplot indeed has no bearing on box plots. It seems that you need the groupyvars option as introduced in Stata 19.

          Comment


          • #6
            To flesh this out

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input float(Id status) byte task float(Idstrong efficacy AItrust) byte(Hilfreich owntrust)
             1 1 1   6 6.4 2.3333333 1 4
             1 1 0 1.5 3.2 4.3333335 6 5
             2 0 1 1.5   4  5.666667 7 2
             2 0 0 2.5 3.8  6.333333 6 5
             3 1 1   7   7         2 1 6
             3 1 0 5.5 4.6  2.666667 2 7
             4 1 1   1   5         2 1 2
             4 1 0   4 2.8 2.3333333 6 5
             5 0 1 1.5 6.6  2.666667 5 3
             5 0 0   2 5.8         5 6 6
             6 0 1   4 6.2         5 5 5
             6 0 0   3 4.6         5 6 5
             7 0 1 1.5   6  5.333333 5 6
             7 0 0   2 3.4  2.666667 7 5
             8 0 1   2 5.8         6 2 2
             8 0 0   6 5.4  5.666667 2 5
             9 0 1   2 5.8 4.6666665 6 1
             9 0 0 4.5 4.6         6 5 6
            10 0 1 1.5 4.6         5 3 2
            10 0 0 6.5 4.2  3.666667 5 6
            end
            label values status statuslbl
            label def statuslbl 0 "Non-experts", modify
            label def statuslbl 1 "Experts", modify
            label values task tasklbl
            label def tasklbl 0 "Information", modify
            label def tasklbl 1 "Poetry", modify
            
            * this will work in Stata 19 up 
            graph box Idstrong-owntrust, over(status) groupyvars name(B1, replace)
            
            * otherwise 
            rename (Idstrong-owntrust) (y=)
            
            reshape long y, i(Id task) j(which) string 
            
            graph box y, over(status) over(which) name(B2, replace) asyvars 
            
            label def Which 1 Idstrong 2 efficacy 3 AItrust 4 Hilfreich 5 owntrust
            
            encode which, label(Which) gen(Which)
            
            * if the original order of variables is crucial 
            graph box y, over(status) over(Which) name(B3, replace) asyvars
            This is the last graph.

            Click image for larger version

Name:	box_b3.png
Views:	1
Size:	35.6 KB
ID:	1781088

            Comment


            • #7
              Hey Nick,

              thank you so much for your prompt answer, it worked perfectly! You are the best!

              Comment


              • #8
                Hey Nick!

                just being curious, is there a way to show the standard errors of this variables in the Bar Chart (see my first attached graph) when using the stataplot?

                Thank you again for your help!

                Comment


                • #9
                  Thanks. But it's not obvious that box plots are really helpful here.

                  Following that code with (qplot is from the Stata Journal)

                  Code:
                  qplot y, ms(O) by(Which, compact row(1) note("")) xla(0 0.5 "0.5" 1) yla(1/7, grid glc(black) glp(solid) glw(vthin))
                  I get this:

                  Click image for larger version

Name:	notabox.png
Views:	1
Size:	55.4 KB
ID:	1781094


                  If variables take only integer scores, box plots aren't good at showing minor modes (or even major modes) and are distorted by the fact that medians and quartiles must be integers or half-integers. Some other variables evidently allow fractions that are halves or thirds and similar quirks apply.

                  I have ignored the status variable here.

                  Comment


                  • #10
                    Hey Nick,

                    sorry, I didn't express myself clearly enough. So what I am trying to do here is do a comparison between experts and nonexperts in each of those variables. So I did a Welch t-test (regardless of whether the data are normally distributed or not) andwant to use graphs to illustrate the differences more directly. Since the t-test compares the mean, I thought it would be more intuitive to use a bar chart (using stataplot) since it shows the mean, instead of a box plot, which shows the median and the distribution of the data within each group.

                    However, to be more precise, I think I should also include error bars (standard deviation, SD) in the bar chart, something like this:
                    Click image for larger version

Name:	Bar Chart with Error Bars.png
Views:	1
Size:	22.5 KB
ID:	1781099


                    Is there an extention of the stataplot to archieve such thing? Or even to be able to show the p-value?

                    Comment


                    • #11
                      Sorry; you were asking about box plots back in #4.

                      statplot (not stataplot) allows options of graph bar and so in Stata 19 may well allow confidence intervals to be added.

                      Otherwise search this forum for mentions of dynamite, detonator or plunger plots, which are widely deprecated in statistical graphics, even though still all too popular.

                      See also posts here on
                      cisets from SSC.

                      Comment


                      • #12
                        Code:
                        * this will work in Stata 19 up 
                        graph bar (meanci) Idstrong-owntrust, groupyvars  ///
                        over(status, relabel(1 "Idstrong" 2 "efficacy" 3 "AItrust" 4 "Hilfreich" 5 "owntrust")) ///
                        ciline(lcolor(black))  bar(1, fcolor(stc1*0.2)) bar(2, fcolor(stc2*0.2))

                        Comment


                        • #13
                          Reverting to #1 this shows a different style. At first sight, it's much more code. And you need to install cisets from SSC.

                          https://www.statalist.org/forums/for...-interval-sets

                          But then the philosophy is plot what you like exactly how you like: it's all just a matter of exploiting twoway.

                          (#11 was written on my phone enjoying the sun in the garden. This is done on a computer.)

                          There is much still to be done, say by standardizing on English or German, but not mixing them.

                          Code:
                          * Example generated by -dataex-. For more info, type help dataex
                          clear
                          input float(Id status) byte task float(Idstrong efficacy AItrust) byte(Hilfreich owntrust)
                           1 1 1   6 6.4 2.3333333 1 4
                           1 1 0 1.5 3.2 4.3333335 6 5
                           2 0 1 1.5   4  5.666667 7 2
                           2 0 0 2.5 3.8  6.333333 6 5
                           3 1 1   7   7         2 1 6
                           3 1 0 5.5 4.6  2.666667 2 7
                           4 1 1   1   5         2 1 2
                           4 1 0   4 2.8 2.3333333 6 5
                           5 0 1 1.5 6.6  2.666667 5 3
                           5 0 0   2 5.8         5 6 6
                           6 0 1   4 6.2         5 5 5
                           6 0 0   3 4.6         5 6 5
                           7 0 1 1.5   6  5.333333 5 6
                           7 0 0   2 3.4  2.666667 7 5
                           8 0 1   2 5.8         6 2 2
                           8 0 0   6 5.4  5.666667 2 5
                           9 0 1   2 5.8 4.6666665 6 1
                           9 0 0 4.5 4.6         6 5 6
                          10 0 1 1.5 4.6         5 3 2
                          10 0 0 6.5 4.2  3.666667 5 6
                          end
                          label values status statuslbl
                          label def statuslbl 0 "Non-experts", modify
                          label def statuslbl 1 "Experts", modify
                          label values task tasklbl
                          label def tasklbl 0 "Information", modify
                          label def tasklbl 1 "Poetry", modify
                          
                          
                          cisets mean Idstrong-owntrust if status==0, saving(results0, replace)
                          cisets mean Idstrong-owntrust if status==1, saving(results1, replace)
                          use results0, clear
                          gen status = 0 
                          append using results1 
                          replace status = 1 if missing(status)
                          
                          list 
                          
                          gen x = _n - 0.2 if status == 0
                          replace x = _n - 4.8 if status == 1
                          
                          gen where = -0.2 
                          gen nshow = "{it:n = }" + strofreal(n)
                          
                          scatter point x if status==0, ms(D) msize(medlarge) pstyle(p1) ///
                          || rbar ub lb x if status==0, fcolor(none) barw(0.3) pstyle(p1) ///
                          || scatter point x if status==1, ms(T) msize(medlarge) pstyle(p2) ///
                          || rbar ub lb x if status==1, pstyle(p2) barw(0.3) fcolor(none) pstyle(p2) ///
                          xline(1.5/4.5, lp(solid) lw(thin) lc(gs8)) ///
                          xla(1 "`= varname[1]'" 2 "`= varname[2]'" 3 "`=varname[3]'" 4 "`= varname[4]'" 5 "`= varname[5]'", tlc(none)) ///
                          xtitle("") ytitle(Means and 95% confidence intervals) legend(order(1 "Non-experts" 3 "Experts") row(1) pos(12)) ///
                          || scatter where  x, ms(none) mla(nshow) mlabpos(0) mlabsize(medlarge)
                          Click image for larger version

Name:	experts.png
Views:	1
Size:	38.4 KB
ID:	1781140

                          Comment

                          Working...
                          X