Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bar Graph: Categorical variables

    Hi!

    I want to make a bar graph comparing the educational level between male and females in a specific dataset. The educational level is categorical variable which takes the values from 0 to 6 and the sex variable is a dummy which takes the values 0 or 1. I want to make a bar graph where the x axis shows the educational level for each sex, and the y axis the values or the % and in each category of educational level.

    I would really appreciate your help.

  • #2
    "values" I guess means counts or frequencies. There is no data example here, but this sandbox dataset shows some technique to be adapted. I use catplot from SSC which is a wrapper here for graph hbar, recast to graph bar.

    Copy the script to your do-file editor, run to see some possibilities and then decide what to do different, such as specify 6 colours.

    Code:
    sysuse auto, clear 
    rename rep78 Education
    rename foreign Female 
    label def female 0 Male 1 Female 
    label val Female female 
    
    * omit if installed 
    ssc install catplot 
    
    catplot Education Female , recast(bar) name(G1, replace)
    
    catplot Education Female , percent(Female) recast(bar) name(G2, replace)
    
    local opts bar(1, color(red*0.6)) bar(2, col(red*0.2)) bar(3, col(blue*0.2)) bar(4, col(blue*0.6))  bar(5, col(blue))
    
    catplot Education Female , percent(Female) recast(bar) asyvars `opts' name(G3, replace)
    
    catplot Female Education, percent(Female) recast(bar) asyvars name(G4, replace)

    Comment


    • #3
      I highly recommend looking into -tabplot- from Stata Journal in your case.
      Depending on the message you like to send I would also look at -waffle- chart (search in Stata and have a look at https://medium.com/the-stata-guide/s...s-32afc7d6f6dd) or Mosaic plots (search marimekko or look at https://medium.com/the-stata-guide/s...s-49caa27c5554).

      Comment


      • #4
        If you're tempted by waffle charts, first read https://www.perceptualedge.com/artic...e_for_kids.pdf and see if it changes your mind.

        tabplot is a familiar command which I too am happy to endorse. Here are some examples. See also the yreverse option.

        Code:
        sysuse auto, clear 
        rename rep78 Education
        rename foreign Female 
        label def female 0 Male 1 Female 
        label val Female female 
        
        label var Education 
        label var Female 
        
        tabplot Education Female , showval horizontal xtitle("") name(G5, replace)
        
        tabplot Education Female , percent(Female) horizontal showval xtitle("") name(G6, replace)

        Comment


        • #5
          Originally posted by Nick Cox View Post
          If you're tempted by waffle charts, first read https://www.perceptualedge.com/artic...e_for_kids.pdf and see if it changes your mind.
          Indeed, the author has convincing arguments againts unit charts / waffle plots. Nonetheless, I think that in the case of displaying percentages of a binary variable (female/male) by a multi-categorical variable - it may be an illustrative way to show larger differences between the categories. More detailed information (like percentages and N) needs to be placed within the plot.
          Code:
          ssc install waffle_plot
          waffle_plot Female, by(Education) name(waffle_1, replace)

          I took a closer look at my second recommendation. It would most likely be achieved by -spineplot- from Stata Journal (authored by Nick Cox)
          Code:
          spineplot Education Female, percent name(spineplot, replace)
          In my opinion this gives the best overview of the whole sample and its composition - but is less convincing when the graph is used to compare the categories.

          Comment


          • #6
            My guess is that being female is a predictor here, not an outcome. So, the main focus would be composition of education levels by female.

            We can't easily further discuss what works well, or best, with Fernando Bastidas's data without seeing those data. The 2 x 6 table of counts would naturally be enough to calculate percentages too.

            Comment

            Working...
            X