Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating bar graph using several dummy variables

    Dear Stata community,

    I want to create a bar graph where the x-axis is the dummy variable and y-axis is the frequency in percentage. I am not able to make all the dummy variables into one categorical variables as they are not exclusive.

    Code:

    Code:
    input byte(math english chem econ)
    1 0 1 1
    0 1 0 1
    1 1 0 0
    1 1 0 0
    0 1 0 1 
    1 0 0 1
    end
    Is there any way to make this bar chart?

  • #2
    Code:
    clear
    input byte(math english chem econ)
    1 0 1 1
    0 1 0 1
    1 1 0 0
    1 1 0 0
    0 1 0 1 
    1 0 0 1
    end
    rename (math english chem econ) (subject#), addnumber(1)
    gen id=_n
    reshape long subject, i(id) j(which)
    lab def which 1 "math" 2 "english" 3 "chem" 4 "econ"
    lab values which which
    graph bar subject, over(which) scheme(s1mono) ///
    ylab(0 "0" 0.2 "20" 0.4 "40" 0.6 "60" 0.8 "80") ytitle("Percent")
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	14.3 KB
ID:	1541247

    Comment


    • #3
      Hi Andrew,

      Thank you for your fast reply. I'm running into a problem because my actual dataset is huge and I'm not able to run this code:
      reshape long subject, i(id) j(which) Is there a way to handle this issue?

      Comment


      • #4
        Without seeing the error message that you get, I cannot help much. You necessarily do not need to reshape the data, it is just easier to do things with your data in long layout. Here is a way with a wide layout. Otherwise, post exactly what you typed and the output from Stata if this does not help.

        Code:
        clear
        input byte(math english chem econ)
        1 0 1 1
        0 1 0 1
        1 1 0 0
        1 1 0 0
        0 1 0 1
        1 0 0 1
        end
        
        graph bar math english chem econ, asyvars showyvars ///
        leg(off) scheme(s1mono) ylab(0 "0" 0.2 "20" 0.4 "40" 0.6 "60" 0.8 "80") ///
        ytitle("Percent") bargap(10) yvaroptions( relabel(1 "math" ///
        2 "english" 3 "chem" 4 "econ"))
        Attached Files

        Comment


        • #5
          Hi Andrew,

          That worked perfectly, thank you. I also tried producing a horizontal plot but splitting it over another categorical variable. Can they be compared side by side rather than two separate chunks?


          For example: Graph number 18 versus graph 14

          https://www.ssc.wisc.edu/sscc/pubs/stata_bar_graphs.htm


          Comment


          • #6
            If you want anything like that, you will need to reshape. There are limits to what you can do with your data in wide layout. From the code in #2

            Code:
            clear
            input byte(math english chem econ)
            1 0 1 1
            0 1 0 1
            1 1 0 0
            1 1 0 0
            0 1 0 1
            1 0 0 1
            end
            
            rename (math english chem econ) (subject#), addnumber(1)
            gen id=_n
            reshape long subject, i(id) j(which)
            lab def which 1 "math" 2 "english" 3 "chem" 4 "econ"
            lab values which which
            set seed 03142020
            gen female= runiformint(0,1)
            lab def female 0 "male" 1 "female"
            lab values female female
            graph hbar subject, over(female) over(which) asyvars ///
             scheme(s1mono) ylab(0 "0" 0.2 "20" 0.4 "40" 0.6 "60" 0.8 "80" 1.0 "100") ///
            ytitle("Percent") bargap(10)
            Click image for larger version

Name:	Graph.png
Views:	1
Size:	14.3 KB
ID:	1541287

            Last edited by Andrew Musau; 14 Mar 2020, 08:00.

            Comment


            • #7
              Hi Anrew. I hope you are having a good day!

              I was following your code because I have multiple non-exclusive categorical variables, which are: First Generation, Second Generation, Third Generation, and Fourth Generation.

              I want to analyze the percentage of individuals that agree that religious extremists should be allowed to speak. However, when I try to run the code, this is the output that I get (attached):

              This is my code:

              Code:
              clear
              
              cd "D:\Julian Salazar\Cato Institute - Internship\
              
              import delimited "D:\Julian Salazar\Cato Institute - Internship\Alex Nowrasteh\Data.txt"
              
              *Migrant Generations
              
              gen AllForeignBorn = ""
              replace AllForeignBorn = "1" if born == 2
              replace AllForeignBorn = "0" if born == 1
              destring AllForeignBorn, replace
              
              gen Second_Generation = ""
              replace Second_Generation = "1" if paborn == 1 | maborn == 1
              replace Second_Generation = "0" if paborn == 2 & maborn == 2
              destring Second_Generation, replace
              
              drop paborn maborn
              
              gen Third_Generation = "0"
              replace Third_Generation = "1" if granborn == 1 | granborn == 2 | granborn == 3 | granborn == 4 
              destring Third_Generation, replace
              
              gen Fourth_Generation = "0"
              replace Fourth_Generation = "1" if granborn == 0 
              destring Fourth_Generation, replace
              
              gen string_sex=""
              replace string_sex="Male" if sex==1
              replace string_sex="Female" if sex==2
              
              *Sex 
              
              recode sex (1=0) (2=1), generate(new_sex)
              
              drop sex
              
              rename new_sex sex
              
              *born
              
              recode born (1=0) (2=1), generate(new_born)
              
              drop born
              
              rename new_born born
              
              *Dependent variable: Religious Extremists
              
              recode spkmslmy (1=0) (2=1), generate(new_spkmslmy)
              drop spkmslmy
              rename new_spkmslmy spkmslmy
              
              *Rename Migrant Generations
              
              rename (AllForeignBorn Second_Generation Third_Generation Fourth_Generation) (MigrantGeneration#), addnumber(1)
              
              gen id = _n
              reshape long MigrantGeneration, i(id) j(which)
              lab def which 1 "A) First Generation" 2 "B) Second Generation" 3 "C) Third Generation" 4 "D) Fourth Generatio "
              lab values which which
              
              graph set window fontface "Candara Light"
              
              graph hbar spkmslmy, over(string_sex, label(labsize(small))) ///
              over(which, label(labsize(small))) ///
              title("{fontface Merriweather Bold:Should Religious Extremeists be allowed to speak?}", pos(11) span) ///
              ytitle("Percent of Agreeable Respondents", size(small)) ///
              ylabel(, angle(horizontal)) ///
              subtitle("{fontface Merriweather Italic: By Migrant Generation & Sex}", size(small) pos(11) span) ///
              blabel(bar,  format(%9.1f)) ///
              bargap(-30) ///
              asyvars ///
              note("{fontface Merriweather Light: Source: U.S. General Social Survey 2022}", span size(*0.7) margin(medium)) ///
              scheme(cblind1)
              I'm not sure what I'm doing wrong, the bars must be of different proportions according to the data.

              I uploaded the data just in case.

              If you could give me a little help with this I would greatly appreciate it





              Attached Files

              Comment


              • #8
                I have comments on various levels.

                Your .txt file is certainly readable, but it's better to give a data example as requested. I used contract followed by dataex and then expand reverses the contract.

                Several of your data management operations can done more directly. Creating a string variable first when you want numeric and recoding when you can just subtract are unnecessary detours.

                If you want to see percents, as you do, you should take the means over a variable with values 0 and 100, not a variable with values 0 and 1.

                As a matter of taste, the text A) B) C) D) and repeating Generation seem unnecessary to me.

                I have corrected an objective spelling error and made minor subjective style changes to the text.

                The main error here is more subtle. Your reshape is a clever way to deal with overlapping categories, but everyone appears in every group unless you drop the observations wih zeros from the indicator variables, which is why the resuts are the same if you don't.

                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input byte(spkmslmy spkcomy born paborn maborn granborn sex) int _freq
                1 1 1 1 1 0 1 201
                1 1 1 1 1 0 2 172
                1 1 1 1 1 1 1  11
                1 1 1 1 1 1 2  17
                1 1 1 1 1 2 1  16
                1 1 1 1 1 2 2  22
                1 1 1 1 1 3 1   4
                1 1 1 1 1 3 2   6
                1 1 1 1 1 4 1  13
                1 1 1 1 1 4 2   8
                1 1 1 1 2 0 1   2
                1 1 1 1 2 0 2   1
                1 1 1 1 2 2 1   3
                1 1 1 1 2 2 2   5
                1 1 1 1 2 4 1   4
                1 1 1 2 1 0 1   1
                1 1 1 2 1 0 2   2
                1 1 1 2 1 1 2   1
                1 1 1 2 1 2 1   4
                1 1 1 2 1 2 2   1
                1 1 1 2 1 3 1   1
                1 1 1 2 1 4 1   1
                1 1 1 2 2 2 1   1
                1 1 1 2 2 4 1   6
                1 1 1 2 2 4 2  13
                1 1 2 1 1 0 1   1
                1 1 2 1 1 0 2   1
                1 1 2 1 1 2 1   1
                1 1 2 1 1 2 2   1
                1 1 2 1 2 2 1   2
                1 1 2 1 2 4 1   1
                1 1 2 2 2 1 2   2
                1 1 2 2 2 2 1   1
                1 1 2 2 2 4 1  16
                1 1 2 2 2 4 2   6
                1 2 1 1 1 0 1  10
                1 2 1 1 1 0 2  13
                1 2 1 1 1 1 2   2
                1 2 1 1 1 2 1   2
                1 2 1 1 1 4 1   1
                1 2 1 2 1 0 1   1
                1 2 1 2 1 2 1   1
                1 2 2 2 2 1 2   1
                1 2 2 2 2 4 2   5
                2 1 1 1 1 0 1  68
                2 1 1 1 1 0 2 103
                2 1 1 1 1 1 1   7
                2 1 1 1 1 1 2  14
                2 1 1 1 1 2 1  15
                2 1 1 1 1 2 2  14
                2 1 1 1 1 3 2   3
                2 1 1 1 1 4 1   5
                2 1 1 1 1 4 2   4
                2 1 1 1 2 2 1   4
                2 1 1 1 2 2 2   1
                2 1 1 1 2 4 2   2
                2 1 1 2 1 0 2   1
                2 1 1 2 1 1 1   2
                2 1 1 2 1 2 1   2
                2 1 1 2 1 2 2   1
                2 1 1 2 1 3 2   2
                2 1 1 2 1 4 2   1
                2 1 1 2 2 1 2   1
                2 1 1 2 2 4 1   4
                2 1 1 2 2 4 2   4
                2 1 2 1 1 0 1   1
                2 1 2 1 1 0 2   1
                2 1 2 2 2 2 1   2
                2 1 2 2 2 3 1   1
                2 1 2 2 2 3 2   1
                2 1 2 2 2 4 1  21
                2 1 2 2 2 4 2  23
                2 2 1 1 1 0 1  57
                2 2 1 1 1 0 2 136
                2 2 1 1 1 1 1   1
                2 2 1 1 1 1 2   8
                2 2 1 1 1 2 1   4
                2 2 1 1 1 2 2   8
                2 2 1 1 1 3 1   2
                2 2 1 1 1 3 2   2
                2 2 1 1 1 4 1   1
                2 2 1 1 1 4 2   5
                2 2 1 1 2 2 1   1
                2 2 1 1 2 2 2   3
                2 2 1 1 2 3 1   1
                2 2 1 1 2 3 2   1
                2 2 1 1 2 4 2   1
                2 2 1 2 1 0 1   1
                2 2 1 2 1 0 2   1
                2 2 1 2 1 3 2   2
                2 2 1 2 1 4 1   2
                2 2 1 2 1 4 2   2
                2 2 1 2 2 3 2   1
                2 2 1 2 2 4 1   4
                2 2 1 2 2 4 2   6
                2 2 2 1 1 1 2   1
                2 2 2 1 2 0 2   1
                2 2 2 2 1 0 2   1
                2 2 2 2 2 2 1   1
                2 2 2 2 2 2 2   1
                2 2 2 2 2 4 1  18
                2 2 2 2 2 4 2  16
                end
                
                expand _freq 
                
                * you start here 
                gen MigrantGeneration1 = born - 1 
                gen MigrantGeneration2 = paborn == 1 | maborn == 1
                gen MigrantGeneration3 = inlist(granborn, 1, 2, 3, 4) 
                gen MigrantGeneration4 = granborn == 0 
                
                gen string_sex = word("Male Female", sex)
                replace sex = sex - 1 
                
                replace born = born - 1 
                replace spkmslmy = spkmslmy - 1 
                
                gen id = _n
                reshape long MigrantGeneration, i(id) j(which)
                drop if MigrantGeneration == 0 
                
                lab def which 1 "First" 2 "Second" 3 "Third" 4 "Fourth"
                lab values which which
                
                graph set window fontface "Candara Light"
                
                replace spkmslmy = 100 * spkmslmy
                
                graph hbar spkmslmy, over(string_sex, label(labsize(small))) ///
                over(which, label(labsize(small))) ///
                title("{fontface Merriweather Bold:Should religious extremists be allowed to speak?}", pos(11) span) ///
                ytitle("Percent of Respondents Agreeing", size(small)) ///
                ylabel(, angle(horizontal)) ///
                subtitle("{fontface Merriweather Italic: By migrant generation & sex}", size(small) pos(11) span) ///
                blabel(bar,  format(%9.1f)) ///
                bargap(-30) ///
                asyvars ///
                note("{fontface Merriweather Light: Source: U.S. General Social Survey 2022}", span size(*0.7) margin(medium)) 
                
                * scheme(cblind1)
                Click image for larger version

Name:	salazar.png
Views:	1
Size:	28.9 KB
ID:	1728103

                Comment


                • #9
                  Here as often bar charts are conventional for many groups but dot charts can be used to convey the same information. Here are some different ideas:

                  Code:
                  separate spkmslmy, by(string_sex) veryshortlabel
                  
                  graph dot spkmslmy?,  ///
                  over(which, label(labsize(small))) ///
                  legend(order(1 "Female" 2 "Male") col(1) pos(3)) ///
                  title("{fontface Merriweather Bold:Should religious extremists be allowed to speak?}", pos(11) span) ///
                  ytitle("Percent of Respondents Agreeing", size(small)) ///
                  marker(2, ms(T)) ylabel(, angle(horizontal)) vertical ///
                  linetype(line) lines(lw(vthin) lc(gs12)) ///
                  subtitle("{fontface Merriweather Italic: By migrant generation & sex}", size(small) pos(11) span) ///
                  blabel(bar,  format(%9.1f) pos(outside) size(medium)) exclude0 yla(30(10)80) ///
                  note("{fontface Merriweather Light: Source: U.S. General Social Survey 2022}", span size(*0.7) margin(medium))

                  Details to be accepted or rejected:

                  1. A dot chart reduces ink and allows direct comparison even more effectively.

                  2. There is no need to start the scale at zero. Most of the interest lies in comparing values for generations and sexes with each other, not with zero.

                  3. Shorter value labels allow vertical alignment.
                  Click image for larger version

Name:	salazar2.png
Views:	1
Size:	32.4 KB
ID:	1728116

                  Last edited by Nick Cox; 25 Sep 2023, 08:57.

                  Comment


                  • #10
                    Nick, as a student I want to express my gratitude. You have been extremely helpful. I'm relatively new-intermediate using Stata, and I feel very passionate about data visualization and analysis. I wish you a very excellent day!

                    Comment


                    • #11
                      You're welcome. #9 is close to what you could get with scatter any way. The main detail by way of difference is flexibility over positioning marker labels.

                      Code:
                      * Example generated by -dataex-. For more info, type help dataex
                      clear
                      input byte(spkmslmy spkcomy born paborn maborn granborn sex) int _freq
                      1 1 1 1 1 0 1 201
                      1 1 1 1 1 0 2 172
                      1 1 1 1 1 1 1  11
                      1 1 1 1 1 1 2  17
                      1 1 1 1 1 2 1  16
                      1 1 1 1 1 2 2  22
                      1 1 1 1 1 3 1   4
                      1 1 1 1 1 3 2   6
                      1 1 1 1 1 4 1  13
                      1 1 1 1 1 4 2   8
                      1 1 1 1 2 0 1   2
                      1 1 1 1 2 0 2   1
                      1 1 1 1 2 2 1   3
                      1 1 1 1 2 2 2   5
                      1 1 1 1 2 4 1   4
                      1 1 1 2 1 0 1   1
                      1 1 1 2 1 0 2   2
                      1 1 1 2 1 1 2   1
                      1 1 1 2 1 2 1   4
                      1 1 1 2 1 2 2   1
                      1 1 1 2 1 3 1   1
                      1 1 1 2 1 4 1   1
                      1 1 1 2 2 2 1   1
                      1 1 1 2 2 4 1   6
                      1 1 1 2 2 4 2  13
                      1 1 2 1 1 0 1   1
                      1 1 2 1 1 0 2   1
                      1 1 2 1 1 2 1   1
                      1 1 2 1 1 2 2   1
                      1 1 2 1 2 2 1   2
                      1 1 2 1 2 4 1   1
                      1 1 2 2 2 1 2   2
                      1 1 2 2 2 2 1   1
                      1 1 2 2 2 4 1  16
                      1 1 2 2 2 4 2   6
                      1 2 1 1 1 0 1  10
                      1 2 1 1 1 0 2  13
                      1 2 1 1 1 1 2   2
                      1 2 1 1 1 2 1   2
                      1 2 1 1 1 4 1   1
                      1 2 1 2 1 0 1   1
                      1 2 1 2 1 2 1   1
                      1 2 2 2 2 1 2   1
                      1 2 2 2 2 4 2   5
                      2 1 1 1 1 0 1  68
                      2 1 1 1 1 0 2 103
                      2 1 1 1 1 1 1   7
                      2 1 1 1 1 1 2  14
                      2 1 1 1 1 2 1  15
                      2 1 1 1 1 2 2  14
                      2 1 1 1 1 3 2   3
                      2 1 1 1 1 4 1   5
                      2 1 1 1 1 4 2   4
                      2 1 1 1 2 2 1   4
                      2 1 1 1 2 2 2   1
                      2 1 1 1 2 4 2   2
                      2 1 1 2 1 0 2   1
                      2 1 1 2 1 1 1   2
                      2 1 1 2 1 2 1   2
                      2 1 1 2 1 2 2   1
                      2 1 1 2 1 3 2   2
                      2 1 1 2 1 4 2   1
                      2 1 1 2 2 1 2   1
                      2 1 1 2 2 4 1   4
                      2 1 1 2 2 4 2   4
                      2 1 2 1 1 0 1   1
                      2 1 2 1 1 0 2   1
                      2 1 2 2 2 2 1   2
                      2 1 2 2 2 3 1   1
                      2 1 2 2 2 3 2   1
                      2 1 2 2 2 4 1  21
                      2 1 2 2 2 4 2  23
                      2 2 1 1 1 0 1  57
                      2 2 1 1 1 0 2 136
                      2 2 1 1 1 1 1   1
                      2 2 1 1 1 1 2   8
                      2 2 1 1 1 2 1   4
                      2 2 1 1 1 2 2   8
                      2 2 1 1 1 3 1   2
                      2 2 1 1 1 3 2   2
                      2 2 1 1 1 4 1   1
                      2 2 1 1 1 4 2   5
                      2 2 1 1 2 2 1   1
                      2 2 1 1 2 2 2   3
                      2 2 1 1 2 3 1   1
                      2 2 1 1 2 3 2   1
                      2 2 1 1 2 4 2   1
                      2 2 1 2 1 0 1   1
                      2 2 1 2 1 0 2   1
                      2 2 1 2 1 3 2   2
                      2 2 1 2 1 4 1   2
                      2 2 1 2 1 4 2   2
                      2 2 1 2 2 3 2   1
                      2 2 1 2 2 4 1   4
                      2 2 1 2 2 4 2   6
                      2 2 2 1 1 1 2   1
                      2 2 2 1 2 0 2   1
                      2 2 2 2 1 0 2   1
                      2 2 2 2 2 2 1   1
                      2 2 2 2 2 2 2   1
                      2 2 2 2 2 4 1  18
                      2 2 2 2 2 4 2  16
                      end
                      
                      expand _freq 
                      
                      * you start here 
                      gen MigrantGeneration1 = born - 1 
                      gen MigrantGeneration2 = paborn == 1 | maborn == 1
                      gen MigrantGeneration3 = inlist(granborn, 1, 2, 3, 4) 
                      gen MigrantGeneration4 = granborn == 0 
                      
                      gen string_sex = word("Male Female", sex)
                      replace sex = sex - 1 
                      
                      replace born = born - 1 
                      replace spkmslmy = spkmslmy - 1 
                      
                      gen id = _n
                      reshape long MigrantGeneration, i(id) j(which)
                      drop if MigrantGeneration == 0 
                      
                      lab def which 1 "First" 2 "Second" 3 "Third" 4 "Fourth"
                      lab values which which
                      
                      graph set window fontface "Candara Light"
                      
                      replace spkmslmy = 100 * spkmslmy
                      
                      preserve 
                      
                      collapse spkmslmy, by(string_sex which)  
                      gen toshow = strofreal(spkmslmy, "%2.1f")
                      
                      scatter spkmslmy which if string_sex == "Female", mla(toshow) mlabcolor(stc1) || ///
                      scatter spkmslmy which if string_sex == "Male", ms(T) mla(toshow) mlabcolor(stc2) ///
                      legend(order(1 "Female" 2 "Male") col(1) pos(1) ring(0)) ///
                      title("{fontface Merriweather Bold:Should religious extremists be allowed to speak?}", pos(11) span) ///
                      ytitle("Percent of Respondents Agreeing") ylabel(, angle(horizontal)) ///
                      subtitle("{fontface Merriweather Italic: By migrant generation & sex}", size(small) pos(11) span) ///
                      yla(30(10)80) xla(, valuelabel grid) xtitle(Generation) xsc(r(0.8 4.2)) ///
                      note("{fontface Merriweather Light: Source: U.S. General Social Survey 2022}", span margin(medium))
                      
                      restore
                      Click image for larger version

Name:	salazar3.png
Views:	1
Size:	48.9 KB
ID:	1728199

                      Comment


                      • #12
                        Some would be queasy about showing line charts here, but I don't think they're outrageous. Another idea is to ditch the legend in favour of direct labelling.


                        Code:
                        preserve 
                        
                        collapse spkmslmy, by(string_sex which)  
                        gen toshow = strofreal(spkmslmy, "%2.1f")
                        
                        gen spkmslmy4 = spkmslmy + 4
                        scatter spkmslmy which if string_sex == "Female", c(L) mlabpos(6) mla(toshow) mlabcolor(stc1) || ///
                        scatter spkmslmy which if string_sex == "Male", c(L) ms(T) mlabpos(6) mla(toshow) mlabcolor(stc2) || ///
                        scatter spkmslmy4 which if string_sex == "Female" & which == 4, mla(string_sex) mlabsize(medium) ms(none) mlabc(stc1) mlabpos(11) || ///
                        scatter spkmslmy4 which if string_sex == "Male" & which == 4, mla(string_sex) mlabsize(medium) ms(none) mlabc(stc2) mlabpos(11) ///
                        legend(off) ///
                        title("{fontface Merriweather Bold:Should religious extremists be allowed to speak?}", pos(11) span) ///
                        ytitle("Percent of Respondents Agreeing") ylabel(, angle(horizontal)) ///
                        subtitle("{fontface Merriweather Italic: By migrant generation & sex}", size(small) pos(11) span) ///
                        yla(30(10)80) xla(, valuelabel grid) xtitle(Generation) xsc(r(0.8 4.2)) ///
                        note("{fontface Merriweather Light: Source: U.S. General Social Survey 2022}", span margin(medium))
                        
                        restore
                        Click image for larger version

Name:	salazar4.png
Views:	1
Size:	46.0 KB
ID:	1728202

                        Comment

                        Working...
                        X