Creating bar graph using several dummy variables

John Kh

Join Date: Jul 2019

Posts: 12
#1

Creating bar graph using several dummy variables

13 Mar 2020, 17:50

Dear Stata community,

I want to create a bar graph where the x-axis is the dummy variable and y-axis is the frequency in percentage. I am not able to make all the dummy variables into one categorical variables as they are not exclusive.

Code:

Code:

input byte(math english chem econ) 1 0 1 1 0 1 0 1 1 1 0 0 1 1 0 0 0 1 0 1 1 0 0 1 end

Is there any way to make this bar chart?
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10213

13 Mar 2020, 18:20

Code:

clear
input byte(math english chem econ)
1 0 1 1
0 1 0 1
1 1 0 0
1 1 0 0
0 1 0 1 
1 0 0 1
end
rename (math english chem econ) (subject#), addnumber(1)
gen id=_n
reshape long subject, i(id) j(which)
lab def which 1 "math" 2 "english" 3 "chem" 4 "econ"
lab values which which
graph bar subject, over(which) scheme(s1mono) ///
ylab(0 "0" 0.2 "20" 0.4 "40" 0.6 "60" 0.8 "80") ytitle("Percent")

Click image for larger version

Name: Graph.png
Views: 1
Size: 14.3 KB
ID: 1541247

Comment

John Kh

Join Date: Jul 2019

Posts: 12
#3

13 Mar 2020, 19:18

Hi Andrew,

Thank you for your fast reply. I'm running into a problem because my actual dataset is huge and I'm not able to run this code:
reshape long subject, i(id) j(which) Is there a way to handle this issue?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#4

13 Mar 2020, 19:44

Without seeing the error message that you get, I cannot help much. You necessarily do not need to reshape the data, it is just easier to do things with your data in long layout. Here is a way with a wide layout. Otherwise, post exactly what you typed and the output from Stata if this does not help.

Code:

clear input byte(math english chem econ) 1 0 1 1 0 1 0 1 1 1 0 0 1 1 0 0 0 1 0 1 1 0 0 1 end graph bar math english chem econ, asyvars showyvars /// leg(off) scheme(s1mono) ylab(0 "0" 0.2 "20" 0.4 "40" 0.6 "60" 0.8 "80") /// ytitle("Percent") bargap(10) yvaroptions( relabel(1 "math" /// 2 "english" 3 "chem" 4 "econ"))

Attached Files
1 like
Comment
John Kh

Join Date: Jul 2019

Posts: 12
#5

13 Mar 2020, 21:39

Hi Andrew,

That worked perfectly, thank you. I also tried producing a horizontal plot but splitting it over another categorical variable. Can they be compared side by side rather than two separate chunks?

For example: Graph number 18 versus graph 14

https://www.ssc.wisc.edu/sscc/pubs/stata_bar_graphs.htm

Bar Graphs in Stata

https://www.ssc.wisc.edu
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10213

14 Mar 2020, 07:55

If you want anything like that, you will need to reshape. There are limits to what you can do with your data in wide layout. From the code in #2

Code:

clear
input byte(math english chem econ)
1 0 1 1
0 1 0 1
1 1 0 0
1 1 0 0
0 1 0 1
1 0 0 1
end

rename (math english chem econ) (subject#), addnumber(1)
gen id=_n
reshape long subject, i(id) j(which)
lab def which 1 "math" 2 "english" 3 "chem" 4 "econ"
lab values which which
set seed 03142020
gen female= runiformint(0,1)
lab def female 0 "male" 1 "female"
lab values female female
graph hbar subject, over(female) over(which) asyvars ///
 scheme(s1mono) ylab(0 "0" 0.2 "20" 0.4 "40" 0.6 "60" 0.8 "80" 1.0 "100") ///
ytitle("Percent") bargap(10)

Click image for larger version

Name: Graph.png
Views: 1
Size: 14.3 KB
ID: 1541287

Last edited by Andrew Musau; 14 Mar 2020, 08:00.

Comment

Julian Salazar

Join Date: Sep 2023
Posts: 2

24 Sep 2023, 11:11

Hi Anrew. I hope you are having a good day!

I was following your code because I have multiple non-exclusive categorical variables, which are: First Generation, Second Generation, Third Generation, and Fourth Generation.

I want to analyze the percentage of individuals that agree that religious extremists should be allowed to speak. However, when I try to run the code, this is the output that I get (attached):

This is my code:

Code:

clear

cd "D:\Julian Salazar\Cato Institute - Internship\

import delimited "D:\Julian Salazar\Cato Institute - Internship\Alex Nowrasteh\Data.txt"

*Migrant Generations

gen AllForeignBorn = ""
replace AllForeignBorn = "1" if born == 2
replace AllForeignBorn = "0" if born == 1
destring AllForeignBorn, replace

gen Second_Generation = ""
replace Second_Generation = "1" if paborn == 1 | maborn == 1
replace Second_Generation = "0" if paborn == 2 & maborn == 2
destring Second_Generation, replace

drop paborn maborn

gen Third_Generation = "0"
replace Third_Generation = "1" if granborn == 1 | granborn == 2 | granborn == 3 | granborn == 4 
destring Third_Generation, replace

gen Fourth_Generation = "0"
replace Fourth_Generation = "1" if granborn == 0 
destring Fourth_Generation, replace

gen string_sex=""
replace string_sex="Male" if sex==1
replace string_sex="Female" if sex==2

*Sex 

recode sex (1=0) (2=1), generate(new_sex)

drop sex

rename new_sex sex

*born

recode born (1=0) (2=1), generate(new_born)

drop born

rename new_born born

*Dependent variable: Religious Extremists

recode spkmslmy (1=0) (2=1), generate(new_spkmslmy)
drop spkmslmy
rename new_spkmslmy spkmslmy

*Rename Migrant Generations

rename (AllForeignBorn Second_Generation Third_Generation Fourth_Generation) (MigrantGeneration#), addnumber(1)

gen id = _n
reshape long MigrantGeneration, i(id) j(which)
lab def which 1 "A) First Generation" 2 "B) Second Generation" 3 "C) Third Generation" 4 "D) Fourth Generatio "
lab values which which

graph set window fontface "Candara Light"

graph hbar spkmslmy, over(string_sex, label(labsize(small))) ///
over(which, label(labsize(small))) ///
title("{fontface Merriweather Bold:Should Religious Extremeists be allowed to speak?}", pos(11) span) ///
ytitle("Percent of Agreeable Respondents", size(small)) ///
ylabel(, angle(horizontal)) ///
subtitle("{fontface Merriweather Italic: By Migrant Generation & Sex}", size(small) pos(11) span) ///
blabel(bar,  format(%9.1f)) ///
bargap(-30) ///
asyvars ///
note("{fontface Merriweather Light: Source: U.S. General Social Survey 2022}", span size(*0.7) margin(medium)) ///
scheme(cblind1)

I'm not sure what I'm doing wrong, the bars must be of different proportions according to the data.

I uploaded the data just in case.

If you could give me a little help with this I would greatly appreciate it

Attached Files

Data.txt (25.4 KB, 1 view)

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35720

25 Sep 2023, 06:14

I have comments on various levels.

Your .txt file is certainly readable, but it's better to give a data example as requested. I used contract followed by dataex and then expand reverses the contract.

Several of your data management operations can done more directly. Creating a string variable first when you want numeric and recoding when you can just subtract are unnecessary detours.

If you want to see percents, as you do, you should take the means over a variable with values 0 and 100, not a variable with values 0 and 1.

As a matter of taste, the text A) B) C) D) and repeating Generation seem unnecessary to me.

I have corrected an objective spelling error and made minor subjective style changes to the text.

The main error here is more subtle. Your reshape is a clever way to deal with overlapping categories, but everyone appears in every group unless you drop the observations wih zeros from the indicator variables, which is why the resuts are the same if you don't.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(spkmslmy spkcomy born paborn maborn granborn sex) int _freq
1 1 1 1 1 0 1 201
1 1 1 1 1 0 2 172
1 1 1 1 1 1 1  11
1 1 1 1 1 1 2  17
1 1 1 1 1 2 1  16
1 1 1 1 1 2 2  22
1 1 1 1 1 3 1   4
1 1 1 1 1 3 2   6
1 1 1 1 1 4 1  13
1 1 1 1 1 4 2   8
1 1 1 1 2 0 1   2
1 1 1 1 2 0 2   1
1 1 1 1 2 2 1   3
1 1 1 1 2 2 2   5
1 1 1 1 2 4 1   4
1 1 1 2 1 0 1   1
1 1 1 2 1 0 2   2
1 1 1 2 1 1 2   1
1 1 1 2 1 2 1   4
1 1 1 2 1 2 2   1
1 1 1 2 1 3 1   1
1 1 1 2 1 4 1   1
1 1 1 2 2 2 1   1
1 1 1 2 2 4 1   6
1 1 1 2 2 4 2  13
1 1 2 1 1 0 1   1
1 1 2 1 1 0 2   1
1 1 2 1 1 2 1   1
1 1 2 1 1 2 2   1
1 1 2 1 2 2 1   2
1 1 2 1 2 4 1   1
1 1 2 2 2 1 2   2
1 1 2 2 2 2 1   1
1 1 2 2 2 4 1  16
1 1 2 2 2 4 2   6
1 2 1 1 1 0 1  10
1 2 1 1 1 0 2  13
1 2 1 1 1 1 2   2
1 2 1 1 1 2 1   2
1 2 1 1 1 4 1   1
1 2 1 2 1 0 1   1
1 2 1 2 1 2 1   1
1 2 2 2 2 1 2   1
1 2 2 2 2 4 2   5
2 1 1 1 1 0 1  68
2 1 1 1 1 0 2 103
2 1 1 1 1 1 1   7
2 1 1 1 1 1 2  14
2 1 1 1 1 2 1  15
2 1 1 1 1 2 2  14
2 1 1 1 1 3 2   3
2 1 1 1 1 4 1   5
2 1 1 1 1 4 2   4
2 1 1 1 2 2 1   4
2 1 1 1 2 2 2   1
2 1 1 1 2 4 2   2
2 1 1 2 1 0 2   1
2 1 1 2 1 1 1   2
2 1 1 2 1 2 1   2
2 1 1 2 1 2 2   1
2 1 1 2 1 3 2   2
2 1 1 2 1 4 2   1
2 1 1 2 2 1 2   1
2 1 1 2 2 4 1   4
2 1 1 2 2 4 2   4
2 1 2 1 1 0 1   1
2 1 2 1 1 0 2   1
2 1 2 2 2 2 1   2
2 1 2 2 2 3 1   1
2 1 2 2 2 3 2   1
2 1 2 2 2 4 1  21
2 1 2 2 2 4 2  23
2 2 1 1 1 0 1  57
2 2 1 1 1 0 2 136
2 2 1 1 1 1 1   1
2 2 1 1 1 1 2   8
2 2 1 1 1 2 1   4
2 2 1 1 1 2 2   8
2 2 1 1 1 3 1   2
2 2 1 1 1 3 2   2
2 2 1 1 1 4 1   1
2 2 1 1 1 4 2   5
2 2 1 1 2 2 1   1
2 2 1 1 2 2 2   3
2 2 1 1 2 3 1   1
2 2 1 1 2 3 2   1
2 2 1 1 2 4 2   1
2 2 1 2 1 0 1   1
2 2 1 2 1 0 2   1
2 2 1 2 1 3 2   2
2 2 1 2 1 4 1   2
2 2 1 2 1 4 2   2
2 2 1 2 2 3 2   1
2 2 1 2 2 4 1   4
2 2 1 2 2 4 2   6
2 2 2 1 1 1 2   1
2 2 2 1 2 0 2   1
2 2 2 2 1 0 2   1
2 2 2 2 2 2 1   1
2 2 2 2 2 2 2   1
2 2 2 2 2 4 1  18
2 2 2 2 2 4 2  16
end

expand _freq 

* you start here 
gen MigrantGeneration1 = born - 1 
gen MigrantGeneration2 = paborn == 1 | maborn == 1
gen MigrantGeneration3 = inlist(granborn, 1, 2, 3, 4) 
gen MigrantGeneration4 = granborn == 0 

gen string_sex = word("Male Female", sex)
replace sex = sex - 1 

replace born = born - 1 
replace spkmslmy = spkmslmy - 1 

gen id = _n
reshape long MigrantGeneration, i(id) j(which)
drop if MigrantGeneration == 0 

lab def which 1 "First" 2 "Second" 3 "Third" 4 "Fourth"
lab values which which

graph set window fontface "Candara Light"

replace spkmslmy = 100 * spkmslmy

graph hbar spkmslmy, over(string_sex, label(labsize(small))) ///
over(which, label(labsize(small))) ///
title("{fontface Merriweather Bold:Should religious extremists be allowed to speak?}", pos(11) span) ///
ytitle("Percent of Respondents Agreeing", size(small)) ///
ylabel(, angle(horizontal)) ///
subtitle("{fontface Merriweather Italic: By migrant generation & sex}", size(small) pos(11) span) ///
blabel(bar,  format(%9.1f)) ///
bargap(-30) ///
asyvars ///
note("{fontface Merriweather Light: Source: U.S. General Social Survey 2022}", span size(*0.7) margin(medium)) 

* scheme(cblind1)

Click image for larger version

Name: salazar.png
Views: 1
Size: 28.9 KB
ID: 1728103

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35720

25 Sep 2023, 08:39

Here as often bar charts are conventional for many groups but dot charts can be used to convey the same information. Here are some different ideas:

Code:

separate spkmslmy, by(string_sex) veryshortlabel

graph dot spkmslmy?,  ///
over(which, label(labsize(small))) ///
legend(order(1 "Female" 2 "Male") col(1) pos(3)) ///
title("{fontface Merriweather Bold:Should religious extremists be allowed to speak?}", pos(11) span) ///
ytitle("Percent of Respondents Agreeing", size(small)) ///
marker(2, ms(T)) ylabel(, angle(horizontal)) vertical ///
linetype(line) lines(lw(vthin) lc(gs12)) ///
subtitle("{fontface Merriweather Italic: By migrant generation & sex}", size(small) pos(11) span) ///
blabel(bar,  format(%9.1f) pos(outside) size(medium)) exclude0 yla(30(10)80) ///
note("{fontface Merriweather Light: Source: U.S. General Social Survey 2022}", span size(*0.7) margin(medium))

Details to be accepted or rejected:

1. A dot chart reduces ink and allows direct comparison even more effectively.

2. There is no need to start the scale at zero. Most of the interest lies in comparing values for generations and sexes with each other, not with zero.

3. Shorter value labels allow vertical alignment.

Click image for larger version

Name: salazar2.png
Views: 1
Size: 32.4 KB
ID: 1728116

Last edited by Nick Cox; 25 Sep 2023, 08:57.

Comment

Julian Salazar

Join Date: Sep 2023

Posts: 2
#10

25 Sep 2023, 19:32

Nick, as a student I want to express my gratitude. You have been extremely helpful. I'm relatively new-intermediate using Stata, and I feel very passionate about data visualization and analysis. I wish you a very excellent day!
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35720

#11

26 Sep 2023, 03:06

You're welcome. #9 is close to what you could get with scatter any way. The main detail by way of difference is flexibility over positioning marker labels.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(spkmslmy spkcomy born paborn maborn granborn sex) int _freq
1 1 1 1 1 0 1 201
1 1 1 1 1 0 2 172
1 1 1 1 1 1 1  11
1 1 1 1 1 1 2  17
1 1 1 1 1 2 1  16
1 1 1 1 1 2 2  22
1 1 1 1 1 3 1   4
1 1 1 1 1 3 2   6
1 1 1 1 1 4 1  13
1 1 1 1 1 4 2   8
1 1 1 1 2 0 1   2
1 1 1 1 2 0 2   1
1 1 1 1 2 2 1   3
1 1 1 1 2 2 2   5
1 1 1 1 2 4 1   4
1 1 1 2 1 0 1   1
1 1 1 2 1 0 2   2
1 1 1 2 1 1 2   1
1 1 1 2 1 2 1   4
1 1 1 2 1 2 2   1
1 1 1 2 1 3 1   1
1 1 1 2 1 4 1   1
1 1 1 2 2 2 1   1
1 1 1 2 2 4 1   6
1 1 1 2 2 4 2  13
1 1 2 1 1 0 1   1
1 1 2 1 1 0 2   1
1 1 2 1 1 2 1   1
1 1 2 1 1 2 2   1
1 1 2 1 2 2 1   2
1 1 2 1 2 4 1   1
1 1 2 2 2 1 2   2
1 1 2 2 2 2 1   1
1 1 2 2 2 4 1  16
1 1 2 2 2 4 2   6
1 2 1 1 1 0 1  10
1 2 1 1 1 0 2  13
1 2 1 1 1 1 2   2
1 2 1 1 1 2 1   2
1 2 1 1 1 4 1   1
1 2 1 2 1 0 1   1
1 2 1 2 1 2 1   1
1 2 2 2 2 1 2   1
1 2 2 2 2 4 2   5
2 1 1 1 1 0 1  68
2 1 1 1 1 0 2 103
2 1 1 1 1 1 1   7
2 1 1 1 1 1 2  14
2 1 1 1 1 2 1  15
2 1 1 1 1 2 2  14
2 1 1 1 1 3 2   3
2 1 1 1 1 4 1   5
2 1 1 1 1 4 2   4
2 1 1 1 2 2 1   4
2 1 1 1 2 2 2   1
2 1 1 1 2 4 2   2
2 1 1 2 1 0 2   1
2 1 1 2 1 1 1   2
2 1 1 2 1 2 1   2
2 1 1 2 1 2 2   1
2 1 1 2 1 3 2   2
2 1 1 2 1 4 2   1
2 1 1 2 2 1 2   1
2 1 1 2 2 4 1   4
2 1 1 2 2 4 2   4
2 1 2 1 1 0 1   1
2 1 2 1 1 0 2   1
2 1 2 2 2 2 1   2
2 1 2 2 2 3 1   1
2 1 2 2 2 3 2   1
2 1 2 2 2 4 1  21
2 1 2 2 2 4 2  23
2 2 1 1 1 0 1  57
2 2 1 1 1 0 2 136
2 2 1 1 1 1 1   1
2 2 1 1 1 1 2   8
2 2 1 1 1 2 1   4
2 2 1 1 1 2 2   8
2 2 1 1 1 3 1   2
2 2 1 1 1 3 2   2
2 2 1 1 1 4 1   1
2 2 1 1 1 4 2   5
2 2 1 1 2 2 1   1
2 2 1 1 2 2 2   3
2 2 1 1 2 3 1   1
2 2 1 1 2 3 2   1
2 2 1 1 2 4 2   1
2 2 1 2 1 0 1   1
2 2 1 2 1 0 2   1
2 2 1 2 1 3 2   2
2 2 1 2 1 4 1   2
2 2 1 2 1 4 2   2
2 2 1 2 2 3 2   1
2 2 1 2 2 4 1   4
2 2 1 2 2 4 2   6
2 2 2 1 1 1 2   1
2 2 2 1 2 0 2   1
2 2 2 2 1 0 2   1
2 2 2 2 2 2 1   1
2 2 2 2 2 2 2   1
2 2 2 2 2 4 1  18
2 2 2 2 2 4 2  16
end

expand _freq 

* you start here 
gen MigrantGeneration1 = born - 1 
gen MigrantGeneration2 = paborn == 1 | maborn == 1
gen MigrantGeneration3 = inlist(granborn, 1, 2, 3, 4) 
gen MigrantGeneration4 = granborn == 0 

gen string_sex = word("Male Female", sex)
replace sex = sex - 1 

replace born = born - 1 
replace spkmslmy = spkmslmy - 1 

gen id = _n
reshape long MigrantGeneration, i(id) j(which)
drop if MigrantGeneration == 0 

lab def which 1 "First" 2 "Second" 3 "Third" 4 "Fourth"
lab values which which

graph set window fontface "Candara Light"

replace spkmslmy = 100 * spkmslmy

preserve 

collapse spkmslmy, by(string_sex which)  
gen toshow = strofreal(spkmslmy, "%2.1f")

scatter spkmslmy which if string_sex == "Female", mla(toshow) mlabcolor(stc1) || ///
scatter spkmslmy which if string_sex == "Male", ms(T) mla(toshow) mlabcolor(stc2) ///
legend(order(1 "Female" 2 "Male") col(1) pos(1) ring(0)) ///
title("{fontface Merriweather Bold:Should religious extremists be allowed to speak?}", pos(11) span) ///
ytitle("Percent of Respondents Agreeing") ylabel(, angle(horizontal)) ///
subtitle("{fontface Merriweather Italic: By migrant generation & sex}", size(small) pos(11) span) ///
yla(30(10)80) xla(, valuelabel grid) xtitle(Generation) xsc(r(0.8 4.2)) ///
note("{fontface Merriweather Light: Source: U.S. General Social Survey 2022}", span margin(medium))

restore

Click image for larger version

Name: salazar3.png
Views: 1
Size: 48.9 KB
ID: 1728199

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35720

#12

26 Sep 2023, 03:57

Some would be queasy about showing line charts here, but I don't think they're outrageous. Another idea is to ditch the legend in favour of direct labelling.

Code:

preserve 

collapse spkmslmy, by(string_sex which)  
gen toshow = strofreal(spkmslmy, "%2.1f")

gen spkmslmy4 = spkmslmy + 4
scatter spkmslmy which if string_sex == "Female", c(L) mlabpos(6) mla(toshow) mlabcolor(stc1) || ///
scatter spkmslmy which if string_sex == "Male", c(L) ms(T) mlabpos(6) mla(toshow) mlabcolor(stc2) || ///
scatter spkmslmy4 which if string_sex == "Female" & which == 4, mla(string_sex) mlabsize(medium) ms(none) mlabc(stc1) mlabpos(11) || ///
scatter spkmslmy4 which if string_sex == "Male" & which == 4, mla(string_sex) mlabsize(medium) ms(none) mlabc(stc2) mlabpos(11) ///
legend(off) ///
title("{fontface Merriweather Bold:Should religious extremists be allowed to speak?}", pos(11) span) ///
ytitle("Percent of Respondents Agreeing") ylabel(, angle(horizontal)) ///
subtitle("{fontface Merriweather Italic: By migrant generation & sex}", size(small) pos(11) span) ///
yla(30(10)80) xla(, valuelabel grid) xtitle(Generation) xsc(r(0.8 4.2)) ///
note("{fontface Merriweather Light: Source: U.S. General Social Survey 2022}", span margin(medium))

restore

Announcement