Graphing, Labels, and Missing Data

Erika Kociolek

Join Date: Apr 2014

Posts: 83
#1

Graphing, Labels, and Missing Data

18 Jun 2014, 14:10

I am trying to create a graph using the code below but am running into an issue. I'm creating a graph composed of multiple graphs; one for each year between 1989 and 1995. The first two graphs (1989 and 1990) show the category "Below 30" in one color (on my screen, blue). The remaining graphs (1990-1995) don't have any observations falling into the "Below 30" category, and so the first category for these years ("30-50") shows up as the same color blue that was used for the "Below 30" category in the 1989 and 1990 graphs. This makes it difficult to look at trends over time. I'm wondering if there's a way to get all graphs to use the same color and have the same legend.

I would greatly appreciate any advice Stata experts have about this.

Thank you,
Erika

/*Code*/
*Using the cigconsump dataset that can be found here http://www.stata-press.com/data/imeus.html.
cd C:/ado/plus/imeus/

use ../imeus/cigconsump, clear
generate region = "N. Centr" if inlist(state, "OH", "MI", "KS", "IA", "WI")
replace region = "N. East" if inlist(state, "NH", "CT", "RI", "VT", "MA")
replace region = "South" if inlist(state, "AL", "GA", "LA", "TX", "FL")
replace region = "West" if inlist(state, "CA", "WA", "OR", "UT", "CO")
drop if (region == "")

generate tax_tier = 1 if taxs < 30
replace tax_tier = 2 if taxs >= 30 & taxs < 50
replace tax_tier = 3 if taxs >= 50 & taxs < 60
replace tax_tier = 4 if taxs >= 60 & taxs < 70
replace tax_tier = 5 if taxs >= 70

label define tax_tier_label 1 "Below 30" 2 "30-50" 3 "50-60" 4 "60-70" 5 "Over 70"
label values tax_tier tax_tier_label

generate ones = 1
keep if year > 1988

save ../imeus/cigconsump_cleaned, replace

forvalues x = 1989/1995{
use ../imeus/cigconsump_cleaned, clear
keep if year == `x'
graph bar (count) ones, over(tax_tier, label(labsize(vsmall))) over(region, label(labsize(vsmall))) stack asyvars title(`x')
graph save ../imeus/1_`x'.gph, replace
collapse (sum) ones, by(region tax_tier)
bys region : egen total_region = total(ones)
generate p_total_region = round(ones/total_region, 0.01)*100
graph bar (asis) p_total_region, over(tax_tier, label(labsize(vsmall))) over(region, label(labsize(vsmall))) stack asyvars title(`x')
graph save ../imeus/2_`x'.gph, replace
}

graph combine ../imeus/1_1989.gph ///
../imeus/1_1990.gph ///
../imeus/1_1991.gph ///
../imeus/1_1992.gph ///
../imeus/1_1993.gph ///
../imeus/1_1994.gph ///
../imeus/1_1995.gph
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4485
#2

18 Jun 2014, 14:27

you can set the color of each bar to whatever you want; see -h barlook_options-
Comment
Erika Kociolek

Join Date: Apr 2014

Posts: 83
#3

18 Jun 2014, 14:54

Thank you for the quick response, Rich. When I add the following line of code after the first graph bar command, nothing changes.

bar(1, color(gs7)) bar(2, color(stone)) bar(3, color(teal)) bar(4, color(ltblue)) bar(5, color(gray))

I think the issue is that the first bar category is different depending on the year, and I'm not sure how to address that. Apologies if I am missing something in the help file.

Best,
Erika
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35754

19 Jun 2014, 03:20

The essence of your problem is that bar charts produced separately don't know anything about others' internals. So, don't do that...

catplot (SSC) can do what you want without obliging the user to invent the machinery of countings 1s separately, combining graphs, and so forth. Assuming that catplot is installed, this is a self-contained script showing some technique.

Code:

 
clear 
copy http://www.stata-press.com/data/imeus/cigconsump.dta cigconsump.dta, replace 
u cigconsump 

generate region = "N. Centr" if inlist(state, "OH", "MI", "KS", "IA", "WI")
replace region = "N. East" if inlist(state, "NH", "CT", "RI", "VT", "MA")
replace region = "South" if inlist(state, "AL", "GA", "LA", "TX", "FL")
replace region = "West" if inlist(state, "CA", "WA", "OR", "UT", "CO")
drop if (region == "")

generate tax_tier = 1 if taxs < 30
replace tax_tier = 2 if taxs >= 30 & taxs < 50
replace tax_tier = 3 if taxs >= 50 & taxs < 60
replace tax_tier = 4 if taxs >= 60 & taxs < 70
replace tax_tier = 5 if taxs >= 70

label define tax_tier_label 1 "Below 30" 2 "30-50" 3 "50-60" 4 "60-70" 5 "Over 70" 
label values tax_tier tax_tier_label

catplot tax_tier region if inrange(year, 1989, 1994), by(year) stack asyvars

At a guess, the dataset you used is just a toy and not central to your real problem, so I've held off suggesting better colours and so forth, but I note that here, as indeed often, horizontal bar charts are friendlier.

Comment

Erika Kociolek

Join Date: Apr 2014

Posts: 83
#5

19 Jun 2014, 07:26

Thank you for the advice; catplot works wonderfully.
Comment

Announcement