Wrapping text in a chart?

Taylor Walter

Join Date: Mar 2018

Posts: 80
#1

Wrapping text in a chart?

04 Apr 2018, 16:46

For any folks who are particularly good at data visualization in Stata:

I'm currently producing bar graphs for the number of times an online course has been completed/retaken/started and quit/etc., and currently want to visualize just the top 10 for each.

The issue is that the course titles are a bit long, so when I produce the graph, it's just a big blob of impossible-to-discern text along the bottom of the chart.

Other than the obvious but clunky method of renaming each course variable (which would take quite awhile, as I'd have to rename 10 courses for each of the variables, would probably end up in 60-70 renames), is there a way to "wrap" the text like you can in Word, or something similar?

If it's helpful, here's my code for one graph as is:

Code:

graph bar (rawsum) course_starts if coursecount == 1, over(course) blabel(bar) ytitle(Courses started) title(Top 10 courses started)

Thanks much.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35486
#2

05 Apr 2018, 04:51

Why not use graph hbar?
Comment
Taylor Walter

Join Date: Mar 2018

Posts: 80
#3

05 Apr 2018, 06:12

I tried hbar first, actually, but then what happens is the course name is long enough that it takes up 3/4 of the space, so the bars for the graph, and the graph itself, looks tall and skinny and not quite right. Plus the numbers along the ticks on the x axis then start to overlap, as well. If there was a way to "Wrap" text or something similar, though, hbar would also be effective (and I guess possibly even more effective, as there might be a little more space).

Actually, another thought related to hbar: an unfortunate way the data was sent to me has each course start with "Course:" ... is there a way to systematically do some type of "find" and "delete all" for a single word within a variable? If so, that might shorten the names so with hbar, it may look a little more normal.

Last edited by Taylor Walter; 05 Apr 2018, 06:15.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35486
#4

05 Apr 2018, 06:19

I think we will make faster progress if you give an example of your data, as is the permanent request here. https://www.statalist.org/forums/help#stata explains.

Code:

gen Course = subinstr(course, "Course.", "", .)

is technique for deleting text. I recommend a new variable, just in case the old one remains needed for other purposes.
Comment

Taylor Walter

Join Date: Mar 2018
Posts: 80

05 Apr 2018, 09:18

Yes, sorry, I was hoping it would have had an easy quick fix without presenting data, but I should have included it anyway. Here is an example:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str51 course double(avghours medianhours)
"Course - Political Science 232: American Government" 63.9869047619047 51.4416
"Course - Economic Literature 101"                    36.3290364583333 21.6166
"Course - Series of Lessons"                          35.2166666666666 35.2166
"Course - Introduction to Western Political Thought"  27.0430303030303 23.3833
"Course - History of Dance 101: Aboriginal Dance"     25.5866666666666 21.7166
"Course - Beginning Algebra "                         24.3146198830409   19.65
"Course - Introduction to Cisco Networking "          20.3813465783664 19.0833
"Course - Business Law and Ethics"                    17.6800995024875  8.4833
"Course - Peace Education Program"                    16.6132411481248   12.65
"Course - Real World Math 101"                        15.8733660130718   12.25
"Course - Principles of Management "                  15.7005144032921   8.675
"Course - Economics of Thought and Rationality"       14.7414015904572 13.5666
"Course - Philosophy of Education Studies"            13.6906229290921   13.05
end

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35486

05 Apr 2018, 10:06

This is now about different data -- and I just focus on what you have given without promising that it will be ideal, or even a good idea, for any other data.

I would now not dream of a bar chart. I naturally agree that the long names are a problem and show some of the things you can do. We should all be clear that names that are vertical or at a steep angle would just be a nightmare.

This code will run as is. I use scheme s1color by default.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str51 course double(avghours medianhours)
"Course - Political Science 232: American Government" 63.9869047619047 51.4416
"Course - Economic Literature 101"                    36.3290364583333 21.6166
"Course - Series of Lessons"                          35.2166666666666 35.2166
"Course - Introduction to Western Political Thought"  27.0430303030303 23.3833
"Course - History of Dance 101: Aboriginal Dance"     25.5866666666666 21.7166
"Course - Beginning Algebra "                         24.3146198830409   19.65
"Course - Introduction to Cisco Networking "          20.3813465783664 19.0833
"Course - Business Law and Ethics"                    17.6800995024875  8.4833
"Course - Peace Education Program"                    16.6132411481248   12.65
"Course - Real World Math 101"                        15.8733660130718   12.25
"Course - Principles of Management "                  15.7005144032921   8.675
"Course - Economics of Thought and Rationality"       14.7414015904572 13.5666
"Course - Philosophy of Education Studies"            13.6906229290921   13.05
end

gen Course = trim(subinstr(course, "Course - ", "", .)) 

graph dot (asis) med avg, ///
over(Course, sort(2) descending) marker(2, ms(Oh)) marker(1, ms(+)) ///
legend(order(1 "median" 2 "average")) ytitle(Hours per course) ysc(alt) ///
linetype(line) lines(lc(gs12) lw(vthin)) name(G1, replace) 

gen Course2 = subinstr(Course, "Introduction", "Intro", .) 
replace Course2 = subinstr(Course2, "Political Science", "Pol Sci", .) 

graph dot (asis) med avg, ///
over(Course2, sort(2) descending label(labsize(small))) marker(2, ms(Oh)) marker(1, ms(+)) ///
legend(order(1 "median" 2 "average")) ytitle(Hours per course) ysc(alt) ///
linetype(line) lines(lc(gs12) lw(vthin)) name(G2, replace)

Click image for larger version

Name: hourspercourse1.png
Views: 1
Size: 65.8 KB
ID: 1437805

Click image for larger version

Name: hourspercourse2.png
Views: 1
Size: 55.0 KB
ID: 1437806

Small points:

1. Since medians are typically less than averages, I mention them first, so they automatically go first in the legend. Not essential, but a useful detail.

2. I find that the default grid of dotted lines can degrade when the graphs are copied to other software, whereas thin grey continuous lines work well in many settings.

3. Whenever marker symbols may occlude each other, it is a good idea to use combinations such as O and + that can be distinguished even when that happens.

4. Horizontal axis stuff at the top matches well whenever graphs have a kind of table flavour, but that may be just personal taste. Much more on that at https://www.stata-journal.com/sjpdf....iclenum=gr0053

On the meta- front: Except for StataCorp personnel, we are here all volunteers eating into our spare time. You really shouldn't want us to guess at your data, or data like yours, or your variable names, or names like yours, when you can just show us those with a few seconds' work and answers are then likely to be much closer to what you could do. Besides, a real example showing the underlying problems can lead to other suggestions, as here.

Comment

Taylor Walter

Join Date: Mar 2018

Posts: 80
#7

05 Apr 2018, 10:20

Incredibly useful, Nick, thank you. A bar chart is what was requested, but I think I have enough ammunition here to argue to switch the type of graph.

I wasn't aware of the subinstr code that seems to change names, useful for common words like Introduction, as used here. I'll play around with this a bit more.

And I guess for my original question, then, this means the answer is no, there is not any way of wrapping text on a graph. Also very useful to know moving forward.

Thanks again.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35486
#8

05 Apr 2018, 10:31

You can break names. It's just a pain and solves one problem by creating another. Documented within

Code:

help graph bar

Search for multiple-line labels.
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#9

30 Apr 2018, 04:28

Taylor Walter
I cannot remember the name of the function within eda, but I have a function specifically to split labels based on a user specified number of characters. As Nick mentioned, it definitely is a pain and there are likely to be some bugs that I’ve not yet encountered with it, but it might provide some other ideas of how you can handle things.
Comment

Announcement