Wishlist for Stata 18

Chen Samulsion

Join Date: Jan 2018

Posts: 926
#136

07 Oct 2021, 22:42

I think it's time to redesign Stata's scheme. The s2color (factory setting) is basically fine, however its Light-bluish-gray used in outer region (graphregion) have raised complaints for many years. Maybe we can just set default fcolor as white or other simple colors. And the axis labels are set to vertical as default, and suboptions of orientation() seems undocumented, which result in graphs that are difficult and peculiar to read for audience. The economist scheme is white elephant (useless and ugly) for me, and I wonder why Stata retain it as an exception for so many years. I have used my own scheme since 2019. And there are more and more user written schemes that mimic ggplot (which is not necessarily the best for statistical graph, just as what I cited below have commented) or other publish styles. For example, lean (@Svend Juul), rbn (@Roger Newson), scientific (@Ariel Linden), tufte (@Ulrich Atz), burd (@François Briatte.), cgd (@Mead Over), cleanplots (@Trenton D Mize), mrc (@Tim Morris), tfl (@Tim Morris), yale (@Aaron Wolf) , gg538, ggtig, ggplain (Daniel Bischof), and some commands to generate customized scheme files such as -brewscheme- (@wbuchanan), or commands to customize the overall look of graphs such as -grstyle- (@Ben Jann). The StataCorp have reinforced Stata's functionality with every new version released, so maybe it's time for them to redesign and reinforce the Scheme. Below I cited Nick Cox's book review that he published in https://www.amazon.com/gp/customer-r...22MWD7RJ6QAFP/ . Nick seems prefer to s1color scheme, and in the review Nick talks a lot on "ABC" of statistical graph and some things on aesthetics.

Although the title does not spell it out (for marketing reasons?), this book is by a scientist -- Claus Wilke is a physicist-turned-biologist, and so experienced across a range of sciences -- and primarily for scientists. That could easily include engineers, social scientists, medical and health people, and so forth: the examples here cover a widerange, as more crucially do the principles. Nor does that target readership necessarily exclude people in journalism, graphic design, orbusiness, for whom most recent books on data visualization seem to bewritten any way.

I disagree slightly with the author. It's a good idea to read, or atleast skim, the entire book quickly, rather than just to sample chapters piecemeal. Some of the tips and tastes of the author make fullest sense in the light of discussions given late in the book. Either way, Wilke strikes a welcome balance, firm but modest, in giving arguments both for and against specific graphic choices. The flavor is very much "This is what I suggest, but do something different if your circumstances make it a better idea or you have a good argument for another decision".

In data visualization, the devil is usually in the details. It can be a small lapse of design that dooms a graphic to uselessness or makes it unnecessarily difficult to follow. It can be a small twist of ingenuity or style that makes a graphic outstanding. At the same time, scientists should be easily persuaded that a graphic must be designed for clear and simple presentation of their data or results. All else is secondary at best. The outcome should be a reader's Aha! not Wow! or Huh? Other cultures march to different tunes: graphic designers are encouraged to be innovative, but that way can lead to data art, or difference for difference's sake. There are good reasons for the main graphic designs, in science principally bar and line charts and scatter plots, all well in place a century or more.

What is in particular excellent here?

Emphasis on static figures. Interactive and dynamic graphics can be spectacular, but most readers don't have time to play, and two-dimensional graphics are still the norm.

Treatment of color. Many texts now do a good job on color, but Wilke is excellent. His default palette is well chosen (see pp.28, 33). Wilke perhaps underestimates how many people are still geared to publishing in black and white, but then again we are getting closer to a time when all figures can make use of color.

State units on the axes (p.270). But there is no need to explain, say, 2014 to 2018 on a time axis as "year". These points should have been evident in high school, but are still often ignored even by experienced researchers.

Logarithmic and root scales. The need for, and value of, logarithmic scales is widely appreciated in scientific and statistical graphics, but Wilke's account is especially good, bringing out specific points such as ratios often needing them (p.18) and 1 being a special value on such scales (pp.20, 215). Powers of 2 can be good axis labels (p.215). Mentioning square root scales is less usual but welcome (pp.20-22).

Welcome warnings. Don't rotate axis labels, but keep them horizontal(pp.46-47). Avoid alphabetical or other arbitrary orders of bars or similar elements (pp.48-50). Frequency and probability distributions are better shown as areas (many examples). Dashed and dotted lines often do not work well (pp.247, 299)

Enhance your plots. It can be fine to add a few numbers to a plot(pp.53, 110). Direct labeling of graphic elements within a plot can allow you to remove an awkward legend (pp.69, 235, 251-252).

ggplot2. This R package is currently extremely popular. Wilke's comments are worth quoting at length, as well judged and as an example of his generous style: "With apologies to the ggplot2 author Hadley Wickham, for whom I have the utmost respect, I don't find the white-on-gray background grid particularly attractive. To my eye, the gray background can detract from the actual data, and a grid with major and minor lines can be too dense. I also find the gray squares in the legend confusing."(p.282)

Naturally there are always small disagreements, as a matter of taste or even principle. Wilke rightly warns against densities being smoothed into areas where they do not belong, but does not explain alternatives beyond truncating the display (pp.63-64). Other solutions not discussed here include estimating densities for a transformed scale such as logarithmic or logit and then back-transforming.

On p.209 Wilke states that bars on a linear scale should always start at zero. This advice is a good starting point, but there are defensible exceptions. Examples I have seen include bars for temperatures on a Fahrenheit scale starting at freezing; sex ratios (number of females/number of males) with bars starting at 1 for parity. More nuanced advice could then be that bars should always start at a natural reference level, often but not necessarily zero. Bar height then encodes deviationor distance from that level.

I don't agree that different and open point symbols, such as open circles and plus marks, add unwelcome visual noise (pp.301-2). The boxplot idea can be often be combined with more detail on the data without compromising either box or details (p.302).

Wilke does explain well that there are usually much better choices than pie charts. Given that, why are there so many here?

Jittering (defined on p.84): Shaking points apart by adding random noise remains a brilliant idea, but just stacking them neatly is often less disconcerting.

History of ideas is always tricky. There is always scope for a neglected precursor in some other literature. Despite many refutations, the meme that Tukey invented box plots is echoed here. He suggested the name, and many new details, but geographers were there with dispersion diagrams inthe 1930s and Kenneth W. Haemer wrote on range-bar plots in 1948 beforeTukey (and before Mary Eleanor Spear too). Similarly, Tufte's name "slopegraph" is good, but he really didn't invent them.
4 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#137

08 Oct 2021, 04:34

s2color is what it is: it's just the company house style and the default default (*) scheme. The situation now is exactly what it was in 2003 when the new graphics appeared in Stata 8. If you don't like the default default, feel free to switch immediately or even permanently to another scheme that is available, say by declaring your own default. . Tastes and rules differ, and your choices can vary accordingly.

(*) Not a typo.

I doubt that it's quite the purpose but in some ways it is as if s2color is intended to provoke people to think what they want instead.

I agree that there are many schemes to choose from -- but that's the same point again. Notice how all these people don't quite agree with anyone else and that is why there are so many different schemes.

I don't fight them but intermittently I see ignorant and even idiotic comments on social media from people who don't understand this. It's sad as well as irritating that people haven't done the few minutes of reading around that makes this clear.

I've tried many of these community-contributed schemes and typically I think "Sure, that's better than s2color, but I don't like X, Y or Z, so I am not switching". I never got round to writing my own scheme for serious, and tend to start with s1color and then tweak ad hoc. But no one should care about that.

Detail: Usually yla(, ang(h)) is a good idea but for some purposes it can work badly as a default. Better to think: Oh, yla(, ang(h)) will improve readability and in my case there is enough space to do it.
3 likes
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 926
#138

08 Oct 2021, 08:02

Yes, I quite agree, however, just change the Light-bluish-gray color of outer region may gain more than lose. So many users have complained this color, and new hand have no idea what to do with his/her hands and feet when they want to change it, whether to satisfy their reviewesr and editors or just for sake of an aesthetics consideration. And yes, this is still a taste, one man's meat is another man's poison.
As to the axis labels. To set ylabel horizontal as default seems to be a natural reflection. When I have to plot, I usually use -splitvallabels- (SSC) or -splitvarlabels- (I wrote by myself) to split value labels and variable labels. The ytitle(, orientation(horizontal)) is important because for some language, especially Chinese, one character is correponding to one word, that is very contrary to English or Latin where one character (letter) has no sense, and a word is composed of three or more characters (letters). So for English and Latin, vertical labels are natural, whereas horizontal labels will waste space and appears peculiar. But for Chinese, horizontal labels (and one character occupys one line) are more easier to read.
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#139

08 Oct 2021, 09:51

What works best in Chinese is exactly for people who know Chinese to say, and unsurprisingly that's not me.. That's the same major point: some defaults can be terrible choices for some circumstances; so don't do that then!

I think the complaints I see -- typically outside Statalist -- are mostly about people just using s2color and not realising that there are other choices that are usually better. Some of those complaints are not complaints so much as slightly cruel or malicious mockery, but then again if people make lazy or clueless choices where does the blame lie?

Many years ago someone who started with Stata very early said to me "I don't know how people learn Stata. I started with Stata when you could know about everything that was on offer. As each new version comes out, I just look at what has been added and decide what to add to my personal toolkit. But now Stata is so big, I don't know how anyone starts." That was a great point, that gets more sting each time Stata adds new stuff. But a fair fraction of what this forum is about is the more experienced guiding the less experienced (also, the more experienced pick up plenty too!). And, to the point. shucks, graph schemes seem pretty easy in principle to learn about. Stata has a default default scheme. If you don't like it, use another. Why is this thought to be difficult to understand or to explain to others? I explain it to my students in two minutes.

I don't mind one bit if StataCorp changes s2color (they won't do that for other reasons) or drops it for another default in their examples (more likely), but equally I will not join any lobby that they should do it. Many more things are more important to add or to fix.
4 likes
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 926
#140

08 Oct 2021, 10:26

Stata has a default default scheme. If you don't like it, use another.

Stata enthusiast or the more experienced can certainly and easily switch to another scheme or to tweak ad hoc, I doubt newcomer and freshman can do that, they maybe frightened in the initial stage and choose to escape. Svend Juul's description still hold:

This was the user's first impression:
• It looks ugly (especially the Viewer window font).
• It behaves confusingly.
• The documentation [GSW] is misleading.
• “I give up using Stata. It's unprofessional.”
• “I give up using Stata. I'm not bright enough.”
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#141

08 Oct 2021, 10:53

Svend Juul is great at Scandi noir. My first impressions were love at first sight. I like this. Still true 30 years later.
1 like
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#142

13 Oct 2021, 20:27

Nick Cox Couldn't agree more. Me personally, I think Stata's default default graphics are really ugly. Others think differently, and that's okay (even if I think they're wrong.)

But, as you say (and this is the main point), if you don't like the default scheme..... use a different scheme. Make it permanent on your machine. Nobody from StataCorp will email you asking why you've decided to forgo or besmirch the default scheme. Reviewers won't desk reject your paper for using the economist scheme. Your professor won't fail you for finding a useful scheme. That's the grand thing here, you know, we have variety to choose from should one not suit us well.

Also, in the very first place, even my default schemes (my default schemes), I always, always edit my graphs. I'll get rid of the legend and use in-plot text, colored with the same color that data points represent. I'll make the markers just the right size. I'll look up what colors contrast best with the background, and so on. So to me, even once you've found a scheme you like, your job (if you're doing real professional graphical analysis) isn't close to being over.

On a related point....... This is what always annoys me about people saying R is better than Stata. Yes it's true... R is free. R has more statistical techniques to play with (especially if you're like me, a causal inference researcher). R has more machine learning and other web-scraping properties (unless you're savvy enough like I am to use Python and Stata in one do-file). I imagine most people find this uncontroversial.

But too often, people say R has better graphics than Stata. By what metric? Maps? Okay, R beats Stata, but frankly not by much. Yeah, R has certain, specific plots that Stata lacks (though this is likely totally inconsequential in most cases). Everything else? I just don't see it. Ggplot2 is marketed as a selling point, but most of those graphs can be replicated exactly in Stata with only a little work. To me, the folks who make these arguments (really points, not arguments) are the same people who haven't sat down with the two way options (and the options to those options), looked up color customization, read Edward Tufte (or whoever else you like), or devoted much time to learning the software. I began using Stata at 18. I'm 24 now. I suspect that even if I started using R at that age, I'd STILL consistently customize my graphics from whatever my default settings are. So, the points some people make about "Stata's default graphics aren't good" always rings hollow to me.
3 likes
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 926
#143

13 Oct 2021, 22:37

I besmirched the default scheme? Jesus! I never said that there will be no need to tweak ad hoc if we had a desirable scheme. And I never said that R's graphic system is better than Stata. So your discussion here is nonsense.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#144

14 Oct 2021, 02:46

Chen Samulsion I don't think that is a helpful tone to adopt.
1 like
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 926
#145

14 Oct 2021, 05:10

Sorry for displeasing someone. I know this is a forum for discussing Stata other than a platform to quarrel. I think Stata and its users ought to welcome advices and even critics, rather than rejecting them. When we give advices to Stata, it's not means we dislike her, no, we love her. I was appealed to Stata when my tutor introduced Stata to us, I was attached to Stata when I read and study various versions of Introduction to Stata and advanced textbooks that related, and I was addicted to Stata when I use her for data management and academic work. Stata achieve numerous success because she is open, humble, and make progress and perfection in her every released version. I cannot imagine what the world will be if Stata made a halt by version 6 or version 7 https://www.stata.com/support/faqs/resources/history-of-stata https://www.stata.com/stata-news/.

Nobody from StataCorp will email you asking why you've decided to forgo or besmirch the default scheme.

Do anyone who use Stata have such worry? I think it just make a scarecrow and attack it, and thus introducing a large element of pseudo-debate into the occasion. Those who are over-confident or over-diffident about Stata can read an analysis posted here: http://r4stats.com/articles/popularity/.

Last edited by Chen Samulsion; 14 Oct 2021, 05:14.
1 like
Comment
Hua Peng (StataCorp)

StataCorp Employee

Join Date: Jun 2014

Posts: 346
#146

14 Oct 2021, 21:20

Usually the inaction is not because we do not listen. For issues like graphics or any other Stata commands, no action's being taken does no mean that we do not take users comments and suggestions seriously and haven't debated intensively in house.
5 likes
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 926
#147

15 Oct 2021, 01:19

Thank you very much Hua Peng (StataCorp), I know that Stata will, and I even can imagine the scene occured everday in house that you and your stuff debated about how to make Stata better. When we give our advices, we consciously know that these advices are only on behalf of (perhaps a small) part of Stata users. These advices maybe right or maybe wrong, or maybe wholly useless. But nobody have right to irony and insult them.

Last edited by Chen Samulsion; 15 Oct 2021, 01:21.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#148

15 Oct 2021, 04:00

Chen Samulsion

I am a little lost here. I take it that we don't need to go further back than #136 to try to understand a discussion

Not to prolong this unnecessarily, but who is being insulted? In #143 you said

So your discussion here is nonsense

and the context is there for anyone to see. I think that's the strongest statement since #136 -- except for statements in abstraction and generality directed at ideas and assertions that can be found outside Statalist.

I have no idea precisely who or what you're referring to there. My biggest objection to that statement is that it doesn't push anything forward.

I am as capable of (indeed, inclined to) vigorous style when I think an idea is wrong, exaggerated, misguided, whatever, and should be called out too for any such statements I make myself or if my tone goes too far. But I was not intending to insult you or anybody else Jared Greathouse can speak for himself but I am clear that his contribution is in similar spirit.

I have an unfair advantage in that I know Svend Juul personally and was there when he gave a presentation in Berlin and again when he repeated the presentation in College Station. So the context of the unsourced quotation in #140 is thoroughly familiar to me, and it's clear that Svend was exaggerating a bit, partly for comic effect and partly for emphasis.

As for irony, it is a thing we have, and it's not out of order here. It can be misunderstood, which is different.
1 like
Comment
Chen Samulsion

Join Date: Jan 2018

Posts: 926
#149

15 Oct 2021, 06:28

Dear Nick Cox, I am sorry for mixing something up here. I have the greatest respect for you and Svend Juul, and all my comments since #136 are NOT against you. And I am a bit familiar with your style in this Statalist forum, so I really won't be upset by your replies or comments or warnings even they sound a little of ironical.
What I objected is the comments posted by Jared Greathouse in #142 when he said that

Nobody from StataCorp will email you asking why you've decided to forgo or besmirch the default scheme.

and subsequently

This is what always annoys me about people saying R is better than Stata

They sound very ironical to me. So I said Do anyone who use Stata have such worry? And why someone could conclude that we love R more than Stata when we said Stata perhaps need to improve itself. When we suggest Stata to improve or modify something, we do not mean that other statistical packages are unexceptionable and are better than Stata.
When you remind me in #144 that

I don't think that is a helpful tone to adopt.

I immediately realize that I should not damage free and moderate atmosphere of this forum. But I have to explain what I really think when I use the word "nonsense". I think Jared made ironical remarks that really offensive. Maybe his comments is in similar spirit as yours in #137, but I doubt whether it is.
At all events, I apologize for bringing about all of this.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35726
#150

15 Oct 2021, 07:02

Chen Samulsion Thanks for #149, which clarifies what you're objecting to.

I found the post by Jared Greathouse vigorously expressed and well argued and for that matter I think agree with his stance, even if I wouldn't choose the same wording always.. I am not active in standard social media (Twitter etc.) but I browse a lot and in such places and also on sites where I am active outside Statalist I see many comments about other software and/or Stata that are (a) wrong or exaggerated about Stata (b) a little uncritical of people's alternative software. So, what else is new? I regard Jared as a kindred spirit in making an detailed argument with good points.

You should blame me too as I digressed in referring to discussions outside Statalist and I think Jared was riffing on those and a bundle of related matters.

Every member of the community has a right to disagree with what is being said or how it is being said. Statalist is not quite a flat democracy -- if someone joins and thinks it's a good place to outsource their homework, that's unlikely to go far -- but there is in general no hierarchy or pecking order.
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment