how to make aTukey mean-difference plot

Soren Nielsen

Join Date: Oct 2018

Posts: 24
#1

how to make aTukey mean-difference plot

02 Oct 2023, 09:51

Dear Statalisters

here is a do-file relevant to my problem

//test of Stata tip 47: Quantile-quantile plots without programming
version 15.1
//checked and revised 02-10-2023 by Søren Nielsen
//Purpose - to create a qqplot as in Cleveland: Visualizing Data (1994) fig 2.3 p 22

. use http://www.stata-press.com/data/r9/auto, replace
. gen gpm = 100 / mpg
. label var gpm "gallons / 100 miles"
by foreign, sort: egen rank = rank(gpm)
by foreign: egen n = count(gpm)
gen pp = (rank - 0.5) / n

. gen gpmd=gpm if !foreign
. gen gpmf=gpm if foreign
. qqplot gpmd gpmf
//creates qqplot as planned - the plot seem to indicate a fairly straightforward relationship - domestic cars on average use 1 gallon more per 100 miles than foreign cars. I am unable to get to the next step - to create a figure like fig 2.4 in Cleveland(1993)
end of do-file

Any help will be greatly appreciated

Søren Nielsen
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35774

02 Oct 2023, 10:50

You're lucky that a copy of Cleveland's 1993 book sits 1 m from my desk at home and that I have the dataset in question already typed in. It was used in https://journals.sagepub.com/doi/pdf...867X0700700308

To compare the quantiles of the first tenors and the second basses, you note that there are fewer tenors, so their (sorted) values define their quantiles for a plot and the corresponding quantiles for the basses must be interpolated within the larger subset. For that cquantile, lurking on SSC since 2005, is one helper command.

This is almost the same graph as mentioned.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(height spart) str7 part byte group
69 3 "Tenor" 1
72 3 "Tenor" 1
71 3 "Tenor" 1
66 3 "Tenor" 1
76 3 "Tenor" 1
74 3 "Tenor" 1
71 3 "Tenor" 1
66 3 "Tenor" 1
68 3 "Tenor" 1
67 3 "Tenor" 1
70 3 "Tenor" 1
65 3 "Tenor" 1
72 3 "Tenor" 1
70 3 "Tenor" 1
68 3 "Tenor" 1
64 3 "Tenor" 1
73 3 "Tenor" 1
66 3 "Tenor" 1
68 3 "Tenor" 1
67 3 "Tenor" 1
64 3 "Tenor" 1
72 4 "Bass"  2
75 4 "Bass"  2
67 4 "Bass"  2
75 4 "Bass"  2
74 4 "Bass"  2
72 4 "Bass"  2
72 4 "Bass"  2
74 4 "Bass"  2
72 4 "Bass"  2
72 4 "Bass"  2
74 4 "Bass"  2
70 4 "Bass"  2
66 4 "Bass"  2
68 4 "Bass"  2
75 4 "Bass"  2
68 4 "Bass"  2
70 4 "Bass"  2
72 4 "Bass"  2
67 4 "Bass"  2
70 4 "Bass"  2
70 4 "Bass"  2
69 4 "Bass"  2
72 4 "Bass"  2
71 4 "Bass"  2
74 4 "Bass"  2
75 4 "Bass"  2
end
label values spart part
label def part 3 "Tenor", modify
label def part 4 "Bass", modify

cquantile height , by(part) gen(h_tenor h_bass)

gen diff = h_tenor - h_bass
label var diff "tenor height {&minus} bass height (inches)"
gen mean = (h_tenor + h_bass) / 2
label var mean "(tenor height + bass height)/2 (inches)"

scatter diff mean, yli(0) xla(66(3)75) subtitle(corresponding quantiles) note(72 inches = 6 feet)

Click image for larger version

Name: tenorbass.png
Views: 1
Size: 29.2 KB
ID: 1728821

Comment

ericmelse

Join Date: May 2014

Posts: 436
#3

02 Oct 2023, 10:54

Dear Soren,

You refer to this Figure (?):

http://publicationslist.org/eric.melse
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35774
#4

02 Oct 2023, 11:06

ericmelse

No; that is from Chambers, J.M, Cleveland, W.S., Kleiner, B. and Tukey, P.A. 1983. Graphical Methods for Data Analysis. Belmont, CA: Wadsworth

variously reprinted, and still in print, and examples a lot cleaner and crisper than typical R output today!
Comment
ericmelse

Join Date: May 2014

Posts: 436
#5

02 Oct 2023, 21:31

Right, Nick, my apologies about that. So, here is the correct Figure 2.4
:

http://publicationslist.org/eric.melse
Comment
Soren Nielsen

Join Date: Oct 2018

Posts: 24
#6

03 Oct 2023, 01:22

It is a great thing to be lucky. Thanks a lot to both respondents. Cleveland has been an important inspiration ever since the appearance of his 'The Elements of Graphing Data'.
Sincerely Søren Nielsen
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35774
#7

03 Oct 2023, 05:57

Absolutely. If people want recommendations for the best books on statistical graphics (visualization) my ranking would still be

W.S. Cleveland, The Elements of Graphing Data
W.S. Cleveland, Visualizing Data
J.M. Chambers and friends, Graphical Methods for Data Analysis
E,R, Tufte, The Visual Display of Quantitative Information
C. O. Wilke, Fundamentals of Data Visualization
J.W. Tukey, Exploratory Data Analysis

Graphs from those top authors -- Cleveland, Chambers and others -- at the time all working at Bell Labs -- are like Mozart's music, smart, simple, stylish and subtle where needed

Too much in contemporary graphics looks as if it was designed by a heavy metal fan who turns up the volume to 11: stark in your face colours, over-sized blobs as point symbols, coarse-binned histograms, and clunky over-prominent grids.
1 like
Comment
Soren Nielsen

Join Date: Oct 2018

Posts: 24
#8

03 Oct 2023, 07:22

Maybe I can add a more recent visual tool - ineqord - dveloped and described by Stephen P. Jenkins in the Stata Journal Comparing distributions of ordinal data DOI: 10.1177/1536867X20953565. I have used this in a recent open source publication Nielsen & Vilmar (2023) Educational attainment in eating disorders: What can we learn from visualising data? DOI: 10.1002/erv3015
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35774
#9

08 Oct 2023, 11:12

See also https://www.statalist.org/forums/for...lable-from-ssc
Comment

Announcement