Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to make aTukey mean-difference plot

    Dear Statalisters

    here is a do-file relevant to my problem

    //test of Stata tip 47: Quantile-quantile plots without programming
    version 15.1
    //checked and revised 02-10-2023 by Søren Nielsen
    //Purpose - to create a qqplot as in Cleveland: Visualizing Data (1994) fig 2.3 p 22

    . use http://www.stata-press.com/data/r9/auto, replace
    . gen gpm = 100 / mpg
    . label var gpm "gallons / 100 miles"
    by foreign, sort: egen rank = rank(gpm)
    by foreign: egen n = count(gpm)
    gen pp = (rank - 0.5) / n



    . gen gpmd=gpm if !foreign
    . gen gpmf=gpm if foreign
    . qqplot gpmd gpmf
    //creates qqplot as planned - the plot seem to indicate a fairly straightforward relationship - domestic cars on average use 1 gallon more per 100 miles than foreign cars. I am unable to get to the next step - to create a figure like fig 2.4 in Cleveland(1993)
    end of do-file

    Any help will be greatly appreciated

    Søren Nielsen

  • #2
    You're lucky that a copy of Cleveland's 1993 book sits 1 m from my desk at home and that I have the dataset in question already typed in. It was used in https://journals.sagepub.com/doi/pdf...867X0700700308

    To compare the quantiles of the first tenors and the second basses, you note that there are fewer tenors, so their (sorted) values define their quantiles for a plot and the corresponding quantiles for the basses must be interpolated within the larger subset. For that cquantile, lurking on SSC since 2005, is one helper command.

    This is almost the same graph as mentioned.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(height spart) str7 part byte group
    69 3 "Tenor" 1
    72 3 "Tenor" 1
    71 3 "Tenor" 1
    66 3 "Tenor" 1
    76 3 "Tenor" 1
    74 3 "Tenor" 1
    71 3 "Tenor" 1
    66 3 "Tenor" 1
    68 3 "Tenor" 1
    67 3 "Tenor" 1
    70 3 "Tenor" 1
    65 3 "Tenor" 1
    72 3 "Tenor" 1
    70 3 "Tenor" 1
    68 3 "Tenor" 1
    64 3 "Tenor" 1
    73 3 "Tenor" 1
    66 3 "Tenor" 1
    68 3 "Tenor" 1
    67 3 "Tenor" 1
    64 3 "Tenor" 1
    72 4 "Bass"  2
    75 4 "Bass"  2
    67 4 "Bass"  2
    75 4 "Bass"  2
    74 4 "Bass"  2
    72 4 "Bass"  2
    72 4 "Bass"  2
    74 4 "Bass"  2
    72 4 "Bass"  2
    72 4 "Bass"  2
    74 4 "Bass"  2
    70 4 "Bass"  2
    66 4 "Bass"  2
    68 4 "Bass"  2
    75 4 "Bass"  2
    68 4 "Bass"  2
    70 4 "Bass"  2
    72 4 "Bass"  2
    67 4 "Bass"  2
    70 4 "Bass"  2
    70 4 "Bass"  2
    69 4 "Bass"  2
    72 4 "Bass"  2
    71 4 "Bass"  2
    74 4 "Bass"  2
    75 4 "Bass"  2
    end
    label values spart part
    label def part 3 "Tenor", modify
    label def part 4 "Bass", modify
    
    cquantile height , by(part) gen(h_tenor h_bass)
    
    gen diff = h_tenor - h_bass
    label var diff "tenor height {&minus} bass height (inches)"
    gen mean = (h_tenor + h_bass) / 2
    label var mean "(tenor height + bass height)/2 (inches)"
    
    scatter diff mean, yli(0) xla(66(3)75) subtitle(corresponding quantiles) note(72 inches = 6 feet)
    Click image for larger version

Name:	tenorbass.png
Views:	1
Size:	29.2 KB
ID:	1728821

    Comment


    • #3
      Dear Soren,

      You refer to this Figure (?):
      Click image for larger version

Name:	CRC 1998 JM Chambers WS Cleveland B Kleiner PA Tukey Graphical methods for data analysis Fig 2.4.jpg
Views:	1
Size:	68.9 KB
ID:	1728824
      http://publicationslist.org/eric.melse

      Comment


      • #4

        ericmelse

        No; that is from Chambers, J.M, Cleveland, W.S., Kleiner, B. and Tukey, P.A. 1983. Graphical Methods for Data Analysis. Belmont, CA: Wadsworth

        variously reprinted, and still in print, and examples a lot cleaner and crisper than typical R output today!

        Comment


        • #5
          Right, Nick, my apologies about that. So, here is the correct Figure 2.4
          Click image for larger version

Name:	Cleveland 1993 Figure 2.4.jpg
Views:	1
Size:	53.1 KB
ID:	1728880
          :
          http://publicationslist.org/eric.melse

          Comment


          • #6
            It is a great thing to be lucky. Thanks a lot to both respondents. Cleveland has been an important inspiration ever since the appearance of his 'The Elements of Graphing Data'.
            Sincerely Søren Nielsen

            Comment


            • #7
              Absolutely. If people want recommendations for the best books on statistical graphics (visualization) my ranking would still be


              W.S. Cleveland, The Elements of Graphing Data
              W.S. Cleveland, Visualizing Data
              J.M. Chambers and friends, Graphical Methods for Data Analysis
              E,R, Tufte, The Visual Display of Quantitative Information
              C. O. Wilke, Fundamentals of Data Visualization
              J.W. Tukey, Exploratory Data Analysis

              Graphs from those top authors -- Cleveland, Chambers and others -- at the time all working at Bell Labs -- are like Mozart's music, smart, simple, stylish and subtle where needed

              Too much in contemporary graphics looks as if it was designed by a heavy metal fan who turns up the volume to 11: stark in your face colours, over-sized blobs as point symbols, coarse-binned histograms, and clunky over-prominent grids.

              Comment


              • #8
                Maybe I can add a more recent visual tool - ineqord - dveloped and described by Stephen P. Jenkins in the Stata Journal Comparing distributions of ordinal data DOI: 10.1177/1536867X20953565. I have used this in a recent open source publication Nielsen & Vilmar (2023) Educational attainment in eating disorders: What can we learn from visualising data? DOI: 10.1002/erv3015

                Comment


                • #9
                  See also https://www.statalist.org/forums/for...lable-from-ssc

                  Comment

                  Working...
                  X