Horizontal boxplot with specific overlays

Nathan Ross

Join Date: Oct 2015

Posts: 3
#1

Horizontal boxplot with specific overlays

09 Oct 2015, 16:34

I am new to this forum and fairly new to stata so I have a question:

I want to produce a hbox for a cohort of student MCQ exam results with a specific student score overlaid onto the graph.
I can produce the box plot with the usual features and display the whole cohorts range and median etc. but I want each student to see where they sit in the range.
Even better (if possible) I would like to show an individual students quartile overlaid on the cohort range.

I have 130 students and 100 MCQs (100 stems, 5 distractors). The data is numerical and I have 'egen'd' the necessary features to align the marking key etc.

Any help would be fantastic!

Nathan
Tags: None

Scott Merryman

Join Date: Mar 2014
Posts: 895

09 Oct 2015, 17:10

Take a look at Nick Cox's Stata Journal article (9:3) Speaking Stata: Creating and varying box plots :

http://www.stata-journal.com/sjpdf.h...iclenum=gr0039

Here is an example taken from the article:

Code:

sysuse lifeexp,clear
egen median = median(lexp), by(region)
egen upq = pctile(lexp), p(75) by(region)
egen loq = pctile(lexp), p(25) by(region)

egen iqr = iqr(lexp), by(region)
egen upper = max(min(lexp, upq + 1.5 * iqr)), by(region)
egen lower = min(max(lexp, loq - 1.5 * iqr)), by(region)

twoway rbar med upq region, horiz pstyle(p1) blc(gs15) bfc(gs8) barw(0.35) /// 
    ||  rbar med loq region,  horiz pstyle(p1) blc(gs15) bfc(gs8) barw(0.35)  /// 
    ||  rspike upq upper region, horiz pstyle(p1) /// 
    ||  rspike loq lower region, horiz pstyle(p1) /// 
    ||  rcap upper upper region, horiz pstyle(p1) msize(*2) /// 
    ||  rcap lower lower region, horiz pstyle(p1) msize(*2) /// 
    ||  scatter region lexp  if !inrange(lexp, lower, upper), /// 
    ms(Oh) mla(country)  legend(off) mlabpos(12) mlabgap(1.5) /// 
    xsc(r(53, .))  yla(1 `" "Europe and" "Central Asia" "'  /// 
    2 "North America"  3 "South America", noticks)  /// 
    yla(, ang(h)) ytitle(Life expectancy (years)) xtitle("")  /// 
    ||  dot lexp region, ndot(0)  pstyle(p1)  hori ds(Oh) ms(Oh) mc(black)

Click image for larger version

Name: Graph.png
Views: 1
Size: 23.6 KB
ID: 1312600

Comment

Nathan Ross

Join Date: Oct 2015

Posts: 3
#3

11 Oct 2015, 23:52

Hi Scott,

Thanks for taking the time to respond. That may be a little over my head but I will certainly give it a shot.
Will also have a good read over Nick Cox's STATA journal

thanks again
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35696
#4

12 Oct 2015, 02:40

The 2009 paper cited by Scott (thanks for the publicity) should be read in conjunction with a detailed correction published at http://www.stata-journal.com/article...ticle=gr0039_1

The problem was in fact pointed out on Statalist: see thread starting http://www.stata.com/statalist/archi.../msg00906.html

That said, I wouldn't start from there in this case. It's a little unclear but I presume that using 100+ MCQs (multiple choice questions? one person's known abbreviation is another person's puzzling jargon) is not central here. Rather, the main focus is on students' overall scores and there is interest in generating personalized reports from which each student sees where they are in the distribution.

I would use stripplot (SSC) and show more detail. An analogue is a report on individual cars in the auto dataset. With mpg as with most grading conventions high is better than low.

There is enormous scope for variations in detail. Here I show hybrid quantile-box plots (search the forum for other examples if you wish).

I don't understand the reference to quartile: is this a reference to quarters of the distribution defined by quartiles or a typo for quantile or ...? Whatever is intended, the example below shows rank in distribution and percentile ranks are equally possible.

Code:

set scheme s1color sysuse auto, clear egen rank = rank(-mpg), track count if mpg < . local N = r(N) forval i = 1/5 { stripplot mpg, vertical box cumul centre subtitle("`= make[`i']'", place(w)) /// caption(score `=mpg[`i']' rank `=rank[`i']'/`N', color(blue)) yline(`=mpg[`i']', lc(blue)) yla(, ang(h)) aspect(1) more }

Last edited by Nick Cox; 12 Oct 2015, 03:02.
Comment
Nathan Ross

Join Date: Oct 2015

Posts: 3
#5

18 Oct 2015, 18:34

Hi there, sorry for the delayed response and thanks again for your time thus far.

Yes my reference to MCQ being - multiple choice question (exam). The reference to quartiles, again my possible misuse of terminology refers to our med students being grouped into post hoc quartiles of 1 - 5 based on their academic performance.

I have 100 questions - var1-100, 'renamed' q1 - q100, and 130 rows (student id). I have also generated new variables that identify correct responses, total score etc. and I have grouped various questions from the exam based on the question type: questions 1-10 are anatomy based, questions 11-20 are pathology based .... question 91-100 are pharmacology based.
So I now need to be able to generate horizontal box plots that show the average score for each student specific to each question type: So for the 10 questions on anatomy, the horizontal boxplot needs to show the overall details of the 130 students IQR, however importantly the average score for a given student overlaid on the graph so they know where they are based in this range.

This being achieved I would need to be able to provide this graph to all students as attached here diagrammatically:

Thanks again, if the advice already provided is still applicable I can stick to this.

Much appreciated

concept.pdf

Attached Files

concept.pdf (43.6 KB, 2 views)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35696
#6

19 Oct 2015, 06:15

Quartiles can't possibly be 1-5. Sounds as if you have 5 groups, which some people would call quintiles, or in a fussier terminology quintile-based classes. "Fifths" of the data, with an explanation, might be acceptable, just as "quarters" appeals to some for the classes defined by quartiles. These days it is stretching plausibility to suppose that everyone knows enough Latin to be comfortable with tertiles, quartiles, quintiles, sextiles, octiles, deciles, etc. to mention only some of the terms used in the past. Better, we don't need that many terms for variations on the same idea, so increasingly people use the general term quantile (although it's still likely that you need to explain it).

More crucially, on the box plots:

I've answered part of this already. To get individual graphs for each student, you need to loop over students and customise. graph box and graph hbox don't make it easy to add marker symbols.

The rest is similar. To get distinct graphs for each subject (anatomy etc.) you need to loop over subjects too, and the loops are nested.

Sounds like 1300 graphs to me!
Comment

Announcement

Horizontal boxplot with specific overlays

Comment

Comment

Comment

Comment

Comment