Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Horizontal boxplot with specific overlays

    I am new to this forum and fairly new to stata so I have a question:

    I want to produce a hbox for a cohort of student MCQ exam results with a specific student score overlaid onto the graph.
    I can produce the box plot with the usual features and display the whole cohorts range and median etc. but I want each student to see where they sit in the range.
    Even better (if possible) I would like to show an individual students quartile overlaid on the cohort range.

    I have 130 students and 100 MCQs (100 stems, 5 distractors). The data is numerical and I have 'egen'd' the necessary features to align the marking key etc.

    Any help would be fantastic!

    Nathan

  • #2
    Take a look at Nick Cox's Stata Journal article (9:3) Speaking Stata: Creating and varying box plots :

    http://www.stata-journal.com/sjpdf.h...iclenum=gr0039

    Here is an example taken from the article:
    Code:
    sysuse lifeexp,clear
    egen median = median(lexp), by(region)
    egen upq = pctile(lexp), p(75) by(region)
    egen loq = pctile(lexp), p(25) by(region)
    
    egen iqr = iqr(lexp), by(region)
    egen upper = max(min(lexp, upq + 1.5 * iqr)), by(region)
    egen lower = min(max(lexp, loq - 1.5 * iqr)), by(region)
    
    twoway rbar med upq region, horiz pstyle(p1) blc(gs15) bfc(gs8) barw(0.35) /// 
        ||  rbar med loq region,  horiz pstyle(p1) blc(gs15) bfc(gs8) barw(0.35)  /// 
        ||  rspike upq upper region, horiz pstyle(p1) /// 
        ||  rspike loq lower region, horiz pstyle(p1) /// 
        ||  rcap upper upper region, horiz pstyle(p1) msize(*2) /// 
        ||  rcap lower lower region, horiz pstyle(p1) msize(*2) /// 
        ||  scatter region lexp  if !inrange(lexp, lower, upper), /// 
        ms(Oh) mla(country)  legend(off) mlabpos(12) mlabgap(1.5) /// 
        xsc(r(53, .))  yla(1 `" "Europe and" "Central Asia" "'  /// 
        2 "North America"  3 "South America", noticks)  /// 
        yla(, ang(h)) ytitle(Life expectancy (years)) xtitle("")  /// 
        ||  dot lexp region, ndot(0)  pstyle(p1)  hori ds(Oh) ms(Oh) mc(black)
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	23.6 KB
ID:	1312600

    Comment


    • #3
      Hi Scott,

      Thanks for taking the time to respond. That may be a little over my head but I will certainly give it a shot.
      Will also have a good read over Nick Cox's STATA journal

      thanks again

      Comment


      • #4
        The 2009 paper cited by Scott (thanks for the publicity) should be read in conjunction with a detailed correction published at http://www.stata-journal.com/article...ticle=gr0039_1

        The problem was in fact pointed out on Statalist: see thread starting http://www.stata.com/statalist/archi.../msg00906.html

        That said, I wouldn't start from there in this case. It's a little unclear but I presume that using 100+ MCQs (multiple choice questions? one person's known abbreviation is another person's puzzling jargon) is not central here. Rather, the main focus is on students' overall scores and there is interest in generating personalized reports from which each student sees where they are in the distribution.

        I would use stripplot (SSC) and show more detail. An analogue is a report on individual cars in the auto dataset. With mpg as with most grading conventions high is better than low.

        There is enormous scope for variations in detail. Here I show hybrid quantile-box plots (search the forum for other examples if you wish).

        I don't understand the reference to quartile: is this a reference to quarters of the distribution defined by quartiles or a typo for quantile or ...? Whatever is intended, the example below shows rank in distribution and percentile ranks are equally possible.

        Code:
        set scheme s1color
        sysuse auto, clear
        egen rank = rank(-mpg), track
        count if mpg < .
        local N = r(N)
        
        forval i = 1/5 {
           stripplot mpg, vertical box cumul centre subtitle("`= make[`i']'", place(w)) ///
           caption(score `=mpg[`i']' rank `=rank[`i']'/`N', color(blue)) yline(`=mpg[`i']', lc(blue)) yla(, ang(h)) aspect(1)
           more
        }
        Click image for larger version

Name:	indivbox.png
Views:	1
Size:	9.4 KB
ID:	1312711

        Last edited by Nick Cox; 12 Oct 2015, 03:02.

        Comment


        • #5
          Hi there, sorry for the delayed response and thanks again for your time thus far.

          Yes my reference to MCQ being - multiple choice question (exam). The reference to quartiles, again my possible misuse of terminology refers to our med students being grouped into post hoc quartiles of 1 - 5 based on their academic performance.

          I have 100 questions - var1-100, 'renamed' q1 - q100, and 130 rows (student id). I have also generated new variables that identify correct responses, total score etc. and I have grouped various questions from the exam based on the question type: questions 1-10 are anatomy based, questions 11-20 are pathology based .... question 91-100 are pharmacology based.
          So I now need to be able to generate horizontal box plots that show the average score for each student specific to each question type: So for the 10 questions on anatomy, the horizontal boxplot needs to show the overall details of the 130 students IQR, however importantly the average score for a given student overlaid on the graph so they know where they are based in this range.

          This being achieved I would need to be able to provide this graph to all students as attached here diagrammatically:

          Thanks again, if the advice already provided is still applicable I can stick to this.

          Much appreciated

          concept.pdf
          Attached Files

          Comment


          • #6
            Quartiles can't possibly be 1-5. Sounds as if you have 5 groups, which some people would call quintiles, or in a fussier terminology quintile-based classes. "Fifths" of the data, with an explanation, might be acceptable, just as "quarters" appeals to some for the classes defined by quartiles. These days it is stretching plausibility to suppose that everyone knows enough Latin to be comfortable with tertiles, quartiles, quintiles, sextiles, octiles, deciles, etc. to mention only some of the terms used in the past. Better, we don't need that many terms for variations on the same idea, so increasingly people use the general term quantile (although it's still likely that you need to explain it).

            More crucially, on the box plots:

            I've answered part of this already. To get individual graphs for each student, you need to loop over students and customise. graph box and graph hbox don't make it easy to add marker symbols.

            The rest is similar. To get distinct graphs for each subject (anatomy etc.) you need to loop over subjects too, and the loops are nested.

            Sounds like 1300 graphs to me!

            Comment

            Working...
            X