Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • combine boxplots over more than one variable

    Code:
    . sysuse nlsw88.dta
    (NLSW, 1988 extract)
    
    . graph box tenure wage, over(south)
    gives the following output. It allows to compare the effect of "south" on job tenure and hourly wage.

    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	13.3 KB
ID:	1356697

    I want to achieve something different. I want to have such a box plot one variable, but over two different variabes, such as
    Code:
    . sysuse nlsw88.dta
    (NLSW, 1988 extract)
    
    . graph box wage, over(south, c_city)
    This code, if it didn't give an error, would allow to compare not the effect of one categorical variable onto two different continuous variables, but compare two categorical variables with respect to the effect they have on one continuous variable.

    In my own dataset, I want to do this for the following reason: I have many different variables (psycho_1, psycho_2, psycho_3, psycho_4, ...) that attempt to explain facets of an overlying "construct" (psychopathy), and I want to quantify this construct by combining these variables into a one-dimensional thing (psychopathy score).
    I would define the psychopathy score in slightly varying ways, such as
    Code:
    egen psychopathy_1 = psycho_1 + psycho_2 + psycho_3 + psycho_4
    egen psychopathy_2 = 2*psycho_1 + psycho_2 + 3*psycho_3 + psycho_4
    and then form categories as in
    Code:
    xtile psychquant_1 = psychopathy_1, nq(3)
    and then visualize robustness of these one-dimensional psychopathy constructs by showing that their effects on some other variable (in the above example, wage) are similar.

    The only way I can do that now would be
    Code:
    . graph box wage, over(psychquant_1)
    . graph box wage, over(psychquant_2)
    or, using the above example
    Code:
    . graph box wage, over(south)
    . graph box wage, over(c_city)
    but the comparison I want to stress would force the reader to compare the first box of the first graph to the first box in the second graph, and the second box from each graph, etc., which is not suitable, is it?
    Last edited by Max Piper; 15 Sep 2016, 02:59.

  • #2
    I got lost half-way through this, absent an example of, or like, your own data. But I think your other examples help. See

    SJ-14-4 gr0062 . . . . . . . . . . . . Stata tip 121: Box plots side by side
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
    Q4/14 SJ 14(4):991--996 (no commands)
    tip on how to and which data should be plotted side by
    side with box plots

    That requires payment or a subscription until 2017q4, so this technique may help.

    Code:
    sysuse nlsw88.dta, clear 
    graph box wage, over(south) name(g1, replace) l1title(south) 
    graph box wage, over(c_city) name(g2, replace) l1title(c_city) 
    graph combine g1 g2, name(G1) 
    
    stack wage south wage c_city, into(wage whatever) clear
    label define whatever 0 zero 1 one 
    label val whatever whatever 
    label define _stack 1 south 2 c_city 
    label val _stack _stack 
    
    graph box wage, over(_stack) by(whatever) name(G2)

    Comment

    Working...
    X