Scatterplot (or similar) for contrasting 2 and 2 variables

Guest
#1

Scatterplot (or similar) for contrasting 2 and 2 variables

16 Aug 2016, 09:46

It may not seem so, but I have tried to find a solution...

I want a graph showing means (averages) and confidence intervals for a set of variables.

So far, so good. The problem is that I want to do two such plots in the same graph, with several sets of two related variables. As an illustration:

Below there are four variables, but these are 2 sets of 2 related variables (e.g., status of boys vs. status of girls). I am not sure about the best way forward to develop such a graph (with many more variables than the 2X2 used as an illustration here).

HTML Code:

--- --- | --- --- | | | | | | | | | | | | | X | | O | O X | | | | | | | | | | | | | --- | | --- --- --- Girls' Boys' Girls' Boys' status status power power

The "graph" above would show four variables, which I need to put in a specific order within the graph, and then let the reader easily compare "status" of girls and boys (again, these variable names are just illustrations).
In addition to -scatter- I was considering -coefplot-, but have not been able to develop a good solution.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35721

16 Aug 2016, 10:47

coefplot is from the Stata Journal. Please remember to explain the locations of user-written programs you cite (FAQ Advice #12).

I've played with various ways of doing things like this (e.g. ciplot on SSC) but I tend to incline to using statsby as an engine. See e.g.
http://www.stata-journal.com/sjpdf.h...iclenum=gr0045 for an exposition.

Here's a token example. Lots of details can be changed: that's the main point.

Code:

set scheme s1mono 

sysuse auto, clear
local lab1 : var label trunk 
statsby mean=r(mean) ub=r(ub) lb=r(lb), by(foreign) : ci means trunk 
save ciresults, replace 

sysuse auto, clear 
local lab2 : var label turn 
statsby mean=r(mean) ub=r(ub) lb=r(lb), by(foreign) : ci means turn 

append using ciresults 
gen which = ceil(_n/2) 
gen x = cond(mod(_n, 2), 1, 2)   

label define which 1 "`lab1'" 2 "`lab2'" 
label val which which 

twoway scatter mean x || rcap ub lb x, by(which, note("") legend(off)) ///
xla(1 "Domestic" 2 "Foreign", tlength(0)) xtitle("") xsc(r(0.5 2.5))

Click image for larger version

Name: ciusingstatsby.png
Views: 1
Size: 25.8 KB
ID: 1353400

Comment

Guest
#3

16 Aug 2016, 11:29

Thanks for both recommendations (also concerning the FAQ)! Okay, so that would be the best starting point. (Girls' status and Boys' status are two different variables, I assume the example shows trunk space is one variable, shown for two different groups.)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35721
#4

16 Aug 2016, 11:32

My example shows two sets (domestic, foreign) for two variables (trunk space, turn circle), not one. If you have trouble adapting to your case, you will need to tell us more about your data.
Comment
Guest
#5

16 Aug 2016, 11:47

PS. The solution suggested is the one I started with, but it still seems to me that it requires me to develop a new data file, since "boys' status" and "girls' status" are two variables within each of several clusters.

The solution below seems to be easier, if I was able to develop a code for using two different variable sets: Run a simple descriptive analysis to get the means and CIs, use a text editor to clean up the output and then enter a code similar to this:

Code:

clear input str2 varname mean upper lower x1 30 25 35 x2 50 20 80 x3 60 50 70 x4 60 55 65 x5 65 55 75 end encode varname, gen(varname1) scatter mean varname1, xlabel(, valuelabel) || rcap upper lower varname1

Source: http://stats.stackexchange.com/quest...th-mean-and-ci

But I haven't been able to adapt this method to my need for two sets of variables (see first post).
Comment
Guest
#6

16 Aug 2016, 11:52

Nick, I didn't see your answer while writing.
The data file would look similar to this:

Variables: GirlsStatus BoysStatus GirlsPower BoysPower
id1
id2

Ideally, I would want means and CIs for each variable, but organised in such a manner that GirlsStatus and BoysStatus are close to each other, and GirlsPower and BoysPower are close to each other, all in one graph.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35721
#7

16 Aug 2016, 11:57

I am now confused on whether you already have the confidence limits. I didn't get that impression from #1, but my post assumes that you need to calculate them.

I don't understand whether you're asking a new question, but I (believe I) have offered a code solution that is complete and works.

If you can find a simpler one, that's fine by me.

Note that to plot the confidence intervals you do need to tell twoway what they are, either as data or as scalars or macros you provide.

Girls and boys next to each other: OK, that's like domestic and foreign.

I can't see what you are seeing as difficult here.

Last edited by Nick Cox; 16 Aug 2016, 12:00.
Comment
Guest
#8

16 Aug 2016, 12:24

I'm trying to find a convenient way to develop a graph, and I cannot find a solution online or in Mitchell's Guide to Stata Graphics. I can use the data to develop a graph in the standard manner (as indicated in my first post), or get Stata to give me the estimates and then develop a graph (as indicated as an alternative solution for the same problem in post #5; so these are not meant as two different questions).
In your example (Trunk Space, Turn Circle) you have two variables (and two groups). I have four (or many) variables, no grouping to report.

I would ideally want a graph that shows all variables (four in my example) in one, single graph, as I tried to describe in post #1.
Moreover, I would ideally want those variables "paired" graphically, as indicated below.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35721
#9

16 Aug 2016, 12:45

To me in #1 the implication was of 3 variables, one a grouping variable.

On Statalist, a variable usually means a Stata variable. Girls and boys would be normally be distinct values of an indicator variable, just like domestic and foreign in the auto dataset.

If you really have four variables, then indeed you need different code.

ciplot (SSC) remains a candidate.

A clear real or realistic data example in #1 would have made this thread go faster. We do ask that you give one.
Comment
Guest
#10

16 Aug 2016, 12:47

I give up. "Below there are four variables"
Yes, there were four variables, and four Stata variables.
Comment
Guest
#11

16 Aug 2016, 12:47

Thanks for your time and effort. I appreciate the good intention.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35721
#12

16 Aug 2016, 12:52

Sorry for any misunderstanding on my part, but if #2 was a misreading of your data structure then that should have been clear as soon as you read it.

No need to "give up", as I have made a positive suggestion of ciplot.

Independently of all that, a data structure

observation 1: girl#1 boy#1
observation 2: girl#2 boy#2
...

is inappropriate for most analyses unless the observational or experimental design pairs girls and boys. I think that was definitely a reason why I didn't take you literally: I found the implied data structure implausible.

That, however, won't bite for plotting confidence intervals of variables taken separately.

Last edited by Nick Cox; 16 Aug 2016, 12:59.
Comment
Guest
#13

17 Aug 2016, 08:57

For list members interested in the question raised in this thread:

The solution I ended up with, is to use Stata’s -mean- and add level(95) as option after the comma,
then let -coefplot- draw a plot using the many means and 95% CIs.

For the package -coefplot- and the many options within this package, please see:
ftp://ftp.repec.org/opt/ReDIF/RePEc/...3-coefplot.pdf
Comment

Announcement

Scatterplot (or similar) for contrasting 2 and 2 variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment