Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scatterplot (or similar) for contrasting 2 and 2 variables

    It may not seem so, but I have tried to find a solution...

    I want a graph showing means (averages) and confidence intervals for a set of variables.

    So far, so good. The problem is that I want to do two such plots in the same graph, with several sets of two related variables. As an illustration:

    Below there are four variables, but these are 2 sets of 2 related variables (e.g., status of boys vs. status of girls). I am not sure about the best way forward to develop such a graph (with many more variables than the 2X2 used as an illustration here).

    HTML Code:
        ---                              ---     
         |         ---         ---        |      
         |          |           |         |        
         |          |           |         |        
         |          |           |         |        
         X          |           |         O        
         |          O           X         |        
         |          |           |         |        
         |          |           |         |        
         |          |           |         |        
        ---         |           |        ---
                   ---         ---       
    
       Girls'     Boys'       Girls'    Boys'   
       status     status      power     power    
    The "graph" above would show four variables, which I need to put in a specific order within the graph, and then let the reader easily compare "status" of girls and boys (again, these variable names are just illustrations).
    In addition to -scatter- I was considering -coefplot-, but have not been able to develop a good solution.

  • #2
    coefplot is from the Stata Journal. Please remember to explain the locations of user-written programs you cite (FAQ Advice #12).

    I've played with various ways of doing things like this (e.g. ciplot on SSC) but I tend to incline to using statsby as an engine. See e.g.
    http://www.stata-journal.com/sjpdf.h...iclenum=gr0045 for an exposition.

    Here's a token example. Lots of details can be changed: that's the main point.

    Code:
    set scheme s1mono 
    
    sysuse auto, clear
    local lab1 : var label trunk 
    statsby mean=r(mean) ub=r(ub) lb=r(lb), by(foreign) : ci means trunk 
    save ciresults, replace 
    
    sysuse auto, clear 
    local lab2 : var label turn 
    statsby mean=r(mean) ub=r(ub) lb=r(lb), by(foreign) : ci means turn 
    
    append using ciresults 
    gen which = ceil(_n/2) 
    gen x = cond(mod(_n, 2), 1, 2)   
    
    label define which 1 "`lab1'" 2 "`lab2'" 
    label val which which 
    
    twoway scatter mean x || rcap ub lb x, by(which, note("") legend(off)) ///
    xla(1 "Domestic" 2 "Foreign", tlength(0)) xtitle("") xsc(r(0.5 2.5))
    Click image for larger version

Name:	ciusingstatsby.png
Views:	1
Size:	25.8 KB
ID:	1353400




    .

    Comment


    • #3
      Thanks for both recommendations (also concerning the FAQ)! Okay, so that would be the best starting point. (Girls' status and Boys' status are two different variables, I assume the example shows trunk space is one variable, shown for two different groups.)

      Comment


      • #4
        My example shows two sets (domestic, foreign) for two variables (trunk space, turn circle), not one. If you have trouble adapting to your case, you will need to tell us more about your data.

        Comment


        • #5
          PS. The solution suggested is the one I started with, but it still seems to me that it requires me to develop a new data file, since "boys' status" and "girls' status" are two variables within each of several clusters.

          The solution below seems to be easier, if I was able to develop a code for using two different variable sets: Run a simple descriptive analysis to get the means and CIs, use a text editor to clean up the output and then enter a code similar to this:

          Code:
          clear
          input str2 varname mean upper lower
          x1 30 25  35
          x2 50 20  80
          x3 60 50  70
          x4 60 55  65
          x5 65 55  75
          end
          
          encode varname, gen(varname1)
          scatter mean varname1, xlabel(, valuelabel) || rcap upper lower varname1
          Source: http://stats.stackexchange.com/quest...th-mean-and-ci

          But I haven't been able to adapt this method to my need for two sets of variables (see first post).

          Comment


          • #6
            Nick, I didn't see your answer while writing.
            The data file would look similar to this:

            Variables: GirlsStatus BoysStatus GirlsPower BoysPower
            id1
            id2

            Ideally, I would want means and CIs for each variable, but organised in such a manner that GirlsStatus and BoysStatus are close to each other, and GirlsPower and BoysPower are close to each other, all in one graph.

            Comment


            • #7
              I am now confused on whether you already have the confidence limits. I didn't get that impression from #1, but my post assumes that you need to calculate them.

              I don't understand whether you're asking a new question, but I (believe I) have offered a code solution that is complete and works.

              If you can find a simpler one, that's fine by me.

              Note that to plot the confidence intervals you do need to tell twoway what they are, either as data or as scalars or macros you provide.

              Girls and boys next to each other: OK, that's like domestic and foreign.

              I can't see what you are seeing as difficult here.
              Last edited by Nick Cox; 16 Aug 2016, 12:00.

              Comment


              • #8
                I'm trying to find a convenient way to develop a graph, and I cannot find a solution online or in Mitchell's Guide to Stata Graphics. I can use the data to develop a graph in the standard manner (as indicated in my first post), or get Stata to give me the estimates and then develop a graph (as indicated as an alternative solution for the same problem in post #5; so these are not meant as two different questions).
                In your example (Trunk Space, Turn Circle) you have two variables (and two groups). I have four (or many) variables, no grouping to report.

                I would ideally want a graph that shows all variables (four in my example) in one, single graph, as I tried to describe in post #1.
                Moreover, I would ideally want those variables "paired" graphically, as indicated below.

                Click image for larger version

Name:	Screenshot 2016-08-16 20.16.30.png
Views:	1
Size:	19.4 KB
ID:	1353415

                Comment


                • #9
                  To me in #1 the implication was of 3 variables, one a grouping variable.

                  On Statalist, a variable usually means a Stata variable. Girls and boys would be normally be distinct values of an indicator variable, just like domestic and foreign in the auto dataset.

                  If you really have four variables, then indeed you need different code.

                  ciplot (SSC) remains a candidate.

                  A clear real or realistic data example in #1 would have made this thread go faster. We do ask that you give one.

                  Comment


                  • #10
                    I give up. "Below there are four variables"
                    Yes, there were four variables, and four Stata variables.

                    Comment


                    • #11
                      Thanks for your time and effort. I appreciate the good intention.

                      Comment


                      • #12
                        Sorry for any misunderstanding on my part, but if #2 was a misreading of your data structure then that should have been clear as soon as you read it.

                        No need to "give up", as I have made a positive suggestion of ciplot.

                        Independently of all that, a data structure

                        observation 1: girl#1 boy#1
                        observation 2: girl#2 boy#2
                        ...

                        is inappropriate for most analyses unless the observational or experimental design pairs girls and boys. I think that was definitely a reason why I didn't take you literally: I found the implied data structure implausible.

                        That, however, won't bite for plotting confidence intervals of variables taken separately.
                        Last edited by Nick Cox; 16 Aug 2016, 12:59.

                        Comment


                        • #13
                          For list members interested in the question raised in this thread:

                          The solution I ended up with, is to use Stata’s -mean- and add level(95) as option after the comma,
                          then let -coefplot- draw a plot using the many means and 95% CIs.

                          For the package -coefplot- and the many options within this package, please see:
                          ftp://ftp.repec.org/opt/ReDIF/RePEc/...3-coefplot.pdf

                          Comment

                          Working...
                          X