Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trying to re-create a scatterplot

    Hi all,
    I have two variables, HAL which has a mean = 1.51, SD =0.41. The other variable is PAL (mean =16.08, sd= 3.8). They are both continuous.
    I have run a correlation and obtained a Pearson's r = 0.17.

    I would also like to present two lines; one that is for males and one that shows the line of best fit for females.

    I am trying to re-create this exact type of chart, but I have no idea how they would have done it... (grey = female, black = male)


    Click image for larger version

Name:	Screen Shot 2018-07-01 at 2.26.49 AM.png
Views:	1
Size:	15.0 KB
ID:	1451366


    I have a scatterplot with overlaid pearson's r line which I generated with the command graph
    graph twoway (lfitci hal pal) (scatter hal pal)

    But my graphs don't look nearly like what I'd like to see... it looks like this:
    Click image for larger version

Name:	Screen Shot 2018-07-01 at 2.32.53 AM.png
Views:	1
Size:	89.8 KB
ID:	1451367



    Does anybody have any clue as to how to make it look more like the first graph (in particular what the code would be to create nice black boxes as seen in the first table, rather than just this scattered mess).

    Does it have anything to do with geometric means?

    Thanks for any and all perspective!!
    Al









  • #2
    Well, you should probably refer back to the source of the original graph to see if they explain their methods in depth. My guess is that the points plotted in the graph you want to emulate are not (hal, pal) data points but rather are (mean(hal | pal), pal) pairs or something similar to that as there is only one value of hal corresponding to each value of pal. So you could probably get something like what you want with:

    Code:
    by pal sex, sort: egen mhal = mean(hal)
    separate mhal, by(sex)
    graph twoway scatter mhal? pal || lfit mhal0 pal || lfit mhal1 pal, sort
    I haven't tested this, as you did not provide any example data to work with. So it may not be quite right, but it should set you in the right direction. You can, of course, add -graph twoway- options to customize the appearance. Also, the references to mhal0 and mhal1 presume that your sex variable is coded 0/1.

    Of course, there is no guarantee this is what was done for your source graph. I can't read the authors of that graph's mind any better than you can. They might have used the median, or perhaps some other summary statistic (including, possibly, the geometric mean). It's really anybody's guess. What I'm pretty confident of is that the statistic being plotted on the vertical axis is some summary statistic of hal, not individual observation values of hal. (Another explanation would be that their data set contains only a single observation for any given value of pal.)
    Last edited by Clyde Schechter; 30 Jun 2018, 20:13.

    Comment


    • #3
      your first graph is very hard to read, at least for me; please see the FAQ on how to show pictures

      try the following:
      Code:
      sysuse auto
      twoway (scatter weight mpg if fore==0, lc(gs8) sort c(l) ms(D)) (scatter weight mpg if fore==1, lc(black) sort c(l) msy(+)), legend(off)
      I turned the legend off here but you may want one

      clearly I did not use your variable names since you did not tell us what they are

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Well, you should probably refer back to the source of the original graph to see if they explain their methods in depth. My guess is that the points plotted in the graph you want to emulate are not (hal, pal) data points but rather are (mean(hal | pal), pal) pairs or something similar to that as there is only one value of hal corresponding to each value of pal. So you could probably get something like what you want with:

        Code:
        by pal sex, sort: egen mhal = mean(hal)
        separate mhal, by(sex)
        graph twoway scatter mhal? pal || lfit mhal0 pal || lfit mhal1 pal, sort
        I haven't tested this, as you did not provide any example data to work with. So it may not be quite right, but it should set you in the right direction. You can, of course, add -graph twoway- options to customize the appearance. Also, the references to mhal0 and mhal1 presume that your sex variable is coded 0/1.

        Of course, there is no guarantee this is what was done for your source graph. I can't read the authors of that graph's mind any better than you can. They might have used the median, or perhaps some other summary statistic (including, possibly, the geometric mean). It's really anybody's guess. What I'm pretty confident of is that the statistic being plotted on the vertical axis is some summary statistic of hal, not individual observation values of hal. (Another explanation would be that their data set contains only a single observation for any given value of pal.)
        This is exceptionally helpful, thank you so much!

        I used Clyde's code for my graph generating, thanks to both of you though.

        Is there a way to include on these graphs the original line of best fit in addition to the lines that present the relationship by sex?
        Also, can we add in a Pearson's r (for the overall males and females combined?)
        Furthermore, is there a way to turn off the scatter and just show the line of best fit -- with data points and their standard errors?
        Last edited by Alan Jeddi; 04 Jul 2018, 09:11.

        Comment

        Working...
        X