Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Plot 4 lines of means, together with "confidence intervals" based on the corresponding standard deviations?

    Dear smartest listers,

    I would like to plot the mean y against each value of x, separately for the 4 cases (g=0,h=0), (g=0,h=1), (g=1,h=0), and (h=1,h=1).

    So I started with the following code:

    Code:
    collapse (mean) y, by(g h x)
    sort g h x
    save graphdata, replace
    
    use graphdata, replace
    gen y1 = y if g==0 & h==0
    gen y2 = y if g==0 & h==1
    gen y3 = y if g==1 & h==0
    gen y4 = y if g==1 & h==1
    
    twoway (line y1 x) (line y2 x) (line y3 x) (line y4 x)
    -
    Now I would like to add also a sort of "confidence interval" (although I am plotting means rather than regression coefficients) that would allow to see whether the lines are significantly different or whether their confidence intervals overlap. IFF I remember my statistics correctly, I should get 95% confidence intervals by adding to each curve a shaded area covering on each side of the line two standard deviations, or would you do this differently?

    For that purpose, I ran another separate collapse with
    Code:
    (sd)
    instead of
    Code:
    (mean)
    to save for each g, h and x value also the standard deviation of y.
    But now I am unsure how to complement the above code to include also those 4 shaded areas.
    Would anyone know?

    Thank you so much!
    PM


  • #2
    Your memory about using standard deviations here is incorrect. You need to calculate the standard errors of the means for your purpose. You can do this within -collapse- with the -(semean)- statistic. So, go back to the beginning.

    Code:
    collapse (mean) y (semean) se = y, by(x g h)
    
    egen gh = group(g h), label
    gen lb = y - invnorm(0.975)*se
    gen ub = y + invnorm(0.975)*se
    drop se g h
    
    reshape wide y lb ub, i(x) j(gh)
    graph twoway (line y1 x) (rcap lb1 yb1 x) (line y2 x) (rcap lb2 yb2 x) ///
        (line y3 x) (rcap lb3 yb3 x) (line y4 x) (rcap lb4 yb4 x)
    You will probably want to add some options to the -graph- command to do things like harmonize the colors between the lines and caps of corresponding values of g, h, and probably to modify the appearance of the legend.

    As you did not provide example data, this code is untested and may contain typos or other errors.


    Comment


    • #3
      Clyde Schechter 's approach from first principles is a little optimistic about your group sample sizes, namely that they are large enough for normal-based confidence intervals to work well.

      Otherwise see https://www.statalist.org/forums/for...-interval-sets for a community-contributed command cisets that can produce t-based confidence intervals collectively

      or equivalently https://journals.sagepub.com/doi/pdf...867X1001000112 for how to use statsby directly to the same end.

      For either approach, Clyde's device of calling up egen first may help.

      Comment


      • #4
        Two small comments on #2:
        • if you are averse to reshaping the data, you can also create the requisite y#, lb# and ub# variables using the separate command
        • if you are on Stata 19, you can also explore the newly introduced graph twoway rpcap
        Last edited by Hemanshu Kumar; Today, 09:59.

        Comment


        • #5
          I would not use the term "conficence interval" but rather error bars or error lines (you can plot lines showing the means ± 1 standard error or ± the standard error multiplied by any other number). Clyde Schechter pointed out already that you need the standard error of the means, not the standard deviation of the data, and Nick Cox added the important comment that ± 2 · standard error only yields the 95% CI limits for large samples.

          I would add a caution about the interpretation of non-overlapping CIs as indicating that two independently estimated means would not differ significantly at p < .05. This is a common misconception. For n > 10 a better rule of thumb would be that the difference of two independent means is statistically significant at p < .05 as long as the arms of their confidence intervals do not overlap by more than 0.5 of the average arm length (this does not hold for multiple comparisons or correlated data), see Cumming & Finch (2005).

          Comment


          • #6
            Thanks to Nick Cox and Dirk Enzmann for their advice polishing the rough edges off of the approach I offered in #2.

            And many thanks to Hemanshu Kumar for drawing my attention to -graph twoway rpcap-. I was not aware of its existence.

            Comment

            Working...
            X