Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a Caterpillar Plot/Bar Plot by Group

    Hello,

    I am using Stata v 16.1. I am attempting to create a caterpillar plot (or something of the like) to show the point estimates and SE/SD by 283 groupIDs. My dataset includes 3,501 observations and my primary variable ('outcome') is a binary variable (yes/no). I am first estimating the mean prevalence and SD by groupID. I have run the following commands:

    bysort groupID: egen Mean=mean(outcome)
    bysort groupID: egen SD=sd(outcome)


    Then, I attempted to create a caterpillar plot by using

    ssc install caterpillar
    caterpillar Mean SD groupID, graph


    Below is an example of what I am trying to replicate, followed by my Stata output:


    Click image for larger version

Name:	Screenshot 2023-11-30 at 1.40.58 PM.png
Views:	2
Size:	279.4 KB
ID:	1735691

    Click image for larger version

Name:	Screenshot 2023-11-30 at 1.44.20 PM.png
Views:	2
Size:	278.0 KB
ID:	1735692




    I also attempted to use the statsby command in order to create a standard bar chart with the 95% CIs for each group. I ran the following command

    statsby mean=r(mean) sd=r(sd) size=r(N), by(GroupID): summarize outcome
    serrbar mean sd GroupID


    This is what I wanted:
    Click image for larger version

Name:	Screenshot 2023-11-30 at 1.54.27 PM.png
Views:	1
Size:	146.3 KB
ID:	1735689

    This is my output graph:


    Click image for larger version

Name:	Screenshot 2023-11-30 at 1.54.01 PM.png
Views:	1
Size:	74.6 KB
ID:	1735690


    I have 283 distinct groups which I'm trying to show.. so confused by this output. Any help on creating these graphs, or something similar to show point estimates and SDs by group, would be greatly appreciated.

    Thank you!
    Attached Files
    Dear All, I find this question here (http://bbs.pinggu.org/thread-6292609-1-1.html). How to draw a graph like this. (Unfortunately, no data set is available.)

  • #2
    I don't know what is going on except that

    1. serrbar is not the tool of choice here. Supposedly I wrote it! (*) But it was never really intended for confidence interval display as such.

    2. Your calculation will not create 95% confidence intervals. It will at best show mean +/- SD, which are narrower intervals, exactly how much narrower depending on the sample size.

    3. I am confused by your graphs, some of which seem to show something like 283 different results, and some of which show many fewer. Perhaps you have different identifier variables in your dataset and are moving back and forth between them.

    I don't know caterpillar, which seems to do much more than draw the graph, which is excellent. But for the graph alone, something like this will suffice.

    Code:
    clear
    set obs 5000
    set seed 2803 
    egen id = seq(), block(50)
    gen outcome = runiformint(0, 1)
    
    * you start here with your variable names for id and outcome 
    statsby, by(id) clear : ci prop outcome 
    
    sort mean id 
    gen newid = _n 
    
    twoway rcap lb ub newid || scatter mean newid, ///
    xtitle(interviewers in outcome order) ///
    legend(order(2 "mean" 1 "95% confidence interval") pos(11) ring(0)) ///
    yla(0.2(0.2)0.8, format("%02.1f")) ytitle(fraction reporting X)
    Click image for larger version

Name:	interviewer.png
Views:	1
Size:	68.5 KB
ID:	1735731


    (*) I did write it in this sense. Stata had an official command serrbar. I suggested some changes, which were implemented in Stata 6. StataCorp have a generous tradition of crediting users who make major contributions to official commands. The command has been rewritten since Stata 6, notably in Stata 8, but the note in the pdf documentation is very sticky.

    Comment


    • #3
      Nick, thank you very much for your prompt response. This is exactly the help I needed. I tested the command on my end and it works perfectly.

      Comment

      Working...
      X