Bar Chart with standard errors (move from two to three bars)

Roger More

Join Date: Jul 2017

Posts: 59
#1

Bar Chart with standard errors (move from two to three bars)

11 Dec 2018, 15:16

Dear all,

I hope all is well with you. I wanted to create a bar chart with three bars along with their standard errors. Here is sample of my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte StateWins int yeardecision str5 bench byte AfterReformJudge 1 2012 "abohc" 0 1 2014 "abohc" 1 0 2013 "abohc" 1 0 2016 "abohc" 1 1 1999 "banhc" 0 1 2013 "banhc" 1 0 2004 "banhc" 0 1 1986 "banhc" 0 0 1990 "banhc" 0 1 2010 "banhc" 0 0 2001 "banhc" 0 end

Previously, when I had a identifier before and after reform, I was able to make two bars by using the following code where former was before reform and latter bar was after the reform:

Code:

preserve collapse (mean) meanStateWins= StateWins (sd) sdStateWins=StateWins (count) n=StateWins, by(AfterReformJudge) generate hiStateWins = meanStateWins + invttail(n-1,0.025)*(sdStateWins / sqrt(n)) generate loStateWins = meanStateWins - invttail(n-1,0.025)*(sdStateWins / sqrt(n)) graph twoway (bar meanStateWins AfterReformJudge) (rcap hiStateWins loStateWins AfterReformJudge) restore

However, now I need to make three bar charts one for State wins from yeardecision 1986 to 1998, one for state wins from year 1999 to 2009 and one from year decision 2010 to 2016 along with their respective standard errors.

I have tried using if qualifiers to construct yeardecision ranges but I cant seem to make the code work. How can I construct the three bars with their standard errors within these three time ranges?

Any help here will be really appreciated.

Cheers,
Roger

Last edited by Roger More; 11 Dec 2018, 15:35.
Tags: None

Roger More

Join Date: Jul 2017
Posts: 59

12 Dec 2018, 02:33

Any leads even on whether I should even use twoway bar command and rcap command would be great

. I am having hard time getting Standard errors or the mean in a bar. I have also tried the following:

Code:

preserve 
collapse (mean) meanStateWins1= StateWins if yeardecision<1999 (sd) sdStateWins1=StateWins if yeardecision >=1999 (count) n=StateWins if yeardecision<1999 (mean) meanStateWins2= StateWins if yeardecision>=1999 & yeardecision <2010 yeardecision>1999(sd) sdStateWins2=StateWins if yeardecision>=1999 & yeardecision <2010 (count) n=StateWins if yeardecision>=1999 & yeardecision <2010 (mean) meanStateWins3= StateWins if yeardecision>2009 (sd) sdStateWins1=StateWins if yeardecision>2009 (count) n=StateWins if yeardecision>2009 
generate hiStateWins = meanStateWins + invttail(n-1,0.025)*(sdStateWins / sqrt(n))
generate loStateWins = meanStateWins - invttail(n-1,0.025)*(sdStateWins / sqrt(n))
graph twoway (bar meanStateWins1 meanStateWins2  meanStateWins3) (rcap hiStateWins loStateWins)
restore

However, I continue to get error "invalid '('"

Again any help here will be really appreciated. Thank you.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35563
#3

12 Dec 2018, 03:27

That's almost unreadable, but thanks for posting actual code that can be copied and pasted. Taking your collapse command I see (with some editing)

Code:

(mean) meanStateWins1= StateWins if yeardecision<1999 (sd) sdStateWins1=StateWins if yeardecision >=1999 (count) n=StateWins if yeardecision<1999 (mean) meanStateWins2= StateWins if yeardecision>=1999 & yeardecision <2010 yeardecision>1999 (sd) sdStateWins2=StateWins if yeardecision>=1999 & yeardecision <2010 (count) n=StateWins if yeardecision>=1999 & yeardecision <2010 (mean) meanStateWins3= StateWins if yeardecision>2009 (sd) sdStateWins1=StateWins if yeardecision>2009 (count) n=StateWins if yeardecision>2009

That's a real mess.

0. The over-arching error is trying a very complicated command and then getting lost in the mess. Try simple commands, get them working, and then complicate.

1. The line

Code:

yeardecision>1999

looks like stray garbage, so out it goes.

2. You can't use multiple if conditions and they are certainly not to be placed within the command as you did. See the help for collapse

3. You have an interest in three time periods

Code:

yeardecision<1999 yeardecision>=1999 & yeardecision <2010 yeardecision>2009

Instead of multiple if conditions, use a new variable:

Code:

gen period = cond(yeardecision < 1999, 1, cond(yeardecision <2010, 2, 3))

We could use that variable in the collapse command, but let's see what else we have:

Code:

(mean) meanStateWins1= StateWins if yeardecision<1999 (sd) sdStateWins1=StateWins if yeardecision >=1999 (count) n=StateWins if yeardecision<1999 (mean) meanStateWins2= StateWins if yeardecision>=1999 & yeardecision <2010 (sd) sdStateWins2=StateWins if yeardecision>=1999 & yeardecision <2010 (count) n=StateWins if yeardecision>=1999 & yeardecision <2010 (mean) meanStateWins3= StateWins if yeardecision>2009 (sd) sdStateWins1=StateWins if yeardecision>2009 (count) n=StateWins if yeardecision>2009

4. The if condition on the second line is clearly legal, but not, I think, what you want.

5. You have got the idea that (with your syntax) the means and SDs for different periods need different variable names, but you missed that point for the counts. You are, or would be, trying to pack three different variables under the same variable name. That alone would be fatal.

6. Your collapse command could perhaps just be

Code:

collapse (mean) mean= StateWins (sd) sd=StateWins (count) n=StateWins, by(period)

That's what you were seeking. You do need to create the period variable first, as above.

I wanted to show some of the ways that someone with more Stata experience (I guess) would think about your code. But your do-it-yourself approach isn't needed at all.

Let's start again.

Code:

gen period = cond(yeardecision < 1999, 1, cond(yeardecision <2010, 2, 3)) label def period 1 "1986-1998" 2 "1999-2009" 3 "2010-2016" label val period period statsby, by(period) : ci proportions StateWins , jeffreys

New points emerge here.

A. You want confidence intervals for a binary outcome. Use the right statistical machinery! Here I use jeffreys as a personal choice, but use a defensible procedure. The standard t-based procedure is often lousy for binary outcomes; many textbooks are decades out of date on this.

B. I have already made this point, but it's so important that I'll repeat it. Stata provides the framework you want. You don't have to invent your own.

Here's one I did separately as a complete self-contained example. I won't adopt the horrible (detonator, dynamite, plunger) plot of bars with error bars, but you could if you really want it.

Code:

sysuse auto, clear statsby, by(rep78) : ci proportions foreign , jeffreys twoway scatter mean rep78 || rcap lb ub rep78 , legend(off) ytitle(Proportion foreign) scheme(s1mono) yla(0 "0" 1 "1" 0.2(0.2)0.8, format("%02.1f") ang(h))

PS: It is unlikely that successive years are independent, and none of this takes account of any dependence structure in the data. So watch out.

Last edited by Nick Cox; 12 Dec 2018, 03:30.
1 like
Comment
Roger More

Join Date: Jul 2017

Posts: 59
#4

12 Dec 2018, 04:19

Dear Dr. Nick,

Thank you so much not just for the time but walking my through how one would approach the problem. I have learned a lot from this post.

However, I am still not able to get the standard errors tacked on the bar charts. So, now I am able to construct bar chart with time periods as you suggested (which is very intuitive):

Code:

preserve gen period = cond(yeardecision < 1999, 1, cond(yeardecision <2010, 2, 3)) label def period 1 "1986-1998" 2 "1999-2009" 3 "2010-2016" label val period period collapse (mean) mean= StateWins (sd) sd=StateWins (count) n=StateWins, by(period) graph twoway (bar mean period) restore

Nevertheless, I am unable to 'collect' the confidence interval/standard errors and put on the bar chart by using

Code:

statsby, by(period) : ci proportions StateWins , jeffreys clear

How, can I tack on standard errors by tweaking the code above. I tried to sandwich the statby command so I can create the confidence interval variable but I get "no; data in memory would be lost". I do understand up looking at help file that I need to clear or replace and that statsby is a bit like collapse command but this reintroduces the problem that I loose my mean StateWins on which I want to tack the confidence interval.

Thank again very much for your help on this!

Last edited by Roger More; 12 Dec 2018, 04:24.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35563

12 Dec 2018, 04:45

Thanks very much for the thanks, but it seems that you don't quite get that the whole collapse approach is completely unnecessary.

The one data example you give in #1 can be used to make the needed points.

clear is placed in the wrong position in your command. It's not an option of ci!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte StateWins int yeardecision str5 bench byte AfterReformJudge
1 2012 "abohc" 0
1 2014 "abohc" 1
0 2013 "abohc" 1
0 2016 "abohc" 1
1 1999 "banhc" 0
1 2013 "banhc" 1
0 2004 "banhc" 0
1 1986 "banhc" 0
0 1990 "banhc" 0
1 2010 "banhc" 0
0 2001 "banhc" 0
end

gen period = cond(year < 1999, 1, cond(year < 2010, 2, 3)) 
label def period 1 "1986-1998" 2 "1999-2009" 3 "2010-2016"
label val period period 

statsby, by(period) clear : ci proportions StateWins

twoway scatter mean period || rcap lb ub period, legend(off) ///
ytitle(Proportion of wins) scheme(s1mono) yla(0 "0" 1 "1" 0.2(0.2)0.8, format("%02.1f") ang(h)) ///
xla(1/3, valuelabel) xsc(r(0.8 3.2))

You need clear because the period variable has been created and statsby won't let you abandon that change to the dataset. Otherwise you could experiment with preserve and restore.

In your full dataset it's most unlikely that you need the entire range from 0 to 1 on the y axis. That's another reason for not using bars, which I didn't recommend at all for this problem.

Comment

Roger More

Join Date: Jul 2017

Posts: 59
#6

12 Dec 2018, 04:55

Thanks again very much. Just final points to close the thread.

I do have to take out proportions variable (since we never created it) and we cannot do it with jeffreys correction for binary variables since its unclear where to put it in the following code as it is an option I put it after comma but it does not work this way along with few iterations:

Code:

statsby, jeffreys by(period) clear : ci proportions StateWins

Thanks again very much.

Cheers!

Last edited by Roger More; 12 Dec 2018, 05:03.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35563
#7

12 Dec 2018, 05:52

Please do read the FAQ Advice as we ask. We explain there that "does not work" is not a good error report.

Nevertheless I think I can work out what you're doing wrong.

jeffreys is an option of ci. It was in the correct position in the statsby call in #3.

There is no need to move the option call jeffreys and indeed doing so made your command illegal. jeffreys is not an option of statsby. The error message you would have got -- which you didn't show us -- would probably have signalled that.

The command in #5 remains correct for you on the information you've given.

Code:

statsby, by(period) clear : ci proportions StateWins

As said, I personally would usually go

Code:

statsby, by(period) clear : ci proportions StateWins, jeffreys

But what lies downstream of this? A thesis, paper, book -- I don't know your career stage -- any will, I guess, carry a need to explain why you made particular choices in analysis. I recommend that you read https://projecteuclid.org/euclid.ss/1009213286 and make your own informed decision on a good method for a confidence interval.

Last edited by Nick Cox; 12 Dec 2018, 05:58.
1 like
Comment
Roger More

Join Date: Jul 2017

Posts: 59
#8

12 Dec 2018, 06:07

Thank you very much. I will read the link carefully. The bar chart is a motivation for a paper and I have just started PhD and getting hand of Stata and thanks to Statalist learning a lot! Ok the final post, sorry again for this long thread.

Regarding proportions and reporting error message, the error message I got is

variable proportions not found
an error occurred when statsby executed ci

The code I ran is as follows:

Code:

cd "F:\Religion and Courts" use ".\Input\CaseYearDataWithJudgeWithShrines.dta", replace gen period = cond(year < 1999, 1, cond(year < 2010, 2, 3)) label def period 1 "1986-1998" 2 "1999-2009" 3 "2010-2016" label val period period statsby, by(period) clear : ci proportions StateWins, jeffreys twoway (bar mean period) ( rcap lb ub period), legend(off) *twoway scatter mean period || rcap lb ub period, legend(off) /// ytitle(Proportion of wins) scheme(s1mono) yla(0 "0" 1 "1" 0.2(0.2)0.8, format("%02.1f") ang(h)) /// xla(1/3, valuelabel) xsc(r(0.8 3.2))

P,S: If i do not put jeffreys option AND remove proportions from statsby command, I do get the bar chart that is why I was wondering the use of proportions in the statsby line of code. Sidenote, x axis is not using labels which we specified above i.e. time periods (1 "1986-1998" 2 "1999-2009" 3 "2010-2016").

Thank you again.

Last edited by Roger More; 12 Dec 2018, 06:30.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35563
#9

12 Dec 2018, 08:04

You must be using an old version of Stata. That's why

Code:

ci proportions

doesn't work for you. It was introduced in Stata 14.1.

Again, the message is: please read the FAQ Advice and act on it.

11. What should I say about the version of Stata I use?

The current version of Stata is 15.1. Please specify if you are using an earlier version; otherwise, the answer to your question may refer to commands or features unavailable to you. Moreover, as bug fixes and new features are issued frequently by StataCorp, make sure that you update your Stata before posting a query, as your problem may already have been solved.

My guess is that you need

Code:

statsby, by(period) clear : ci StateWins, binomial jeffreys
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35563
#10

12 Dec 2018, 12:46

#8 had a comment on value labels not being used. But the code for that was given in #5 and repeated in #8, yet commented out.
Comment

Announcement