Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bar Chart with standard errors (move from two to three bars)

    Dear all,

    I hope all is well with you. I wanted to create a bar chart with three bars along with their standard errors. Here is sample of my data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte StateWins int yeardecision str5 bench byte AfterReformJudge
    1 2012 "abohc" 0
    1 2014 "abohc" 1
    0 2013 "abohc" 1
    0 2016 "abohc" 1
    1 1999 "banhc" 0
    1 2013 "banhc" 1
    0 2004 "banhc" 0
    1 1986 "banhc" 0
    0 1990 "banhc" 0
    1 2010 "banhc" 0
    0 2001 "banhc" 0
    end

    Previously, when I had a identifier before and after reform, I was able to make two bars by using the following code where former was before reform and latter bar was after the reform:

    Code:
    preserve
    collapse (mean) meanStateWins= StateWins (sd) sdStateWins=StateWins (count) n=StateWins, by(AfterReformJudge)
    generate hiStateWins = meanStateWins + invttail(n-1,0.025)*(sdStateWins / sqrt(n))
    generate loStateWins = meanStateWins - invttail(n-1,0.025)*(sdStateWins / sqrt(n))
    graph twoway (bar meanStateWins AfterReformJudge) (rcap hiStateWins loStateWins AfterReformJudge)
    restore
    However, now I need to make three bar charts one for State wins from yeardecision 1986 to 1998, one for state wins from year 1999 to 2009 and one from year decision 2010 to 2016 along with their respective standard errors.

    I have tried using if qualifiers to construct yeardecision ranges but I cant seem to make the code work. How can I construct the three bars with their standard errors within these three time ranges?

    Any help here will be really appreciated.

    Cheers,
    Roger
    Last edited by Roger More; 11 Dec 2018, 15:35.

  • #2
    Any leads even on whether I should even use twoway bar command and rcap command would be great

    . I am having hard time getting Standard errors or the mean in a bar. I have also tried the following:

    Code:
    preserve 
    collapse (mean) meanStateWins1= StateWins if yeardecision<1999 (sd) sdStateWins1=StateWins if yeardecision >=1999 (count) n=StateWins if yeardecision<1999 (mean) meanStateWins2= StateWins if yeardecision>=1999 & yeardecision <2010 yeardecision>1999(sd) sdStateWins2=StateWins if yeardecision>=1999 & yeardecision <2010 (count) n=StateWins if yeardecision>=1999 & yeardecision <2010 (mean) meanStateWins3= StateWins if yeardecision>2009 (sd) sdStateWins1=StateWins if yeardecision>2009 (count) n=StateWins if yeardecision>2009 
    generate hiStateWins = meanStateWins + invttail(n-1,0.025)*(sdStateWins / sqrt(n))
    generate loStateWins = meanStateWins - invttail(n-1,0.025)*(sdStateWins / sqrt(n))
    graph twoway (bar meanStateWins1 meanStateWins2  meanStateWins3) (rcap hiStateWins loStateWins)
    restore
    However, I continue to get error "invalid '('"

    Again any help here will be really appreciated. Thank you.

    Comment


    • #3
      That's almost unreadable, but thanks for posting actual code that can be copied and pasted. Taking your collapse command I see (with some editing)

      Code:
      (mean) meanStateWins1= StateWins if yeardecision<1999
      (sd) sdStateWins1=StateWins if yeardecision >=1999
      (count) n=StateWins if yeardecision<1999  
      
      (mean) meanStateWins2= StateWins if yeardecision>=1999 & yeardecision <2010
      yeardecision>1999
      (sd) sdStateWins2=StateWins if yeardecision>=1999 & yeardecision <2010
      (count) n=StateWins if yeardecision>=1999 & yeardecision <2010
      
      (mean) meanStateWins3= StateWins if yeardecision>2009
      (sd) sdStateWins1=StateWins if yeardecision>2009
      (count) n=StateWins if yeardecision>2009
      That's a real mess.

      0. The over-arching error is trying a very complicated command and then getting lost in the mess. Try simple commands, get them working, and then complicate.

      1. The line

      Code:
      yeardecision>1999
      looks like stray garbage, so out it goes.

      2. You can't use multiple if conditions and they are certainly not to be placed within the command as you did. See the help for collapse

      3. You have an interest in three time periods

      Code:
      yeardecision<1999
      yeardecision>=1999 & yeardecision <2010
      yeardecision>2009
      Instead of multiple if conditions, use a new variable:

      Code:
      gen period = cond(yeardecision < 1999, 1, cond(yeardecision <2010, 2, 3))
      We could use that variable in the collapse command, but let's see what else we have:

      Code:
      (mean) meanStateWins1= StateWins if yeardecision<1999
      (sd) sdStateWins1=StateWins if yeardecision >=1999
      (count) n=StateWins if yeardecision<1999  
      
      (mean) meanStateWins2= StateWins if yeardecision>=1999 & yeardecision <2010
      (sd) sdStateWins2=StateWins if yeardecision>=1999 & yeardecision <2010
      (count) n=StateWins if yeardecision>=1999 & yeardecision <2010
      
      (mean) meanStateWins3= StateWins if yeardecision>2009
      (sd) sdStateWins1=StateWins if yeardecision>2009
      (count) n=StateWins if yeardecision>2009
      4. The if condition on the second line is clearly legal, but not, I think, what you want.

      5. You have got the idea that (with your syntax) the means and SDs for different periods need different variable names, but you missed that point for the counts. You are, or would be, trying to pack three different variables under the same variable name. That alone would be fatal.

      6. Your collapse command could perhaps just be

      Code:
      collapse (mean) mean= StateWins (sd) sd=StateWins (count) n=StateWins,  by(period)
      That's what you were seeking. You do need to create the period variable first, as above.

      I wanted to show some of the ways that someone with more Stata experience (I guess) would think about your code. But your do-it-yourself approach isn't needed at all.

      Let's start again.

      Code:
      gen period = cond(yeardecision < 1999, 1, cond(yeardecision <2010, 2, 3))
      label def period 1 "1986-1998" 2 "1999-2009" 3 "2010-2016"
      label val period period
      
      statsby, by(period) : ci proportions StateWins , jeffreys
      New points emerge here.

      A. You want confidence intervals for a binary outcome. Use the right statistical machinery! Here I use jeffreys as a personal choice, but use a defensible procedure. The standard t-based procedure is often lousy for binary outcomes; many textbooks are decades out of date on this.

      B. I have already made this point, but it's so important that I'll repeat it. Stata provides the framework you want. You don't have to invent your own.

      Here's one I did separately as a complete self-contained example. I won't adopt the horrible (detonator, dynamite, plunger) plot of bars with error bars, but you could if you really want it.

      Code:
      sysuse auto, clear
      statsby, by(rep78) : ci proportions foreign , jeffreys
      twoway scatter mean rep78 || rcap lb ub rep78 , legend(off) ytitle(Proportion foreign) scheme(s1mono) yla(0 "0" 1 "1" 0.2(0.2)0.8, format("%02.1f") ang(h))
      Click image for larger version

Name:	roger_more.png
Views:	1
Size:	19.5 KB
ID:	1474537



      PS: It is unlikely that successive years are independent, and none of this takes account of any dependence structure in the data. So watch out.
      Last edited by Nick Cox; 12 Dec 2018, 03:30.

      Comment


      • #4
        Dear Dr. Nick,

        Thank you so much not just for the time but walking my through how one would approach the problem. I have learned a lot from this post.

        However, I am still not able to get the standard errors tacked on the bar charts. So, now I am able to construct bar chart with time periods as you suggested (which is very intuitive):

        Code:
        preserve
        gen period = cond(yeardecision < 1999, 1, cond(yeardecision <2010, 2, 3))
        label def period 1 "1986-1998" 2 "1999-2009" 3 "2010-2016"
        label val period period
        collapse (mean) mean= StateWins (sd) sd=StateWins (count) n=StateWins,  by(period)
        graph twoway (bar mean period)
        restore
        Nevertheless, I am unable to 'collect' the confidence interval/standard errors and put on the bar chart by using

        Code:
          statsby, by(period) : ci proportions StateWins , jeffreys clear
        How, can I tack on standard errors by tweaking the code above. I tried to sandwich the statby command so I can create the confidence interval variable but I get "no; data in memory would be lost". I do understand up looking at help file that I need to clear or replace and that statsby is a bit like collapse command but this reintroduces the problem that I loose my mean StateWins on which I want to tack the confidence interval.

        Thank again very much for your help on this!
        Last edited by Roger More; 12 Dec 2018, 04:24.

        Comment


        • #5
          Thanks very much for the thanks, but it seems that you don't quite get that the whole collapse approach is completely unnecessary.

          The one data example you give in #1 can be used to make the needed points.

          clear is placed in the wrong position in your command. It's not an option of ci!

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte StateWins int yeardecision str5 bench byte AfterReformJudge
          1 2012 "abohc" 0
          1 2014 "abohc" 1
          0 2013 "abohc" 1
          0 2016 "abohc" 1
          1 1999 "banhc" 0
          1 2013 "banhc" 1
          0 2004 "banhc" 0
          1 1986 "banhc" 0
          0 1990 "banhc" 0
          1 2010 "banhc" 0
          0 2001 "banhc" 0
          end
          
          gen period = cond(year < 1999, 1, cond(year < 2010, 2, 3)) 
          label def period 1 "1986-1998" 2 "1999-2009" 3 "2010-2016"
          label val period period 
          
          statsby, by(period) clear : ci proportions StateWins
          
          twoway scatter mean period || rcap lb ub period, legend(off) ///
          ytitle(Proportion of wins) scheme(s1mono) yla(0 "0" 1 "1" 0.2(0.2)0.8, format("%02.1f") ang(h)) ///
          xla(1/3, valuelabel) xsc(r(0.8 3.2))
          You need clear because the period variable has been created and statsby won't let you abandon that change to the dataset. Otherwise you could experiment with preserve and restore.

          In your full dataset it's most unlikely that you need the entire range from 0 to 1 on the y axis. That's another reason for not using bars, which I didn't recommend at all for this problem.

          Comment


          • #6
            Thanks again very much. Just final points to close the thread.

            I do have to take out proportions variable (since we never created it) and we cannot do it with jeffreys correction for binary variables since its unclear where to put it in the following code as it is an option I put it after comma but it does not work this way along with few iterations:

            Code:
             statsby, jeffreys by(period) clear : ci proportions StateWins
            Thanks again very much.

            Cheers!
            Last edited by Roger More; 12 Dec 2018, 05:03.

            Comment


            • #7
              Please do read the FAQ Advice as we ask. We explain there that "does not work" is not a good error report.

              Nevertheless I think I can work out what you're doing wrong.


              jeffreys is an option of ci. It was in the correct position in the statsby call in #3.

              There is no need to move the option call jeffreys and indeed doing so made your command illegal. jeffreys is not an option of statsby. The error message you would have got -- which you didn't show us -- would probably have signalled that.

              The command in #5 remains correct for you on the information you've given.

              Code:
               statsby, by(period) clear : ci proportions StateWins

              As said, I personally would usually go

              Code:
               statsby, by(period) clear : ci proportions StateWins, jeffreys 
              But what lies downstream of this? A thesis, paper, book -- I don't know your career stage -- any will, I guess, carry a need to explain why you made particular choices in analysis. I recommend that you read https://projecteuclid.org/euclid.ss/1009213286 and make your own informed decision on a good method for a confidence interval.
              Last edited by Nick Cox; 12 Dec 2018, 05:58.

              Comment


              • #8
                Thank you very much. I will read the link carefully. The bar chart is a motivation for a paper and I have just started PhD and getting hand of Stata and thanks to Statalist learning a lot! Ok the final post, sorry again for this long thread.



                Regarding proportions and reporting error message, the error message I got is
                variable proportions not found
                an error occurred when statsby executed ci
                The code I ran is as follows:

                Code:
                cd "F:\Religion and Courts"
                use ".\Input\CaseYearDataWithJudgeWithShrines.dta", replace
                
                gen period = cond(year < 1999, 1, cond(year < 2010, 2, 3))
                label def period 1 "1986-1998" 2 "1999-2009" 3 "2010-2016"
                label val period period
                
                statsby, by(period) clear : ci proportions StateWins, jeffreys
                
                twoway (bar mean period) ( rcap lb ub period), legend(off)
                
                *twoway scatter mean period || rcap lb ub period, legend(off) ///
                ytitle(Proportion of wins) scheme(s1mono) yla(0 "0" 1 "1" 0.2(0.2)0.8, format("%02.1f") ang(h)) ///
                xla(1/3, valuelabel) xsc(r(0.8 3.2))

                P,S: If i do not put jeffreys option AND remove proportions from statsby command, I do get the bar chart that is why I was wondering the use of proportions in the statsby line of code. Sidenote, x axis is not using labels which we specified above i.e. time periods (1 "1986-1998" 2 "1999-2009" 3 "2010-2016").

                Thank you again.
                Last edited by Roger More; 12 Dec 2018, 06:30.

                Comment


                • #9
                  You must be using an old version of Stata. That's why

                  Code:
                  ci proportions
                  doesn't work for you. It was introduced in Stata 14.1.

                  Again, the message is: please read the FAQ Advice and act on it.

                  11. What should I say about the version of Stata I use?

                  The current version of Stata is 15.1. Please specify if you are using an earlier version; otherwise, the answer to your question may refer to commands or features unavailable to you. Moreover, as bug fixes and new features are issued frequently by StataCorp, make sure that you update your Stata before posting a query, as your problem may already have been solved.
                  My guess is that you need

                  Code:
                   
                   statsby, by(period) clear : ci StateWins, binomial jeffreys

                  Comment


                  • #10
                    #8 had a comment on value labels not being used. But the code for that was given in #5 and repeated in #8, yet commented out.

                    Comment

                    Working...
                    X