Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keep the same order of database categories in a bar graph and not sorting them

    Hello Stata people;

    I hope I can get a clarification on this one, I'm using Stata 13.1 version and working with this data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str5 Deciles byte shareoffinancialassetsintotalass
    ">D1"   31
    "D1-D2" 34
    "D2-D3" 42
    "D3-D4" 43
    "D4-D5" 20
    "D5-D6" 14
    "D6-D7" 15
    "D7-D8" 16
    "D8-D9" 20
    "<D9"   23
    end
    The data represents the share of financial assets in total assets for each social class or decile. The idea is to draw a bar graph showing that part for each decile. As you can see, the deciles (and the other variable) are sorted from the poorest decile to the richest, and I do want to keep that order in a bar graph.
    Yet, when using the command graph bar (asis) shareoffinancialassetsintotalass, over(Deciles) and even using that same command with the sort option, I get a bar graph with the categories organized in a different way rather than the way of the dataset.
    I guess the solution could be to change something on the sort option of the command. My goal is to have the same order of the categories on the graph as it is on the dataset.

    Any help please?

    With many thanks!
    Attached Files

  • #2
    You can't suppress -graph bar-'s sorting, as far as I know. What you can do is create a labeled numeric variable that is sorted in the order you actually want and use that instead of the original Deciles variable. To wit:

    Code:
    label define decile 1    ">D1"    10    "<D9"
    forvalues i = 2/9 {
        label define decile `i'    "D`=`i'-1'-D`i'", add
    }
    encode Deciles, gen(decile) label(decile)
    
    graph bar (asis) shareoffinancialassetsintotalass, over(decile)
    By the way, do you really mean ">D1" and "<D9"? I think "<D1" and ">D9" would be more sensible.

    Comment


    • #3
      Clyde Schechter Thanks for the help! The code worked and got me the result I wanted. As for your last remark, you are right, it was just some mistyping from my part.

      But back at my original question, why doesn't Stata keep the order as it is from the dataset on the graph if I didn't introduce a "sort" option? Why does the command change automatically the order of the categories without me introducing such an option in my command and code? I do want to understand that, because it could be easy to create a labeled numeric variable out of deciles variables, but what if I was working with categories about age groups for instance? It would be complicated to do something to keep the order of categories as it is on the datavase.

      Thanks.

      Comment


      • #4
        I don't know why they designed it that way. You would have to ask somebody at StataCorp. Actually, the -graph- command goes back, I think, to version 7, which is a very long time ago. There might not be anybody still there who remembers how they decided on that. I will say that since bar graphs are typically used for unordered categorical variables, sorting the categories alphabetically is often what is desired. And you can override it when that isn't what you want, as shown in #2.

        Comment


        • #5
          If you want to retain the sort order of the observations, you can just create a variable to store the observation number, and then ask it to sort over that:

          Code:
          gen x = _n
          graph bar (asis) shareoffinancialassetsintotalass, over(Deciles, sort(x)) scheme(s2color)
          which produces:
          Click image for larger version

Name:	Screenshot 2025-07-26 at 1.32.16 AM.png
Views:	1
Size:	254.8 KB
ID:	1780316


          If you don't specify some kind of sort() suboption, the default is to sort the categories alphabetically, which is what is happening in #1.
          Last edited by Hemanshu Kumar; 25 Jul 2025, 14:09.

          Comment


          • #6
            Some of the puzzlement here seems self-inflicted. These are results for decile bins -- and a simple conventional ordering is surely then 1 to 10. As Clyde Schechter pointed out the labels ">D1" "<D9" seem to have the inequalities the wrong way round, but why stop there?

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str5 Deciles byte shareoffinancialassetsintotalass
            ">D1"   31
            "D1-D2" 34
            "D2-D3" 42
            "D3-D4" 43
            "D4-D5" 20
            "D5-D6" 14
            "D6-D7" 15
            "D7-D8" 16
            "D8-D9" 20
            "<D9"   23
            end
            
            gen BetterDeciles = _n 
            
            label var BetterDeciles "Deciles"
            
            graph bar shareoffinancialassetsintotalass, over(BetterDeciles) barw(0.9)
            With a simple natural ordering, graph bar does automatically what you presumably expect. The default spacing seems too large to me, but that's another story and itself easily overridden.

            Comment


            • #7

              Originally posted by Nick Cox View Post
              graph bar shareoffinancialassetsintotalass, over(BetterDeciles) barw(0.9)
              There is a small typo in the code in #6. -barwidth()- is a twoway bar option. To change the spacing between bars in graph bar with the -over()- option, you need to use the -gap()- suboption within -over()-.

              Code:
               
               graph bar shareoffinancialassetsintotalass, over(BetterDeciles, gap(2))

              Comment


              • #8
                Andrew Musau is entirely correct. Sorry about that. Here's the job a bit better done.


                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str5 Deciles byte shareoffinancialassetsintotalass
                ">D1"   31
                "D1-D2" 34
                "D2-D3" 42
                "D3-D4" 43
                "D4-D5" 20
                "D5-D6" 14
                "D6-D7" 15
                "D7-D8" 16
                "D8-D9" 20
                "<D9"   23
                end
                
                gen BetterDeciles = _n 
                
                label var BetterDeciles "Deciles"
                 
                twoway bar shareoffinancialassetsintotalass BetterDeciles, base(0) barw(0.9) xla(1/10, noticks) yla(0(5)45)
                Click image for larger version

Name:	deciles.png
Views:	1
Size:	47.2 KB
ID:	1780333

                Comment


                • #9
                  For completeness, here is another go with a version using graph bar as the new graph in this post.

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str5 Deciles byte shareoffinancialassetsintotalass
                  ">D1"   31
                  "D1-D2" 34
                  "D2-D3" 42
                  "D3-D4" 43
                  "D4-D5" 20
                  "D5-D6" 14
                  "D6-D7" 15
                  "D7-D8" 16
                  "D8-D9" 20
                  "<D9"   23
                  end
                  
                  gen BetterDeciles = _n 
                  
                  label var BetterDeciles "Deciles"
                   
                  twoway bar shareoffinancialassetsintotalass BetterDeciles, base(0) barw(0.9) xla(1/10, noticks) yla(0(5)45)
                  
                  graph bar (asis) shareoffinancialassetsintotalass, over(BetterDeciles, gap(*0.5)) yla(0(5)45) ///
                  ytitle(Share of financial assets (%)) b1title(Deciles)
                  Click image for larger version

Name:	deciles2.png
Views:	1
Size:	42.3 KB
ID:	1780363


                  I haven't tried to replicate the graph in #8 directly.

                  Some small details, varying from well documented to slightly more elusive:

                  graph bar, over() treats the over() variable as categorical and automatically displays all possible labels and suppresses all ticks. If you don't want that, move to twoway bar or look for work-arounds. Much more discussion at https://journals.sagepub.com/doi/pdf...6867X211000032

                  graph bar does not regard the categorical axis as an x axis. Obvious once understood, but it follows that you need to reach for b1title() or b2title() to get a horizontal axis title you want. You don't need that work-around with twoway bar. Other way round, with graph hbar, the ytitle() is defined but is horizontal, so to get a vertical axis title you want you may need l1title() or l2title().

                  Logic, convention and psychology are all tangled up in this territory and may contradict each other.

                  For a histogram of a continuous variable, it is standard among statistically literate people that histogram bars should touch because the bins touch. For a histogram of a discrete (e.g. counted) variable, some people don't mind touching bars; otherwise some people prefer bars that don't touch (or are reduced to spikes) to emphasise discreteness. People might even make different decisions depending on the data, the readership, or what looks good (or according to instructions from on high). As already hinted in #6 for data of this kind, I like to see small gaps between bars, but that is psychology, not logic, as a matter of seeing easily where the bins lie.

                  (I can't test this with Stata 13.1. You won't get the same default bar colour, but otherwise I don't think the syntax here has changed much recently.)

                  Comment

                  Working...
                  X