Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • overlaying histograms produce "empty" part on x-axis

    Dear all,

    I am making a plot consisting of overlaying histograms. My data ranges from -9 to 14. Even after rescaling the x-axis (xscale) from -14 to 15 AND declaring all xlabels to be within this bound (xlabel and xtick) I obtain a graph with a large empty patch in the lower x-area. As neither data nor labels should be restricting the x-axis to be larger than I specified, I am out of ideas how to get rid of the empty patch in my graph. My code is

    Code:
    twoway (histogram LR_open, frac width(14) start(-14) yaxis(1) yscale(axis(1) range(1.3)) xscale(range(-14 15)) xlabel(-10 -5 -2 0 2 5 10 15) fin(inten50) fcol(gs10) lcol(gs10) lwidth(vthin) xlabel(-10 -5 0 5 10 15) xtick(-10(1)15)) ///
    (histogram LR_open if LR_open < -5, fcol(dknavy) lcol(dknavy) fin(inten90) freq width(0.5) start(-10) xline(0) yaxis(2))  ///
    (histogram LR_open if LR_open >= -5 & LR_open < -2, fcol(navy) lcol(navy) fin(inten90) freq width(0.5) start(-10) yaxis(2))  ///
    (histogram LR_open if LR_open >= -2 & LR_open < -1, fcol(blue) lcol(blue) fin(inten80) freq width(0.5) start(-10) yaxis(2))  ///
    (histogram LR_open if LR_open >= -1 & LR_open < -0.5, fcol(midblue) lcol(midblue) freq width(0.5) start(-10) yaxis(2))  ///
    (histogram LR_open if LR_open >= -0.5 & LR_open < 0, fcol(eltblue) lcol(eltblue) freq width(0.5) start(-10) yaxis(2))  ///
    (histogram LR_open if LR_open >= 0 & LR_open < 0.5, fcol(sandb) lcol(sandb) freq width(0.5) start(-10) yaxis(2))  ///
    (histogram LR_open if LR_open >= 0.5 & LR_open < 1, fcol(orange) lcol(orange) freq width(0.5) start(-10) yaxis(2))  ///
    (histogram LR_open if LR_open >= 1 & LR_open < 2, fcol(dkorange) lcol(dkorange) fin(inten90) freq width(0.5) start(-10) yaxis(2))  ///
    (histogram LR_open if LR_open >= 2 & LR_open < 5, fcol(red) lcol(red) fin(inten80) freq width(0.5) start(-10) yaxis(2))  ///
    (histogram LR_open if LR_open >= 5, fcol(maroon) lcol(maroon) fin(inten90) freq width(0.5) start(-10) yaxis(2) xscale(range(-14 15)) xlabel(-10 -5 0 5 10 15) xtick(-10(1)15) legend(off))
    and it produces
    Click image for larger version

Name:	LR-coefficient_hist.png
Views:	1
Size:	40.0 KB
ID:	1578065


    I have tried including

    Code:
    xscale(range(-14 15)) xlabel(-10 -5 0 5 10 15) xtick(-10(1)15)
    in every line (for every histogram) but to no avail.

    Thank you kindly for your ideas!
    Last edited by Daniel Prosi; 20 Oct 2020, 08:38.

  • #2
    Hard for me to say. Sometimes even with an if qualifier twoway wants to take account of values that exist otherwise in the dataset. What is the result of

    Code:
    su LR_open

    Comment


    • #3
      Code:
      sum LR_open
      returns

      Code:
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
           LR_open |        141    .5499366    3.344713  -8.345554   13.70668
      Furthermore, adding
      Code:
      if LR_open > -10
      for those cases where the current specification isn't more restrictive on the lower tails doesn't resolve the issue, either
      Last edited by Daniel Prosi; 20 Oct 2020, 10:27.

      Comment


      • #4
        Still puzzling. Can you post all 141 values?

        Code:
        dataex LR_open, count(141)
        should do it, unless there are missing values.

        Comment


        • #5
          There are missing observations in my data. I obtained the values using
          Code:
          preserve
          drop if LR_open == .
          dataex LR_open, count(141)
          restore
          Wrapping the histogram in the same procedure did not solve the problem (as in:
          Code:
          preserve
          drop if LR_open == .
          XXX histogram code
          restore
          (erratum, see correct value in post below)


          Last edited by Daniel Prosi; 20 Oct 2020, 12:36.

          Comment


          • #6
            I have noticed that there are some accidental duplicate entries in the data. With the correct 143 inputs and using LR_open as only variable in the dataset, the problem nonethless prevails.

            Using
            Code:
            preserve
            keep if LR_open != .
            duplicates drop importer, force
            sum LR_open
            dataex LR_open, count(143)
            restore
            I obtain

            Code:
            clear
            input float LR_open
             -5.018064
              1.731577
              2.875983
             .22685067
             -3.758065
             2.1413996
              1.873726
              7.858955
             1.9677265
              5.697448
             .47337115
            -4.5013423
              4.043397
             -.9938433
              3.831953
             -2.476054
              4.903228
            -.56771266
              .4761753
              .9893744
            -1.1800859
               .615934
             1.8173267
             -.5917706
             -7.250248
              .5238573
             -1.332375
             4.0381665
             -.7855218
             -.8017771
             -3.723938
            -1.6362536
             -1.669914
            -4.1188197
               2.91452
             -5.511377
             -3.362836
             -2.754868
            -1.5873542
              2.713035
              4.084998
              .6288217
             3.7536786
              .8504635
             -.3404498
             -.5338348
             .27037364
              .7942923
              -.778944
            -4.5899754
            -2.0637581
             1.5226095
            -1.1585642
            -2.0336874
             -3.156846
               .078358
            -1.7394385
              3.805475
            -3.9659274
             -3.981655
              5.577051
            -2.1175406
              2.442175
             .24953373
              .8357137
             -.1947574
              7.750432
             10.148218
             2.0734582
            -2.3150759
              2.712012
            -1.6332544
              6.071053
              .4947964
              2.201214
             2.2334108
              4.972853
             -5.309225
             1.2603792
             -.6072004
             .19985756
            -.54897416
            -4.2286453
              .9150898
             4.4935775
            -1.4719853
             1.8433787
              6.997094
              4.802326
             13.706676
            -1.2157594
             -.0487306
             .10869528
             -2.456617
            -2.2348704
              .7763441
              3.035811
             3.5110536
             -8.345554
             4.2044997
             -.9926308
              -.561468
             3.1150095
             -.9493916
              .9194046
             1.1215992
             -2.752052
             .54867053
              5.062253
              .6798565
              2.502055
              1.231164
            -.14546892
             -.8640493
             .06634203
              6.952297
             1.7854975
             -3.500589
             1.1572671
              2.823348
             -.7288429
               -3.0176
            -1.7901953
             -4.920794
             -.3342973
             3.4437716
               4.07915
              .9103919
               .904939
              5.018089
              3.866268
               .975197
             3.8249345
              3.686186
             -4.422942
              6.447074
             1.5751936
             -2.973553
              4.825603
             -6.406952
              3.624205
             2.1513088
              2.632967
            end

            Comment


            • #7
              histogram LR_open, frac width(14)
              Your problem lies with specifying a width of 14 for the first set of histograms. By doing this, Stata controls the axis range, i.e., you cannot override its minimum with xscale(range()).

              ADDED IN EDIT: See the following for a similar problem.

              https://www.statalist.org/forums/for...in-on-the-left


              Last edited by Andrew Musau; 20 Oct 2020, 13:08.

              Comment


              • #8
                Originally posted by Andrew Musau View Post

                Your problem lies with specifying a width of 14 for the first set of histograms. By doing this, Stata controls the axis range, i.e., you cannot override its minimum with xscale(range()).
                I have tried to incorporate your suggestion by replacing the width with the bin() option. I need to ensure however that all histograms make a split precisely at 0. My solution is

                Code:
                preserve
                keep if LR_open != .
                duplicates drop importer, force
                sum LR_open
                local a = - r(max)
                twoway (histogram LR_open, frac bin(2) start(`a') yaxis(1) yscale(axis(1) range(1.3)) xscale(range(-14 15)) xlabel(-10 -5 -2 0 2 5 10 15) fin(inten50) fcol(gs10) lcol(gs10) lwidth(vthin) xlabel(-10 -5 0 5 10 15) xtick(-10(1)15)) ///
                (histogram LR_open if LR_open < -5 & LR_open > -10,  freq bin(10) start(-10) xline(0) yaxis(2))  ///
                (histogram LR_open if LR_open >= -5 & LR_open < -2,  freq bin(6) start(-5) yaxis(2))  ///
                (histogram LR_open if LR_open >= -2 & LR_open < -1,  freq bin(2) start(-2) yaxis(2))  ///
                (histogram LR_open if LR_open >= -1 & LR_open < -0.5,  freq bin(1) start(-1) yaxis(2))  ///
                (histogram LR_open if LR_open >= -0.5 & LR_open < 0,  freq bin(1) start(-.5) yaxis(2))  ///
                (histogram LR_open if LR_open >= 0 & LR_open < 0.5,  freq bin(1) start(0) yaxis(2))  ///
                (histogram LR_open if LR_open >= 0.5 & LR_open < 1,  freq bin(1) start(.5) yaxis(2))  ///
                (histogram LR_open if LR_open >= 1 & LR_open < 2,  freq bin(2) start(1) yaxis(2))  ///
                (histogram LR_open if LR_open >= 2 & LR_open < 5, freq bin(6) start(2) yaxis(2))  ///
                (histogram LR_open if LR_open >= 5 & LR_open < 14, freq bin(18) start(5) yaxis(2) xscale(range(-14 15)) xlabel(-10 -5 0 5 10 15) xtick(-10(1)15))
                restore
                The result is as before.

                Last edited by Daniel Prosi; 20 Oct 2020, 13:23.

                Comment


                • #9
                  Based on the link in #8, the problem should not occur with twoway.

                  The problem lies in the options left out. This works great.

                  Code:
                  tw (histogram LR_open if LR_open < -5, fcol(dknavy) lcol(dknavy) fin(inten90) freq width(0.5) start(-10) xline(0) yaxis(2))  ///
                  (histogram LR_open if LR_open >= -5 & LR_open < -2, fcol(navy) lcol(navy) fin(inten90) freq width(0.5) start(-10) yaxis(2))  ///
                  (histogram LR_open if LR_open >= -2 & LR_open < -1, fcol(blue) lcol(blue) fin(inten80) freq width(0.5) start(-10) yaxis(2))  ///
                  (histogram LR_open if LR_open >= -1 & LR_open < -0.5, fcol(midblue) lcol(midblue) freq width(0.5) start(-10) yaxis(2))  ///
                  (histogram LR_open if LR_open >= -0.5 & LR_open < 0, fcol(eltblue) lcol(eltblue) freq width(0.5) start(-10) yaxis(2))  ///
                  (histogram LR_open if LR_open >= 0 & LR_open < 0.5, fcol(sandb) lcol(sandb) freq width(0.5) start(-10) yaxis(2))  ///
                  (histogram LR_open if LR_open >= 0.5 & LR_open < 1, fcol(orange) lcol(orange) freq width(0.5) start(-10) yaxis(2))  ///
                  (histogram LR_open if LR_open >= 1 & LR_open < 2, fcol(dkorange) lcol(dkorange) fin(inten90) freq width(0.5) start(-10) yaxis(2))  ///
                  (histogram LR_open if LR_open >= 2 & LR_open < 5, fcol(red) lcol(red) fin(inten80) freq width(0.5) start(-10) yaxis(2))  ///
                  (histogram LR_open, frac width(14) yaxis(1) yscale(axis(1) range(1.3))) ///
                  (histogram LR_open if LR_open >= 5, fcol(maroon) lcol(maroon) fin(inten90) freq width(0.5) start(-10) yaxis(2) xscale(range(-10 15)) xlabel(-10 -5 0 5 10 15) xtick(-10(1)15) legend(off))
                  I am away for a while, if you can't figure it out, I will look at it afterwards.
                  Last edited by Andrew Musau; 20 Oct 2020, 13:20.

                  Comment


                  • #10
                    I am going to trust that Andrew Musau is on top of the histogram question. But what you are doing here? Here is one of several other ways to look at your data, a normal quantile plot.


                    Code:
                    set scheme s1color 
                    
                    qnorm LR_open, ms(Oh)
                    Click image for larger version

Name:	nqplot.png
Views:	1
Size:	31.2 KB
ID:	1578130

                    Comment


                    • #11
                      Originally posted by Nick Cox View Post
                      I am going to trust that Andrew Musau is on top of the histogram question. But what you are doing here? Here is one of several other ways to look at your data, a normal quantile plot.


                      Code:
                      set scheme s1color
                      
                      qnorm LR_open, ms(Oh)
                      The data are unit-specific estimates in a random-coefficients setting. I am mostly interested in showing their empirical distribution and particularly that a large fraction of units has negative coefficients. This is why splitting the bars at 0 is important. The colours are equivalent to the fill colour of a map (so it also serves as a sort of legend for the map).

                      The problem with the above solution by Andrew Musau is that it no longer allows me to control the split of the large histogram in the background to be precisely at 0. I am finding that I can achieve that
                      i) by setting start() to the (negative) maximum of my variable and use bins(2), or
                      ii) by setting start() and width() to be equally large and at least as large as my variable's maximum.

                      As both approaches involve the use of start() and start() is also the source of the large free patch to the left of my graph I fear that there is no solution to my problem.
                      Last edited by Daniel Prosi; 20 Oct 2020, 14:39.

                      Comment


                      • #12
                        I fear that there is no solution to my problem.
                        Don't despair just yet! Better to use Stata's logic here. Just redefine your variable so that the minimum point is -14.

                        Code:
                        *DATA FROM #6
                        qui sum LR_open
                        gen LR_open0=LR_open+abs(`r(min)') -14
                        
                        twoway (histogram LR_open0, frac width(14) yaxis(1) yscale(axis(1) range(1.3)) ///
                        xscale(range(-14 15)) xlabel(-10 -5 -2 0 2 5 10 15) fin(inten50) fcol(gs10) lcol(gs10) ///
                        lwidth(vthin) xlabel(-10 -5 0 5 10 15) xtick(-10(1)15)) (histogram LR_open if ///
                        LR_open < -5, fcol(dknavy) lcol(dknavy) fin(inten90) freq width(0.5) start(-10) xline(0)  ///
                        yaxis(2)) (histogram LR_open if LR_open >= -5 & LR_open < -2, fcol(navy) lcol(navy) fin(inten90) ///
                        freq width(0.5) start(-10) yaxis(2))  (histogram LR_open if LR_open >= -2 & LR_open < -1, ///
                        fcol(blue) lcol(blue) fin(inten80) freq width(0.5) start(-10) yaxis(2))  (histogram LR_open if LR_open >= -1 & ///
                         LR_open < -0.5, fcol(midblue) lcol(midblue) freq width(0.5) start(-10) yaxis(2))  (histogram LR_open if ///
                        LR_open >= -0.5 & LR_open < 0, fcol(eltblue) lcol(eltblue) freq width(0.5) start(-10) yaxis(2))  ///
                        (histogram LR_open if LR_open >= 0 & LR_open < 0.5, fcol(sandb) lcol(sandb) freq width(0.5) ///
                        start(-10) yaxis(2))  (histogram LR_open if LR_open >= 0.5 & LR_open < 1, fcol(orange) ///
                        lcol(orange) freq width(0.5) start(-10) yaxis(2))  (histogram LR_open if LR_open >= 1 & ///
                        LR_open < 2, fcol(dkorange) lcol(dkorange) fin(inten90) freq width(0.5) start(-10) yaxis(2))  ///
                        (histogram LR_open if LR_open >= 2 & LR_open < 5, fcol(red) lcol(red) fin(inten80) freq ///
                        width(0.5) start(-10) yaxis(2))  (histogram LR_open if LR_open >= 5, fcol(maroon) ///
                        lcol(maroon) fin(inten90) freq width(0.5) start(-10) yaxis(2) xscale(range(-14 15)) ///
                        xlabel(-14 -5 0 5 10 14) xtick(-14(1)14) legend(off))
                        Res.:

                        Click image for larger version

Name:	Graph.png
Views:	1
Size:	16.6 KB
ID:	1578152

                        Comment


                        • #13
                          I am mostly interested in showing their empirical distribution and particularly that a large fraction of units has negative coefficients. This is why splitting the bars at 0 is important.
                          yline(0) in #10 emphasises that too.

                          Comment


                          • #14
                            Originally posted by Andrew Musau View Post

                            Don't despair just yet! Better to use Stata's logic here. Just redefine your variable so that the minimum point is -14.

                            Code:
                            *DATA FROM #6
                            qui sum LR_open
                            gen LR_open0=LR_open+abs(`r(min)') -14
                            
                            twoway (histogram LR_open0, frac width(14) yaxis(1) yscale(axis(1) range(1.3)) ///
                            xscale(range(-14 15)) xlabel(-10 -5 -2 0 2 5 10 15) fin(inten50) fcol(gs10) lcol(gs10) ///
                            lwidth(vthin) xlabel(-10 -5 0 5 10 15) xtick(-10(1)15)) (histogram LR_open if ///
                            LR_open < -5, fcol(dknavy) lcol(dknavy) fin(inten90) freq width(0.5) start(-10) xline(0) ///
                            yaxis(2)) (histogram LR_open if LR_open >= -5 & LR_open < -2, fcol(navy) lcol(navy) fin(inten90) ///
                            freq width(0.5) start(-10) yaxis(2)) (histogram LR_open if LR_open >= -2 & LR_open < -1, ///
                            fcol(blue) lcol(blue) fin(inten80) freq width(0.5) start(-10) yaxis(2)) (histogram LR_open if LR_open >= -1 & ///
                            LR_open < -0.5, fcol(midblue) lcol(midblue) freq width(0.5) start(-10) yaxis(2)) (histogram LR_open if ///
                            LR_open >= -0.5 & LR_open < 0, fcol(eltblue) lcol(eltblue) freq width(0.5) start(-10) yaxis(2)) ///
                            (histogram LR_open if LR_open >= 0 & LR_open < 0.5, fcol(sandb) lcol(sandb) freq width(0.5) ///
                            start(-10) yaxis(2)) (histogram LR_open if LR_open >= 0.5 & LR_open < 1, fcol(orange) ///
                            lcol(orange) freq width(0.5) start(-10) yaxis(2)) (histogram LR_open if LR_open >= 1 & ///
                            LR_open < 2, fcol(dkorange) lcol(dkorange) fin(inten90) freq width(0.5) start(-10) yaxis(2)) ///
                            (histogram LR_open if LR_open >= 2 & LR_open < 5, fcol(red) lcol(red) fin(inten80) freq ///
                            width(0.5) start(-10) yaxis(2)) (histogram LR_open if LR_open >= 5, fcol(maroon) ///
                            lcol(maroon) fin(inten90) freq width(0.5) start(-10) yaxis(2) xscale(range(-14 15)) ///
                            xlabel(-14 -5 0 5 10 14) xtick(-14(1)14) legend(off))
                            Res.:

                            [ATTACH=CONFIG]n1578152[/ATTACH]
                            Sure, that alligns the box perfectly fine. But the rescaling displays different (and misleading) information about the data. Note that the large box plot in the background serves to show the fraction of LR_open that lie below 0. In your solution it does that no longer.

                            Your comment, Nick Cox , is very true and I will consider using this different approach.

                            Comment


                            • #15
                              You should be able to figure that out.

                              Code:
                              qui sum LR_open
                              gen LR_open0=cond(LR_open<0, -14, 14)
                              Click image for larger version

Name:	Graph.png
Views:	1
Size:	16.0 KB
ID:	1578196

                              Comment

                              Working...
                              X