Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Editing combined histogram plots

    Hello everybody,
    I am not used to editing Stata graphs and I am suffering to edit them, so I am here for help. I have plotted a variable (share of individuals working beyond statutory retirement age) by country, and I get the graph below. My questions are the following:
    1/ Is there a way to add the x-axis for all plots? Being so many countries at once, I find it hard to read what are the peak values for each of them with x-axis being far down.
    2/ How to edit the format of x-axis, displaying integer numbers instead of numbers with one decimal?
    3/ Can I redistribute the number of columns and rows to avoid having this break with the last row having only 2 graphs? For instance, I wanted to plot 4 rows and 7 columns. I tried the following, but it did not work:

    Code:
    hist age if work_stat_empl==1 & wave!=1 & age>ret_age2, by(country) color(navy) graphregion(fcolor(white)) xtitle("Age of respondent") column(7)
    Thanks for your help and if you have any other suggestions, they are very much welcome!

    Click image for larger version

Name:	Picture1.png
Views:	1
Size:	38.7 KB
ID:	1718912





    Last edited by Giovanna Ortolani; 30 Jun 2023, 04:47.

  • #2
    You need

    Code:
    by(country, column(7))  xla(, format(%2.0f))
    which takes care of 2) and 3).

    Your densities are dominated by the need to show some high densities to accommodate countries with just a few distinct values.

    A grid of lines xli(60(10)90) may ease the comparison problem. You can tune line pattern, width and colour.

    Comment


    • #3
      Nick Cox Thank you for your help. However, Stata does not recognize the option column inside by() and returns an error. The full line code is the following:

      Code:
      hist age if work_stat_empl==1 & wave!=1 & age>ret_age2, by(country) xtitle("Age of respondent") xla(, format(%2.0f) column(7)) xli(60(10)90, lc(bluishgray)) yscale(range(0 0.7)) color(navy) graphregion(fcolor(white))
      The error message says:
      option column() not allowed
      r(198);

      Comment


      • #4
        You put column(7) in the wrong place -- inside xla(). That indeed will not work. As said, you need

        Code:
         
         by(country, column(7))

        Comment


        • #5
          That said, I fear that your histograms are not going to work very well.

          Here is a loosely similar problem which (modulo being behind a fearsome firewall) is accessible to any Stata user. Using frames can be avoided. The colours shown will need changing trivially by those on Stata < 18.

          Here there are 19 groups with unequal sample sizes, and (I suggest) a need to show sample size as well as the main features of each distribution. In #1 it seems that odd-looking distributions are just side-effects of small sample size and not in themselves interesting or helpful.

          So, I fall back on box plots but with means too, and put sample size information in value labels.

          For your problem, there may be scope for a better ordering of countries. For some discussion, see https://journals.sagepub.com/doi/pdf...6867X211045582

          Code:
          webuse nlswork, clear
          tab grade
          
          forval j = 0/18 {
          count if age < . & grade == `j'
          label def toshow `j' "`j' ({it:n} = `r(N)')", add
          }
          label li toshow
          
          label val grade toshow
          
          capture frame drop sandbox
          
          frame put age grade, into(sandbox)
          
          frame sandbox {
              collapse (min) min=age (p25) p25=age (p50) p50=age (p75) p75=age (max) max=age (mean) mean=age, by(grade)
              
              twoway rbar p25 p50 grade, barw(0.8) lc(stc1) fc(none) horizontal || rbar p50 p75 grade, barw(0.8) lc(stc1) fc(none) horizontal || scatter grade min, ms(+) mc(stc1) || scatter grade max, ms(+)mc(stc1) || scatter grade mean, ms(Dh) mc(stc2) yla(0/18, noticks valuelabel) legend(off) xla(14 20 25 30 35 40 46) xtitle(Age (years))
              
          }
          Click image for larger version

Name:	ortolani.png
Views:	1
Size:	64.9 KB
ID:	1719186

          Last edited by Nick Cox; 03 Jul 2023, 04:14.

          Comment


          • #6
            Here is a cosmetic variant on the previous. No promises from me on how fragile this fudge is to font choice or operating system or phase of the moon.

            Code:
            webuse nlswork, clear
            tab grade
            
            forval j = 0/18 {
            count if age < . & grade == `j'
            
            local length = strlen("`r(N)'")
            local nspaces = 2 * (6 - `length')
            local spaces = `nspaces' * " " 
            label def toshow `j' "`j' `spaces'({it:n} = `r(N)')", add
            }
            label li toshow
            
            label val grade toshow
            
            capture frame drop sandbox 
            
            frame put age grade, into(sandbox)
            
            frame sandbox { 
                collapse (min) min=age (p25) p25=age (p50) p50=age (p75) p75=age (max) max=age (mean) mean=age, by(grade)
                
                twoway rbar p25 p50 grade, barw(0.8) lc(stc1) fc(none) horizontal || rbar p50 p75 grade, barw(0.8) lc(stc1) fc(none) horizontal || scatter grade min, ms(+) mc(stc1) || scatter grade max, ms(+)mc(stc1) || scatter grade mean, ms(Dh) mc(stc2) yla(0/18, noticks valuelabel) legend(off) xla(14 20 25 30 35 40 46) xtitle(Age (years))
                
            }
            Click image for larger version

Name:	ortolani2.png
Views:	1
Size:	64.7 KB
ID:	1719192

            Comment


            • #7
              Sorry for the mistake, I pasted the incorrect code here, but I did insert the column() option together with by(), as in
              Code:
              hist age if work_stat_empl==1 & wave!=1 & age>ret_age2, by(country, column(7)) xtitle("Age of respondent") xla(, format(%2.0f)) xli(60(10)90, lc(bluishgray)) yscale(range(0 0.7)) color(navy)
              but it did not work... Again it says that this option is not allowed.

              Your proposed graph looks indeed as something that would work best for what I need, but I am struggling to adjust it to my dataset, as my country variable which I would like to display in the y-axis is a str2. I tried to quickly create a hbox to see how your graph could look like for me, but it seems that I have many outliers in it:

              Code:
              graph hbox age if work_stat_empl==1 & wave!=1 & age>ret_age2, over(country2)
              Click image for larger version

Name:	Screenshot 2023-07-03 at 16.56.43.png
Views:	1
Size:	1.16 MB
ID:	1719236


              Being this a longitudinal dataset, I am wondering if I should specify something else in the graph code or not, or if this is truly the number of outliers I have in the dataset.

              Thanks once again.
              Last edited by Giovanna Ortolani; 03 Jul 2023, 09:55.

              Comment


              • #8
                hist works for me with those options. Try

                Code:
                . sysuse auto, clear 
                
                . hist mpg, by(rep78)
                
                . hist mpg, by(rep78, col(5))
                
                . hist mpg, by(rep78, col(1))
                The only guess I have is that you are using an outdated version of Stata in which this option is not supported. See https://www.statalist.org/forums/help#version for our longstanding request to tell us about the version you use if it is not the present version.

                Otherwise your question is already answered in the paper linked in #5. You can push a string variable through myaxis from the Stata Journal asking for a particular sort order.

                Here is a silly example. Clearly you already have the data. The number of countries you have is more like 26 than 5. You need extra code for your condition

                Code:
                if work_stat_empl==1 & wave!=1 & age>ret_age2
                Code:
                clear 
                set obs 82 
                set seed 2803 
                gen country = word("AT BE BG CX CY", ceil(_n/20))
                gen age = 60 + exp(rnormal(0, 1.5))
                tab country, su(age)
                
                * you start here: use country2 if appropriate 
                myaxis wanted=country, sort(mean age)
                
                graph hbox age, over(wanted)
                
                forval j = 1/5 {
                count if age < . & wanted == `j'
                
                local length = strlen("`r(N)'")
                local nspaces = 2 * (6 - `length')
                local spaces = `nspaces' * " " 
                label def toshow `j' "`: label (wanted) `j'' `spaces'({it:n} = `r(N)')", add
                }
                label li toshow
                
                label val wanted toshow
                
                capture frame drop sandbox 
                
                frame put age wanted, into(sandbox)
                
                frame sandbox { 
                    collapse (min) min=age (p25) p25=age (p50) p50=age (p75) p75=age (max) max=age (mean) mean=age, by(wanted)
                    
                    twoway rbar p25 p50 wanted, barw(0.8) lc(stc1) fc(none) horizontal || rbar p50 p75 wanted, barw(0.8) lc(stc1) fc(none) horizontal || scatter wanted min, ms(+) mc(stc1) || scatter wanted max, ms(+)mc(stc1) || scatter wanted mean, ms(Dh) mc(stc2) yla(1/5, noticks valuelabel) legend(off)  xtitle(Age (years))
                    
                }
                I wouldn't say that you have many outliers. The rule for showing data points individually is whenever a value is more than 1.5 IQR from the nearer quartile, which implies showing many points with any variable with appreciable skewness.

                If your Stata doesn't support frames either, there will be other solutions.

                Comment

                Working...
                X