Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deleting number of observations below 3 in histogram?

    Hi,

    How can I delete the bars in the histogram if there is less or equal to 3 observations?
    The histogram should show the age of mothers and fathers when giving birth, but I cannot plot the age if there is less than 3 observations (as this is individual specific data). I have a panel dataset.

    My code is as follows:

    twoway (hist age if id!=L.id & gender==1, discrete) ///
    (hist age if id!=L.id & gender==2, discrete) ///
    xtitle("Age") title("Panel A. Age of mothers and fathers at childbirth") ///
    legend(rows(1) order(1 "Male" 2 "Femlae")) ///
    ylabel( ,grid) ///
    noout ///


    This gives me the bug: "option noout not allowed".

    Many thanks!
    Best,
    Natasha

  • #2
    You could create a variable that identifies age groups with sufficient observations and condition on it in your code,
    Code:
    bys gender age: gen sufficient = _N > 3
    twoway ///
      (hist age if id!=L.id & gender==1 & sufficient, discrete) ///
      (hist age if id!=L.id & gender==2 & sufficient, discrete)
    To get rid of the error remove "noout", also do not end the code with "///."

    Comment


    • #3
      Many thanks. This solved my problem.

      Comment


      • #4
        Note that the histogram commands will calculate everything relative to the data included, paying no attention to what has been excluded. So fractions, percents and densities will be overstated. how much depending on how much has been excluded.

        Comment


        • #5
          Oh shoot, my bad!

          A possible solution might be to generate histogram bin variables using all data but plot only bins with sufficient observations.

          First, generate histogram bin variables using all data,
          Code:
          twoway__histogram_gen age if gender==1 ,discrete gen(h1 x1)
          twoway__histogram_gen age if gender==0 ,discrete gen(h2 x2)
          Then, identify age groups with insufficient observations,
          Code:
          bys gender age: gen sufficient = _N>3
          tab age if !sufficient & gender==1
          tab age if  sufficient & gender==0
          Finally, plot histogram bin variables generated using all data, but exclude insufficient age groups (here assumed to be 20 for gender==1 and 22 for gender==0),
          Code:
          twoway ///
          (bar h1 x1 if !inlist(x1,20)) ///
          (bar h2 x2 if !inlist(x2,22))
          Last edited by Øyvind Snilsberg; 01 Nov 2021, 13:19.

          Comment

          Working...
          X