Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simple question - histogram

    First, this forum has been invaluable to me.

    I have a large dataset and as part of simple descriptive analysis am creating a histogram of the # of observations at each age (var: ageatproced).

    Code:
    histogram ageatproced if ageatproced<=100, percent
    The histogram displays as such:
    Click image for larger version

Name:	hist.png
Views:	1
Size:	30.0 KB
ID:	1432886


    Why are there these spikes? It's probably due to binning, but why is Stata binning some so that they are much larger? (This happens even when I set the number of bins very large, e.g. 100). As you can see from my data, there are no ages at which the frequency is that high:

    Code:
    . tab ageatproced
    
    ageatproced |      Freq.     Percent        Cum.
    ------------+-----------------------------------
             18 |     10,811        0.25        0.25
             19 |     19,362        0.46        0.71
             20 |     20,496        0.48        1.19
             21 |     20,548        0.48        1.67
             22 |     20,329        0.48        2.15
             23 |     20,535        0.48        2.64
             24 |     20,910        0.49        3.13
             25 |     21,528        0.51        3.63
             26 |     22,077        0.52        4.15
             27 |     22,985        0.54        4.69
             28 |     24,031        0.56        5.26
             29 |     25,551        0.60        5.86
             30 |     26,851        0.63        6.49
             31 |     28,603        0.67        7.16
             32 |     30,118        0.71        7.87
             33 |     31,817        0.75        8.62
             34 |     33,430        0.79        9.40
             35 |     35,555        0.84       10.24
             36 |     37,344        0.88       11.12
             37 |     39,089        0.92       12.04
             38 |     41,101        0.97       13.00
             39 |     43,274        1.02       14.02
             40 |     45,546        1.07       15.09
             41 |     47,657        1.12       16.21
             42 |     49,303        1.16       17.37
             43 |     51,668        1.21       18.59
             44 |     53,930        1.27       19.85
             45 |     56,864        1.34       21.19
             46 |     59,404        1.40       22.59
             47 |     62,218        1.46       24.05
             48 |     64,331        1.51       25.56
             49 |     67,043        1.58       27.14
             50 |     69,131        1.63       28.76
             51 |     71,273        1.68       30.44
             52 |     72,563        1.71       32.15
             53 |     73,074        1.72       33.86
             54 |     73,843        1.74       35.60
             55 |     75,701        1.78       37.38
             56 |     76,458        1.80       39.18
             57 |     77,226        1.82       40.99
             58 |     78,451        1.84       42.84
             59 |     81,300        1.91       44.75
             60 |     83,702        1.97       46.72
             61 |     86,255        2.03       48.75
             62 |     88,390        2.08       50.82
             63 |     90,342        2.12       52.95
             64 |     93,344        2.19       55.14
             65 |     95,670        2.25       57.39
             66 |     98,644        2.32       59.71
             67 |     99,341        2.34       62.05
             68 |     99,159        2.33       64.38
             69 |     97,931        2.30       66.68
             70 |     96,552        2.27       68.95
             71 |     96,419        2.27       71.22
             72 |     96,706        2.27       73.49
             73 |     96,340        2.27       75.76
             74 |     95,597        2.25       78.00
             75 |     93,872        2.21       80.21
             76 |     92,108        2.17       82.38
             77 |     89,961        2.12       84.49
             78 |     86,421        2.03       86.52
             79 |     83,040        1.95       88.48
             80 |     77,225        1.82       90.29
             81 |     68,321        1.61       91.90
             82 |     62,408        1.47       93.36
             83 |     55,859        1.31       94.68
             84 |     48,954        1.15       95.83
             85 |     41,276        0.97       96.80
             86 |     34,293        0.81       97.61
             87 |     27,507        0.65       98.25
             88 |     21,400        0.50       98.76
             89 |     16,452        0.39       99.14
             90 |     11,870        0.28       99.42
             91 |      7,805        0.18       99.60
             92 |      5,703        0.13       99.74
             93 |      3,915        0.09       99.83
             94 |      2,686        0.06       99.89
             95 |      1,717        0.04       99.93
             96 |      1,180        0.03       99.96
             97 |        624        0.01       99.98
             98 |        425        0.01       99.99
             99 |        266        0.01       99.99
            100 |        130        0.00      100.00
            101 |         73        0.00      100.00
            102 |         33        0.00      100.00
            103 |         21        0.00      100.00
            104 |          9        0.00      100.00
            105 |          5        0.00      100.00
            106 |          3        0.00      100.00
            108 |          3        0.00      100.00
            109 |          2        0.00      100.00
            112 |          1        0.00      100.00
            113 |          1        0.00      100.00
            122 |          1        0.00      100.00
            130 |          1        0.00      100.00
            136 |          2        0.00      100.00
            137 |          1        0.00      100.00
            145 |          1        0.00      100.00
            152 |          2        0.00      100.00
            153 |          1        0.00      100.00
            159 |          1        0.00      100.00
            162 |          1        0.00      100.00
            163 |          2        0.00      100.00
            165 |          2        0.00      100.00
    ------------+-----------------------------------
          Total |  4,253,305      100.00
    Probably a simple fix, but I haven't been able to find anything similar on the forum.

    Thanks for your help.

  • #2
    Well, you can control the binning yourself with -histogram-'s -start()-, -bin()-, and -width()- options. Perhaps better still with this data is to use the -discrete- option as well. (With -discrete-, you cannot specify -bin()-, but you still can control -start()- and -width()-). See -help histogram-.

    That said, I think there is something seriously wrong with your data. How can you possibly have people who are more than 130 years old?

    Comment


    • #3
      first, please read the FAQ; if you had provided data via -dataex-, somone could have supplied exact code

      as is, please look at the help file:
      Code:
      help histogram
      and especially at the "width" option (you probably want "width(1)"; note that since your values cover a range greater than 100, you probably want more than 100 "bins"; you might want to investigate other options also (e.g., "discrete")

      Comment


      • #4
        Just adding the discrete option seems to do the trick.

        Code:
        clear *
        input ageatproced Freq     Percent        Cum         
                 18      10811        0.25        0.25
                 19      19362        0.46        0.71
                 20      20496        0.48        1.19
                 21      20548        0.48        1.67
                 22      20329        0.48        2.15
                 23      20535        0.48        2.64
                 24      20910        0.49        3.13
                 25      21528        0.51        3.63
                 26      22077        0.52        4.15
                 27      22985        0.54        4.69
                 28      24031        0.56        5.26
                 29      25551        0.60        5.86
                 30      26851        0.63        6.49
                 31      28603        0.67        7.16
                 32      30118        0.71        7.87
                 33      31817        0.75        8.62
                 34      33430        0.79        9.40
                 35      35555        0.84       10.24
                 36      37344        0.88       11.12
                 37      39089        0.92       12.04
                 38      41101        0.97       13.00
                 39      43274        1.02       14.02
                 40      45546        1.07       15.09
                 41      47657        1.12       16.21
                 42      49303        1.16       17.37
                 43      51668        1.21       18.59
                 44      53930        1.27       19.85
                 45      56864        1.34       21.19
                 46      59404        1.40       22.59
                 47      62218        1.46       24.05
                 48      64331        1.51       25.56
                 49      67043        1.58       27.14
                 50      69131        1.63       28.76
                 51      71273        1.68       30.44
                 52      72563        1.71       32.15
                 53      73074        1.72       33.86
                 54      73843        1.74       35.60
                 55      75701        1.78       37.38
                 56      76458        1.80       39.18
                 57      77226        1.82       40.99
                 58      78451        1.84       42.84
                 59      81300        1.91       44.75
                 60      83702        1.97       46.72
                 61      86255        2.03       48.75
                 62      88390        2.08       50.82
                 63      90342        2.12       52.95
                 64      93344        2.19       55.14
                 65      95670        2.25       57.39
                 66      98644        2.32       59.71
                 67      99341        2.34       62.05
                 68      99159        2.33       64.38
                 69      97931        2.30       66.68
                 70      96552        2.27       68.95
                 71      96419        2.27       71.22
                 72      96706        2.27       73.49
                 73      96340        2.27       75.76
                 74      95597        2.25       78.00
                 75      93872        2.21       80.21
                 76      92108        2.17       82.38
                 77      89961        2.12       84.49
                 78      86421        2.03       86.52
                 79      83040        1.95       88.48
                 80      77225        1.82       90.29
                 81      68321        1.61       91.90
                 82      62408        1.47       93.36
                 83      55859        1.31       94.68
                 84      48954        1.15       95.83
                 85      41276        0.97       96.80
                 86      34293        0.81       97.61
                 87      27507        0.65       98.25
                 88      21400        0.50       98.76
                 89      16452        0.39       99.14
                 90      11870        0.28       99.42
                 91       7805        0.18       99.60
                 92       5703        0.13       99.74
                 93       3915        0.09       99.83
                 94       2686        0.06       99.89
                 95       1717        0.04       99.93
                 96       1180        0.03       99.96
                 97         624        0.01       99.98
                 98         425        0.01       99.99
                 99         266        0.01       99.99
                100         130        0.00      100.00
                101          73        0.00      100.00
                102          33        0.00      100.00
                103          21        0.00      100.00
                104           9        0.00      100.00
                105           5        0.00      100.00
                106           3        0.00      100.00
                108           3        0.00      100.00
                109           2        0.00      100.00
                112           1        0.00      100.00
                113           1        0.00      100.00
                122           1        0.00      100.00
                130           1        0.00      100.00
                136           2        0.00      100.00
                137           1        0.00      100.00
                145           1        0.00      100.00
                152           2        0.00      100.00
                153           1        0.00      100.00
                159           1        0.00      100.00
                162           1        0.00      100.00
                163           2        0.00      100.00
                165           2        0.00      100.00
        end
        
        histogram ageatproced if ageatproced<=100 [fweight=Freq], percent
        graph rename plot1    
        histogram ageatproced if ageatproced<=100 [fweight=Freq], percent discrete    
        graph rename plot2
        Click image for larger version

Name:	plot2.png
Views:	1
Size:	29.8 KB
ID:	1432894
        --
        Bruce Weaver
        Email: [email protected]
        Version: Stata/MP 18.5 (Windows)

        Comment


        • #5
          First, -dataex- does not reproduce the problem, even up to count(1000).

          You are correct in pointing out that my data, like all large, real-life data sets, contains some data entry errors. Fortunately, only 13/4,253,305 patients seem to be affected by this longevity problem.

          Thank you Bruce for a genuinely helpful comment to this admittedly simple problem. My error was in assuming that age is obviously a continuous variable, when in fact I do want it treated discretely in this case.

          Comment

          Working...
          X