Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • building groups

    Dear All, According to
    Code:
    . webuse grunfeld, clear
    
    . sum invest
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
          invest |        200    145.9583    216.8753        .93     1486.7
    The values of `invest' are from 0.93 to 1486.7. Suppose that I 'd like to build groups (a variable group containing 1,2,...) where the range for each group is 100. As such, the first group is from 0-100, the second group is from 100-200, and so on. Any suggestions?
    Thanks.
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    Code:
    gen group= ceil(invest/100)
    Have a look at the floor and ceiling functions.

    To get a consecutive group order

    Code:
    sort group
    egen cgroup= group(group)
    Last edited by Andrew Musau; 20 Mar 2019, 06:08.

    Comment


    • #3
      Binning is one term for this. Here is a trick I particularly like to get what you want -- except that you really need a rule for whether bin limits join the lower or the upper bin: e.g. what happens if values are exactly 100, 200, ...?

      Code:
      . webuse grunfeld, clear
      
      . gen invest2 = 100 * floor(invest/100)
      
      . tab invest2
      
          invest2 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |        145       72.50       72.50
              100 |         15        7.50       80.00
              200 |          6        3.00       83.00
              300 |          6        3.00       86.00
              400 |         13        6.50       92.50
              500 |          7        3.50       96.00
              600 |          4        2.00       98.00
              700 |          1        0.50       98.50
              800 |          1        0.50       99.00
             1300 |          1        0.50       99.50
             1400 |          1        0.50      100.00
      ------------+-----------------------------------
            Total |        200      100.00
      I'd particularly want to recommend not mapping this to small integers and so needing to use value labels. The command line defining the variable and its possible values 0, 100, 200, ... as inclusive lower limits make it close to self-explanatory. If not, then use a variable label to explain.

      Binning is so basic that sometimes it seems that almost no-one discusses it thoroughly, although careful treatments of histograms are an exception. It needs a geographer to write about what everyone knows, or should know, and there was a survey of some fundamentals and simple tricks in

      SJ-18-3 dm0095 . . . . . . . . . . . Speaking Stata: From rounding to binning
      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
      Q3/18 SJ 18(3):741--754 (no commands)
      basic review of how to bin variables in Stata, meaning how to
      divide their range or support into disjoint intervals

      This was written after

      SJ-18-1 gr0072 . . . . . . . Speaking Stata: Logarithmic binning and labeling
      (help niceloglabels) . . . . . . . . . . . . . . . . . . . N. J. Cox
      Q1/18 SJ 18(1):262--286
      introduces the niceloglabels command for helping (even automating)
      label choice

      because I realised that I had discussed a special case but not the general case, although Section 4 on quantile binning in

      SJ-12-4 pr0054 . . . . . . . . . . Speaking Stata: Matrices as look-up tables
      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
      Q4/12 SJ 12(4):748--758 (no commands)
      illustrates the use of matrices as look-up tables

      https://www.stata-journal.com/sjpdf....iclenum=pr0054

      remains pertinent to discussions of that kind of binning, and is hidden by the column title.

      EDIT Further references on floor() and its sibling ceil()

      SJ-11-3 dm0058 . . . . . . . . Speaking Stata: Fun and fluency with functions
      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
      Q3/11 SJ 11(3):460--471 (no commands)
      a tour of easily missed or underestimated Stata functions

      https://www.stata-journal.com/sjpdf....iclenum=dm0058

      SJ-3-4 dm0002 . . . . . . . . Stata tip 2: Building with floors and ceilings
      . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
      Q4/03 SJ 3(4):446--447 (no commands)
      tips for using floor() and ceil()

      https://www.stata-journal.com/sjpdf....iclenum=dm0002



      Last edited by Nick Cox; 20 Mar 2019, 06:22.

      Comment


      • #4
        Dear Andrew, Thanks for this helpful suggestion.
        Ho-Chuan (River) Huang
        Stata 19.0, MP(4)

        Comment


        • #5
          Dear Nick, Many thanks for all the suggestions (including `binning').
          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment

          Working...
          X