Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting data into terciles

    I currently have a data set with school districts and # of students enrolled per district from all 50 states. I categorized the states into 10 Regions with a new variable called Region (so for example, Alabama and Arkansas have Region 1). Now, I need to create terciles within each region. I want to have small, medium, and large districts for each of those 10 districts, resulting in 30 bins (3 categories for each of the 10 regions). I've tried some different egen commands but have not been able to successfully split them within each region--I can split the entire data set into terciles easily, but have not figured out how to do it for each region.

  • #2
    egenmore from SSC has an xtile() function.

    Code:
    ssc inst egenmore 
    help egenmore

    Comment


    • #3
      Thanks for the quick response! I'm pretty new to Stata, so this is the first time I'm taking a go at this.

      So if my variable for district population is "member"...it would go something like this?

      egen tercile=xtile(x), n(3) by(member)

      Comment


      • #4
        Looks good from here.

        My impression is that tertile is a more common term than tercile.

        More on -iles here: http://stats.stackexchange.com/quest...half-a-percent

        Comment


        • #5
          Thanks!

          So I used this code: egen tercile=xtile(hhsregion), n(3) by(member)

          And it produced this:

          tab tercile

          tercile | Freq. Percent Cum.
          ------------+-----------------------------------
          1 | 8,034 61.07 61.07
          2 | 3,322 25.25 86.32
          3 | 1,800 13.68 100.00
          ------------+-----------------------------------
          Total | 13,156 100.00


          Which doesn't seem quite right. I was hoping that for each region (1-10), it would be split into tertiles.

          Comment


          • #6
            Not surprised that came next. http://www.statalist.org/forums/foru...-xtile-command should help.

            Comment


            • #7
              So...is my pursuit futile?

              Comment


              • #8
                You tell us. The ideal condition for quantile binning of n observations into k classes of equal frequency is that there are breaks in the distribution such that each class does indeed contains n/k observations. This is made difficult by ties, notably whenever any value has more than n/k observations and also otherwise.

                Comment


                • #9
                  When I looked at them this way, it's more promising...except shouldn't the tertiles be split evenly?
                  Attached Files

                  Comment


                  • #10
                    Same question? You can bin regions pooled or separately but the rule that the same values go into the same bin must trump all others either way.

                    Comment


                    • #11
                      Right. I'll have to figure out what to do with that. I appreciate the help! Hoping to get a little better at Stata every day.

                      Comment


                      • #12
                        There is some discussion of the limits on binning in http://www.stata-journal.com/sjpdf.h...iclenum=pr0054 Section 4

                        Comment

                        Working...
                        X