Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • easy way to compute income deciles?

    I have a dataset with the persons income and income decile of the UK as a whole, however I want to find their income decile within their given region- are there any commands to easily compute this, as currently I'm going through it by hand calculating each decile's range for 21 regions!

    Cheers,
    David

  • #2
    Terminology matters here. I presume you actually mean that you want region-specific decile group to be attributed to each person. There are 9 deciles defining 10 decile groups. This is a difference between, say, pctile and xtile. (You also need to be careful with weights.)

    Suggestion: use levels to put the values of region into a local macro. Then use foreach to loop over those values, while in each loop iteration you use xtile ... if region = `local_macro_name' to create the decile groups.

    Comment


    • #3
      I share Stephen's frustration here. Only yesterday I was reading a terrible book which isn't even consistent on the difference between quantiles and the groups, bins, or classes they imply if used as boundaries. They know what they mean, I know what they mean, but it's like swimming through mud. Unfortunately, some major non-technical dictionaries encourage use of quantiles to mean bins, not the boundaries of the bins!

      More to the main point. Technique in the last section of http://www.stata-journal.com/sjpdf.h...iclenum=pr0054 is directly relevant. Don't be put off by the title of the paper.

      Comment


      • #4
        thanks for the responses guys,

        To clarify, yes I would like region specific deciloe groups to be attributed to each person.

        I'm not sure I understand your suggestion Stephen, my knowledge of stata is quite limited so I struggled to follow those instructions!

        Nick, I read the article and it proved quite useful; I wrote the following command; . xtile incomedecile_rner = finios*rner, nquantiles(10)

        where finions=final equivalised income, and rner is a dummy variable for the region = 0 or 1

        however, this created a distribution where 90% of the values were 0 (as income*zero=0) and all the positive values ended up in the final decile! is there any way around this? perhaps replace all zero values of finios*rner with a value of null?

        Comment


        • #5
          One of the key points in that article is to underline that ties have to be respected in binning of this kind. It's a substantive question for you depending on your research goals whether the zeros are better omitted.

          Comment


          • #6
            the zero's represent all people from a different region, all samples have atleast some income, so there presence is stopping me from creating deciles of the actual regions income distribution so far as I can tell?

            Comment


            • #7
              I am not clear what else you are asking as we seem to be cycling round the same point.

              Quantile-based division usually has as an ideal that the bins are equally populated, but there are exceptions.

              You could compromise by treating the zeros as one group and then split the rest by deciles (or noniles!). I guess wildly that you may still have problems with ties.

              The decision is yours, unless there is some supervisor, mentor, boss or patron insisting otherwise.

              I never split anything into quantile-based bins for my own research. I'm of the view that a measurement is a measurement to be treated as such. But I am aware that people in business, economics and finance seem to find the approach congenial and useful; otherwise presumably they would cease forthwith. I can see the point in talking about (say) the top 10% performers of firms or the poorest 10% of people, perhaps most of all because firms aspire to be in that top fraction or researchers want to illuminate what is happening at the other end.

              Somehow it doesn't translate to environmental data, my main concern. But that has no implications for your work.

              Comment

              Working...
              X