Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating 'household income per capita' variable

    Dear all

    I am using the South African National Income Dynamics Survey (NIDs) and have successively created a panel dataset across all 5 waves.
    I am looking to create a variable for 'household income per capita' quintiles that is categorized according to 5 percentiles so that you have the "bottom 20th percentile", "20th - 40th percentile" etc.
    It should be easy enough to make since i have a derived variable for 'household income' and a variable for the number of people in a household.

    I have tried using the following command:


    sort w`i'_hhincome

    xtile w`i'_inc = w`i'_hhincome, nq(5)
    tab w`i'_inc

    }

    forvalues i = 1(1)5 {


    // income per capita

    sort w`i'_hhincome
    cap gen w`i'_hh_capita = w`i'_hhincome/w`i'_hhsizer
    cap xtile w`i'_inc_capita = w`i'_hh_capita, nq(5)
    tab w`i'_inc_capita


    }

    But it ends up giving me the following result:

    5 quantiles |
    of |
    w5_hh_capit |
    a | Freq. Percent Cum.
    ------------+-----------------------------------
    1 | 8,194 20.01 20.01
    2 | 8,186 19.99 40.01
    3 | 8,191 20.01 60.01
    4 | 8,185 19.99 80.00
    5 | 8,188 20.00 100.00
    ------------+-----------------------------------
    Total | 40,944 100.00


    Can anyone help me understand whether this is the correct way to generate such a variable as it seems odd that the percentages are so similar?

    Kind regards
    Sophie Gebers


  • #2
    Can anyone help me understand whether this is the correct way to generate such a variable as it seems odd that the percentages are so similar?
    That you ask this question suggests to me that either you do not understand what quintiles are, or that you want something other than quintiles but have not expressed it clearly.

    When you divide any variable into quintiles, with some irregularity created by tied values at the boundaries between the 5 groups, by definition, you will always get 20% in each group. So the distribution of those quintiles in two different variables will always be quite similar, nearly identical, because they are always going to be close to 20% in each group. The only deviation from exactly 20% in each group arises from the sample size not being exactly divisible by 5, and by the possibility that there are ties at the boundary values separating the groups.

    If quintiles are not what you want, try explaining more clearly what you are hoping to calculate, or perhaps show an example of what you are looking for.

    That said, there are aspects of your code that are confusing.

    Code:
    sort w`i'_hhincome 
    
    xtile w`i'_inc = w`i'_hhincome, nq(5)
    tab w`i'_inc
    
    }
    is problematic because it refers to a local macro i that is undefined at that point. I'm guessing this is a copy/paste error and that the code shown has an additional line at the beginning that looks like -forvalues i = 1/5 { - And I assume that in this context, i refers to the wave of the survey--is that correct?

    Comment


    • #3
      hello,
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float year long(totinc incftx wagsal)
      1982 44257 36127 38375
      1982  7067  7059     0
      1982  6124  6124     0
      1982 11323 11081 10816
      1982 53423 46942 33274
      1982 34801 32626 27300
      1982 10864 10369  4942
      1982 37855 35581 13917
      1982 52098 44385 46683
      1982 22048 19357 20000
      1982 26063 21063 25459
      1982 30409 25311 30409
      1982 11153  9900 10307
      1982 19812 16248 17802
      1982  5082  5082     0
      1982 21691 21691 12000
      1982  6500  6225  6000
      1982 24261 21851 15767
      1982 28515 24094 28515
      1982 29481 24042 28456
      1982 17840 17788 15300
      1982  6212  6212     0
      1982  7486  7486  1685
      1982  7895  7220  6335
      1982 42572 32690 37823
      1982  6420  6317     0
      1982 21365 20677 20040
      1982 30075 24189 28654
      1982  5358  5358     0
      1982 19847 17845 13500
      1982 22379 20005 18503
      1982 52834 43324 25779
      1982 26701 24011 12562
      1982 17445 16955 10254
      1982  7604  7604  3479
      1982 12507 11887 11959
      1982  6623  6623     0
      1982 23164 18764 22616
      1982 13256 11045 17940
      1982 26270 22129 20250
      1982 24024 19819     0
      1982 26117 24319 14884
      1982 31047 30049  4087
      1982 32187 22827 31200
      1982 43226 31582 42134
      1982 15190 12814 15190
      1982 43257 34722 32523
      1982 32615 26084 31271
      1982  7092  7092     0
      1982 23515 20737 17568
      1982 10471 10471     0
      1982 26898 22968 26416
      1982 43160 34831 42584
      1982 10276 10091  7739
      1982 40343 30754 37976
      1982 28562 26857 21364
      1982  5400  5400     0
      1982 28710 25556 13728
      1982 11704 11436  1600
      1982 35480 31471 25954
      1982 28512 18302 28512
      1982  8296  7694  8296
      1982 31798 27335 21837
      1982 31912 26368 29702
      1982 38767 33084 37677
      1982  5324  5324     0
      1982  5970  5970     0
      1982 45254 35507 44512
      1982 29520 24726 28611
      1982 31959 25956 31959
      1982  5091  5091     0
      1982 21586 21457 20353
      1982 36165 34629 31473
      1982 44206 36579 38868
      1982 43224 35119 40000
      1982 10509 10509     0
      1982 13508 13508     0
      1982 31411 25168 30784
      1982 20315 18434 11643
      1982  4570  4570     0
      1982 18264 16920 17225
      1982 38677 32072 37173
      1982  9334  9334     0
      1982 37884 31536 33964
      1982 58665 49603 40126
      1982  6654  6654     0
      1982 27402 26646  8689
      1982 30314 24750 26016
      1982 29864 26064 28767
      1982 13690 12957  7118
      1982 30850 27570 28018
      1982  5091  5091     0
      1982  7429  6496  4919
      1982 27285 23231 25477
      1982  5091  5091     0
      1982 33758 30122 23153
      1982  9427  9427     0
      1982 23722 20132 21726
      1982 25700 20890 22700
      1982 26776 22450 23425
      end
      format %ty year
      i would like to create 25th, 50th, 75th, 90th percentile group of these income distribution. which command can i use to get the resut

      thanks in advance?

      Comment


      • #4
        Your example data includes a variable called year, but it is actually a constant 1982. Is year always 1982 throughout your full data set? If not, do you want to do this for each year separately?

        What do you mean by "25th, 50th, 75th, 90th percentile group." I understand what 25th, 50th, etc. percentiles are--but they are single numbers from the distribution, not groups. People often also want to break a data set up into quartiles, with the 25th, 50th, and 75th percentiles serving as boundaries between those quartiles. But the 90th percentile doesn't fit into that paradigm. So it isn't clear to me what you want to do.

        Comment


        • #5
          the variable year it is from 1982-2011
          i want to calculate the percentiles of these variables by year

          Comment


          • #6
            Code:
            foreach p of numlist 25 50 75 90 {
                gen pctile`p' = .
            
            }
            levelsof year, local(years)
            foreach y of local years {
                summ totinc if year == `y', detail
                foreach p of numlist 25 50 75 90 {
                    replace pctile`p' = r(p`p') if year == `y'
                }
            }

            Comment

            Working...
            X