Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Percentile based on age and year using loop

    Hi all,

    I have a dataset that includes total income, variable age ranges from 25 to 54, year from 1982 to 2018 and immigrant dummy variable. I want to generate two dummy variables for those who are at the top 1% and top10% of income distribution based on the year and age and then plot the share of those immigrants who are at the top 1% and top10% over years. So, basically, I generated the percentiles for each age group and for each year as follow:

    Code:
    gen ptile_inc=.
    
    forvalues a = 25/54 {
    xtile p`a' = totinc if age==`a' & year==1982 [aw=weight], nq(100)
    replace ptile_inc=p`a' if age==`a' & year==1982
    }
    
    gen ptile_inc1=.
    
    forvalues a = 25/54 {
    xtile p1_`a' = totinc if age==`a' & year==1983 [aw=weight], nq(100)
    replace ptile_inc1=p1_`a' if age==`a' & year==1983
    }
    
    gen top1_1982=(ptile_inc==100 | 
    gen top1_1983=ptile_inc1==100
    
    gen top10_1982=ptile_inc>90
    gen top10_1983=ptile_inc1>90
    This code is for only two years. However, doing this for each year needs too many codes and also creates too many variables. I am just wondering is there a neat syntax for creating top1% and top 10% dummy variables. Any guidance on this is much appreciated.
    Last edited by Marjan Habibi; 28 Feb 2021, 12:32.

  • #2
    Perhaps this (untested) example code will start you in a useful direction.
    Code:
    forvalues y=1982/2018 {
        gen top1_`y'  = .
        gen top10_`y' = .
        forvalues a = 25/54 {
            xtile p = totinc if age==`a' & year==`y' [aw=weight], nq(100)
            replace top1_`y'  = p==100 if age==`a' & year==`y'
            replace top10_`y' = p>90   if age==`a' & year==`y'
            drop p
        }
    }

    Comment


    • #3
      Thanks William! I tested it in a small sample dataset and that worked perfectly.

      Comment

      Working...
      X