I am working with CPS data on Stata13. For each year from 1962 to 2014, I want to calculate the Gini of each age group between 20 and 80 (so 3720 different Ginis). The entire dataset has ~5m observations. I am looking for a way to accelerate this calculation:
I'm thinking for example that it could be accelerated by editing ineqdec0.ado to calculate the Gini only and not the other inequality measures, but I am not sure how to do this.
I used the code below in order to do a quick comparison of different commands in terms of speed and accuracy. ineqdec0 took about 90sec, fastgini took 50 and inequal7 about 110; the latter however is too imprecise for my purposes (inequal7's Gini differed from ineqdec0's Gini by up to 0.09 for some years).
Thanks, Joachim
Code:
qui foreach yyy in $years {
ineqdec0 inctot_adj [w=wtsupp] if year==`yyy', by(cohort)
foreach ccc in $cohorts {
di r(gini_`ccc')
replace apcgini=r(gini_`ccc') if cohort==`ccc' & year==`yyy'
}
}
I used the code below in order to do a quick comparison of different commands in terms of speed and accuracy. ineqdec0 took about 90sec, fastgini took 50 and inequal7 about 110; the latter however is too imprecise for my purposes (inequal7's Gini differed from ineqdec0's Gini by up to 0.09 for some years).
Code:
set more off
qui gen g1=.
qui gen g2=.
qui gen g3=.
qui timer on 1
qui ineqdec0 inctot_adj [w=wtsupp], by(year)
qui foreach yyy in $years {
di r(gini_`yyy')
replace g1=r(gini_`yyy') if year==`yyy'
}
qui timer off 1
qui timer on 2
qui foreach yyy in $years {
fastgini inctot_adj [w=wtsupp] if year==`yyy'
di r(gini)
replace g2=r(gini) if year==`yyy'
}
qui timer off 2
qui timer on 3
qui foreach yyy in $years {
inequal7 inctot_adj [w=wtsupp] if year==`yyy'
di r(gini)
replace g3=real(r(gini)) if year==`yyy'
}
qui timer off 3
qui gen gdiff2=g1-g2
qui gen gdiff3=g1-g3
table year, contents(mean g1 mean g2 mean g3)
table year, contents(mean gdiff2 mean gdiff3)
timer list
Thanks, Joachim

Comment