Dear all,
Im trying to fit different GB distributions to LIS unit data income datasets. The problem I keep facing is that some of the distributions do not seem to fit to the income data.
When fitting different distributions, an interesting situation occurs where the unrestricted model doesn't find a solution but the restricted model does (special case of the higher parameter distribution). I was wondering if someone could elaborate why would this happen? Naturally I'm very interested to learn about ways to try to fix the problems I am having.
Consistently, I keep having issues with
4-par GB2 and GB1, where GB2 solution is found in about half of the datasets and for GB1 only in a handful of ~160datasets.
3par Beta1 and Beta2 with similar or less success compered to GB1 and GB2.
Interestingly 2parameter distributions (lognormal, fisk, gamma, weibull) seem to fit in more or less 100% of the datasets considered aswell.
A solution is found for all Dagum fits and in most cases for SM distribution fits.
Im using STATA commands fiskfit, lognfit, gammafit, weibullfit, dagumfit, smfit and gb2fit (Thanks to Buis, Cox and Jenkins) as well as distribution commands modified by myself for the other distributions that I have based on the code of gb2fit. I also have tried a number of different tech() options as well as without difficult setting. Some settings seem to favor some datasets but mostly marginal changes. Same with the income scale changes between thousands units and units).
The error messages I get are (depending on distribution and dataset)
- could not find feasible values
- could not calculate numerical derivatives
missing values encountered
- convergence not achieved
My full do-file is about 1500 rows long but here are the most important parts
. program gb1fit_ll
1. version 8.1
2. args lnf a b p q
3. qui replace `lnf' = ln(`a') + (`a' * `p' - 1) * ln($S_MLy) + (`q'-1) * ln(1-($S_MLy/`b')^(`a')) - (`a' * `p')*ln(`b') - lngamm
> a(`p') - lngamma(`q') + lngamma(`p' + `q')
4. end
. program beta1fit_ll
1. version 8.1
2. args lnf b p q
3. qui replace `lnf' = (`p' - 1) * ln($S_MLy)+(`q' - 1) * (ln(1 - $S_MLy / `b'))-(`p') * ln(`b') - lngamma(`p') - lngamma(`q') +
> lngamma(`p'+`q')
4. end
. program beta2fit_ll
1. version 8.1
2. args lnf b p q
3. qui replace `lnf' = (`p' - 1) * ln($S_MLy) - (`p') * (ln (`b')) - lngamma(`p') - lngamma(`q') + lngamma(`p' + `q') - (`p' + `q
> ')*ln(1+$S_MLy/`b')
4. end
. program gengammafit_ll
1. version 8.1
2. args lnf a b p
3. qui replace `lnf' = ln(`a') + (`a'*`p'-1)*ln($S_MLy) - (`a'*`p')*ln(`b')-lngamma(`p')-(($S_MLy/`b')^(`a'))
4. end
local distrlist fisk logn gamma weibull dagum beta1 beta2 gengamma sm gb1 gb2
foreach distr in `distrlist' {
capture : `distr'fit $yvar if $yvar>0 [aw=we], difficult iterate(1000)
...
..
.
Resulting with a .csv
One of the worst fitting dataset (Finnish 2004)
dataset;Distribution;Mean;Log-L;SSE;SAE;Chi2;par1;par2;par3;par4;error_dummy
fi04;dagum;34995.552;-107011.82;.0029263063287779;.1762591078877449;484. 0472143648566;4.9403716;69296.152;.16656719;.;0
fi04;beta1;.;.;0;0;0;.;.;.;.;1
fi04;beta2;.;-107596.2;.0087654317035373;.3436714746057987;1483. 029804663156;7.659e+09;1.1169577;237518.22;.;0
fi04;gengamma;.;.;0;0;0;.;.;.;.;1
fi04;sm;.;-107502.13;.0070364654916375;.2907900214195252;1354 .312556440797;1.1499537;1.724e+09;229316.27;.;0
fi04;gb1;.;.;0;0;0;.;.;.;.;1
fi04;gb2;.;.;0;0;0;.;.;.;.;1
And one of the best fitting (Australia 1995)
dataset;Distribution;Mean;Log-L;SSE;SAE;Chi2;a;b;c;d;error_dummy
au95;fisk;53.698438;-21717.873;.0037656504070043;.2073850817978382;338. 1778163919059;2.4879985;40.524231;.;.;0
au95;logn;51.782965;-22019.962;.0088273451336249;.2844472713768482;547. 3337241130193;3.6415797;.78164124;.;.;0
au95;gamma;48.697075;-21684.452;.0026782182983851;.1625073105096817;191. 0720707902185;22.124455;2.2010519;.;.;0
au95;weibull;48.854015;-21777.489;.0038584757592162;.2187049686908722;295. 8086521186779;54.102029;1.4965573;.;.;0
au95;dagum;48.393966;-21548;.0012784655802562;.1206132508814335;116.1812 030004812;3.74688;61.238609;.43080483;.;0
au95;beta1;.;.;0;0;0;.;.;.;.;gini;1
au95;beta2;48.640651;-21665.653;.0028560391988321;.1681783348321915;199. 9628612559609;418.71786;2.4345877;21.957888;.;0
au95;gengamma;45.513946;-21683.456;.002779269434626;.165854673832655;195.93 96153486823;.94191628;18.608602;2.4458552;.;0
au95;sm;48.398556;-21586.624;.0016310668069544;.1278458423912525;131. 1987107450299;1.9813232;78.95848;2.8145186;.;0
au95;gb1;49.103449;-21752.068;.0498044795822352;.9949923679232597;2282 1302.75527501;.13384447;919.34325;34.801775;18.214 408;0
au95;gb2;48.485684;-21547.726;.0503780724247918;.9994861483573914;-5976257.133697152;3.9752689;60.125643;.40234381;.9 0979795;0
Any advice is very welcome, thanks in advance
-Sami
Im trying to fit different GB distributions to LIS unit data income datasets. The problem I keep facing is that some of the distributions do not seem to fit to the income data.
When fitting different distributions, an interesting situation occurs where the unrestricted model doesn't find a solution but the restricted model does (special case of the higher parameter distribution). I was wondering if someone could elaborate why would this happen? Naturally I'm very interested to learn about ways to try to fix the problems I am having.
Consistently, I keep having issues with
4-par GB2 and GB1, where GB2 solution is found in about half of the datasets and for GB1 only in a handful of ~160datasets.
3par Beta1 and Beta2 with similar or less success compered to GB1 and GB2.
Interestingly 2parameter distributions (lognormal, fisk, gamma, weibull) seem to fit in more or less 100% of the datasets considered aswell.
A solution is found for all Dagum fits and in most cases for SM distribution fits.
Im using STATA commands fiskfit, lognfit, gammafit, weibullfit, dagumfit, smfit and gb2fit (Thanks to Buis, Cox and Jenkins) as well as distribution commands modified by myself for the other distributions that I have based on the code of gb2fit. I also have tried a number of different tech() options as well as without difficult setting. Some settings seem to favor some datasets but mostly marginal changes. Same with the income scale changes between thousands units and units).
The error messages I get are (depending on distribution and dataset)
- could not find feasible values
- could not calculate numerical derivatives
missing values encountered
- convergence not achieved
My full do-file is about 1500 rows long but here are the most important parts
. program gb1fit_ll
1. version 8.1
2. args lnf a b p q
3. qui replace `lnf' = ln(`a') + (`a' * `p' - 1) * ln($S_MLy) + (`q'-1) * ln(1-($S_MLy/`b')^(`a')) - (`a' * `p')*ln(`b') - lngamm
> a(`p') - lngamma(`q') + lngamma(`p' + `q')
4. end
. program beta1fit_ll
1. version 8.1
2. args lnf b p q
3. qui replace `lnf' = (`p' - 1) * ln($S_MLy)+(`q' - 1) * (ln(1 - $S_MLy / `b'))-(`p') * ln(`b') - lngamma(`p') - lngamma(`q') +
> lngamma(`p'+`q')
4. end
. program beta2fit_ll
1. version 8.1
2. args lnf b p q
3. qui replace `lnf' = (`p' - 1) * ln($S_MLy) - (`p') * (ln (`b')) - lngamma(`p') - lngamma(`q') + lngamma(`p' + `q') - (`p' + `q
> ')*ln(1+$S_MLy/`b')
4. end
. program gengammafit_ll
1. version 8.1
2. args lnf a b p
3. qui replace `lnf' = ln(`a') + (`a'*`p'-1)*ln($S_MLy) - (`a'*`p')*ln(`b')-lngamma(`p')-(($S_MLy/`b')^(`a'))
4. end
local distrlist fisk logn gamma weibull dagum beta1 beta2 gengamma sm gb1 gb2
foreach distr in `distrlist' {
capture : `distr'fit $yvar if $yvar>0 [aw=we], difficult iterate(1000)
...
..
.
Resulting with a .csv
One of the worst fitting dataset (Finnish 2004)
dataset;Distribution;Mean;Log-L;SSE;SAE;Chi2;par1;par2;par3;par4;error_dummy
fi04;dagum;34995.552;-107011.82;.0029263063287779;.1762591078877449;484. 0472143648566;4.9403716;69296.152;.16656719;.;0
fi04;beta1;.;.;0;0;0;.;.;.;.;1
fi04;beta2;.;-107596.2;.0087654317035373;.3436714746057987;1483. 029804663156;7.659e+09;1.1169577;237518.22;.;0
fi04;gengamma;.;.;0;0;0;.;.;.;.;1
fi04;sm;.;-107502.13;.0070364654916375;.2907900214195252;1354 .312556440797;1.1499537;1.724e+09;229316.27;.;0
fi04;gb1;.;.;0;0;0;.;.;.;.;1
fi04;gb2;.;.;0;0;0;.;.;.;.;1
And one of the best fitting (Australia 1995)
dataset;Distribution;Mean;Log-L;SSE;SAE;Chi2;a;b;c;d;error_dummy
au95;fisk;53.698438;-21717.873;.0037656504070043;.2073850817978382;338. 1778163919059;2.4879985;40.524231;.;.;0
au95;logn;51.782965;-22019.962;.0088273451336249;.2844472713768482;547. 3337241130193;3.6415797;.78164124;.;.;0
au95;gamma;48.697075;-21684.452;.0026782182983851;.1625073105096817;191. 0720707902185;22.124455;2.2010519;.;.;0
au95;weibull;48.854015;-21777.489;.0038584757592162;.2187049686908722;295. 8086521186779;54.102029;1.4965573;.;.;0
au95;dagum;48.393966;-21548;.0012784655802562;.1206132508814335;116.1812 030004812;3.74688;61.238609;.43080483;.;0
au95;beta1;.;.;0;0;0;.;.;.;.;gini;1
au95;beta2;48.640651;-21665.653;.0028560391988321;.1681783348321915;199. 9628612559609;418.71786;2.4345877;21.957888;.;0
au95;gengamma;45.513946;-21683.456;.002779269434626;.165854673832655;195.93 96153486823;.94191628;18.608602;2.4458552;.;0
au95;sm;48.398556;-21586.624;.0016310668069544;.1278458423912525;131. 1987107450299;1.9813232;78.95848;2.8145186;.;0
au95;gb1;49.103449;-21752.068;.0498044795822352;.9949923679232597;2282 1302.75527501;.13384447;919.34325;34.801775;18.214 408;0
au95;gb2;48.485684;-21547.726;.0503780724247918;.9994861483573914;-5976257.133697152;3.9752689;60.125643;.40234381;.9 0979795;0
Any advice is very welcome, thanks in advance
-Sami
Comment