Help with decomposition of probabilites

Alina Faruk

Join Date: Oct 2018

Posts: 96
#1

Help with decomposition of probabilites

12 Jun 2019, 11:00

Dear all,

I am trying to replicate the methodology from the following paper (see Section 3. A Multinomial Probability Model of Income Distribution, pg 6):
http://unpan1.un.org/intradoc/groups...npan048358.pdf

From what I could gather, I am supposed to use -margins- after -mlogit-. But I am completely lost how to separate the difference into a characteristics effect and a discrimination effect by creating counterfactual distributions, where one group has another another group's characteristics and coefficients respectively.

Sample code:

Code:

sysuse auto mlogit rep78 foreign price displacement gear_ratio weight

The outcome variable has 5 categories, and the group variable is foreign.

Any help would be much appreciated.

Thanks.
Tags: categorical, counterfactual, decomposition, margins, mlogit

FernandoRios

Join Date: Apr 2014
Posts: 2470

12 Jun 2019, 11:35

Hi Alina
What you need to do to replicate the paper you provided a link for is more complicated than what you are currently doing.
Below is a small code that implements a basic version of what you are trying to replicate.

Code:

use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
xtile q5=lnwage, n(5)
drop if lnwage==.
mlogit q5 educ exper tenure age agesq if female==1
predict pf*
mlogit q5 educ exper tenure age agesq if female==0
predict pm*


mean pf* if female==1, 
est store m1
mean pf* if female==0
est store m2a
mean  pm* if female==1
est store m2b
mean pm* if female==0
est store m3

** This table has the results you are looking for
**m1 are the predicted probabilities of women to be in any of the income quantiles.
**m2a are the predicted probabilities of men to be in any of the income quantiles, using women coefficients
**m3 are the predicted probabilities of men to be in any of the income quantiles.
**differences between m1 and m2a are due to characteristics,
**differences between m2a and m3 are due to differences in coefficients
est tab m1 m2a m3,  nose nostar not 

-----------------------------------------------------
    Variable |     m1          m2a           m3      
-------------+---------------------------------------
         pf1 |   .2781845    .23494095               
         pf2 |  .24304539    .22699103               
         pf3 |  .16983894    .19630346               
         pf4 |  .16691068     .1812393               
         pf5 |   .1420205    .16052526               
         pm1 |                            .12916112  
         pm2 |                             .2183755  
         pm3 |                            .19174434  
         pm4 |                            .21038615  
         pm5 |                            .25033289  
-----------------------------------------------------
** This is basically the same as above, but the counterfactual is different
est tab m1 m2b m3,  nose nostar not 

-----------------------------------------------------
    Variable |     m1          m2b           m3      
-------------+---------------------------------------
         pf1 |   .2781845                            
         pf2 |  .24304539                            
         pf3 |  .16983894                            
         pf4 |  .16691068                            
         pf5 |   .1420205                            
         pm1 |               .11402641    .12916112  
         pm2 |               .20694076     .2183755  
         pm3 |               .19885544    .19174434  
         pm4 |               .22126638    .21038615  
         pm5 |               .25891101    .25033289  
-----------------------------------------------------

This is the basic structure of the method you are looking for, but needs a bit more work to obtain standard errors, and derive the actual decomposition.
HTH
Fernando

Comment

Alina Faruk

Join Date: Oct 2018
Posts: 96

12 Jun 2019, 11:46

Originally posted by FernandoRios View Post

Code:

use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
xtile q5=lnwage, n(5)
drop if lnwage==.
mlogit q5 educ exper tenure age agesq if female==1
predict pf*
mlogit q5 educ exper tenure age agesq if female==0
predict pm*


mean pf* if female==1,
est store m1
mean pf* if female==0
est store m2a
mean pm* if female==1
est store m2b
mean pm* if female==0
est store m3

** This table has the results you are looking for
**m1 are the predicted probabilities of women to be in any of the income quantiles.
**m2a are the predicted probabilities of men to be in any of the income quantiles, using women coefficients
**m3 are the predicted probabilities of men to be in any of the income quantiles.
**differences between m1 and m2a are due to characteristics,
**differences between m2a and m3 are due to differences in coefficients
est tab m1 m2a m3, nose nostar not

-----------------------------------------------------
Variable | m1 m2a m3
-------------+---------------------------------------
pf1 | .2781845 .23494095
pf2 | .24304539 .22699103
pf3 | .16983894 .19630346
pf4 | .16691068 .1812393
pf5 | .1420205 .16052526
pm1 | .12916112
pm2 | .2183755
pm3 | .19174434
pm4 | .21038615
pm5 | .25033289
-----------------------------------------------------
** This is basically the same as above, but the counterfactual is different
est tab m1 m2b m3, nose nostar not

-----------------------------------------------------
Variable | m1 m2b m3
-------------+---------------------------------------
pf1 | .2781845
pf2 | .24304539
pf3 | .16983894
pf4 | .16691068
pf5 | .1420205
pm1 | .11402641 .12916112
pm2 | .20694076 .2183755
pm3 | .19885544 .19174434
pm4 | .22126638 .21038615
pm5 | .25891101 .25033289
-----------------------------------------------------

This is the basic structure of the method you are looking for, but needs a bit more work to obtain standard errors, and derive the actual decomposition.
HTH
Fernando

Thank you for being a lifesaver once again, Mr Fernando.

If I understand correctly, the first has females as the reference category and the second one has males?

Would you please kindly elaborate what you mean by actual decomposition?

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2470
#4

12 Jun 2019, 12:18

It depends on what do you understand as reference category. I prefer not to use the "reference" category language, because it confuses me. However, based on the way Borooah(2005) uses the method, yes. You are correct. In the first case women are used as reference category.
For the actual decomposition i mean to obtain the differences as follows:
Say, Using the first set of numbers, and referring only the the first quintile. There is a 14.9% difference in the share of people that belong to the first quintile, comparing women's to Men's distribution.
10.58pp (23.5-12.9) are due to differences in coefficients, and 4.32pp [(27.8-23.5)] due to differences in coefficients.
1 like
Comment
Alina Faruk

Join Date: Oct 2018

Posts: 96
#5

12 Jun 2019, 12:33

Originally posted by FernandoRios View Post

It depends on what do you understand as reference category. I prefer not to use the "reference" category language, because it confuses me. However, based on the way Borooah(2005) uses the method, yes. You are correct. In the first case women are used as reference category.
For the actual decomposition i mean to obtain the differences as follows:
Say, Using the first set of numbers, and referring only the the first quintile. There is a 14.9% difference in the share of people that belong to the first quintile, comparing women's to Men's distribution.
10.58pp (23.5-12.9) are due to differences in coefficients, and 4.32pp [(27.8-23.5)] due to differences in coefficients.

Thanks a lot. And if I am using survey weights, I should just replace with -svy: mlogit- and -svy: means-, right?
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2470
#6

12 Jun 2019, 12:40

That is the part that makes this more complicated, unfortunately.
If you are using survey data, you should use weights. But, there is no easy way to obtain standard errors. That is why the paper you provided, does not report standard errors.
Comment
Alina Faruk

Join Date: Oct 2018

Posts: 96
#7

12 Jun 2019, 12:51

Originally posted by FernandoRios View Post

That is the part that makes this more complicated, unfortunately.
If you are using survey data, you should use weights. But, there is no easy way to obtain standard errors. That is why the paper you provided, does not report standard errors.

I see that now. Thank you so much, once again!
Comment

Alina Faruk

Join Date: Oct 2018
Posts: 96

12 Jun 2019, 14:28

Originally posted by FernandoRios View Post

Code:

use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
xtile q5=lnwage, n(5)
drop if lnwage==.
mlogit q5 educ exper tenure age agesq if female==1
predict pf*
mlogit q5 educ exper tenure age agesq if female==0
predict pm*


mean pf* if female==1,
est store m1
mean pf* if female==0
est store m2a
mean pm* if female==1
est store m2b
mean pm* if female==0
est store m3

** This table has the results you are looking for
**m1 are the predicted probabilities of women to be in any of the income quantiles.
**m2a are the predicted probabilities of men to be in any of the income quantiles, using women coefficients
**m3 are the predicted probabilities of men to be in any of the income quantiles.
**differences between m1 and m2a are due to characteristics,
**differences between m2a and m3 are due to differences in coefficients
est tab m1 m2a m3, nose nostar not

-----------------------------------------------------
Variable | m1 m2a m3
-------------+---------------------------------------
pf1 | .2781845 .23494095
pf2 | .24304539 .22699103
pf3 | .16983894 .19630346
pf4 | .16691068 .1812393
pf5 | .1420205 .16052526
pm1 | .12916112
pm2 | .2183755
pm3 | .19174434
pm4 | .21038615
pm5 | .25033289
-----------------------------------------------------
** This is basically the same as above, but the counterfactual is different
est tab m1 m2b m3, nose nostar not

-----------------------------------------------------
Variable | m1 m2b m3
-------------+---------------------------------------
pf1 | .2781845
pf2 | .24304539
pf3 | .16983894
pf4 | .16691068
pf5 | .1420205
pm1 | .11402641 .12916112
pm2 | .20694076 .2183755
pm3 | .19885544 .19174434
pm4 | .22126638 .21038615
pm5 | .25891101 .25033289
-----------------------------------------------------

This is the basic structure of the method you are looking for, but needs a bit more work to obtain standard errors, and derive the actual decomposition.
HTH
Fernando

Just in case anyone is interested, building on the above, here's how I did it:

Code:

use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
xtile q5=lnwage, n(5)
drop if lnwage==.
mlogit q5 educ exper tenure age agesq if female==1
predict pf*
mlogit q5 educ exper tenure age agesq if female==0
predict pm*

mean pf* if female==1,
matrix m1=e(b)'
mean pf* if female==0
matrix m2a=e(b)'
mean pm* if female==1
matrix m2b=e(b)'
mean pm* if female==0
matrix m3=e(b)'

//Using female as reference category
*Total Difference
matrix tf=m1-m3
matrix list tf

*Characteristics effect
matrix cf=m1-m2a
matrix list cf

*Coefficients effect
matrix df=m2a-m3
matrix list df

//Using male as reference category
*Total Difference
matrix tm=m3-m1
matrix list tm

*Characteristics effect
matrix cm=m3-m2b
matrix list cm

*Coefficients effect
matrix dm=m2b-m1
matrix list dm

There might be an easier way to do this that I do not know of, though.

And thanks once again to Mr Fernando!

Announcement