Firpo, Fortin and Lemieux methodology- recentered influence function and oaxaca decomposition

Aleksandra Anic

Join Date: Aug 2017

Posts: 88
#1

Firpo, Fortin and Lemieux methodology- recentered influence function and oaxaca decomposition

19 Dec 2018, 10:43

Dear all,

I want to use the methodology defined by Firpo, Fortin and Lemieux in paper Decomposing wage distributions using recentered inflluence function regressions, Econometrics 2018, 6, 28; doi:10.3390/econometrics6020028. Although I read the paper many times, I don't know how to estimate reweighting and specification errors in STATA. See e.g. Table 4.

FFL methodology combines recentered influence function, oaxaca-blinder decomposition (Oaxaca 1973 and Blinder 1973) and reweighting (DiNardo et al. 1996).

Thank you in advance for your answer.

Best,
Aleksandra
Tags: None

FernandoRios

Join Date: Apr 2014
Posts: 2469

19 Dec 2018, 12:52

Hi Aleksandra
Unfortunately, there is no direct way (ready made program) that estimates the re-weighting and specification errors in Stata. However, estimating them its not difficult. It requires some matrix notation. Below its an example that may help clarify how to do it:

Code:

* In this example I compare married women wages to single women wages.
*The counterfactual is what would wages for single women would be if they
earn as married women.
* This mean im Reweighting the sample of married women to look
like single women
webuse womenwk, clear
drop if wage==.

gen age2=age*age
gen educ2=educ*education
gen ageeduc=age*education
logit married age education  age2 educ2 ageeduc
predict pmarried
gen w1=1
replace w1=(1-pmarried)/pmarried if married==1

gen c=1
reg wage age education if married==1 
est sto e1
matrix bm=e(b)
mean age education c if married==1
est sto x1
matrix xm=e(b)

reg wage age education if married==1 [pw=w1]
est sto ec
matrix bc=e(b)
mean age education c if married==1 [pw=w1]
est sto xc
matrix xc=e(b)


reg wage age education if married==0 
est sto e2
matrix bs=e(b)
mean age education c if married==0 
est sto x2
matrix xs=e(b)

** For this command you need to install estout (ssc install estout)
** This is just compare the outputs.
esttab e1 ec e2 x1 xc x2, se compress mtitles(Bmarried BCmarried Bsingle Xmarried XCMarried Xsingle1)
----------------------------------------------------------------------------------------
                 (1)          (2)          (3)          (4)          (5)          (6)   
            Bmarried    BCmarried      Bsingle     Xmarried    XCMarried     Xsingle1   
----------------------------------------------------------------------------------------
age            0.156***    0.0717        0.137***     39.33***     33.50***     33.41***
            (0.0244)     (0.0388)     (0.0315)      (0.239)      (0.448)      (0.424)   

education      0.923***     0.876***     0.842***     13.94***     12.24***     12.24***
            (0.0607)     (0.0861)     (0.0923)     (0.0964)      (0.136)      (0.145)   

c                                                         1            1            1   
                                                        (.)          (.)          (.)   

_cons          5.273***     9.211***     7.223***                                       
             (1.207)      (1.843)      (1.501)                                          
----------------------------------------------------------------------------------------
N                976          976          367          976          976          367   
----------------------------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
** Here is where the decomposition is done.
** First aggregate decomposition
matrix Dx=vecdiag(bm'*xm)-vecdiag(bc'*xc)
matrix Dx=Dx,bm*xm'-bc*xc'
matrix DB=vecdiag(bc'*xc)-vecdiag(bs'*xs)
matrix DB=DB,bc*xc'-bs*xs'
matrix OB=Dx',DB'
matrix rowname OB=age education _cons T
matrix colname OB=DX DB
matrix list OB

                   DX          DB
      age    3.750833  -2.1818023
education    2.144517   .40561869
    _cons  -3.9387435   1.9885457
        T   1.9566066    .2123621
** Then separating the reweighted error, here  DBe, from the specification error Dxe
matrix Dxx=vecdiag(bm'*(xm-xc)),bm*(xm-xc)'
matrix Dxe=vecdiag((bm'-bc')*(xc)),(bm-bc)*xc'
matrix DBe=vecdiag(bc'*(xc-xs)),bc*(xc-xs)'
matrix DBB=vecdiag((bc'-bs')*xs),(bc-bs)*xs'
matrix OB2=Dxx',Dxe',DBB',DBe'
matrix rowname OB2=age education _cons T
matrix colname OB2=Dxx Dxe DBB DBe
 matrix list OB2


OB2[4,4]
                  Dxx         Dxe         DBB         DBe
      age   .91255831   2.8382747  -2.1884821   .00667986
education   1.5685393   .57597776   .40842372  -.00280503
    _cons           0  -3.9387435   1.9885457           0
        T   2.4810976  -.52449097   .20848726   .00387484

* you can see that, as expected, the re-weighting error is almost zero (as expected)
but that the specification error is large, suggesting that there is a specification
error in the model.

The example above its for a standard regression, but the same process can be used for -rifreg-. To obtain standard errors, I would suggest to bootstrap the whole system.
HTH
Fernando

Comment

Aleksandra Anic

Join Date: Aug 2017

Posts: 88
#3

20 Dec 2018, 01:39

Thank you for such a detail explanation. I believed that there is an easier way to do that. Thanks.

Best,
Aleksandra
Comment

FernandoRios

Join Date: Apr 2014
Posts: 2469

20 Dec 2018, 04:33

Hi Aleksandra,
YEs, there is an additional way to do it. The above code is something i like to work on so i know exactly where everything is coming from. Below is a code that reproduces the same code, but uses Firpo et al(2017) recommendation:

Code:

webuse womenwk, clear
drop if wage==.

gen age2=age*age
gen educ2=educ*education
gen ageeduc=age*education
logit married age education  age2 educ2 ageeduc
predict pmarried
gen w1=1
replace w1=(1-pmarried)/pmarried if married==1
 gen n=_n
expand 2 if married==1 
bysort n:gen id=_n

replace w1=1 if married==1 & id==1

gen grps=1 if married==1
replace grps=2 if married==1 & id==2
replace grps=3 if married==0

oaxaca wage age education [aw=w1] if grps==1 | grps==2, by(grps) w(1)

Blinder-Oaxaca decomposition                    Number of obs     =      1,952
                                                  Model           =     linear
Group 1: grps = 1                                 N of obs 1      =        976
Group 2: grps = 2                                 N of obs 2      =        976

------------------------------------------------------------------------------
             |               Robust
        wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   24.28488   .2072612   117.17   0.000     23.87865     24.6911
     group_2 |   22.32827   .3084618    72.39   0.000      21.7237    22.93284
  difference |   1.956607    .371626     5.26   0.000     1.228233     2.68498
   explained |   2.481098   .2518829     9.85   0.000     1.987416    2.974779
 unexplained |   -.524491   .3713628    -1.41   0.158    -1.252349    .2033667
-------------+----------------------------------------------------------------
explained    |
         age |   .9125583   .1660705     5.50   0.000     .5870662     1.23805
   education |   1.568539   .1847625     8.49   0.000     1.206411    1.930667
-------------+----------------------------------------------------------------
unexplained  |
         age |   2.838275   1.547891     1.83   0.067    -.1955354    5.872085
   education |   .5759778   1.286132     0.45   0.654    -1.944795    3.096751
       _cons |  -3.938743   2.202122    -1.79   0.074    -8.254822    .3773355
------------------------------------------------------------------------------

                        


oaxaca wage age education [aw=w1] if grps==2 | grps==3, by(grps) w(1)



Blinder-Oaxaca decomposition                    Number of obs     =      1,343
                                                  Model           =     linear
Group 1: grps = 2                                 N of obs 1      =        976
Group 2: grps = 3                                 N of obs 2      =        367

------------------------------------------------------------------------------
             |               Robust
        wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   22.32827   .3084618    72.39   0.000      21.7237    22.93284
     group_2 |   22.11591   .2904775    76.14   0.000     21.54658    22.68523
  difference |   .2123621   .4237049     0.50   0.616    -.6180843    1.042808
   explained |   .0038748   .1874601     0.02   0.984    -.3635402    .3712899
 unexplained |   .2084873   .3802472     0.55   0.583    -.5367835     .953758
-------------+----------------------------------------------------------------
explained    |
         age |   .0066799    .044375     0.15   0.880    -.0802936    .0936534
   education |   -.002805   .1738861    -0.02   0.987    -.3436156    .3380055
-------------+----------------------------------------------------------------
unexplained  |
         age |  -2.188482   1.672938    -1.31   0.191    -5.467381    1.090417
   education |   .4084237   1.535391     0.27   0.790    -2.600888    3.417735
       _cons |   1.988546   2.403184     0.83   0.408    -2.721609      6.6987
------------------------------------------------------------------------------

As you can see, we get the same results as the one using matrices.
The only thing to keep in mind is that those standard errors you get from oaxaca are not the correct ones, since they assume the reweighted weight its fixed. Also for RIF regressions (in particular unconditional quantile regression) you are also estimating the quantile and kernel density. For this reason, The above Oaxaca decomposition should also be bootstrapped.
HTH
Fernando

Comment

Aleksandra Anic

Join Date: Aug 2017

Posts: 88
#5

20 Dec 2018, 10:54

Dear Fernando,

Thank you for your help. I get strange results for oaxaca blinder decomposition with reweighting. If you could please have a look on my code below, as well as the result of the oaxaca-blinder decomposition. I don't know how to interpret results, since the result for the composition effect seems strange.

Thank you very much for your help.

Best,
Aleksandra

***demographic variables education - secondary and tertiary (educ2 & educ3) work experiance and work experiance sqaured (liwwh & liwwh2)
***dummy for regions and settlment types (REG2-REG4, urb2-urb3)
***employment variables sector of economic acitivity (servicies and industry wact2 & wact3), dummies for number of workers (nw2-nw4)
***contract type (jobc) and part time (part_time)
***y2-y4 year dummies
global demo educ2 educ3 liwwh liwwh2 REG2 REG3 REG4 urb2 urb3
global emp wact2 wact3 nw2 nw3 nw4 jobc part_time
sum $emp $demo

female=2 is countrafectual sample of females

gen male=(female==0)

probit male $demo $emp y2-y4 if female==0 | female==1
predict pmale, pr
summ male if male<2
gen pbar=r(mean)
gen phix=(pmale)/(1-pmale)*((1-pbar)/pbar) if female==2
sum phix, detail

///rifreg for the median, female, female as male and male, respectively.

rifreg lyhw $demo $emp y2-y4 if female==1, q(0.5) re (r50f)
rifreg lyhw $demo $emp y2-y4 if female==2 [aw=phix], q(0.5) re (r50fm)
rifreg lyhw $demo $emp y2-y4 if female==0, q(0.5) re (r50m)

egen r50=rowtotal(r50fm r50f r50m)
recode r50 (0=.)

///males and females reweighted to males - wage effect
no oaxaca r50 $demo $emp y2-y4 if female==0 | female==2, by(female) ///
weight(0) ///
detail (edu: educ1 educ2 educ3, wact: wact1 wact2 wact3, wexp: liwwh liwwh2, REG: REG1 REG2 REG3 REG4, urb: urb1 urb2 urb3, nw: nw1 nw2 nw3 nw4, y: y2-y4) ///
categorical (educ1 educ2 educ3, wact1 wact2 wact3, REG1 REG2 REG3 REG4, urb1 urb2 urb3, nw1 nw2 nw3 nw4) ///
vce (r) ///
noisily

difference 0.098***

(0.010)

explained -0.071***

(0.006)

unexplained 0.169***

(0.010)

///female and females reweighted to males -composition effect
no oaxaca r50 $demo $emp y2-y4 if female==1 | female==2, by(female) ///
weight(1) ///
swap ///
detail (edu: educ1 educ2 educ3, wact: wact1 wact2 wact3, wexp: liwwh liwwh2, REG: REG1 REG2 REG3 REG4, urb: urb1 urb2 urb3, nw: nw1 nw2 nw3 nw4, y: y2-y4) ///
categorical (educ1 educ2 educ3, wact1 wact2 wact3, REG1 REG2 REG3 REG4, urb1 urb2 urb3, nw1 nw2 nw3 nw4) ///
vce (r) ///
noisily

difference -0.026**

(0.011)

explained 0.000

(0.006)

unexplained -0.026***

(0.009)
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#6

20 Dec 2018, 13:41

Hi Alexandra.
Its difficult to say what is going on. I would like to see (and may be useful for you too) the intermediate outputs as well. That would help figuring out what explains your results.
Fernando
Comment
Aleksandra Anic

Join Date: Aug 2017

Posts: 88
#7

21 Dec 2018, 02:00

Hi Fernando,

Here are the results. Thank you very much for your help. Any comments or suggestions are wellcome.

Best,
Aleksandra

Attached Files

results.txt (47.0 KB, 1 view)
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#8

21 Dec 2018, 06:18

Hi Alexandra.
I see now the problem that you got. I think the problem lies because you did not include the weights (the re-weighting weights) in the Oaxaca. You can see this result because your Rif regression output "rifreg lyhw $demo $emp y2-y4 if female==2 [aw=phix], q(0.5) re (r50fm)" gives you a different result from the ones reported in your Oaxaca.
As of right now, your variable weights "phix" only has values for the counterfactual, and is missing for the other two groups. You can fix that by simply replacing the missing values with 1. Then just add it to your oaxaca command.
Fernando
Comment
Aleksandra Anic

Join Date: Aug 2017

Posts: 88
#9

21 Dec 2018, 07:29

Thank you very much, now it works. I have one more question. I use Surevey of Income and Living Conditions (SILC) data in my research. Generally, probability weights should be used to extrapolate results from the sample level to the whole population (personal cross-sectional weights). Since I use reweighting weights, I cannot use sample weights at the same time. Could that be a problem when commenting results? Also, is it possible to combine Heckman selection procedure somehow with the FFL + oaxaca methdology?

Thank you Fernando, you helped me significantly.
Best,
Aleksandra
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#10

21 Dec 2018, 08:22

So, based on my own work. It is possible to use both SIMPLE survey weights "[pw=sweight]" with reweighting weights. You just need to Multiply them. [pw=swegiht*ipw] If the IPW were fixed, that will give you the correct standard errors (other aspects of survey structures to the side). The problem lies on the Standard errors.
As you read in the paper you cited, Asymptotic Standard errors care difficult to get, which is why FFL use Bootstrapping.
Bootstrapping without sample weights is easy and straight forward, but doing so with weights is more involved. If you look online, you will see a few procedures for obtaining the right bootstrap weights.
The easiest approach to Bootstrapping with survey structure is the one described in ""The Analysis of Household Surveys: A Microeconometric Approach to Development Policy" by Angus Deaton See the link here: http://web.worldbank.org/archive/web...WEB/BOOK-2.HTM. But it may not apply to all cases.

Bottom line. I would make it clear that you are not using sample weights when describing your main results and statistics, and perhaps make a note comparing the statistics of interest (say gini and quantiles 10 and 90) with and without weights, just to be open regarding your results.

Now for heckman selection and RIF. There is a recent paper that came out last year (see link https://ideas.repec.org/p/zbw/hohdps/262017.html), where they cover this aspect in particular. Im not sure how well it will work, as its on my "to do list" to check the empirical validity of RIFregressions with heckman selection models. The bottom line of their paper, however, its that to do selection, you need to add some nonlinearities to the selection term. Similar to what you would do for quantile regression with selection.

HTH
Fernando
Comment
Aleksandra Anic

Join Date: Aug 2017

Posts: 88
#11

21 Dec 2018, 09:18

Thank you very much. One last question: How can I retain the rifreg results when bootstrap option is used, since it is not possible to combine bootstrap with retain?

Error messae is: can't use norobust, generate or retain options with bootstrap

Thank you Fernando.

Best,
Aleksandra
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#12

21 Dec 2018, 09:52

That is weird error. It may be because of how you are calling the bootstrap
I usually do it using "other" programs.
something like this:

Code:

capture program drop brif program brif, eclass capture drop rif rifreg y x1 x2 x3, retain(rif) reg rif x1 x2 x3 end bootstrap: brif

That may help you
Fernando
Comment
Aleksandra Anic

Join Date: Aug 2017

Posts: 88
#13

21 Dec 2018, 10:15

Dear Fernando,

Thank you very much for your help.

Best wishes,
Aleksandra
Comment
Tamia Lavado

Join Date: May 2019

Posts: 1
#14

02 Jun 2019, 18:26

Dear,
To continue the post, I commented that I am interested in analyzing the wage inequality in the period 2005-2015. My database used was the National Household Survey. The method to be followed is that described in Firpo, Fortin and Lemieux (2018).

I have analyzed the inter-quantile difference. First, I found the weighting term to see what the salary would be in 2005 if the salary structure were like the one in 2015. Then, I estimated by RIF regressions the salaries for each quantile of interest and generated a new dependent variable that indicates the interquartile I want to analyze. Finally I have applied the decomposition of Oaxaca.

My question is this:
How can I apply the survey expansion factor?

I didn’t use the information on the salary of the women because, since their labor participation rate is lower, the selection bias may exist. Is it possible to apply the selection bias correction in the methodology?

About the interpretation of results, can the estimated coefficients be interpreted as percentage variations of the interquartile difference as in the common linear regressions

Thanks in advance for the answers

PD: I not sure about how share log file, for these reason I share a link

Reference:
Firpo, S., Fortin, N., & Lemieux, T. (2018). Decomposing wage distributions using recentered influence function regressions. Econometrics, 6(2), 28.

oaxaca_rif.log - Google Drive

https://drive.google.com
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2469
#15

03 Jun 2019, 04:55

Dear Tamia
First of all, few months ago I added a new user written command to Stata, which can now be installed using -ssc install rif- In this package, 2 commands are introduced. rifhdreg and oaxaca_rif. The latter will allow you to do the type of decomposition analysis you are working on in a much simpler way.
Based on what you provided in that log, you are using the simpler version of the RIF oaxaca proposed in FFL2018. The reweighted version requires couple of additional steps, which oaxaca_rif does. Some of the details of the command can be found at :http://www.levyinstitute.org/publica...and-inequality

regarding you specific questions.
1. oaxaca_rif allows you to add sampling weights as usual [pw=weight]. the weights are applied on all the steps of the decomposition, including the estimation of the ipw and estimation of the RIF's. However, to obtain corrected standard errors, you may want to look into https://www.stata.com/meeting/snasug...v_snasug08.pdf and the paper available in stata journal by the same author.
2. In principle, selection issues could be solved by just including the Inverse Mills ratios into the outcome equation. However, as far as i know, there is no formal analysis of the best way to do it. In particular, because the effect of the selection component is non linear in your outcome (intequartile range).
3. Yes, you could interpret it as such.
HTH
Fernando
1 like
Comment

difference	0.098***
	(0.010)
explained	-0.071***
	(0.006)
unexplained	0.169***
	(0.010)

difference	-0.026**
	(0.011)
explained	0.000
	(0.006)
unexplained	-0.026***
	(0.009)

Announcement