Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Firpo, Fortin and Lemieux methodology- recentered influence function and oaxaca decomposition

    Dear all,

    I want to use the methodology defined by Firpo, Fortin and Lemieux in paper Decomposing wage distributions using recentered inflluence function regressions, Econometrics 2018, 6, 28; doi:10.3390/econometrics6020028. Although I read the paper many times, I don't know how to estimate reweighting and specification errors in STATA. See e.g. Table 4.

    FFL methodology combines recentered influence function, oaxaca-blinder decomposition (Oaxaca 1973 and Blinder 1973) and reweighting (DiNardo et al. 1996).

    Thank you in advance for your answer.

    Best,
    Aleksandra


  • #2
    Hi Aleksandra
    Unfortunately, there is no direct way (ready made program) that estimates the re-weighting and specification errors in Stata. However, estimating them its not difficult. It requires some matrix notation. Below its an example that may help clarify how to do it:
    Code:
    * In this example I compare married women wages to single women wages.
    *The counterfactual is what would wages for single women would be if they
    earn as married women.
    * This mean im Reweighting the sample of married women to look
    like single women
    webuse womenwk, clear
    drop if wage==.
    
    gen age2=age*age
    gen educ2=educ*education
    gen ageeduc=age*education
    logit married age education  age2 educ2 ageeduc
    predict pmarried
    gen w1=1
    replace w1=(1-pmarried)/pmarried if married==1
    
    gen c=1
    reg wage age education if married==1 
    est sto e1
    matrix bm=e(b)
    mean age education c if married==1
    est sto x1
    matrix xm=e(b)
    
    reg wage age education if married==1 [pw=w1]
    est sto ec
    matrix bc=e(b)
    mean age education c if married==1 [pw=w1]
    est sto xc
    matrix xc=e(b)
    
    
    reg wage age education if married==0 
    est sto e2
    matrix bs=e(b)
    mean age education c if married==0 
    est sto x2
    matrix xs=e(b)
    
    ** For this command you need to install estout (ssc install estout)
    ** This is just compare the outputs.
    esttab e1 ec e2 x1 xc x2, se compress mtitles(Bmarried BCmarried Bsingle Xmarried XCMarried Xsingle1)
    ----------------------------------------------------------------------------------------
                     (1)          (2)          (3)          (4)          (5)          (6)   
                Bmarried    BCmarried      Bsingle     Xmarried    XCMarried     Xsingle1   
    ----------------------------------------------------------------------------------------
    age            0.156***    0.0717        0.137***     39.33***     33.50***     33.41***
                (0.0244)     (0.0388)     (0.0315)      (0.239)      (0.448)      (0.424)   
    
    education      0.923***     0.876***     0.842***     13.94***     12.24***     12.24***
                (0.0607)     (0.0861)     (0.0923)     (0.0964)      (0.136)      (0.145)   
    
    c                                                         1            1            1   
                                                            (.)          (.)          (.)   
    
    _cons          5.273***     9.211***     7.223***                                       
                 (1.207)      (1.843)      (1.501)                                          
    ----------------------------------------------------------------------------------------
    N                976          976          367          976          976          367   
    ----------------------------------------------------------------------------------------
    Standard errors in parentheses
    * p<0.05, ** p<0.01, *** p<0.001
    ** Here is where the decomposition is done.
    ** First aggregate decomposition
    matrix Dx=vecdiag(bm'*xm)-vecdiag(bc'*xc)
    matrix Dx=Dx,bm*xm'-bc*xc'
    matrix DB=vecdiag(bc'*xc)-vecdiag(bs'*xs)
    matrix DB=DB,bc*xc'-bs*xs'
    matrix OB=Dx',DB'
    matrix rowname OB=age education _cons T
    matrix colname OB=DX DB
    matrix list OB
    
                       DX          DB
          age    3.750833  -2.1818023
    education    2.144517   .40561869
        _cons  -3.9387435   1.9885457
            T   1.9566066    .2123621
    ** Then separating the reweighted error, here  DBe, from the specification error Dxe
    matrix Dxx=vecdiag(bm'*(xm-xc)),bm*(xm-xc)'
    matrix Dxe=vecdiag((bm'-bc')*(xc)),(bm-bc)*xc'
    matrix DBe=vecdiag(bc'*(xc-xs)),bc*(xc-xs)'
    matrix DBB=vecdiag((bc'-bs')*xs),(bc-bs)*xs'
    matrix OB2=Dxx',Dxe',DBB',DBe'
    matrix rowname OB2=age education _cons T
    matrix colname OB2=Dxx Dxe DBB DBe
     matrix list OB2
    
    
    OB2[4,4]
                      Dxx         Dxe         DBB         DBe
          age   .91255831   2.8382747  -2.1884821   .00667986
    education   1.5685393   .57597776   .40842372  -.00280503
        _cons           0  -3.9387435   1.9885457           0
            T   2.4810976  -.52449097   .20848726   .00387484
    
    * you can see that, as expected, the re-weighting error is almost zero (as expected)
    but that the specification error is large, suggesting that there is a specification
    error in the model.
    The example above its for a standard regression, but the same process can be used for -rifreg-. To obtain standard errors, I would suggest to bootstrap the whole system.
    HTH
    Fernando

    Comment


    • #3
      Thank you for such a detail explanation. I believed that there is an easier way to do that. Thanks.

      Best,
      Aleksandra

      Comment


      • #4
        Hi Aleksandra,
        YEs, there is an additional way to do it. The above code is something i like to work on so i know exactly where everything is coming from. Below is a code that reproduces the same code, but uses Firpo et al(2017) recommendation:
        Code:
        webuse womenwk, clear
        drop if wage==.
        
        gen age2=age*age
        gen educ2=educ*education
        gen ageeduc=age*education
        logit married age education  age2 educ2 ageeduc
        predict pmarried
        gen w1=1
        replace w1=(1-pmarried)/pmarried if married==1
         gen n=_n
        expand 2 if married==1 
        bysort n:gen id=_n
        
        replace w1=1 if married==1 & id==1
        
        gen grps=1 if married==1
        replace grps=2 if married==1 & id==2
        replace grps=3 if married==0
        
        oaxaca wage age education [aw=w1] if grps==1 | grps==2, by(grps) w(1)
        
        Blinder-Oaxaca decomposition                    Number of obs     =      1,952
                                                          Model           =     linear
        Group 1: grps = 1                                 N of obs 1      =        976
        Group 2: grps = 2                                 N of obs 2      =        976
        
        ------------------------------------------------------------------------------
                     |               Robust
                wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        overall      |
             group_1 |   24.28488   .2072612   117.17   0.000     23.87865     24.6911
             group_2 |   22.32827   .3084618    72.39   0.000      21.7237    22.93284
          difference |   1.956607    .371626     5.26   0.000     1.228233     2.68498
           explained |   2.481098   .2518829     9.85   0.000     1.987416    2.974779
         unexplained |   -.524491   .3713628    -1.41   0.158    -1.252349    .2033667
        -------------+----------------------------------------------------------------
        explained    |
                 age |   .9125583   .1660705     5.50   0.000     .5870662     1.23805
           education |   1.568539   .1847625     8.49   0.000     1.206411    1.930667
        -------------+----------------------------------------------------------------
        unexplained  |
                 age |   2.838275   1.547891     1.83   0.067    -.1955354    5.872085
           education |   .5759778   1.286132     0.45   0.654    -1.944795    3.096751
               _cons |  -3.938743   2.202122    -1.79   0.074    -8.254822    .3773355
        ------------------------------------------------------------------------------
        
                                
        
        
        oaxaca wage age education [aw=w1] if grps==2 | grps==3, by(grps) w(1)
        
        
        
        Blinder-Oaxaca decomposition                    Number of obs     =      1,343
                                                          Model           =     linear
        Group 1: grps = 2                                 N of obs 1      =        976
        Group 2: grps = 3                                 N of obs 2      =        367
        
        ------------------------------------------------------------------------------
                     |               Robust
                wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        overall      |
             group_1 |   22.32827   .3084618    72.39   0.000      21.7237    22.93284
             group_2 |   22.11591   .2904775    76.14   0.000     21.54658    22.68523
          difference |   .2123621   .4237049     0.50   0.616    -.6180843    1.042808
           explained |   .0038748   .1874601     0.02   0.984    -.3635402    .3712899
         unexplained |   .2084873   .3802472     0.55   0.583    -.5367835     .953758
        -------------+----------------------------------------------------------------
        explained    |
                 age |   .0066799    .044375     0.15   0.880    -.0802936    .0936534
           education |   -.002805   .1738861    -0.02   0.987    -.3436156    .3380055
        -------------+----------------------------------------------------------------
        unexplained  |
                 age |  -2.188482   1.672938    -1.31   0.191    -5.467381    1.090417
           education |   .4084237   1.535391     0.27   0.790    -2.600888    3.417735
               _cons |   1.988546   2.403184     0.83   0.408    -2.721609      6.6987
        ------------------------------------------------------------------------------
        As you can see, we get the same results as the one using matrices.
        The only thing to keep in mind is that those standard errors you get from oaxaca are not the correct ones, since they assume the reweighted weight its fixed. Also for RIF regressions (in particular unconditional quantile regression) you are also estimating the quantile and kernel density. For this reason, The above Oaxaca decomposition should also be bootstrapped.
        HTH
        Fernando

        Comment


        • #5
          Dear Fernando,

          Thank you for your help. I get strange results for oaxaca blinder decomposition with reweighting. If you could please have a look on my code below, as well as the result of the oaxaca-blinder decomposition. I don't know how to interpret results, since the result for the composition effect seems strange.

          Thank you very much for your help.

          Best,
          Aleksandra



          ***demographic variables education - secondary and tertiary (educ2 & educ3) work experiance and work experiance sqaured (liwwh & liwwh2)
          ***dummy for regions and settlment types (REG2-REG4, urb2-urb3)
          ***employment variables sector of economic acitivity (servicies and industry wact2 & wact3), dummies for number of workers (nw2-nw4)
          ***contract type (jobc) and part time (part_time)
          ***y2-y4 year dummies
          global demo educ2 educ3 liwwh liwwh2 REG2 REG3 REG4 urb2 urb3
          global emp wact2 wact3 nw2 nw3 nw4 jobc part_time
          sum $emp $demo

          female=2 is countrafectual sample of females

          gen male=(female==0)

          probit male $demo $emp y2-y4 if female==0 | female==1
          predict pmale, pr
          summ male if male<2
          gen pbar=r(mean)
          gen phix=(pmale)/(1-pmale)*((1-pbar)/pbar) if female==2
          sum phix, detail

          ///rifreg for the median, female, female as male and male, respectively.

          rifreg lyhw $demo $emp y2-y4 if female==1, q(0.5) re (r50f)
          rifreg lyhw $demo $emp y2-y4 if female==2 [aw=phix], q(0.5) re (r50fm)
          rifreg lyhw $demo $emp y2-y4 if female==0, q(0.5) re (r50m)

          egen r50=rowtotal(r50fm r50f r50m)
          recode r50 (0=.)

          ///males and females reweighted to males - wage effect
          no oaxaca r50 $demo $emp y2-y4 if female==0 | female==2, by(female) ///
          weight(0) ///
          detail (edu: educ1 educ2 educ3, wact: wact1 wact2 wact3, wexp: liwwh liwwh2, REG: REG1 REG2 REG3 REG4, urb: urb1 urb2 urb3, nw: nw1 nw2 nw3 nw4, y: y2-y4) ///
          categorical (educ1 educ2 educ3, wact1 wact2 wact3, REG1 REG2 REG3 REG4, urb1 urb2 urb3, nw1 nw2 nw3 nw4) ///
          vce (r) ///
          noisily
          difference 0.098***
          (0.010)
          explained -0.071***
          (0.006)
          unexplained 0.169***
          (0.010)

          ///female and females reweighted to males -composition effect
          no oaxaca r50 $demo $emp y2-y4 if female==1 | female==2, by(female) ///
          weight(1) ///
          swap ///
          detail (edu: educ1 educ2 educ3, wact: wact1 wact2 wact3, wexp: liwwh liwwh2, REG: REG1 REG2 REG3 REG4, urb: urb1 urb2 urb3, nw: nw1 nw2 nw3 nw4, y: y2-y4) ///
          categorical (educ1 educ2 educ3, wact1 wact2 wact3, REG1 REG2 REG3 REG4, urb1 urb2 urb3, nw1 nw2 nw3 nw4) ///
          vce (r) ///
          noisily
          difference -0.026**
          (0.011)
          explained 0.000
          (0.006)
          unexplained -0.026***
          (0.009)



          Comment


          • #6
            Hi Alexandra.
            Its difficult to say what is going on. I would like to see (and may be useful for you too) the intermediate outputs as well. That would help figuring out what explains your results.
            Fernando

            Comment


            • #7
              Hi Fernando,

              Here are the results. Thank you very much for your help. Any comments or suggestions are wellcome.

              Best,
              Aleksandra

              Attached Files

              Comment


              • #8
                Hi Alexandra.
                I see now the problem that you got. I think the problem lies because you did not include the weights (the re-weighting weights) in the Oaxaca. You can see this result because your Rif regression output "rifreg lyhw $demo $emp y2-y4 if female==2 [aw=phix], q(0.5) re (r50fm)" gives you a different result from the ones reported in your Oaxaca.
                As of right now, your variable weights "phix" only has values for the counterfactual, and is missing for the other two groups. You can fix that by simply replacing the missing values with 1. Then just add it to your oaxaca command.
                Fernando

                Comment


                • #9
                  Thank you very much, now it works. I have one more question. I use Surevey of Income and Living Conditions (SILC) data in my research. Generally, probability weights should be used to extrapolate results from the sample level to the whole population (personal cross-sectional weights). Since I use reweighting weights, I cannot use sample weights at the same time. Could that be a problem when commenting results? Also, is it possible to combine Heckman selection procedure somehow with the FFL + oaxaca methdology?

                  Thank you Fernando, you helped me significantly.
                  Best,
                  Aleksandra

                  Comment


                  • #10
                    So, based on my own work. It is possible to use both SIMPLE survey weights "[pw=sweight]" with reweighting weights. You just need to Multiply them. [pw=swegiht*ipw] If the IPW were fixed, that will give you the correct standard errors (other aspects of survey structures to the side). The problem lies on the Standard errors.
                    As you read in the paper you cited, Asymptotic Standard errors care difficult to get, which is why FFL use Bootstrapping.
                    Bootstrapping without sample weights is easy and straight forward, but doing so with weights is more involved. If you look online, you will see a few procedures for obtaining the right bootstrap weights.
                    The easiest approach to Bootstrapping with survey structure is the one described in ""The Analysis of Household Surveys: A Microeconometric Approach to Development Policy" by Angus Deaton See the link here: http://web.worldbank.org/archive/web...WEB/BOOK-2.HTM. But it may not apply to all cases.

                    Bottom line. I would make it clear that you are not using sample weights when describing your main results and statistics, and perhaps make a note comparing the statistics of interest (say gini and quantiles 10 and 90) with and without weights, just to be open regarding your results.

                    Now for heckman selection and RIF. There is a recent paper that came out last year (see link https://ideas.repec.org/p/zbw/hohdps/262017.html), where they cover this aspect in particular. Im not sure how well it will work, as its on my "to do list" to check the empirical validity of RIFregressions with heckman selection models. The bottom line of their paper, however, its that to do selection, you need to add some nonlinearities to the selection term. Similar to what you would do for quantile regression with selection.

                    HTH
                    Fernando





                    Comment


                    • #11
                      Thank you very much. One last question: How can I retain the rifreg results when bootstrap option is used, since it is not possible to combine bootstrap with retain?

                      Error messae is: can't use norobust, generate or retain options with bootstrap

                      Thank you Fernando.

                      Best,
                      Aleksandra

                      Comment


                      • #12
                        That is weird error. It may be because of how you are calling the bootstrap
                        I usually do it using "other" programs.
                        something like this:
                        Code:
                        capture program drop brif
                        program brif, eclass
                        capture drop rif
                        rifreg y x1 x2 x3, retain(rif)
                        reg rif x1 x2 x3
                        end
                        bootstrap: brif
                        That may help you
                        Fernando

                        Comment


                        • #13
                          Dear Fernando,

                          Thank you very much for your help.

                          Best wishes,
                          Aleksandra

                          Comment


                          • #14
                            Dear,
                            To continue the post, I commented that I am interested in analyzing the wage inequality in the period 2005-2015. My database used was the National Household Survey. The method to be followed is that described in Firpo, Fortin and Lemieux (2018).

                            I have analyzed the inter-quantile difference. First, I found the weighting term to see what the salary would be in 2005 if the salary structure were like the one in 2015. Then, I estimated by RIF regressions the salaries for each quantile of interest and generated a new dependent variable that indicates the interquartile I want to analyze. Finally I have applied the decomposition of Oaxaca.

                            My question is this:
                            1. How can I apply the survey expansion factor?
                            2. I didn’t use the information on the salary of the women because, since their labor participation rate is lower, the selection bias may exist. Is it possible to apply the selection bias correction in the methodology?
                            3. About the interpretation of results, can the estimated coefficients be interpreted as percentage variations of the interquartile difference as in the common linear regressions
                            Thanks in advance for the answers

                            PD: I not sure about how share log file, for these reason I share a link

                            Reference:
                            Firpo, S., Fortin, N., &amp; Lemieux, T. (2018). Decomposing wage distributions using recentered influence function regressions. Econometrics, 6(2), 28.

                            Comment


                            • #15
                              Dear Tamia
                              First of all, few months ago I added a new user written command to Stata, which can now be installed using -ssc install rif- In this package, 2 commands are introduced. rifhdreg and oaxaca_rif. The latter will allow you to do the type of decomposition analysis you are working on in a much simpler way.
                              Based on what you provided in that log, you are using the simpler version of the RIF oaxaca proposed in FFL2018. The reweighted version requires couple of additional steps, which oaxaca_rif does. Some of the details of the command can be found at :http://www.levyinstitute.org/publica...and-inequality

                              regarding you specific questions.
                              1. oaxaca_rif allows you to add sampling weights as usual [pw=weight]. the weights are applied on all the steps of the decomposition, including the estimation of the ipw and estimation of the RIF's. However, to obtain corrected standard errors, you may want to look into https://www.stata.com/meeting/snasug...v_snasug08.pdf and the paper available in stata journal by the same author.
                              2. In principle, selection issues could be solved by just including the Inverse Mills ratios into the outcome equation. However, as far as i know, there is no formal analysis of the best way to do it. In particular, because the effect of the selection component is non linear in your outcome (intequartile range).
                              3. Yes, you could interpret it as such.
                              HTH
                              Fernando

                              Comment

                              Working...
                              X