Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unconditional Quantile Regression-Counterfactual decomposition analysis

    Dear STATA pros,

    For my research, using the India NSS data, I'm observing the gap in consumption of major food groups by religion and employing RIF QR counterfactual decomposition methods.

    A study by (Srinivasan, Chittur S.; Zanello, Giacomo; Shankar, Bhavani, 2013) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3729423/ assesses the rural-urban differentials in HAZ scores by first estimating the distributions of HAZ scores separately for rural and urban children in each country using kernel smoothing techniques. I would like to generate figures like Figures 1 and 2 (shown on page 9) which show the cumulative distribution functions for urban and rural HAZ scores in Bangladesh and Nepal as well as aggregate the results of QR-CD analysis.

    My question is how do you implement this on STATA? What is the command?

    I have plotted the cumulative distribution functions for hindu and non-hindu milk consumption but I am struggling with how to plot the "counterfactual" curve on the same graph.

    **plotting cumulative distribution function
    cumul milk if hindu==1, gen(x)
    cumul milk if hindu==0, gen(y)
    stack x milk y milk, into(c milk) wide clear
    line x y milk, sort

    Thank you for your help.
    Samira.

  • #2
    Hi Samira,
    there are a few ways to get what you are thinking about, and some of them involve the use of UQR procedures. It all depends on what would you consider to be your counterfactual.
    Here is a small example
    Code:
    use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta, clear
    ** Say that you want to do this by Gender
    ** First you need your 
    probit female c.(educ exper tenure)##c.(educ exper tenure)
    predict prfem
    drop if prfem==.
    
    sum educ exper tenure if female==0
    ** two different counterfactuals:
    ** Women with characteristics that look like men
    sum educ exper tenure if female==1 [w=(1-prfem)/prfem]
    ** Men with characteristics that look like women
    sum educ exper tenure if female==0 [w=prfem/(1-prfem)]
    sum educ exper tenure if female==1
    
    pctile men_wage=lnwage if female==0, n(100)
    pctile women_wage=lnwage if female==1, n(100)
    pctile men_x_with_women_b=lnwage if female==0 [w=prfem/(1-prfem)], n(100)
    gen n=_n 
    replace n=. if n>100
    two line n men_wage, sort || line n women_wage  , sort|| line n men_x_with_women_b , sort legend(order(1 "Men" 2 "Women" 3 "Counterfactual")) xtitle(Log Wages) ytitle(Perncentile)
    Regarding a more formal way to derive the decomposition, perhaps you want to take a look at "oaxaca_rif", which you can install using "ssc install oaxaca_rif"
    HTH

    Comment


    • #3
      Hi,

      Thank you so much for your response. Yes, I have used oaxaca_rif command to decompose milk consumption differences in hindu and non-hindu households for quantiles (10, 25, 50, 75 and 90) but I am still not sure how to replicate Figure 1 by Srinivasan et a. (2013). Also, in that paper, how would you do further decomposition of the covariate and coefficient effects into the contribution of individual covariates shown in Tables 4 and 5?

      For instance, running the following command will show the decomposition for the lowest quantile.

      oaxaca_rif milk logmpce hh_size hheduc femhh rural agri [aw=hhwt], cluster (FSU_Serial_No) by(hindu) wgt(1) rif(q(10))

      But I don't know what command they have used for further decomposition of the covariate and coefficient effects of individual characteristics?

      Many thanks for your help,
      Samira.

      Comment


      • #4
        Im glad you are finding the program useful.
        I m not sure, however, why is it that you do not see the desired results. Oaxaca_rif should automatically give you the detailed decomposition.
        Can you share the exact results you are obtaining?
        Thank you

        Comment


        • #5
          oaxaca_rif pcfruitvegg logmpce hh_size hheduc femhh rural agri [aw=hhwt], cluster (FSU_Serial_No) by(hindu) wgt(1) rif(q(10))
          No Reweighted Strategy Choosen
          Estimating Standard RIF-OAXACA using RIF:q(10)
          Model : Oaxaca-Blinder RIF-decomposition
          Type : Standard
          RIF : q(10)
          Scale : 1
          Group 1: hindu= 0 N of obs 1 = 32022
          Group c: x2*b1 N of obs C = .
          Group 2: hindu= 1 N of obs 2 = 13353

          ------------------------------------------------------------------------------
          pcfruitvegg | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          overall |
          group_1 | 100.2119 1.973394 50.78 0.000 96.34407 104.0796
          group_2 | 66.04037 2.416401 27.33 0.000 61.30432 70.77643
          difference | 34.17148 3.098817 11.03 0.000 28.09791 40.24505
          explained | 21.46223 2.478677 8.66 0.000 16.60411 26.32035
          unexplained | 12.70925 4.160562 3.05 0.002 4.5547 20.8638
          -------------+----------------------------------------------------------------
          explained |
          logmpce | 24.56972 2.803273 8.76 0.000 19.07541 30.06404
          hh_size | -.329022 .1980491 -1.66 0.097 -.7171912 .0591471
          hheduc | 2.964485 1.68645 1.76 0.079 -.3408963 6.269865
          femhh | .0035343 .020232 0.17 0.861 -.0361197 .0431883
          rural | -2.861022 1.317567 -2.17 0.030 -5.443406 -.2786387
          agrihh | -2.099375 .5005418 -4.19 0.000 -3.080419 -1.118332
          -------------+----------------------------------------------------------------
          unexplained |
          logmpce | -97.91773 47.87332 -2.05 0.041 -191.7477 -4.087752
          hh_size | -3.547644 7.259225 -0.49 0.625 -17.77546 10.68018
          hheduc | 10.14495 3.61851 2.80 0.005 3.052796 17.2371
          femhh | .8236559 1.277081 0.64 0.519 -1.679377 3.326689
          rural | -6.098612 6.621972 -0.92 0.357 -19.07744 6.880214
          agrihh | -7.444178 3.552301 -2.10 0.036 -14.40656 -.4817967
          _cons | 116.9414 54.00685 2.17 0.030 11.08992 222.7929
          ------------------------------------------------------------------------------

          .
          ][/CODE]

          Comment


          • #6
            Thank you for your response. As you can see here, it shows the covariate effect of the 10th quantile for the "explained" part and coefficient effect for the "unexplained" part for each of the explanatory variables. But in that paper, in Table 4, they show the covariate effect and coefficient effect of these explanatory variables for both explained and unexplained parts. Hope I make sense!

            Thank you,
            Samira.

            Comment


            • #7
              I see.
              Ok so to understand what that component is I would suggest you to read Firpo Fortin and Lemieux (2018) paper :https://www.mdpi.com/2225-1146/6/2/28/pdf-vor
              and the paper that explains about the command, with couple of examples about it Rios-Avila (2019) http://www.levyinstitute.org/pubs/wp_927.pdf

              In a nutshell, this components, which are referred to as unexplained on the paper you cite, is what FFL(2018) call the specification error and reweighing error. To obtain those results, you need to use a syntax similar to the following
              Code:
              oaxaca_rif lnwage educ exper tenure, by(female) wgt(1) rif(q(50)) rwlogit(educ exper tenure)
              This uses the Reweighted RIF decomposition , rather than just the Oaxaca RIF decomposition.
              HTH
              Fernando

              Comment


              • #8
                Thank you so much Fernando!

                Samira.

                Comment


                • #9
                  Hi Fernando,

                  In the World Development paper by Cavatorta et al. (2015) https://www.sciencedirect.com/scienc...05750X15001655, which is about explaining cross -state disparities in child nutrition in rural India, in Figure 1, they have plotted density functions. Do you know the STATA code to plot this figure (using the oaxaca_rif command) - Tamil Nadu distribution vs Bihar distribution along with the counterfactual?

                  I'm doing a RIF counterfactual decomposition analysis using the NSS data for quantiles Q10-Q90 and it would be nice to demonstrate the results graphically.

                  Many thanks for your help,
                  Samira.

                  Comment


                  • #10
                    Hi Samira,
                    Unfortunately, there is no command that can produce those figures automatically. I tried write one, because graphs usually require a lot of detail when formatting, I desisted from the idea. In any case, bellow I provide you with one code that can be used to replicate the density figures.

                    Code:
                    webuse cattaneo2, clear
                    * treatment mbsmoke. Decomposition using Reweighted option
                    oaxaca_rif bweight prenatal1 mmarried mage fbaby, by(mbsmoke) w(0) rwprobit(mmarried c.mage##c.mage fbaby medu) rif(q(50))
                    ** this probit is internally estimated when using oaxaca_rif with the rwprobit option
                    probit mbsmoke  mmarried c.mage##c.mage fbaby medu
                    predict pr
                    
                    ** this is equivalent to using w(0) in the oaxaca_rif command
                    gen wc1=(1-pr)/pr if mbsmoke==1
                    ** this is equivalent to using w(1) in the oaxaca_rif command
                    gen wc2=pr/(1-pr) if mbsmoke==0
                    
                    ** Equivalent to figure 1
                    two kdensity bweight if mbsmoke==0 || kdensity bweight if mbsmoke==1 || kdensity bweight [aw=wc1] if mbsmoke==1, ///
                                legend(order(1 "Non Smokers" 2 "Smokers" 3 "Counterfactual If smokers did not smoke"))
                    ** Equivalent to figure 1 with alternative counterfactual
                    two kdensity bweight if mbsmoke==0 || kdensity bweight if mbsmoke==1 || kdensity bweight [aw=wc2] if mbsmoke==0, ///
                                legend(order(1 "Non Smokers" 2 "Smokers" 3 "Counterfactual If nonsmokers did smoke"))
                    ** Some people also like to use CDFs for comparing the distribution, so you can use the following
                    ** Using CDFs
                     cumul bweight if mbsmoke==0, gen(cdf0)
                     cumul bweight if mbsmoke==1, gen(cdf1)
                     cumul bweight if mbsmoke==1 [aw=wc1], gen(cdfc1)
                     cumul bweight if mbsmoke==0 [aw=wc2], gen(cdfc2)
                    two line cdf0 bweight, sort || line cdf1 bweight, sort || line cdfc1 bweight, sort legend(order(1 "Non Smokers" 2 "Smokers" 3 "Counterfactual If smokers did not smoke"))
                    two line cdf0 bweight, sort || line cdf1 bweight, sort || line cdfc2 bweight, sort legend(order(1 "Non Smokers" 2 "Smokers" 3 "Counterfactual If nonsmokers did smoke"))
                    HTH
                    Fernando
                    Last edited by FernandoRios; 03 Jul 2019, 10:51.

                    Comment


                    • #11
                      Hi Fernando,

                      Thank you so much!! This is extremely helpful. Sorry to bother you again but I have just one last question. Hope you can help me out.

                      I have used your command to generate reweighted RIF decomposition.
                      oaxaca_rif lnwage educ exper tenure, by(female) wgt(1) rif(q(50)) rwlogit(educ exper tenure) For my analysis, I need to calculate the contribution of individual characteristics to caste differences in vegetable consumption (similar to table 4 from the paper by Srinivasan et a. (2013)). I want to double check if I'm calculating the contributions correctly (I'm copy pasting the STATA output for one of the covariates - log per capita expenditure for the lowest quantile - Q10).
                      (1) (2) (3) (4) (5) (6) (7)
                      VARIABLES Overall Explained Pure_explained Specif_err Unexplained Pure_Unexplained Reweight_err
                      logmpce 29.611*** -28.232 -68.024 -1.242
                      (3.168) (39.698) (71.985) (2.221)
                      Group_1 108.014***
                      (1.420)
                      Group_c 81.623***
                      (2.991)
                      Group_2 74.248***
                      (1.589)
                      Tdifference 33.766***
                      (2.131)
                      ToT_Explained 26.391***
                      (3.289)
                      ToT_Unexplained 7.375*
                      (4.425)
                      Total 26.391*** 7.375*
                      (3.289) (4.425)
                      Pure_explained 26.246***
                      (4.559)
                      Specif_err 0.144
                      (4.052)
                      Reweight_err 4.192
                      (2.909)
                      Pure_Unexplained 3.183
                      (4.005)
                      For the covariate effect, the contribution of log per capita expenditure is (29.611/26.246)*100= 112.8% For the coefficient effect, the contribution of log per capita expenditure is (-68.024/3.183)*100=-2137% Is this correct? Thank you so much for your help. Samira.

                      Comment


                      • #12
                        Hi Samira
                        I think your interpretation is technically correct. But as you already point out, the relative shares are too large (2000%). Perhaps would be better to refer to the detail decomposition in absolute terms rather than relative, to make a better interpretative of the results.
                        Fernando

                        Comment


                        • #13
                          Thank you so much for your help Fernando. Is there any reason as to why I'm getting such large relative shares?

                          Comment


                          • #14
                            There is no particular reason. nor reason for a problem
                            It simply means that within the coefficient and composition components some characteristics are counterbalancing each other.
                            For example, it is possible that say education has a positive contribution to the gap, whereas education has a negative contribution, but in total, they almost counteract each other. In a case like this, you will find those extremely large relative differences.
                            HTH
                            Fernando

                            Comment


                            • #15
                              Thanks a lot for the explanation !!

                              Comment

                              Working...
                              X