Dear all,
I am currently working on a project that makes use of OB decomposition to decompose the test score gap that exists between private and public schools.
I especially want to apply a OB-RIF decomposition to look at the evolution of explained vs. unexplained educational inputs (e.g teachers' characteristics) across the score gap distribution.
I read and tried to understand as much as possible Jann (2008) and Rios-Avila (2020) regarding the two OB (resp. "oaxaca" and "oaxaca_rif" cmds).
- My first question lies on the use of the standard (unweighted) RIF decomposition. Jann (2008) in "oaxaca" cmd allows the use of a "pooled" option when calculating twofold decomposition but in "oaxaca_rif" cmd, there is no such option, requiring to use either w(1) or w(0) and that would say that we consider that discrimination is directed toward only one group. In my specific setting group A = Private, group B = Public such that I use w(1) but, does that really make sense? Is it normal not to have the "pooled" option in "oaxaca_rif" cmd, at least for estimation at the mean ?
- My second question goes to the arbitrage between using the standard or reweighted OB-RIF in the case where I don't have an optimal reweighted model to deal with. Namely, when using "oaxaca_rif" cmd with a "rwlogit" model, some specification errors are significant (not at the mean but at Q10, Q20 and Q80 only). As stated by Rios-Avila (2019), it suggests a misspecification of the latent model. However I cannot do better in this model. What would you do? Go for this reweighted model anyway since it is important to estimate as least approximately the counterfactual distribution or go for an unweighted model?
For information, I join this graph that shows how explained and unexplained parts vary across the distribution of the scoring gap, where we see that the results are quite sensitive to the model used. And I am then a bit lost. I prefer the reweighted model by its capacity to estimate a counterfactual distribution (even though flawed, at least for some quantiles) but still, I'm wondering if it's the good decision or if there is kind of a rule of thumb to go to the reweighted or the standard model
Thank you for any answer you could provide on these questions,
Best,

I am currently working on a project that makes use of OB decomposition to decompose the test score gap that exists between private and public schools.
I especially want to apply a OB-RIF decomposition to look at the evolution of explained vs. unexplained educational inputs (e.g teachers' characteristics) across the score gap distribution.
I read and tried to understand as much as possible Jann (2008) and Rios-Avila (2020) regarding the two OB (resp. "oaxaca" and "oaxaca_rif" cmds).
- My first question lies on the use of the standard (unweighted) RIF decomposition. Jann (2008) in "oaxaca" cmd allows the use of a "pooled" option when calculating twofold decomposition but in "oaxaca_rif" cmd, there is no such option, requiring to use either w(1) or w(0) and that would say that we consider that discrimination is directed toward only one group. In my specific setting group A = Private, group B = Public such that I use w(1) but, does that really make sense? Is it normal not to have the "pooled" option in "oaxaca_rif" cmd, at least for estimation at the mean ?
- My second question goes to the arbitrage between using the standard or reweighted OB-RIF in the case where I don't have an optimal reweighted model to deal with. Namely, when using "oaxaca_rif" cmd with a "rwlogit" model, some specification errors are significant (not at the mean but at Q10, Q20 and Q80 only). As stated by Rios-Avila (2019), it suggests a misspecification of the latent model. However I cannot do better in this model. What would you do? Go for this reweighted model anyway since it is important to estimate as least approximately the counterfactual distribution or go for an unweighted model?
For information, I join this graph that shows how explained and unexplained parts vary across the distribution of the scoring gap, where we see that the results are quite sensitive to the model used. And I am then a bit lost. I prefer the reweighted model by its capacity to estimate a counterfactual distribution (even though flawed, at least for some quantiles) but still, I'm wondering if it's the good decision or if there is kind of a rule of thumb to go to the reweighted or the standard model
Thank you for any answer you could provide on these questions,
Best,
Comment