Dear Fernando,
I am using data from Brazil to estimate the wage gap between men and women.
I tried to use the command oaxaca_rif to estimate this gap for different quantiles.
I am particularly interested in the detailed decomposition. I would like to know the contribution of each variable X to the wage gap explained by differences in compositions of the two groups, and the contribution of each variable X to the wage gap attributed to differences in wage structures.
To understand the command, first, I am running a regression with two independent variables only.
My simple example and results are presented below. Unfortunately, in the end, I get an error message.
Would you be so kind as to explain to me why I am not getting the detailed decomposition?
. oaxaca_rif LNwage education hoursWoked, by(man) rif(q(50)) rwlogit(education hoursWoked) noisily
No wgt specified. Using default 0
Estimating Reweighted RIF-OAXACA using RIF:q(50)
Iteration 0: log pseudolikelihood = -136538.64
Iteration 1: log pseudolikelihood = -128750.52
Iteration 2: log pseudolikelihood = -128725.35
Iteration 3: log pseudolikelihood = -128725.35
Logistic regression Number of obs = 200,402
Wald chi2(2) = 12567.67
Prob > chi2 = 0.0000
Log pseudolikelihood = -128725.35 Pseudo R2 = 0.0572
------------------------------------------------------------------------------
| Robust
__000003 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
education | -.1155691 .0012232 -94.48 0.000 -.1179665 -.1131717
hoursWoked | .0334967 .0004398 76.16 0.000 .0326347 .0343587
_cons | .2750013 .020349 13.51 0.000 .235118 .3148846
------------------------------------------------------------------------------
RIF regression group 1
Source | SS df MS Number of obs = 84,823
-------------+---------------------------------- F(2, 84820) = 12952.10
Model | 9217.58863 2 4608.79432 Prob > F = 0.0000
Residual | 30181.8157 84,820 .355833714 R-squared = 0.2340
-------------+---------------------------------- Adj R-squared = 0.2339
Total | 39399.4043 84,822 .464495111 Root MSE = .59652
------------------------------------------------------------------------------
LNwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
education | .0643899 .0005345 120.47 0.000 .0633423 .0654375
hoursWoked | .0143415 .000164 87.46 0.000 .0140201 .0146629
_cons | 6.080057 .0083169 731.05 0.000 6.063756 6.096358
------------------------------------------------------------------------------
Distributional Statistic: q(50)
Sample Mean RIF q(50) : 7.3693
RIF regression counterfactual group
Source | SS df MS Number of obs = 115,579
-------------+---------------------------------- F(2, 115576) = 15169.09
Model | 17041.1926 2 8520.59632 Prob > F = 0.0000
Residual | 64919.9404 115,576 .561707797 R-squared = 0.2079
-------------+---------------------------------- Adj R-squared = 0.2079
Total | 81961.133 115,578 .709141299 Root MSE = .74947
------------------------------------------------------------------------------
LNwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
education | .081595 .0005969 136.69 0.000 .080425 .082765
hoursWoked | .017034 .0001869 91.15 0.000 .0166678 .0174003
_cons | 6.04341 .0095272 634.33 0.000 6.024737 6.062083
------------------------------------------------------------------------------
Distributional Statistic: q(50)
Sample Mean RIF q(50) : 7.6353
RIF regression group 2
Source | SS df MS Number of obs = 115,579
-------------+---------------------------------- F(2, 115576) = 12038.73
Model | 22590.0487 2 11295.0244 Prob > F = 0.0000
Residual | 108436.154 115,576 .938223799 R-squared = 0.1724
-------------+---------------------------------- Adj R-squared = 0.1724
Total | 131026.203 115,578 1.13366041 Root MSE = .96862
------------------------------------------------------------------------------
LNwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
education | .0799026 .0006567 121.66 0.000 .0786154 .0811898
hoursWoked | .0207496 .0002479 83.69 0.000 .0202636 .0212355
_cons | 5.899544 .0118873 496.29 0.000 5.876245 5.922843
------------------------------------------------------------------------------
Distributional Statistic: q(50)
Sample Mean RIF q(50) : 7.5521
overall:group_1 not found
r(111);
When I click on the r(111), I get the text below, but none of them seems to be my problem here.
[P] error . . . . . . . . . . . . . . . . . . . . . . . . Return code 111
__________ not found;
no variables defined;
The variable does not exist. You may have mistyped the
variable's name.
variables out of order;
You specified a varlist containing varname1-varname2, yet
varname1 occurs after varname2. Reverse the order of the
variables if you did not make some other typographical error.
Remember, varname1-varname2 is taken by Stata to mean varname1,
varname2, and all the variables in dataset order in between.
Type describe to see the order of the variables in your dataset.
__________ not found in using data;
You specified a varlist with merge, but the variables on which
you wish to merge are not found in the using dataset, so the
merge is not possible.
__________ ambiguous abbreviation;
You typed an ambiguous abbreviation for a variable in your data.
The abbreviation could refer to more than one variable. Use a
nonambiguous abbreviation, or if you intend all the variables
implied by the ambiguous abbreviation, append a `*' to the end
of the abbreviation.
Many Thanks!
Cassia
P.S.: I saw your e-mail address in the paper Recentered influence functions (RIFs) in Stata and I will take the liberty of sending you the data at this moment.
I am using data from Brazil to estimate the wage gap between men and women.
I tried to use the command oaxaca_rif to estimate this gap for different quantiles.
I am particularly interested in the detailed decomposition. I would like to know the contribution of each variable X to the wage gap explained by differences in compositions of the two groups, and the contribution of each variable X to the wage gap attributed to differences in wage structures.
To understand the command, first, I am running a regression with two independent variables only.
My simple example and results are presented below. Unfortunately, in the end, I get an error message.
Would you be so kind as to explain to me why I am not getting the detailed decomposition?
. oaxaca_rif LNwage education hoursWoked, by(man) rif(q(50)) rwlogit(education hoursWoked) noisily
No wgt specified. Using default 0
Estimating Reweighted RIF-OAXACA using RIF:q(50)
Iteration 0: log pseudolikelihood = -136538.64
Iteration 1: log pseudolikelihood = -128750.52
Iteration 2: log pseudolikelihood = -128725.35
Iteration 3: log pseudolikelihood = -128725.35
Logistic regression Number of obs = 200,402
Wald chi2(2) = 12567.67
Prob > chi2 = 0.0000
Log pseudolikelihood = -128725.35 Pseudo R2 = 0.0572
------------------------------------------------------------------------------
| Robust
__000003 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
education | -.1155691 .0012232 -94.48 0.000 -.1179665 -.1131717
hoursWoked | .0334967 .0004398 76.16 0.000 .0326347 .0343587
_cons | .2750013 .020349 13.51 0.000 .235118 .3148846
------------------------------------------------------------------------------
RIF regression group 1
Source | SS df MS Number of obs = 84,823
-------------+---------------------------------- F(2, 84820) = 12952.10
Model | 9217.58863 2 4608.79432 Prob > F = 0.0000
Residual | 30181.8157 84,820 .355833714 R-squared = 0.2340
-------------+---------------------------------- Adj R-squared = 0.2339
Total | 39399.4043 84,822 .464495111 Root MSE = .59652
------------------------------------------------------------------------------
LNwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
education | .0643899 .0005345 120.47 0.000 .0633423 .0654375
hoursWoked | .0143415 .000164 87.46 0.000 .0140201 .0146629
_cons | 6.080057 .0083169 731.05 0.000 6.063756 6.096358
------------------------------------------------------------------------------
Distributional Statistic: q(50)
Sample Mean RIF q(50) : 7.3693
RIF regression counterfactual group
Source | SS df MS Number of obs = 115,579
-------------+---------------------------------- F(2, 115576) = 15169.09
Model | 17041.1926 2 8520.59632 Prob > F = 0.0000
Residual | 64919.9404 115,576 .561707797 R-squared = 0.2079
-------------+---------------------------------- Adj R-squared = 0.2079
Total | 81961.133 115,578 .709141299 Root MSE = .74947
------------------------------------------------------------------------------
LNwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
education | .081595 .0005969 136.69 0.000 .080425 .082765
hoursWoked | .017034 .0001869 91.15 0.000 .0166678 .0174003
_cons | 6.04341 .0095272 634.33 0.000 6.024737 6.062083
------------------------------------------------------------------------------
Distributional Statistic: q(50)
Sample Mean RIF q(50) : 7.6353
RIF regression group 2
Source | SS df MS Number of obs = 115,579
-------------+---------------------------------- F(2, 115576) = 12038.73
Model | 22590.0487 2 11295.0244 Prob > F = 0.0000
Residual | 108436.154 115,576 .938223799 R-squared = 0.1724
-------------+---------------------------------- Adj R-squared = 0.1724
Total | 131026.203 115,578 1.13366041 Root MSE = .96862
------------------------------------------------------------------------------
LNwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
education | .0799026 .0006567 121.66 0.000 .0786154 .0811898
hoursWoked | .0207496 .0002479 83.69 0.000 .0202636 .0212355
_cons | 5.899544 .0118873 496.29 0.000 5.876245 5.922843
------------------------------------------------------------------------------
Distributional Statistic: q(50)
Sample Mean RIF q(50) : 7.5521
overall:group_1 not found
r(111);
When I click on the r(111), I get the text below, but none of them seems to be my problem here.
[P] error . . . . . . . . . . . . . . . . . . . . . . . . Return code 111
__________ not found;
no variables defined;
The variable does not exist. You may have mistyped the
variable's name.
variables out of order;
You specified a varlist containing varname1-varname2, yet
varname1 occurs after varname2. Reverse the order of the
variables if you did not make some other typographical error.
Remember, varname1-varname2 is taken by Stata to mean varname1,
varname2, and all the variables in dataset order in between.
Type describe to see the order of the variables in your dataset.
__________ not found in using data;
You specified a varlist with merge, but the variables on which
you wish to merge are not found in the using dataset, so the
merge is not possible.
__________ ambiguous abbreviation;
You typed an ambiguous abbreviation for a variable in your data.
The abbreviation could refer to more than one variable. Use a
nonambiguous abbreviation, or if you intend all the variables
implied by the ambiguous abbreviation, append a `*' to the end
of the abbreviation.
Many Thanks!
Cassia
P.S.: I saw your e-mail address in the paper Recentered influence functions (RIFs) in Stata and I will take the liberty of sending you the data at this moment.
Comment