I am working on the impact of education on wage inequality between 2002 and 2010. Knight and Sabot (1983) carried out a simulation method where they break down the inequality into two effects - compression and composition (from the Kuznets effect). I have attached the paper if anyone wants to explore it. I am trying to do this method using two data sets - one from 2002 and the other from 2010. However, my method seems to be giving incorrect results on stata. I would really appreciate it if someone could help me out with this. I will explain my method in detail below:
Methodology in Stata:
“_b[E1_10]” is the saved coefficient from Equation 1 and “[E1_02]” is the 2002 education coefficient
The same is done for E2, E3, E4 and E5
These new variables (E1_coeff, E2_coeff, E3_coeff, E4_coeff and E5_coeff) are substituted into the year 2002 earnings function to replace the original variables E1_02, E2_02, E3_02, E4_02 and E5_02.
Predicted wages and the variance for the predicted wages are obtained.
Methodology in Stata:
Another set of variables were created: gen E1_02_comp=E1_proportion*E1_02
The same is done for E2, E3, E4 and E5
These variables (E1_02_comp, E2_02_comp, E3_02_comp, E4_02_comp and E5_02_comp) are substituted into the year 2002 regression to replace the original variables E1_02, E2_02, E3_02, E4_02 and E5_02.
Predicted wages and the variance for the predicted wages are obtained.
The same is done for E2, E3, E4 and E5
These new variables (E1_02_ineq, E2_02_ineq, E3_02_ineq, E4_02_ineq and E5_02_ineq) are substituted into the year 2002 earnings function to replace the original variables E1_02, E2_02, E3_02, E4_02 and E5_02.
Predicted wages and the variance for the predicted wages are obtained.
I found these errors in my work:
The predicted wages for each effect are the same, so it results in the same variance obtained for each effect. Is there something wrong in the way I measure predicted wages? I just use the command "predict wage_hat, xb". To obtain the variance, I use the command "summarize (wage_hat), detail" and identify the variance in this summary.
Any feedback on my method will be greatly appreciated.
- Put the 2 data sets in one file for the years 2002 and 2010 (hereafter referred to as 02 and 10).
- Create dummy variables for all education levels where E1=no education, E2=primary, E3=lower secondary, E4=upper secondary and E5=tertiary
- The focus is on wage inequality, so log wages were calculated
- Estimate an earnings function
Methodology in Stata:
“_b[E1_10]” is the saved coefficient from Equation 1 and “[E1_02]” is the 2002 education coefficient
The same is done for E2, E3, E4 and E5
These new variables (E1_coeff, E2_coeff, E3_coeff, E4_coeff and E5_coeff) are substituted into the year 2002 earnings function to replace the original variables E1_02, E2_02, E3_02, E4_02 and E5_02.
Predicted wages and the variance for the predicted wages are obtained.
- Composition effect: Generate proportion variables for each education category.
Methodology in Stata:
Another set of variables were created: gen E1_02_comp=E1_proportion*E1_02
The same is done for E2, E3, E4 and E5
These variables (E1_02_comp, E2_02_comp, E3_02_comp, E4_02_comp and E5_02_comp) are substituted into the year 2002 regression to replace the original variables E1_02, E2_02, E3_02, E4_02 and E5_02.
Predicted wages and the variance for the predicted wages are obtained.
- The resulting wage inequality is the combination of both effects:
The same is done for E2, E3, E4 and E5
These new variables (E1_02_ineq, E2_02_ineq, E3_02_ineq, E4_02_ineq and E5_02_ineq) are substituted into the year 2002 earnings function to replace the original variables E1_02, E2_02, E3_02, E4_02 and E5_02.
Predicted wages and the variance for the predicted wages are obtained.
I found these errors in my work:
The predicted wages for each effect are the same, so it results in the same variance obtained for each effect. Is there something wrong in the way I measure predicted wages? I just use the command "predict wage_hat, xb". To obtain the variance, I use the command "summarize (wage_hat), detail" and identify the variance in this summary.
Any feedback on my method will be greatly appreciated.
Comment