Advice on how to implement an inequality method in Stata

Anuththari Bandara

Join Date: Jan 2015

Posts: 28
#1

Advice on how to implement an inequality method in Stata

17 Feb 2015, 01:45

I am working on the impact of education on wage inequality between 2002 and 2010. Knight and Sabot (1983) carried out a simulation method where they break down the inequality into two effects - compression and composition (from the Kuznets effect). I have attached the paper if anyone wants to explore it. I am trying to do this method using two data sets - one from 2002 and the other from 2010. However, my method seems to be giving incorrect results on stata. I would really appreciate it if someone could help me out with this. I will explain my method in detail below:
Put the 2 data sets in one file for the years 2002 and 2010 (hereafter referred to as 02 and 10).

Create dummy variables for all education levels where E1=no education, E2=primary, E3=lower secondary, E4=upper secondary and E5=tertiary

The focus is on wage inequality, so log wages were calculated

Estimate an earnings function

5.Compression effect: Generate education coefficients (from the year 2010 regression) and substituted those in the other (year 2002) regression.

Methodology in Stata:
“_b[E1_10]” is the saved coefficient from Equation 1 and “[E1_02]” is the 2002 education coefficient
The same is done for E2, E3, E4 and E5
These new variables (E1_coeff, E2_coeff, E3_coeff, E4_coeff and E5_coeff) are substituted into the year 2002 earnings function to replace the original variables E1_02, E2_02, E3_02, E4_02 and E5_02.
Predicted wages and the variance for the predicted wages are obtained.
Composition effect: Generate proportion variables for each education category.

“λ” is the proportion of the educational distribution in years 2010 with respect to year 2002. So it represents the change in the educational distribution between the two years.

Methodology in Stata:
Another set of variables were created: gen E1_02_comp=E1_proportion*E1_02
The same is done for E2, E3, E4 and E5
These variables (E1_02_comp, E2_02_comp, E3_02_comp, E4_02_comp and E5_02_comp) are substituted into the year 2002 regression to replace the original variables E1_02, E2_02, E3_02, E4_02 and E5_02.
Predicted wages and the variance for the predicted wages are obtained.
The resulting wage inequality is the combination of both effects:

Methodology in Stata: gen E1_02_ineq=E1_coeff*E1_proportion
The same is done for E2, E3, E4 and E5
These new variables (E1_02_ineq, E2_02_ineq, E3_02_ineq, E4_02_ineq and E5_02_ineq) are substituted into the year 2002 earnings function to replace the original variables E1_02, E2_02, E3_02, E4_02 and E5_02.
Predicted wages and the variance for the predicted wages are obtained.

I found these errors in my work:

The predicted wages for each effect are the same, so it results in the same variance obtained for each effect. Is there something wrong in the way I measure predicted wages? I just use the command "predict wage_hat, xb". To obtain the variance, I use the command "summarize (wage_hat), detail" and identify the variance in this summary.

Any feedback on my method will be greatly appreciated.

JSTOR: An Error Occurred Setting Your User Cookie

http://www.jstor.org
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35803
#2

17 Feb 2015, 01:59

The terms and conditions of accessing www.jstor.org do not, I believe, include permission to post .pdf copies in other
forums. The issue arises so long as this site is accessible to people who are not Jstor users, which is the case.

http://www.jstor.org/page/info/about/policies/terms.jsp

5. Prohibited Uses of the Content.

Institutions and users may not:

... (b) ... provide and/or authorize access to the Content available through Individual Access, the Publisher Sales Service, or other programs to persons or entities other than Authorized Users;

Last edited by Nick Cox; 17 Feb 2015, 02:07.
Comment
Anuththari Bandara

Join Date: Jan 2015

Posts: 28
#3

19 Feb 2015, 02:14

It's deleted. Any other advice, other than on the rules?
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1439
#4

19 Feb 2015, 02:40

Advice: provide details of your Stata code, and do so using CODE delimiters (as explained in the Forum FAQ). Without specific details, how can readers try and work out where the "errors" in your work are?
Comment
Anuththari Bandara

Join Date: Jan 2015

Posts: 28
#5

20 Feb 2015, 02:05

Yes, I did post the method I used in my initial post, but I guess that isn't enough so I will produce the code below (for the year 2002):

COMPRESSION EFFECT:

regress ln_wage no_educ pri lower_sec upper_sec tert experience experience2 urban sinhalese if female==0 ///this is the 2010 regression///
gen no_educ_coeff = _b[no_educ] * no_educ2 ///no_educ2, pri2, etc are dummy variables from 2002///
gen pri_coeff = _b[pri] * pri2
gen lowersec_coeff = _b[lower_sec] * lower_sec2
gen uppersec_coeff = _b[upper_sec] * upper_sec2
gen tert_coeff = _b[tert] * tert2
regress ln_wage2 no_educ_coeff pri_coeff lowersec_coeff uppersec_coeff tert_coeff experience experience2 urban sinhalese if female==0 ///substituting the above coefficients in 2002 reg///
predict x_hat, xb
sum ( x_hat)if female==0 & ln_wage2!=., detail

COMPOSITION EFFECT:

///obtaining the proportion in each level of education for 2002 and 2010, then generating the proportion variables///
count if ln_wage!=.
count if no_educ==1 & ln_wage!=.
count if pri==1 & ln_wage!=.
count if lower_sec ==1 & ln_wage!=.
count if upper_sec ==1 & ln_wage!=.
count if tert==1 & ln_wage!=.
count if ln_wage2!=.
count if no_educ2==1 & ln_wage2!=.
count if pri2 ==1 & ln_wage2!=.
count if lower_sec2 ==1 & ln_wage2!=.
count if upper_sec2 ==1 & ln_wage2!=.
count if tert2 ==1 & ln_wage2!=.
gen no_educ_prop=(1014/26874)/(1398/25835)
gen pri_prop=(5158/26874)/(5574/25835)
gen lower_sec_prop=(5909/26874)/(6336/25835)
gen upper_sec_prop=(13614/26874)/(11762/25835)
gen tert_prop=(1066/26874)/(765/25835)
gen no_educ4=no_educ2* no_educ_prop
gen primary4=pri2* pri_prop
gen lower_sec4=lower_sec2* lower_sec_prop
gen upper_sec4=upper_sec2* upper_sec_prop
gen tert4=tert2* tert_prop
regress ln_wage2 no_educ4 primary4 lower_sec4 upper_sec4 tert4 experience experience2 urban sinhalese if female==0
predict y_hat, xb
summarize( y_hat ) if female==0 & ln_wage2!=., detail

OVERALL EFFECT:

gen no_educ5= no_educ_prop*no_educ_coeff
gen primary5= pri_prop*pri_coeff
gen lower_sec5= lower_sec_prop *lowersec_coeff
gen upper_sec5= upper_sec_prop *uppersec_coeff
gen tert5= tert_prop *tert_coeff
regress ln_wage2 no_educ5 primary5 lower_sec5 upper_sec5 tert5 experience experience2 urban sinhalese if female==0
predict z_hat, xb
summarize( z_hat ) if female==0 & ln_wage2!=., detail

I think there is something wrong in the way I have predicted x_hat, y_hat and z_hat as I get the same linear predictions for each effect. Also, I need to calculate the variance of the predicted values for each effect but I don't know whether the command "summarize( y_hat ), detail" (for example) is the right command.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1439
#6

20 Feb 2015, 02:32

You haven't used CODE delimiters. Please read the Forum FAQ on why use of CODE delimiters is requested. From a quick glance, your self-diagnosis about calculating predicted values is correct. You might get more buy-in from readers if you managed to (a) simplify your example so its essence was much clearer, and (b) did this with a data set that all have access to.
Comment
Anuththari Bandara

Join Date: Jan 2015

Posts: 28
#7

20 Feb 2015, 05:07

Originally posted by Stephen Jenkins View Post

You haven't used CODE delimiters. Please read the Forum FAQ on why use of CODE delimiters is requested. From a quick glance, your self-diagnosis about calculating predicted values is correct. You might get more buy-in from readers if you managed to (a) simplify your example so its essence was much clearer, and (b) did this with a data set that all have access to.

I will try to make it clearer.

I have two samples - from 2002 and 2010.

I first estimate a wage function: w=a+b*e + c*z +u
where "e" represents the dummy variables for each level of education, "z" is a set of independent variables, and "u" is the error term
The estimated earnings function is used to predict the wage of each individual worker (w_hat) from his set of characteristics; the inequality of predicted earnings is then measured.

Secondly, the compression effect was looked at: The wages of workers in each sample are predicted using the education coefficients estimated for the other samples instead of the actual coefficients.
The estimated model is: w_hat02 = a_02 + b_10*e_02 + c_02*z_02
where"02" represents the data set of 2002, and 2010 represents the coefficients from the 2010 regression (given in the first equation).

My problem is, if I just multiply the coefficients of the 2010 regression (say, _b[primary]) with the dummy variables for 2002, it will change the coefficients of my education variables in the second equation but nothing else, since essentially, I'm just multiplying the same values (_b[primary]) by zeros and ones (dummy). It will not change the fit of the model. I will still get the same R_squared.

So I am doing something wrong in my technique.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1439
#8

20 Feb 2015, 07:00

Here's some pseudo-code

Code:

1. use sample2002.dta 2. regress w ed1 ed2 z1 z2 3. use sample2010.dta, clear // assuming w ed1 ed2 z1 z2 present in these data too 4. predict w10_02, xb // predicted log wages in 2010, using fitted coefficients from 2002 5. save results 6. regress w ed1 ed2 z1 z2 // get 2010 coeffs 7. use sample2002.dta, clear 8. predict w02_10, xb // predicted log wages in 2002, using fitted coefficients from 2010 9. save results 10. Manipulate and then compare the two sets of "results"

This pseudo-code generates predicted ("expected") wage values using covariate values from the 'current' sample and coefficients from the 'other' sample. Counterfactual comparisons of inequality in w are more problematic because the variance of w also depends on the var(residual), and this variance will differ between 2002 and 2010. There are various ways of handling this, some of which are employed in the imputation literature.
Comment
Anuththari Bandara

Join Date: Jan 2015

Posts: 28
#9

20 Feb 2015, 09:01

Originally posted by Stephen Jenkins View Post

Here's some pseudo-code

Code:

1. use sample2002.dta 2. regress w ed1 ed2 z1 z2 3. use sample2010.dta, clear // assuming w ed1 ed2 z1 z2 present in these data too 4. predict w10_02, xb // predicted log wages in 2010, using fitted coefficients from 2002 5. save results 6. regress w ed1 ed2 z1 z2 // get 2010 coeffs 7. use sample2002.dta, clear 8. predict w02_10, xb // predicted log wages in 2002, using fitted coefficients from 2010 9. save results 10. Manipulate and then compare the two sets of "results"

This pseudo-code generates predicted ("expected") wage values using covariate values from the 'current' sample and coefficients from the 'other' sample. Counterfactual comparisons of inequality in w are more problematic because the variance of w also depends on the var(residual), and this variance will differ between 2002 and 2010. There are various ways of handling this, some of which are employed in the imputation literature.

Thank you! That was really useful!
Comment

Announcement

Advice on how to implement an inequality method in Stata

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment