Hello!
I'm replicating the findings of Autor, Katz and Kearney 2008, specifically table 1 and figures 1-3.Ive hit a wall with the generation of predicted values. The table describes the regression that generates the predicted values. AKK generate a predicted value of the log of the weekly wage for combinations of year, sex, education group, and experience group. There are 43 years, two sexes, five education groups, and four experience groups, so there are 1,720 cells with predicted values.
This is what I HOPE to accomplish.
Run one regression using the statsby prefix, as well as by(sex year). Have statsby save the regression coefficients, so statsby ..., by(): regress ... creates a data set with 86 observations, each with a full set of regression coefficients.
Next you create 20 prediction variables, one for each combination of education group and experience group. Use generate to multiply the coefficient estimates, which are variables here, by numbers, which are values of the original variables. (You're just evaluating the regression at 20 different values of the X matrix.) Each of the 20 generate commands treats races and regions the same: set race to white, which means you can ignore the race effects; define macros with the region proportions, and use the region proportions here. What varies across the 20 generate commands is the values of education and experience dummies. Across the 20 generate commands, the only things that change are the zeros and ones associated with education and experience groups.
Ive ran the Statby regression and my code looks like this:
statsby _b, by(female year) saving(coefs, replace): ///
regress lrwwage i.potential_education_group i.potential_experience_group ///
i.region_group race_dummy
however, for some reason 3 coefficient variables _stat_1 , _stat_6, and _stat_10 are all 0
. tabulate _stat_1
_b[1b.poten |
tial_educat |
ion_group] | Freq. Percent Cum.
------------+-----------------------------------
0 | 84 100.00 100.00
------------+-----------------------------------
Total | 84 100.00
. tabulate _stat_6
_b[0b.poten |
tial_experi |
ence_group] | Freq. Percent Cum.
------------+-----------------------------------
0 | 84 100.00 100.00
------------+-----------------------------------
Total | 84 100.00
. tabulate _stat_10
_b[1b.regio |
n_group] | Freq. Percent Cum.
------------+-----------------------------------
0 | 84 100.00 100.00
------------+-----------------------------------
Total | 84 100.00
.
Looking for help figuring out why that is and tips on formating the generate command to create the 20 prediction variables. Anything helps! Thank you!
I'm replicating the findings of Autor, Katz and Kearney 2008, specifically table 1 and figures 1-3.Ive hit a wall with the generation of predicted values. The table describes the regression that generates the predicted values. AKK generate a predicted value of the log of the weekly wage for combinations of year, sex, education group, and experience group. There are 43 years, two sexes, five education groups, and four experience groups, so there are 1,720 cells with predicted values.
This is what I HOPE to accomplish.
Run one regression using the statsby prefix, as well as by(sex year). Have statsby save the regression coefficients, so statsby ..., by(): regress ... creates a data set with 86 observations, each with a full set of regression coefficients.
Next you create 20 prediction variables, one for each combination of education group and experience group. Use generate to multiply the coefficient estimates, which are variables here, by numbers, which are values of the original variables. (You're just evaluating the regression at 20 different values of the X matrix.) Each of the 20 generate commands treats races and regions the same: set race to white, which means you can ignore the race effects; define macros with the region proportions, and use the region proportions here. What varies across the 20 generate commands is the values of education and experience dummies. Across the 20 generate commands, the only things that change are the zeros and ones associated with education and experience groups.
Ive ran the Statby regression and my code looks like this:
statsby _b, by(female year) saving(coefs, replace): ///
regress lrwwage i.potential_education_group i.potential_experience_group ///
i.region_group race_dummy
however, for some reason 3 coefficient variables _stat_1 , _stat_6, and _stat_10 are all 0
. tabulate _stat_1
_b[1b.poten |
tial_educat |
ion_group] | Freq. Percent Cum.
------------+-----------------------------------
0 | 84 100.00 100.00
------------+-----------------------------------
Total | 84 100.00
. tabulate _stat_6
_b[0b.poten |
tial_experi |
ence_group] | Freq. Percent Cum.
------------+-----------------------------------
0 | 84 100.00 100.00
------------+-----------------------------------
Total | 84 100.00
. tabulate _stat_10
_b[1b.regio |
n_group] | Freq. Percent Cum.
------------+-----------------------------------
0 | 84 100.00 100.00
------------+-----------------------------------
Total | 84 100.00
.
Looking for help figuring out why that is and tips on formating the generate command to create the 20 prediction variables. Anything helps! Thank you!
Comment