Test difference between predicted values

Anshul Anand

Join Date: May 2015
Posts: 113

Test difference between predicted values

24 Nov 2018, 12:46

Hello,

I have a model which is regressing log wages on being immigrant/native. However, immigrant are splitted into different arrival cohorts. Variables:

is051: 1 if immigrant, 0 if native
arrival: 0 if pre 1980 immigrant arrival, 1980 if immigrant 1980-84 arrival, 1985 if 1985-89 immigrant arrival, 1990 if 1990-94 immigrant arrival, 1995 if 1995-99 immigrant arrival, 2000 if 2000-04 immigrant arrival, 2005 if 2005-09 immigrant arrival, 2010 if 2010-14 immigrant arrival, 9999 if native;
age2=age^2, age3=(age^3)*(10^(-4))

My regression is:

Code:

svy: regress lnhourlyw_w c.age c.age2 c.age3    i.is051#c.age i.is051#c.age2 i.is051#
> c.age3 i.ib9999.arrival if year==2004
(running regress on estimation sample)

Survey: Linear regression

Number of strata   =         1    Number of obs     =     10,726
Number of PSUs     =    10,726    Population size   =  1,317,293
    Design df         =     10,725
    F(  12,  10714)   =     244.61
    Prob > F          =     0.0000
    R-squared         =     0.2189

    
Linearized
lnhourlyw_w       Coef.   Std. Err.      t    P>t     [95% Conf. Interval]
    
age    .1339665   .0152074     8.81    0.000     .1041572    .1637757
age2   -.0022769      .0004    -5.69    0.000    -.0030609    -.001493
age3    .1261596    .033307     3.79    0.000     .0608718    .1914474

is051#c.age
foreign     .1088957   .0338962     3.21    0.001     .0424529    .1753385

is051#c.age2
foreign    -.0024298   .0008359    -2.91    0.004    -.0040683   -.0007913

is051#c.age3
foreign     .1797601   .0657533     2.73    0.006     .0508716    .3086487

arrival
pre 1980    -1.962534   .4349641    -4.51    0.000    -2.815144   -1.109924
1980-84    -1.930812   .4371311    -4.42    0.000    -2.787669   -1.073954
1985-89    -1.942779   .4417473    -4.40    0.000    -2.808686   -1.076872
1990-94    -1.943912   .4435777    -4.38    0.000    -2.813407   -1.074418
1995-99    -1.743931   .4441677    -3.93    0.000    -2.614582   -.8732798
2000-04    -1.600682   .4372272    -3.66    0.000    -2.457728   -.7436355

_cons    1.260013   .1825067     6.90    0.000     .9022659     1.61776

I have to predict the the logwage at the age=40 for each immigrant cohort arrival and test the difference to that of natives. My idea looks like this:

Code:

predict    lnwage
(option    xb assumed;    fitted    values)

Code:

sum lnwage if arv2000==1 &    age ==40

Variable         Obs    Mean    Std.    Dev.    Min    Max
                    
lnwage          79    3.800933        0    3.800933    3.800933

. sum lnwage if is051==0 &    age ==40

Variable         Obs    Mean    Std.    Dev.    Min    Max
                    
lnwage         686    3.782984        0    3.782984    3.782984

Is there a way to test this difference? Do I have to store

Code:

sum lnwage if arv2000==1 &    age ==40

and

Code:

sum lnwage if is051==0 &    age ==40

and then test with

Code:

ttest

Tags: None

Anshul Anand

Join Date: May 2015
Posts: 113

24 Nov 2018, 13:59

Sorry, one may want to see the coding:

Code:

table arrival    if    year==2004

        
arrival        Freq.
        
pre 1980        576
1980-84        325
1985-89        534
1990-94        901
1995-99        521
2000-04        841
native        7,028

Code:

.    table is051    if    year==2004

            
    Heimat        Freq.
            
    native        7,028
    foreign        3,698

Comment

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

24 Nov 2018, 15:08

I assume you must stick with the survey design.

Being this so, you may probably test the difference by: 1)First, using a subpop under - svy - command; 2) then, applying a - lincom - command.

Best regards,

Marcos
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#4

24 Nov 2018, 15:08

This is somewhat complicated. Your data structure, with is051 = 1 and arrival = 9999 both representing the native subgroup complicates matters considerably, and using both variables in the regression makes it impossible for you to use -margins- afterwards, because -margins- will not understand that is051 is related to arrival. So we need to put everything just in terms of arrival. In addition, there is no reason to create separate variables for age^2 and age^3; factor variable notation will do that for you, and will also enable -margins- to handle the quadratic and linear terms appropriately. You won't get the different scaling on the cubic term out of this, but as the coefficient of the cubic coefficient is, by itself, meaningless, this is a very small price to pay. So I wold code this as:

Code:

svy: regress lnhoutlyw_w 9999.arrival##c.age##c.age##c.age i(0 1980 1985 1990 1995 2000 2005 2010).arrival if year == 2004 margins arrival, at(age = 40) pwcompare

Spelling out all those levels of the arrival variable is annoying, but, unfortunately, necessary because the previous meniton of 9999.arrival will otherwise cause Stata to neglect the existence of these other values.

The -margins, pwcompare- output will compare all levels of the arrival cohort variable with each other, but you can just pick out the ones you want to report.
Comment
Anshul Anand

Join Date: May 2015

Posts: 113
#5

24 Nov 2018, 15:37

Thanks a lot for your inputs.

I have to use the code I have written since I am duplicating a paper with data from another country.

Is there no way with predict()?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#6

24 Nov 2018, 16:43

I have to use the code I have written since I am duplicating a paper with data from another country.

That makes no sense to me. But if it's really true, then use the code that they used in the paper.

Is there no way with predict()?

Yes, but it's long, complicated, and error prone. In essence it involves writing your own customized version of -margins-.
Comment
Anshul Anand

Join Date: May 2015

Posts: 113
#7

26 Nov 2018, 06:53

The Problem is that the code is not available. The author only write how his regression looks like. He writes:

I use the 1970 census to compare the wage of the typical worker in the 1965-69 immigrant wave to that of natives aged 25-64. I then use the 1990 census to again compare the earnings of the same immigrants (.i.e. those who arrived between 1965-1969) to natives aged 25-64. Because the typical immigrant cohort is aging while the age composition of the native base is held (roughly) constant, the rate of wage growth overstates the actual wage growth.

To avoid this bias, I calculated the relative wage of immigrants after adjusting for differences in the age composition of the native and immigrant population. In each census cross-section, I estimated a regression of the worker's log wage on age (introduced as a third-order polynomial), on dummy variables indicating if the worker is an immigrant and which cohort he belongs to, and on interactions of the age variables with the immigrant dummy. The age-adjusted wage differential between immigrants and natives is then evaluated at the age of 40 (which is approximately the mean age of the immigrant sample in both 1980 and 1990).

I think my coding of the regression should look correct?

Maybe this helps:

Code:

. tab arrival is051 if year==2004 Heimat arrival native foreign Total pre 1980 0 576 576 1980-84 0 325 325 1985-89 0 534 534 1990-94 0 901 901 1995-99 0 521 521 2000-04 0 841 841 native 7,028 0 7,028 Total 7,028 3,698 10,726

Last edited by Anshul Anand; 26 Nov 2018, 07:01.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#8

26 Nov 2018, 09:23

Well, if the original code is not available, then it seems to me that you are free to use any code that implements the approach the author described in your quote in #7. The code I suggested in #4 does that (at least for the second paragraph of your quote--it is, evidently, indifferent to the source of the data.)
1 like
Comment
Anshul Anand

Join Date: May 2015

Posts: 113
#9

27 Nov 2018, 14:49

Thank you very much, Sir! I think maybe I should try to implement a simple framework by myself rather than duplicating a paper.
Comment

Announcement

Test difference between predicted values

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment