Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Test difference between predicted values

    Hello,

    I have a model which is regressing log wages on being immigrant/native. However, immigrant are splitted into different arrival cohorts. Variables:
    • is051: 1 if immigrant, 0 if native
    • arrival: 0 if pre 1980 immigrant arrival, 1980 if immigrant 1980-84 arrival, 1985 if 1985-89 immigrant arrival, 1990 if 1990-94 immigrant arrival, 1995 if 1995-99 immigrant arrival, 2000 if 2000-04 immigrant arrival, 2005 if 2005-09 immigrant arrival, 2010 if 2010-14 immigrant arrival, 9999 if native;
    • age2=age^2, age3=(age^3)*(10^(-4))

    My regression is:
    Code:
    svy: regress lnhourlyw_w c.age c.age2 c.age3    i.is051#c.age i.is051#c.age2 i.is051#
    > c.age3 i.ib9999.arrival if year==2004
    (running regress on estimation sample)
    
    Survey: Linear regression
    
    Number of strata   =         1    Number of obs     =     10,726
    Number of PSUs     =    10,726    Population size   =  1,317,293
        Design df         =     10,725
        F(  12,  10714)   =     244.61
        Prob > F          =     0.0000
        R-squared         =     0.2189
    
        
    Linearized
    lnhourlyw_w       Coef.   Std. Err.      t    P>t     [95% Conf. Interval]
        
    age    .1339665   .0152074     8.81    0.000     .1041572    .1637757
    age2   -.0022769      .0004    -5.69    0.000    -.0030609    -.001493
    age3    .1261596    .033307     3.79    0.000     .0608718    .1914474
    
    is051#c.age
    foreign     .1088957   .0338962     3.21    0.001     .0424529    .1753385
    
    is051#c.age2
    foreign    -.0024298   .0008359    -2.91    0.004    -.0040683   -.0007913
    
    is051#c.age3
    foreign     .1797601   .0657533     2.73    0.006     .0508716    .3086487
    
    arrival
    pre 1980    -1.962534   .4349641    -4.51    0.000    -2.815144   -1.109924
    1980-84    -1.930812   .4371311    -4.42    0.000    -2.787669   -1.073954
    1985-89    -1.942779   .4417473    -4.40    0.000    -2.808686   -1.076872
    1990-94    -1.943912   .4435777    -4.38    0.000    -2.813407   -1.074418
    1995-99    -1.743931   .4441677    -3.93    0.000    -2.614582   -.8732798
    2000-04    -1.600682   .4372272    -3.66    0.000    -2.457728   -.7436355
    
    _cons    1.260013   .1825067     6.90    0.000     .9022659     1.61776
    I have to predict the the logwage at the age=40 for each immigrant cohort arrival and test the difference to that of natives. My idea looks like this:
    Code:
    predict    lnwage
    (option    xb assumed;    fitted    values)
    Code:
    sum lnwage if arv2000==1 &    age ==40
    
    Variable         Obs    Mean    Std.    Dev.    Min    Max
                        
    lnwage          79    3.800933        0    3.800933    3.800933
    
    . sum lnwage if is051==0 &    age ==40
    
    Variable         Obs    Mean    Std.    Dev.    Min    Max
                        
    lnwage         686    3.782984        0    3.782984    3.782984
    Is there a way to test this difference? Do I have to store
    Code:
    sum lnwage if arv2000==1 &    age ==40
    and
    Code:
    sum lnwage if is051==0 &    age ==40
    and then test with
    Code:
    ttest
    ?



  • #2
    Sorry, one may want to see the coding:
    Code:
    table arrival    if    year==2004
    
            
    arrival        Freq.
            
    pre 1980        576
    1980-84        325
    1985-89        534
    1990-94        901
    1995-99        521
    2000-04        841
    native        7,028
    Code:
    .    table is051    if    year==2004
    
                
        Heimat        Freq.
                
        native        7,028
        foreign        3,698

    Comment


    • #3
      I assume you must stick with the survey design.

      Being this so, you may probably test the difference by: 1)First, using a subpop under - svy - command; 2) then, applying a - lincom - command.
      Best regards,

      Marcos

      Comment


      • #4
        This is somewhat complicated. Your data structure, with is051 = 1 and arrival = 9999 both representing the native subgroup complicates matters considerably, and using both variables in the regression makes it impossible for you to use -margins- afterwards, because -margins- will not understand that is051 is related to arrival. So we need to put everything just in terms of arrival. In addition, there is no reason to create separate variables for age^2 and age^3; factor variable notation will do that for you, and will also enable -margins- to handle the quadratic and linear terms appropriately. You won't get the different scaling on the cubic term out of this, but as the coefficient of the cubic coefficient is, by itself, meaningless, this is a very small price to pay. So I wold code this as:

        Code:
        svy: regress lnhoutlyw_w 9999.arrival##c.age##c.age##c.age i(0 1980 1985 1990 1995 2000 2005 2010).arrival if year == 2004
        margins arrival, at(age = 40) pwcompare
        Spelling out all those levels of the arrival variable is annoying, but, unfortunately, necessary because the previous meniton of 9999.arrival will otherwise cause Stata to neglect the existence of these other values.

        The -margins, pwcompare- output will compare all levels of the arrival cohort variable with each other, but you can just pick out the ones you want to report.

        Comment


        • #5
          Thanks a lot for your inputs.

          I have to use the code I have written since I am duplicating a paper with data from another country.

          Is there no way with predict()?

          Comment


          • #6
            I have to use the code I have written since I am duplicating a paper with data from another country.
            That makes no sense to me. But if it's really true, then use the code that they used in the paper.

            Is there no way with predict()?
            Yes, but it's long, complicated, and error prone. In essence it involves writing your own customized version of -margins-.

            Comment


            • #7
              The Problem is that the code is not available. The author only write how his regression looks like. He writes:
              I use the 1970 census to compare the wage of the typical worker in the 1965-69 immigrant wave to that of natives aged 25-64. I then use the 1990 census to again compare the earnings of the same immigrants (.i.e. those who arrived between 1965-1969) to natives aged 25-64. Because the typical immigrant cohort is aging while the age composition of the native base is held (roughly) constant, the rate of wage growth overstates the actual wage growth.

              To avoid this bias, I calculated the relative wage of immigrants after adjusting for differences in the age composition of the native and immigrant population. In each census cross-section, I estimated a regression of the worker's log wage on age (introduced as a third-order polynomial), on dummy variables indicating if the worker is an immigrant and which cohort he belongs to, and on interactions of the age variables with the immigrant dummy. The age-adjusted wage differential between immigrants and natives is then evaluated at the age of 40 (which is approximately the mean age of the immigrant sample in both 1980 and 1990).
              I think my coding of the regression should look correct?

              Maybe this helps:
              Code:
              .    tab arrival    is051 if year==2004
              
                                          Heimat
                  arrival         native    foreign    Total
                          
                  pre 1980        0          576    576
                  1980-84         0          325    325
                  1985-89         0          534    534
                  1990-94         0          901    901
                  1995-99         0          521    521
                  2000-04         0           841    841
                  native        7,028           0    7,028
                          
                  Total        7,028      3,698    10,726
              Last edited by Anshul Anand; 26 Nov 2018, 07:01.

              Comment


              • #8
                Well, if the original code is not available, then it seems to me that you are free to use any code that implements the approach the author described in your quote in #7. The code I suggested in #4 does that (at least for the second paragraph of your quote--it is, evidently, indifferent to the source of the data.)

                Comment


                • #9
                  Thank you very much, Sir! I think maybe I should try to implement a simple framework by myself rather than duplicating a paper.

                  Comment

                  Working...
                  X