Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sample selection in the control function approach

    I am trying to understand which sample it is correct to use in the first stage when estimating the models using the control function (CF) approach and lagged explanatory variables. Below, I explain in detail what I mean.

    The CF approach is an alternative to xtivreg, fe estimation. Suppose X is an endogenous independent variable. In the CF approach, we first run
    xtreg X Z C1 C2, fe, where
    C1 and C2 are controls from the first stage and Z is an instrument); then predict residuals with
    predict CF, resid
    and then insert CF in the first stage:
    xtreg Y X C1 C2 CF, fe
    In this case, coefficients for X, C1, and C2 should be the same in both xtreg Y X C1 C2 CF, fe and xtivreg Y C1 C2 (X = Z), fe, while standard errors will differ if we do not adjust the ones from xtreg, fe via bootsrapping (I did not use bootstrapping in order not to create additional confusion).

    Indeed, here are the results of xtreg, fe and xtivreg, fe I derived using the nlswork data:

    xtreg, fe (errors not bootstrapped)
    Code:
    webuse nlswork, clear
    quietly xtreg tenure union south age c.age#c.age not_smsa, fe
    predict cf, resid
    xtreg ln_w tenure age c.age#c.age not_smsa cf, fe
    
    Fixed-effects (within) regression               Number of obs     =     19,007
    Group variable: idcode                          Number of groups  =      4,134
    
    R-sq:                                           Obs per group:
         within  = 0.1328                                         min =          1
         between = 0.2365                                         avg =        4.6
         overall = 0.2073                                         max =         12
    
                                                    F(5,14868)        =     455.53
    corr(u_i, Xb)  = 0.2033                         Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          tenure |   .2403531   .0151385    15.88   0.000     .2106797    .2700264
             age |   .0118437   .0036499     3.24   0.001     .0046894     .018998
                 |
     c.age#c.age |  -.0012145   .0000798   -15.22   0.000    -.0013709    -.001058
                 |
        not_smsa |  -.0167178   .0137527    -1.22   0.224    -.0436748    .0102393
              cf |  -.2227325   .0151602   -14.69   0.000    -.2524484   -.1930167
           _cons |   1.678287   .0659452    25.45   0.000     1.549027    1.807548
    -------------+----------------------------------------------------------------
         sigma_u |  .38999138
         sigma_e |  .25552281
             rho |  .69964877   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(4133, 14868) = 8.30                 Prob > F = 0.0000
    xtivreg, fe:
    Code:
    xtivreg ln_w age c.age#c.age not_smsa (tenure = union south), fe
    
    Fixed-effects (within) IV regression            Number of obs     =     19,007
    Group variable: idcode                          Number of groups  =      4,134
    
    R-sq:                                           Obs per group:
         within  =      .                                         min =          1
         between = 0.1304                                         avg =        4.6
         overall = 0.0897                                         max =         12
    
                                                    Wald chi2(4)      =  147926.58
    corr(u_i, Xb)  = -0.6843                        Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
         ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          tenure |   .2403531   .0373419     6.44   0.000     .1671643    .3135419
             age |   .0118437   .0090032     1.32   0.188    -.0058023    .0294897
                 |
     c.age#c.age |  -.0012145   .0001968    -6.17   0.000    -.0016003   -.0008286
                 |
        not_smsa |  -.0167178   .0339236    -0.49   0.622    -.0832069    .0497713
           _cons |   1.678287   .1626657    10.32   0.000     1.359468    1.997106
    -------------+----------------------------------------------------------------
         sigma_u |  .70661941
         sigma_e |  .63029359
             rho |  .55690561   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F  test that all u_i=0:     F(4133,14869) =     1.44      Prob > F    = 0.0000
    ------------------------------------------------------------------------------
    Instrumented:   tenure
    Instruments:    age c.age#c.age not_smsa union south
    ------------------------------------------------------------------------------
    As you could see, coefficients are the same, just standard errors differ (standard errors equalize once bootstrapped that confirms that both approaches yield the exact same results when the same instrument is used).

    However, my question is which sample in the first stage it is correct to use once our explanatory variables are lagged?

    When explanatory variable are (one year) lagged, fixed-effects IV estimator produces the following:
    Code:
    xtivreg ln_w l.age cl.age#cl.age l.not_smsa (l.tenure = l.union l.south), fe
    
    Fixed-effects (within) IV regression            Number of obs     =      7,500
    Group variable: idcode                          Number of groups  =      3,294
    
    R-sq:                                           Obs per group:
         within  =      .                                         min =          1
         between = 0.0685                                         avg =        2.3
         overall = 0.0571                                         max =          6
    
                                                    Wald chi2(4)      =   80781.56
    corr(u_i, Xb)  = -0.5474                        Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
         ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          tenure |
             L1. |   .1755435   .0389611     4.51   0.000     .0991811    .2519059
                 |
             age |
             L1. |   .0106753   .0134104     0.80   0.426    -.0156085    .0369592
                 |
          cL.age#|
          cL.age |  -.0008867   .0002305    -3.85   0.000    -.0013384   -.0004351
                 |
        not_smsa |
             L1. |  -.0452809   .0509685    -0.89   0.374    -.1451773    .0546154
                 |
           _cons |   1.671945   .2302329     7.26   0.000     1.220697    2.123194
    -------------+----------------------------------------------------------------
         sigma_u |  .59050356
         sigma_e |  .54146412
             rho |  .54324114   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F  test that all u_i=0:     F(3293,4202) =     1.08       Prob > F    = 0.0089
    ------------------------------------------------------------------------------
    Instrumented:   L.tenure
    Instruments:    L.age cL.age#cL.age L.not_smsa L.union L.south
    ------------------------------------------------------------------------------
    The following CF model provides the same results:
    Code:
    quietly xtreg l.tenure l.union l.south l.age cl.age#cl.age l.not_smsa, fe
    predict cf, resid
    xtreg ln_w l.tenure l.age cl.age#cl.age l.not_smsa cf, fe
    
    Fixed-effects (within) regression               Number of obs     =      7,500
    Group variable: idcode                          Number of groups  =      3,294
    
    R-sq:                                           Obs per group:
         within  = 0.1351                                         min =          1
         between = 0.1783                                         avg =        2.3
         overall = 0.1770                                         max =          6
    
                                                    F(5,4201)         =     131.21
    corr(u_i, Xb)  = 0.1436                         Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          tenure |
             L1. |   .1755435   .0205221     8.55   0.000     .1353094    .2157776
                 |
             age |
             L1. |   .0106753   .0070637     1.51   0.131    -.0031732    .0245239
                 |
          cL.age#|
          cL.age |  -.0008867   .0001214    -7.30   0.000    -.0011247   -.0006488
                 |
        not_smsa |
             L1. |  -.0452809   .0268467    -1.69   0.092    -.0979147    .0073528
                 |
              cf |  -.1641325    .020582    -7.97   0.000     -.204484   -.1237809
           _cons |   1.671945   .1212711    13.79   0.000      1.43419    1.909701
    -------------+----------------------------------------------------------------
         sigma_u |  .41441731
         sigma_e |   .2852065
             rho |  .67859444   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(3293, 4201) = 3.72                  Prob > F = 0.0000
    However, if I do not use lags in the first stage and lag the residual in the second stage instead, the coefficients differ (because different samples were used in the first stage).

    Code:
    quietly xtreg tenure union south age c.age#c.age not_smsa, fe
    predict cf, resid
    xtreg ln_w l.tenure l.age cl.age#cl.age l.not_smsa l.cf, fe
    
    Fixed-effects (within) regression               Number of obs     =      7,500
    Group variable: idcode                          Number of groups  =      3,294
    
    R-sq:                                           Obs per group:
         within  = 0.1353                                         min =          1
         between = 0.1785                                         avg =        2.3
         overall = 0.1767                                         max =          6
    
                                                    F(5,4201)         =     131.45
    corr(u_i, Xb)  = 0.1454                         Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          tenure |
             L1. |   .2566965   .0304213     8.44   0.000     .1970547    .3163383
                 |
             age |
             L1. |   .0144529    .006859     2.11   0.035     .0010056    .0279002
                 |
          cL.age#|
          cL.age |  -.0013382   .0001577    -8.48   0.000    -.0016475    -.001029
                 |
        not_smsa |
             L1. |  -.0346281    .027326    -1.27   0.205    -.0882015    .0189453
                 |
              cf |
             L1. |  -.2452925   .0305005    -8.04   0.000    -.3050896   -.1854954
                 |
           _cons |   1.710315   .1238945    13.80   0.000     1.467417    1.953214
    -------------+----------------------------------------------------------------
         sigma_u |  .41454272
         sigma_e |  .28517027
             rho |  .67878182   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(3293, 4201) = 3.72                  Prob > F = 0.0000
    Is it completely incorrect to do this
    Code:
    quietly xtreg tenure union south age c.age#c.age not_smsa, fe
    predict cf, resid
    xtreg ln_w l.tenure l.age cl.age#cl.age l.not_smsa l.cf, fe
    instead of this?
    Code:
    quietly xtreg l.tenure l.union l.south l.age cl.age#cl.age l.not_smsa, fe
    predict cf, resid
    xtreg ln_w l.tenure l.age cl.age#cl.age l.not_smsa cf, fe
    Sorry for a long post. I just wanted to demonstrate my reasoning with examples.

    Last edited by Marco Tacchi; 27 Feb 2019, 08:26.
Working...
X