Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Weak IV and log transformation of a control variable

    Hello Statalist Community,

    I hope you are well.

    I have been trying to implement the PPML FE IV with the control function, as in this topic. To test for a weak instrument I am running "ivregress 2sls" where I have applied a within transformation manually (for check: xtivreg2 estimates are the same as the estimates obtained by this manual approach) and then calling "weakivtest" to get MO-P Effective F as recommend by Andrews et al. (2019). In theory, if I am not mistaken, the first stage of the Control function approach is the same as TSLS and therefore, MO-P F statistic holds. My question is: how a specification change of a control variable can impact so heavily the strength of the IV itself and therefore, the Effective F statistics?

    My code is something like

    Y = dependent variable (count)
    X2 = EEV
    Z = instrument
    X1, X3 and X4 = control variables (all of them are continuous)
    l_ = log()
    id_sector_year = sector#year
    id = municipality

    Code:
    xi: xtivreg2 Y l_X1 X3 X4 (l_X2 = l_Z) i.id_sector_year, i(id) fe first r
    In this case, I obtain an Effective F of 1797.96 with a tau=5% of 37.418 (that is, well above)

    But if I run (the only change with respect to the former is the log transformation of X4)
    Code:
    xi: xtivreg2 Y l_X1 X3 l_X4 (l_X2 = l_Z) i.id_sector_year, i(id) fe first r
    The Effective F is now 93.83 (same here, well above)

    I have also tested the main regression, that is
    Code:
    * First-stage
    reghdfe X2 Z X1 X3 X4, absorb(id id_sector_year) res
    predict double v2hat, r
    * Second-stage
    ppmlhdfe Y v2hat X2 X1 X3 X4, absorb(id id_sector_year) vce(cluster id)
    And the results are very different: estimates are around 8.96 for X2 in the log case and 1.48 for the linear case. For comparison, the PPML FE (without iv) estimates for X2 are around 0.88. It gives me a "bias" of 10 times and 1.7 times, respectively. A huge difference.

    Thank you very much !

  • #2
    I do not think there is anything surprising here. log X is not X, and if you use log X instead of X as a dependent variable in the first stage or as a regressor in the second stage, you will get different results.


    Regarding the second issue, PPML is a nonlinear estimator, no? Generally plugging in predicted values in nonlinear estimators is not a correct estimation strategy. You speak of "control function", but what you show is only plugging in predicted first stage values. So I guess the last line of code you are showing is not a consistent estimator of what you want to estimate.

    Comment


    • #3
      Thank you very much, Prof Kolev

      Yes, they are different things but the difference should be that large?

      Regarding the control function. I am aware of that problem and I believe I am using the residual from the first-stage, right? Following the -reghdfe- help file, the command "res" means "save regression residuals" . Correct?

      Thank you again!

      Comment


      • #4
        These are effects of non-linearities, when you use some strongly non-linear function instead of the original variable. There is no way to know in advance whether using non-linear functions would have huge or small effect. You should not worry about this.

        Yes, you are right, you have used the residual, which is the control function approach. I got tripped over because you call it hat and you abbreviated the residual prediction to "r." Between 0.88 and 1.48 is not a huge difference and can be expected as a result of the instrumentation.

        Comment

        Working...
        X