Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PPML / Control function approach with time-invariant endogenous regressor in panel data

    Hello everyone!

    I'm running gravity-like estimations using panel data with T = 25 years and 50 country-pairs. My dependent variable is a count, so I’m estimating the model using PPML via ppmlhdfe, including high-dimensional fixed effects.
    My key explanatory variable of interest is continuous and time-invariant, for which I also have a time-invariant continuous instrument as I suspect it may be endogenous. Since there is no built-in IV procedure in ppmlhdfe, I am following the control function (CF) Poisson approach as suggested by Lin & Wooldridge (2019).

    That brings me to two questions I would love to get some insight on.

    1) Considering my endogenous variable is time invariant, would it still be appropriate to use this approach? I’ve seen a couple of working papers who also have time invariant endogenous variables use this but without formal justification. Also, given that in the first stage I will have a time invariant variable as the dependent, I’m confused if I should keep the data in its panel nature or transform it to a cross-section regression. Alternatively, could an approach where the first stage is in cross section and the second in panel be justifiable? Im thinking in something like:

    First stage:

    \[ ln(PP_{ij})=\tilde{\beta}_1 Z_{ij} + \tilde{\beta_2} X_{ij} + {FE} +\tilde{\varepsilon}_{ij}\]

    Second stage:

    \[ Y_{ijt}=exp(\beta_1 ln(PP_{ij}) + \gamma\widehat{\tilde{\varepsilon}_{ij}} + \beta_2 X_{ijt} + {FE}) +\varepsilon_{ijt} \]

    Where PP is the endogenous var, Z is the instrument, and Y the dependent var for countries i and j in year t.

    If this is useful info, for some specifications I include time varying controls (at the country pair level), however my main specification only considers time invariant controls.

    2) If I understood correctly, one can test the endogeneity of its variable through the CF approach looking at the significance of the residual term included in the second-stage regression. Is a standard t-test on that coefficient sufficient for this? Or should I also consider bootstrapped standard errors? If the residual is not significant (i.e., I can’t reject the null of gamma=0), I assume it is valid to proceed with a standard ppmlhdfe estimation and omit the CF approach.

    Any guidance would be greatly appreciated!

    Jose



  • #2
    The FE approach won’t work with a time-constant variable and/or IV. Do you have time-varying controls? If so, I’d include their time averages in a correlated RE approach. Then your CF won’t be perfectly collinear.

    Comment


    • #3
      Dear Prof. Wooldridge, thank you very much for the rapid response.

      I understand the CF would be perfectly collinear if I included a country-pair FE, as this would absorb my variable of interest. However, in my specification I’m only including separate fixed effects for country_i, country_j, and year. Would the FE approach still be invalid in this case?

      I have time varying controls, although potentially endogenous. Will check the literature on this.

      Thanks again!

      Comment

      Working...
      X