PPML / Control function approach with time-invariant endogenous regressor in panel data

Jose Fuentes

Join Date: Jun 2025

Posts: 4
#1

PPML / Control function approach with time-invariant endogenous regressor in panel data

21 Jun 2025, 15:38

Hello everyone!

I'm running gravity-like estimations using panel data with T = 25 years and 50 country-pairs. My dependent variable is a count, so I’m estimating the model using PPML via ppmlhdfe, including high-dimensional fixed effects.
My key explanatory variable of interest is continuous and time-invariant, for which I also have a time-invariant continuous instrument as I suspect it may be endogenous. Since there is no built-in IV procedure in ppmlhdfe, I am following the control function (CF) Poisson approach as suggested by Lin & Wooldridge (2019).

That brings me to two questions I would love to get some insight on.

1) Considering my endogenous variable is time invariant, would it still be appropriate to use this approach? I’ve seen a couple of working papers who also have time invariant endogenous variables use this but without formal justification. Also, given that in the first stage I will have a time invariant variable as the dependent, I’m confused if I should keep the data in its panel nature or transform it to a cross-section regression. Alternatively, could an approach where the first stage is in cross section and the second in panel be justifiable? Im thinking in something like:

First stage:

\[ ln(PP_{ij})=\tilde{\beta}_1 Z_{ij} + \tilde{\beta_2} X_{ij} + {FE} +\tilde{\varepsilon}_{ij}\]

Second stage:

\[ Y_{ijt}=exp(\beta_1 ln(PP_{ij}) + \gamma\widehat{\tilde{\varepsilon}_{ij}} + \beta_2 X_{ijt} + {FE}) +\varepsilon_{ijt} \]

Where PP is the endogenous var, Z is the instrument, and Y the dependent var for countries i and j in year t.

If this is useful info, for some specifications I include time varying controls (at the country pair level), however my main specification only considers time invariant controls.

2) If I understood correctly, one can test the endogeneity of its variable through the CF approach looking at the significance of the residual term included in the second-stage regression. Is a standard t-test on that coefficient sufficient for this? Or should I also consider bootstrapped standard errors? If the residual is not significant (i.e., I can’t reject the null of gamma=0), I assume it is valid to proceed with a standard ppmlhdfe estimation and omit the CF approach.

Any guidance would be greatly appreciated!

Jose
Tags: None
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2204
#2

21 Jun 2025, 17:48

The FE approach won’t work with a time-constant variable and/or IV. Do you have time-varying controls? If so, I’d include their time averages in a correlated RE approach. Then your CF won’t be perfectly collinear.
Comment
Jose Fuentes

Join Date: Jun 2025

Posts: 4
#3

21 Jun 2025, 18:51

Dear Prof. Wooldridge, thank you very much for the rapid response.

I understand the CF would be perfectly collinear if I included a country-pair FE, as this would absorb my variable of interest. However, in my specification I’m only including separate fixed effects for country_i, country_j, and year. Would the FE approach still be invalid in this case?

I have time varying controls, although potentially endogenous. Will check the literature on this.

Thanks again!
Comment

Announcement

PPML / Control function approach with time-invariant endogenous regressor in panel data

Comment

Comment