Hello everyone!
I'm running gravity-like estimations using panel data with T = 25 years and 50 country-pairs. My dependent variable is a count, so I’m estimating the model using PPML via ppmlhdfe, including high-dimensional fixed effects.
My key explanatory variable of interest is continuous and time-invariant, for which I also have a time-invariant continuous instrument as I suspect it may be endogenous. Since there is no built-in IV procedure in ppmlhdfe, I am following the control function (CF) Poisson approach as suggested by Lin & Wooldridge (2019).
That brings me to two questions I would love to get some insight on.
1) Considering my endogenous variable is time invariant, would it still be appropriate to use this approach? I’ve seen a couple of working papers who also have time invariant endogenous variables use this but without formal justification. Also, given that in the first stage I will have a time invariant variable as the dependent, I’m confused if I should keep the data in its panel nature or transform it to a cross-section regression. Alternatively, could an approach where the first stage is in cross section and the second in panel be justifiable? Im thinking in something like:
First stage:
\[ ln(PP_{ij})=\tilde{\beta}_1 Z_{ij} + \tilde{\beta_2} X_{ij} + {FE} +\tilde{\varepsilon}_{ij}\]
Second stage:
\[ Y_{ijt}=exp(\beta_1 ln(PP_{ij}) + \gamma\widehat{\tilde{\varepsilon}_{ij}} + \beta_2 X_{ijt} + {FE}) +\varepsilon_{ijt} \]
Where PP is the endogenous var, Z is the instrument, and Y the dependent var for countries i and j in year t.
If this is useful info, for some specifications I include time varying controls (at the country pair level), however my main specification only considers time invariant controls.
2) If I understood correctly, one can test the endogeneity of its variable through the CF approach looking at the significance of the residual term included in the second-stage regression. Is a standard t-test on that coefficient sufficient for this? Or should I also consider bootstrapped standard errors? If the residual is not significant (i.e., I can’t reject the null of gamma=0), I assume it is valid to proceed with a standard ppmlhdfe estimation and omit the CF approach.
Any guidance would be greatly appreciated!
Jose
I'm running gravity-like estimations using panel data with T = 25 years and 50 country-pairs. My dependent variable is a count, so I’m estimating the model using PPML via ppmlhdfe, including high-dimensional fixed effects.
My key explanatory variable of interest is continuous and time-invariant, for which I also have a time-invariant continuous instrument as I suspect it may be endogenous. Since there is no built-in IV procedure in ppmlhdfe, I am following the control function (CF) Poisson approach as suggested by Lin & Wooldridge (2019).
That brings me to two questions I would love to get some insight on.
1) Considering my endogenous variable is time invariant, would it still be appropriate to use this approach? I’ve seen a couple of working papers who also have time invariant endogenous variables use this but without formal justification. Also, given that in the first stage I will have a time invariant variable as the dependent, I’m confused if I should keep the data in its panel nature or transform it to a cross-section regression. Alternatively, could an approach where the first stage is in cross section and the second in panel be justifiable? Im thinking in something like:
First stage:
\[ ln(PP_{ij})=\tilde{\beta}_1 Z_{ij} + \tilde{\beta_2} X_{ij} + {FE} +\tilde{\varepsilon}_{ij}\]
Second stage:
\[ Y_{ijt}=exp(\beta_1 ln(PP_{ij}) + \gamma\widehat{\tilde{\varepsilon}_{ij}} + \beta_2 X_{ijt} + {FE}) +\varepsilon_{ijt} \]
Where PP is the endogenous var, Z is the instrument, and Y the dependent var for countries i and j in year t.
If this is useful info, for some specifications I include time varying controls (at the country pair level), however my main specification only considers time invariant controls.
2) If I understood correctly, one can test the endogeneity of its variable through the CF approach looking at the significance of the residual term included in the second-stage regression. Is a standard t-test on that coefficient sufficient for this? Or should I also consider bootstrapped standard errors? If the residual is not significant (i.e., I can’t reject the null of gamma=0), I assume it is valid to proceed with a standard ppmlhdfe estimation and omit the CF approach.
Any guidance would be greatly appreciated!
Jose
Comment