Number of observations must be greater than number of instruments and under- and weak identification test

Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#1

Number of observations must be greater than number of instruments and under- and weak identification test

05 Dec 2022, 17:26

Dear all,

I am trying to estimate a dynamic panel data on gdp growth rates with large T, by far greater than N. The estimation method is ivreghdfe a linear Model with Multi-Way Fixed Effects as in Correia (2016). Time and fixed effects clustering Driscoll-Kraay AR(1) s.e partialling-out exogenous regressors, but when I reach the point of treating specific groups, estimating only for those groups I get the error

Code:

Error: number of observations must be greater than number of instruments including constant. insufficient observations

I understand that has to do with the number of observations, as I did not run in this error when I estimated the general full scale model.

My model is specified as

Code:

(y_i,t − y_i,t−1 )= (α − 1)y_i,t−1 + β₁y_i,t−1+ β₂T y_i,t−1 + β₃T ^₂y_i,t−1 + y_{i, t}+Σ¹_Nβ₄+ny_i,t + +β₅Ζ_i,t+D_i,t+μ_t+η_i,+ε_i,t

Where Y is the dependent variable of interest, Z a set of control variables, and D a set of dummy and categorical variables. T is an operator for some kind of indicators. Indicator are nine types of continuous indicators variables, not binary, allowed to interact with them

My code is for abbreviation is

Code:

ivreghdfe Dgdpgroth other RHS variables list of dummies and cattegorical year c.ind_*##c.ind_* (humancapital=ky) if imf_income2==1 & year >=1990, absorb(id) dkraay(1) partial(i.year)

Where Dgdpgroth is the y, GDP growth rate dependent variable, year is the time period in years, c.ind_*##c.ind_* is the interaction of the T operator in the model above, humancapital is a human capital measurement, and ky is capital to output ratio. According to the standard, previous papers, capital to output ratio is the instrument.

All independent variable are endogenous variables. Dummy variables capture events, and categorical variables, capturing classification or duration of an event in years. Indicator are nine types of continuous indicator variables, not binary, allowed to interact with them. I am trying to see the effects of them on the GDP growth and the other macro variables. Their total number is large.

The general code I run and looks fine like being

Code:

ivreghdfe growthgdp var1 var2 var2 ...... var_n dummy1 dummy2....dummy_n year c.ind_*##c.ind_* (humancapital=ky), absorb(id) dkraay(1) partial(i.year)

but when I look into the specific group with limited group members and limit the years windows, for some groups only, the above-mentioned error appears

Code:

ivreghdfe growthgdp var1 var2 var2 ...... var_n dummy1 dummy2....dummy_n year c.ind_*##c.ind_* (humancapital=ky) if group2==1 & year >=1990, absorb(id) dkraay(1) partial(i.year)

I have three questions :

First, is if there is a way or trick to deal with this error? The obvious answer is to reduce the number of instruments, but that will create me a problem in comparing the results with the full scale model and the other groups where no such problem occur. I mean, can I get an estimation for those groups comparable to the other groups and the general model using somehow multiple fixed effects with clustering and partial out with dkraay AR(1) errors. Is there a way to approach this?

My second question if the syntax is correct as I would like to instrument for capital to output ratio (ky). I am interested in coefficients for human capital, but not sure if the syntax is correct. Could I just instrument capital to output ratio itself(ky=ky)?
It may happen when I impose the time restriction or limit the group to be N>T, depending on the single cases, nonetheless.

Third and my biggest concern is, since I am using partialling-out exogenous regressors, if I'm running a problem of weak instrument here. My display table for the general model is:

Code:

IV (2SLS) estimation -------------------- Estimates efficient for homoskedasticity only Statistics robust to heteroskedasticity and clustering on ts and kernel-robust to common correlated disturbances (Driscoll-Kraay) kernel=Bartlett; bandwidth=1 time variable (t): ts group variable (i): id Number of clusters (ts) = 35 Number of obs = 828 F(124, 34) = 7.1e+05 Prob > F = 0.0000 Total (centered) SS = 3885.575361 Centered R2 = 0.9057 Total (uncentered) SS = 3885.575361 Uncentered R2 = 0.9057 Residual SS = 366.2462637 Root MSE = .7559

variables coefficient for abbreviation are omitted
and

Code:

Underidentification test (Kleibergen-Paap rk LM statistic): 0.029 Chi-sq(1) P-val = 0.8638 Weak identification test (Cragg-Donald Wald F statistic): 0.019 (Kleibergen-Paap rk Wald F statistic): 0.025 Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38 15% maximal IV size 8.96 20% maximal IV size 6.66 25% maximal IV size 5.53 Source: Stock-Yogo (2005). Reproduced by permission. NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.

R² is too high over 0.90, maybe a sing of pathology but when it comes to IV then I am told that R-square is not that important. Is that correct

I know should have provided data to reproduce the problem, but there are too many variables to be included.

I would appreciate if you could comment, and I would appreciate if you could kindly provide a code on how to proceed for the estimation

Fei Wang Joao Santos Silva Jeff WooldridgeEnable GingerCannot connect to Ginger Check your internet connection
or reload the browserDisable in this text fieldRephraseRephrase current sentence63Edit in Ginger×

Last edited by Giorgio Di Stefano; 05 Dec 2022, 17:33.
Tags: None
Eric Makela

Join Date: Aug 2022

Posts: 45
#2

06 Dec 2022, 02:42

Hi Giorgio, your consideration for providing a data sample is definitely appreciated, and thank you for describing the contents of the variables. Fellow Statalisters will be more likely to help with your final question once there is some sort of dataset with code that can be replicated. I assume you know you can use the 'set trace on' command to better troubleshoot your code.

1) It seems you are already on to the answer to this question. Either your data are not matching with the full-scale model or perhaps you're including some superfluous variable? Your code is trying to 'sort' the observations into categories. Searching the syntax of ivreg2 and its subroutines for the error's text will guide you to the piece of code where your model is failing.

2) Without me reading Correia's paper, let's assume the coefficient on human capital are both relevant and able to be interpreted in a meaningful way. Are there academic papers where a researcher will instrument a variable for itself? My understanding is that this would shift the resulting instrument (in the 2nd stage) by the amount of the constant, but perhaps you can point to where in the literature this is practiced?

3) Certainly, you can test each individual instrument as a weak instrument. If this is a contribution of your paper, Wooldridge's textbooks should give you a basis as to which test you want to run in your case. The partialling-out of your endogenous variables don't present a particular problem statistically.
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#3

06 Dec 2022, 16:11

Originally posted by Eric Makela View Post

Hi Giorgio, your consideration for providing a data sample is definitely appreciated, and thank you for describing the contents of the variables. Fellow Statalisters will be more likely to help with your final question once there is some sort of dataset with code that can be replicated. I assume you know you can use the 'set trace on' command to better troubleshoot your code.

1) It seems you are already on to the answer to this question. Either your data are not matching with the full-scale model or perhaps you're including some superfluous variable? Your code is trying to 'sort' the observations into categories. Searching the syntax of ivreg2 and its subroutines for the error's text will guide you to the piece of code where your model is failing.

2) Without me reading Correia's paper, let's assume the coefficient on human capital are both relevant and able to be interpreted in a meaningful way. Are there academic papers where a researcher will instrument a variable for itself? My understanding is that this would shift the resulting instrument (in the 2nd stage) by the amount of the constant, but perhaps you can point to where in the literature this is practiced?

3) Certainly, you can test each individual instrument as a weak instrument. If this is a contribution of your paper, Wooldridge's textbooks should give you a basis as to which test you want to run in your case. The partialling-out of your endogenous variables don't present a particular problem statistically.

Hi Eric,
Thank you so much for your comments.

Starting from you point 2 I know in the literature that usually capital to output is the instrument and human capital, sometimes referred as schooling years, or education , is included in the regression results. But I do not know what the exact syndax could be here.
I don't know any academic paper that a variable is instrumented by itsells. It was a thought I made , probably a wrong one, so I asked.

On point 1 probably I need to check again my data or better my group treatment.

On point 3 I am not sure about the coefficients I reported in #1, if I'm having a problem of weak instrument here. It is the first time I am dealing with those tests and IVs in general.

In sort I am looking for a correct way to write the IV in the parenthesis part of my codes, where capital to output ratio is the instrument

Code:

(endogenous variables =instruments variables )

Last edited by Giorgio Di Stefano; 06 Dec 2022, 16:17.
Comment
Eric Makela

Join Date: Aug 2022

Posts: 45
#4

06 Dec 2022, 16:34

Hi Giorgio, so let me get this straight, when you enter

Code:

ivreghdfe growthgdp var1 var2 var2 ...... var_n dummy1 dummy2....dummy_n year c.ind_*##c.ind_* (humancapital=ky), absorb(id) dkraay(1) partial(i.year)

the model runs and produces your presented results table. You are looking to add additional instruments to the parentheses section, as well as other variables and possibly interactions that are not listed above, to coalesce into the first written model above, no?
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#5

07 Dec 2022, 11:57

Eric, I was looking if the syntaxes within the parenthesis were correct since I was getting zero coefficients for human capital in a growth model, which is usually positive correaleted .

In addition, I was looking for another way to estimate the same model, when I was applying restriction to only one group sample.

After having watched carefully some videos on YouTube, I think I got the answer for the former. For the latter still have to think about it.

Thank you so much for your time and comments!

PS A well known Harvard economist once told me that when people at Harvard are mentioning instrumental variables, the Instrumental Variables policy is coming!
Comment

Announcement

Number of observations must be greater than number of instruments and under- and weak identification test

Comment

Comment

Comment

Comment