Significant Sargan Statistic (Overidentification test) in presence of additional variable

lakhi narayan

Join Date: Jul 2024

Posts: 7
#1

Significant Sargan Statistic (Overidentification test) in presence of additional variable

12 Jul 2024, 01:03

Hi, I am using 2SLS to estimate the impact of TFP on employment and wage with the help a simultaneous system of equations. In my data, n=65, t= 22. The structure as well as the Hausman endogeneity test for the individual equations show that Employment and wage are endogenous. I have L2. Employment and L2. kl as the instruments for the endogenous variables respectively as they satisfy falsification test. The system of equations is:

Employment=a1+a2. tfp+a3. L.tfp+a4. gva+a5.kl+a6. ngov+a7. sc+a8. (tfp.gva)+a9. Lp+a10. L.employment + error

Wage=b1+b2. Employment+ b3. tfp+ b4. L.tfp+ b5. Lp + b6. Contract+ b7. L. wage+ error

Here,
tfp= Total factor productivity
gva= Gross value added
kl= Capital-Labour ratio
ngov= share of non-government firm
sc= share of subcontracting
tfp.gva= interaction term of tfp and gva (both continuous variables)
lp= labour productivity
Contract= Share of Contractual workers in total number of workers

As I am estimating these two equations as simultaneous system, while using xtivreg I have included all the exogeneous variables of the system as instruments along with the individual instruments of endogenous variables.
Now when I am estimating the second equation, the significant overidentification test statistic cast doubt on the validity of my instruments. When I removed ln.gva from the list of instruments (as it was an exogeneous variable in the first equation, I included this under instruments), the overidentification test statistic becomes insignificant.

I have got the following results when I used xtivreg:

After that I have used xtoverid, nois and I got the following results:

Now it would be of very helpful for me if anyone helps me to solve the following two queries:
why this problem occurs when I include ln.gva in the list of instruments?

And how to avoid that?
Tags: None
Sebastian Kripfganz

Join Date: May 2014

Posts: 2581
#2

12 Jul 2024, 02:57

A simple answer could be that ln_gva might have predictive power for wages and therefore would need to be included as a regressor in the second equation, not just as an instrument.

However, there are more serious issues with your estimation strategy.
You are using a fixed-effects IV estimator. This transforms all variables (including the instruments) into deviations from their group-specific means. Consequently, if ln_workers is endogenous, then its lags L.ln_workers L2.ln_workers are endogenous as well, because their within-group means include ln_workers. Thus, the instruments you started with are already invalid by construction (unless ln_workers is actually exogenous).

You have included the lagged dependent variable L.ln_wage as a regressor. This makes it a dynamic panel data model. By the same argument as above, its deviation from within-group means is necessarily endogenous (unless T is very large, but here T=20 is usually still considered to be relatively small).

To deal with the above issues, one would typically use a GMM estimator with lags as instruments for the first-differenced model (not the model in deviations from within-group means). The following presentation might be helpful:

Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

https://www.kripfganz.de/stata/
1 like
Comment
lakhi narayan

Join Date: Jul 2024

Posts: 7
#3

29 Jul 2024, 11:51

Thank you so much Prof. Kripfganz for your suggestions. I read the proceeding that you mentioned in your reply. Accordingly I tried to estimate the model using first difference GMM (xtdpdgmm) and I got the following results:

The estimates of employment equation:

Tests for Autocorrelation and Overidentification for the equation of employment:

Then I have estimated the equation of wage and got the following estimates:

Tests for Autocorrelation and Overidentification for the equation of wage:

Now, in the context of these two estimates, I have some queries which I mentioned below. It would be of great help if you could give your suggestions on them:

1. As you said in your last reply, as the dependent variables in my models are endogenous, the use of their lagged values in fixed effect estimation will also make them endogenous. In that case, first difference GMM should be used. So, my first question is can I use system GMM to estimate the above mentioned equations?

2. As I want to estimate the equations (above mentioned two equations) as a simultaneous system of equations, I put the all the exogenous variables (of both the equations) as GMM IV with lag 0 in both the equations. Is it the correct process of estimating the two equations as simultaneous system of equations. If not then how can I estimate both the equations as a simultaneous system under GMM? My main intuition of following this strategy is that in 2SLS, the endogenous variable is first estimated against all the exogenous variables in the first stage and then in the second stage, the fitted value of endogenous variable is used to estimate the final equation.

3. With respect of moment conditions, is there any limit on using specific number of moment conditions? In my first equation, there is 101 moment conditions and 142 moment conditions in the second equation.

It would be of great help if you could help me to solve these queries.

Last edited by lakhi narayan; 29 Jul 2024, 11:57.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2581
#4

30 Jul 2024, 09:23

1. In general, yes, but note that System GMM requires an additional (possibly strong) assumption compared to Difference GMM. You can test this assumption with an incremental overidentification test (Difference-in-Hansen test); see again my presentation slides for examples.

2. xtdpdgmm does not estimate a simultaneous system of equations. If you want to separately estimate them equation by equation, then you need to consider what the system implies about the exogeneity of the variables in a given equation. For those variables that are exogenous in the given equation (or all equations), you can use the gmm() option starting with lag 0.

3. This is an important question. In your case, the number of instruments is very large (too large) relative to the number of groups. It should be (much) smaller than the number of groups. The high p-value (close to 1) of the overidentification tests is a clear indication here that there are too many instruments. While the collapse option helps to some degree, it would make sense to restrict the upper bound for the lag range given your relatively large number of time periods; e.g., lag(2 5).

https://www.kripfganz.de/stata/
Comment

Announcement

Significant Sargan Statistic (Overidentification test) in presence of additional variable

Comment

Comment

Comment