Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Significant Sargan Statistic (Overidentification test) in presence of additional variable

    Hi, I am using 2SLS to estimate the impact of TFP on employment and wage with the help a simultaneous system of equations. In my data, n=65, t= 22. The structure as well as the Hausman endogeneity test for the individual equations show that Employment and wage are endogenous. I have L2. Employment and L2. kl as the instruments for the endogenous variables respectively as they satisfy falsification test. The system of equations is:

    Employment=a1+a2. tfp+a3. L.tfp+a4. gva+a5.kl+a6. ngov+a7. sc+a8. (tfp.gva)+a9. Lp+a10. L.employment + error

    Wage=b1+b2. Employment+ b3. tfp+ b4. L.tfp+ b5. Lp + b6. Contract+ b7. L. wage+ error

    Here,
    tfp= Total factor productivity
    gva= Gross value added
    kl= Capital-Labour ratio
    ngov= share of non-government firm
    sc= share of subcontracting
    tfp.gva= interaction term of tfp and gva (both continuous variables)
    lp= labour productivity
    Contract= Share of Contractual workers in total number of workers

    As I am estimating these two equations as simultaneous system, while using xtivreg I have included all the exogeneous variables of the system as instruments along with the individual instruments of endogenous variables.
    Now when I am estimating the second equation, the significant overidentification test statistic cast doubt on the validity of my instruments. When I removed ln.gva from the list of instruments (as it was an exogeneous variable in the first equation, I included this under instruments), the overidentification test statistic becomes insignificant.

    I have got the following results when I used xtivreg:

    Click image for larger version

Name:	2SLS.png
Views:	1
Size:	45.0 KB
ID:	1758538


    After that I have used xtoverid, nois and I got the following results:

    Click image for larger version

Name:	xtoverid, nois.png
Views:	1
Size:	49.0 KB
ID:	1758539


    Now it would be of very helpful for me if anyone helps me to solve the following two queries:
    1. why this problem occurs when I include ln.gva in the list of instruments?
    2. And how to avoid that?

  • #2
    A simple answer could be that ln_gva might have predictive power for wages and therefore would need to be included as a regressor in the second equation, not just as an instrument.

    However, there are more serious issues with your estimation strategy.
    1. You are using a fixed-effects IV estimator. This transforms all variables (including the instruments) into deviations from their group-specific means. Consequently, if ln_workers is endogenous, then its lags L.ln_workers L2.ln_workers are endogenous as well, because their within-group means include ln_workers. Thus, the instruments you started with are already invalid by construction (unless ln_workers is actually exogenous).
    2. You have included the lagged dependent variable L.ln_wage as a regressor. This makes it a dynamic panel data model. By the same argument as above, its deviation from within-group means is necessarily endogenous (unless T is very large, but here T=20 is usually still considered to be relatively small).
    3. To deal with the above issues, one would typically use a GMM estimator with lags as instruments for the first-differenced model (not the model in deviations from within-group means). The following presentation might be helpful:
    https://www.kripfganz.de/stata/

    Comment


    • #3
      Thank you so much Prof. Kripfganz for your suggestions. I read the proceeding that you mentioned in your reply. Accordingly I tried to estimate the model using first difference GMM (xtdpdgmm) and I got the following results:

      The estimates of employment equation:
      Click image for larger version

Name:	Employment Equation-GMM.png
Views:	1
Size:	48.7 KB
ID:	1760126




      Tests for Autocorrelation and Overidentification for the equation of employment:
      Click image for larger version

Name:	Employment Equation-OVERID and SERIAL.png
Views:	1
Size:	15.1 KB
ID:	1760127


      Then I have estimated the equation of wage and got the following estimates:

      Click image for larger version

Name:	Wage Equation-GMM.png
Views:	1
Size:	44.3 KB
ID:	1760128



      Tests for Autocorrelation and Overidentification for the equation of wage:
      Click image for larger version

Name:	Wage Equation-OVERID and SERIAL.png
Views:	1
Size:	15.0 KB
ID:	1760129



      Now, in the context of these two estimates, I have some queries which I mentioned below. It would be of great help if you could give your suggestions on them:

      1. As you said in your last reply, as the dependent variables in my models are endogenous, the use of their lagged values in fixed effect estimation will also make them endogenous. In that case, first difference GMM should be used. So, my first question is can I use system GMM to estimate the above mentioned equations?

      2. As I want to estimate the equations (above mentioned two equations) as a simultaneous system of equations, I put the all the exogenous variables (of both the equations) as GMM IV with lag 0 in both the equations. Is it the correct process of estimating the two equations as simultaneous system of equations. If not then how can I estimate both the equations as a simultaneous system under GMM? My main intuition of following this strategy is that in 2SLS, the endogenous variable is first estimated against all the exogenous variables in the first stage and then in the second stage, the fitted value of endogenous variable is used to estimate the final equation.

      3. With respect of moment conditions, is there any limit on using specific number of moment conditions? In my first equation, there is 101 moment conditions and 142 moment conditions in the second equation.

      It would be of great help if you could help me to solve these queries.
      Last edited by lakhi narayan; 29 Jul 2024, 11:57.

      Comment


      • #4
        1. In general, yes, but note that System GMM requires an additional (possibly strong) assumption compared to Difference GMM. You can test this assumption with an incremental overidentification test (Difference-in-Hansen test); see again my presentation slides for examples.

        2. xtdpdgmm does not estimate a simultaneous system of equations. If you want to separately estimate them equation by equation, then you need to consider what the system implies about the exogeneity of the variables in a given equation. For those variables that are exogenous in the given equation (or all equations), you can use the gmm() option starting with lag 0.

        3. This is an important question. In your case, the number of instruments is very large (too large) relative to the number of groups. It should be (much) smaller than the number of groups. The high p-value (close to 1) of the overidentification tests is a clear indication here that there are too many instruments. While the collapse option helps to some degree, it would make sense to restrict the upper bound for the lag range given your relatively large number of time periods; e.g., lag(2 5).
        https://www.kripfganz.de/stata/

        Comment

        Working...
        X