Thank you very much Manh Hoang Ba for your feedback.
In my case, here is the procedure I followed: I first selected internal instruments using the Lewbel method, specifying them in the Z-matrix with the z() option. I chose marital status (married) and religion as internal instruments. Then, I added the death of a close relative as an external instrument.
I ran the different estimations and performed the Sargan–Hansen test. The results show that:
With the Lewbel method, or more generally with GMM / 2SLS / ivreg2 / ivreg2h estimations, I sometimes obtain a negative or very small positive R².
So, is the R² really meaningful in this context?
From what I understand, a negative R² does not imply that the model is misspecified; rather, it suggests that the residual variance has increased as a result of correcting for endogeneity — which often happens when the instruments are weak or only weakly correlated with the endogenous variable.
I was wondering whether there are references in the literature that discuss this issue, or if it is a common finding in IV/GMM models.
For reference, my dependent variable is access to the labor market (Currently working, binary: 0 = No, 1 = Yes).
In my case, here is the procedure I followed: I first selected internal instruments using the Lewbel method, specifying them in the Z-matrix with the z() option. I chose marital status (married) and religion as internal instruments. Then, I added the death of a close relative as an external instrument.
I ran the different estimations and performed the Sargan–Hansen test. The results show that:
- For the external instrument (death), the Sargan statistic is zero;
- For the internal instruments, the test is not significant (p-value > 0.05), meaning that the instruments are valid;
- When combining both internal and external instruments, the test remains non-significant (p-value > 0.05), confirming their joint validity.
With the Lewbel method, or more generally with GMM / 2SLS / ivreg2 / ivreg2h estimations, I sometimes obtain a negative or very small positive R².
So, is the R² really meaningful in this context?
From what I understand, a negative R² does not imply that the model is misspecified; rather, it suggests that the residual variance has increased as a result of correcting for endogeneity — which often happens when the instruments are weak or only weakly correlated with the endogenous variable.
I was wondering whether there are references in the literature that discuss this issue, or if it is a common finding in IV/GMM models.
For reference, my dependent variable is access to the labor market (Currently working, binary: 0 = No, 1 = Yes).

Comment