Unable to replicate IV 2SLS and GMM results from published paper, with supplied do-file and .dta files

Ryan Patel

Join Date: Apr 2016

Posts: 11
#1

Unable to replicate IV 2SLS and GMM results from published paper, with supplied do-file and .dta files

05 Apr 2016, 19:34

Dear all,

This is my first post on Statalist, so please excuse any errors.

The paper in question is http://onlinelibrary.wiley.com/doi/1...ssa.12048/full. It is open access so I've attached the main article and online appendix for convenience as PDFs. The users also make available their .dta files and do-file in a .txt document, which is downloadable from http://onlinelibrary.wiley.com/journ...Marcellino.zip - I've attached them unzipped version for convenience.

The authors used Stata 11 with the latest versions (at the time) of -xtivreg2-, -ivreg2-, -ranktest-, -bacon-. I'm running Stata 13 and have installed the latest versions of those programs.

What I've done so far is download the files, and run the relevant section of the _do.txt file for "TABLES 2, 3 and 4". In Table 2 of the main article (page 113 .pdf), I've been able to replicate columns (1) (3) and (5) but am unable to locate the numbers for the IV-2SLS baseline model in columns (2) (4) and (6). I would like some assistance to see if I'm missing something or if something has changed in the -xtivreg2- or -ivreg2- commands. In the attached "5 April 2016.txt" log file column (1) of Table 2 refers to line 221, column (3) refers to line 764 and column (5) refers to line 1493 (these are all Model 3.1 - where .1 refers to the basic OLS-FE model (line 138 of the _do.txt file ... "* OLS-FE model for comparison (TABLE 2)"). Columns (2) (4) and (6) are from my understanding the baseline Model 3.2 (line 142 of the _do.txt file).

This should give me appropriate results but in my "5 April 2016.txt" log file if you search for Model 3.2 from what I understand, lines 244-326 refer to Column (2) of Table 2 [mrc_ihme is under-5 mortality rate], lines 787-869 refer to Column (4) [mr_fem is female mortality rate] and lines 1518 to 1600 refer to Column (6) [mr_mal is male mortality rate]. For convenience I've attached a screenshot of Table 2 extracted with some annotations of the corresponding variable names in the Stata regression output.

I've also attached a .txt with Model 3.2 extracted from the full "5 April 2016.txt" log file.

Thank you for any assistance you can provide.
Attached Files

Moreno-Serra_et_al-2015-Journal_of_the_Royal_Statistical_Society-_Series_A_(Statistics_in_Society).sup-1.pdf (199.2 KB, 1 view)

Moreno-Serra_et_al-2015-Journal_of_the_Royal_Statistical_Society-_Series_A_(Statistics_in_Society).pdf (835.3 KB, 1 view)

5 April 2016.txt (136.8 KB, 1 view)

Model 3.2 all.txt (15.9 KB, 1 view)

Broader health coverage is good for the nation's health: evidence from country level panel data - Moreno-Serra - 2014 - Journal of the Royal Statistical Society: Series A (Statistics in Society) - Wiley Online Library

http://onlinelibrary.wiley.com

Main Paper (Open Access)

Last edited by Ryan Patel; 05 Apr 2016, 19:38. Reason: Fix images, upload files
Tags: None
Ryan Patel

Join Date: Apr 2016

Posts: 11
#2

05 Apr 2016, 19:35

There is a limit on file attachments, so please find the extracted .zip file attached.
Attached Files

Readme.txt (1.6 KB, 1 view)

Moreno-Serra_Smith_JRSSA2013_do.txt (14.2 KB, 1 view)

Moreno-Serra_Smith_JRSSA2013_data_A2-A6.dta (452.1 KB, 1 view)

Moreno-Serra_Smith_JRSSA2013_data_A7.dta (747.2 KB, 1 view)

Moreno-Serra_Smith_JRSSA2013_data_main.dta (750.2 KB, 1 view)
Comment
Mark Schaffer

Join Date: Mar 2014

Posts: 324
#3

06 Apr 2016, 05:29

ivreg2 now has a limited kind of version control built in. When we upgrade so that it requires a more recent version of Stata in order to run, we freeze the existing version so that it will continue to run under the older Stata. You can invoke these earlier versions of ivreg2 using Stata's version command. For example,

Code:

version 10: ivreg2 y (x=z)

invokes the version of ivreg2 that ran under Stata 10 before it required Stata 11. You can try running the code under versions 10, 9 and 8 and see if one of those replicates.
Comment
Ryan Patel

Join Date: Apr 2016

Posts: 11
#4

06 Apr 2016, 08:17

Originally posted by Mark Schaffer View Post

ivreg2 now has a limited kind of version control built in. When we upgrade so that it requires a more recent version of Stata in order to run, we freeze the existing version so that it will continue to run under the older Stata. You can invoke these earlier versions of ivreg2 using Stata's version command. For example,

Code:

version 10: ivreg2 y (x=z)

invokes the version of ivreg2 that ran under Stata 10 before it required Stata 11. You can try running the code under versions 10, 9 and 8 and see if one of those replicates.

Thank you Mark!

Code:

version 10: xi: xtivreg2 mrc_ihme gdppc edu_prie pop14 pop65 i.year (hcepc_pu hcepc_oop hcepc_vhi im_md = z3_hcepc_pu_mrc_ihme z3_hcepc_oop_mrc_ihme z3_hcepc_vhi_mrc_ihme z3_im_md_mrc_ihme), fe gmm2s clu(country) small

replicated the results in column (2) of Table 2, and similar commands changing the dependent variable replicated columns (4) and (6). Please could you confirm what is different between version 10 and the current version, to help me choose which one I should use?
Comment
Mark Schaffer

Join Date: Mar 2014

Posts: 324
#5

06 Apr 2016, 09:49

Hard to say. There are differences in collinearities are handled, for example, so you could get different results if different variables are dropped to remove the collinearity. Or it could be a bug fix in the more up-to-date version. Or something else. You might be able to tell by comparing the version notes at the bottom of the ado files for ivreg2.ado and ivreg210.ado.
Comment

Ryan Patel

Join Date: Apr 2016
Posts: 11

06 Apr 2016, 10:57

Originally posted by Mark Schaffer View Post

You might be able to tell by comparing the version notes at the bottom of the ado files for ivreg2.ado and ivreg210.ado.

Okay thank you. I'm still quite new to Stata, I think I found the two .ado files ivreg210.ado is version 3.1.10 and ivreg2.ado has the following changes:

Code:

4.0.00 25Jan15. Promote to require Stata version 11.2
* Rewrite of s_gmm1s, s_iegmm, s_egmm etc. to use matrix solvers rather than inversion.
* rankS and rankV now calculated along with estimators; rankS now always saved.
* Returned to use of _rmcollright to detect collinearities since bug was in Stata 10's _rmcollright and now not relevant.
* Added reporting of collinearities and duplicates in replay mode.
* Rewrite of legacy support for previous ivreg2x version. Main program calls ivreg2x depending on _caller().
* Estimation and replay moved to ivreg211 subroutine above.
* 4.0.01 8Feb15. Fixed bug in default name and command used used for saved first and RF equations
* Fixed bug in saved command line (was ivreg211, should be ivreg2).
* 4.0.02 9Feb15. Changed forced exit at Stata <11 before continuing loading to forced exit pre-Mata code at Stata <9.
* 4.1.00 Substantial rewrite to allow factor variables. Now also accepts TS ops as well as FV ops in partial varlist.
* Rewrite included code for dropped/collinear/reclassified.
* Saved RF and 1st-stage estimations have "if e(sample)" instead of "if `touse'" in e(cmdline).
* Rewrite of s_gmm1s etc. to use qrsolve if weighting matrix not full rank or cholsolve fails
* Fixed bug in display subroutines that would display hyperlink to wrong (nonexistent) help file.
* 4.1.01 15Jun15. Fixed bug that did not allow dropped variables to be in partial(.) varlist.
* Major rewrite of parsing code and collinearity/dropped/reclassified code.
* Added support for display options noomitted, vsquish, noemptycells, baselevels, allbaselevels.
* Changed from _rmcoll/_rmcollright/_rmcoll2list to internal ivreg2_rmcollright2
* Changed failure of ranktest to obtain id stats to non-fatal so that estimation proceeds.
* Removed recount via _rmcoll if noid option specified
* Added partial(_all) option.
* Improved checks of smatrix, wmatrix, b0 options
* Rewrite of first-stage and reduced form code; rewrite of replay(.) functionality
* Added option for displaying system of first-stage/reduced form eqns.
* Replaced AP first-stage test stats with SW (Sanderson-Windmeijer) first-stage stats
* Corrected S LM stat option; now calcuated in effect as J stat for case of no endog (i.e. b=0)
* with inexog partialled out i.e. LM version of AR stat; now matches weakiv
* Undocumented FV-related options: fvsep (expand endo, inexog, exexog separately) fvall (expand together)
* 4.1.02 17Jun15. Fixed bug in collinearity check - was ignoring weights.
* More informative error message if invalid matrix provided to smatrix(.) or wmatrix(.) options.
* Caught error if depvar was FV or TS var that expanded to >1 variable.
* 4.1.03 18Jun15. Fixed bug with robust + rf option.
* 4.1.04 18Jun15. Fixed bug in AR stat with dofminus option + cluster (was subtracting dof, shouldn't).
* 4.1.05 18Jun15. Added rmse, df_m, df_r to saved RF and first-stage equation results.
* 4.1.06 4July15. Replaced mvreg with Mata code for partialling out (big speed gains with many vars).
* Rewrote AddOmitted to avoid inefficient loop; replaced with Mata subscripting.
* Failure of id stats because of collinearities triggers error message only; estimation continues.
* Calculation of dofs etc. uses rankS and rankV instead of iv1_ct and rhs1_ct;
* counts are therefore correct even in presence of collinearities and use of nocollin option.
* nocollin options triggers use of QR instead of default Cholesky.
* rankxx and rankzz now based on diag0cnt of (XX)^-1 and (ZZ)^-1.
* CUE fails if either S or V not full rank; can happen if nocollin option used.
* Added undocumented useqr option to force use of QR instead of Cholesky.
* Misc other code tweaks to make results more robust to nocollin option.
* 4.1.07 12July15. Fixed bugs in calculation of rank(V) (had miscounted in some cases if omega not full rank)
* Changed calc of dofs etc. from rankS and rankV to rankzz and rankxx (had miscounted in some cases etc.).
* Restored warning message for all exog regressors case if S not full rank.
* 4.1.08 27July15. Replaced wordcount(.) function with word count macro in AddOmitted;
* AddOmitted called only if any omitted regressors to add.
* Added center option for centering moments.
* 4.1.09 20Aug15. Expanded error message for failure to save first-stage estimations (var name too long).
* Fixed bug when weighting used with new partial-out code (see 4.1.06 4July15).
* Tweaked code so that if called under Stata version < 11, main ivreg2.ado is exited immediately after
* loading parent ivreg2 program. Removed automatic use of QR solver when nocollin option used.
* Added saved condition numbers for XX and ZZ.
* e(cmdline) now saves original string including any "s (i.e., saves `0' instead of `*').
* 4.1.10 Fixed bug with posting first-stage results if sort had been disrupted by Mata code.
* Fixed bug which mean endog(.) and orthog(.) varlists weren't saved or displayed.

Reading through I think it could be either

* 4.0.00 Rewrite of s_gmm1s, s_iegmm, s_egmm etc. to use matrix solvers rather than inversion.
* 4.0.00 Returned to use of _rmcollright to detect collinearities since bug was in Stata 10's _rmcollright and now not relevant.
* 4.1.00 Rewrite of s_gmm1s etc. to use qrsolve if weighting matrix not full rank or cholsolve fails
* 4.1.02 Fixed bug in collinearity check - was ignoring weights.

For the purposes of my work, I'm thinking the best step would be to run the code with the latest version and version 10, then compare the results to see if they're substantively different then choose based on this?

Comment

Mark Schaffer

Join Date: Mar 2014

Posts: 324
#7

06 Apr 2016, 11:12

Are there any collinearities being detected? This should be reported in the output. (BTW it's not the bug fix in 4.1.02 since you aren't using weights in the estimation.)
Comment

Ryan Patel

Join Date: Apr 2016
Posts: 11

06 Apr 2016, 12:45

Yes. Take for example the regression of Table 2's Column (2) the latest IV reg output has the output

Code:

Warning - collinearities detected
Vars dropped:  z3_hcepc_vhi_mrc_ihme z3_im_md_mrc_ihme hcepc_pu

------------------------------------------------------------------------------
Instrumented:         hcepc_pu
Included instruments: hcepc_oop hcepc_vhi im_md gdppc edu_prie pop14 pop65
                      _Iyear_1996 _Iyear_1997 _Iyear_1998 _Iyear_1999
                      _Iyear_2000 _Iyear_2001 _Iyear_2002 _Iyear_2003
                      _Iyear_2004 _Iyear_2005 _Iyear_2006 _Iyear_2007
                      _Iyear_2008
Excluded instruments: z3_hcepc_pu_mrc_ihme z3_hcepc_oop_mrc_ihme
Dropped collinear:    z3_hcepc_vhi_mrc_ihme z3_im_md_mrc_ihme hcepc_pu
Reclassified as exog: hcepc_oop hcepc_vhi im_md

Whereas the version 10 ivreg2 does not detect any collinearity:

Code:

------------------------------------------------------------------------------
Instrumented:         hcepc_pu hcepc_oop hcepc_vhi im_md
Included instruments: gdppc edu_prie pop14 pop65 _Iyear_1996 _Iyear_1997
                      _Iyear_1998 _Iyear_1999 _Iyear_2000 _Iyear_2001
                      _Iyear_2002 _Iyear_2003 _Iyear_2004 _Iyear_2005
                      _Iyear_2006 _Iyear_2007 _Iyear_2008
Excluded instruments: z3_hcepc_pu_mrc_ihme z3_hcepc_oop_mrc_ihme
                      z3_hcepc_vhi_mrc_ihme z3_im_md_mrc_ihme
------------------------------------------------------------------------------

This seems to be the source of the problem, 'reclassified as exog' are variables which we believe to be endogenous (hence the reason for running the IV estimation in the first place, using the 'dropped collinear' variables as the instruments'). The full output for the latest -ivreg2- is in Model 3.2 all.txt attached to the first post. I can upload the output for the version 10 if that will help?

Comment

Mark Schaffer

Join Date: Mar 2014

Posts: 324
#9

06 Apr 2016, 12:52

I think I see what is going on. What if you try running using the current version of ivreg2 but with the nocollin option? That will stop it from reclassifying the variables as exog etc.
Comment
Ryan Patel

Join Date: Apr 2016

Posts: 11
#10

06 Apr 2016, 13:48

Great, in that case it reports the same as the version 10 of -ivreg2- for Columns (2) (4) and (6). Would it be appropriate to use this option?

Thinking through the econometrics: An instrument needs to be relevant and exogenous - that is for an instrument z, error term e and endogenous regressor x, Cov(Z,e) = 0, which will be true if instrument is valid (exogeneity) and then relevant so Cov (z,x) != 0 - hence instrument is correlated with the endogenous explanatory variable?

Please could you confirm if you know what level of correlation -ivreg2- uses before it reports variables to collinear?

Thanks again
Comment
Mark Schaffer

Join Date: Mar 2014

Posts: 324
#11

06 Apr 2016, 14:49

ivreg2 uses Stata's internal routines to detect collinearity, so as long as there are no scaling problems with the variables, it needs to be pretty much perfectly collinear.

The problem is that the newer version of ivreg2 is getting caught out by unusual combinations of variables when it tries to work out whether there are collinearities between the different variable lists (included exogenous, excluded exogenous, endogenous). The program tries to catch cases where there are perfect collinearities between endegeous regressors and instruments, which if undetected would mean that you think you have an endogeous regressor but in fact, since it's perfectly predicted by instruments, it's essentially being treated as exogenous. But the current code, in some odd cases, will mistakenly think is is happening when it isn't. The program is clear about what it is doing when this happens, so when it arises it's clear to user that something is amiss, and the results that are reported are fine (just not the specification that was requested). It's rare (this is the second time I've seen it happen) but it needs fixing. Unfortunately, it means (another) rewrite of the collinearity code to fix properly. On my to-do list....
Comment
Ryan Patel

Join Date: Apr 2016

Posts: 11
#12

06 Apr 2016, 15:26

Thank you for the clarification. So in my case if I use the nocollin option if I'm following the exact methodology here for the construction of instruments, this should work as expected? I imagine when I apply this methodology to my own dataset I can run the code without the nocollin option, and if it shows similar problems of collinearity then add nocollin?
Comment
Mark Schaffer

Join Date: Mar 2014

Posts: 324
#13

06 Apr 2016, 15:34

Yes, that's basically right. Except that in the vast majority of cases, if collinearity is detected, it will be a genuine one and adding the nocollin option won't be a good idea. It's only if ivreg2 relcassifies some endogenous variables as exogenous that you might want to consider the nocollin option. And/or use ivreg2 with version 10: to confirm that you get the same results.
Comment

Announcement