Traditional 3SLS with reg3 vs GMM - GMM takes far too long to fit the model

Rijo John

Join Date: Jan 2017

Posts: 48
#1

Traditional 3SLS with reg3 vs GMM - GMM takes far too long to fit the model

13 Nov 2018, 09:10

Dear Statalist,

Estimating the model below with gmm and reg3 produce identical results. Yet, reg3 finishes the job in few seconds while gmm takes forever to do the same estimation. I am trying to understand why? Can someone clarify?

Code:

reg3 (y1 y2 y3 y4 = e1 e2 x1 x2), exog(z1 z2) endog(e1 e2) 3sls gmm (eq1: y1 - {xb1: e1 e2 x1 x2 _cons}) /// (eq2: y2 - {xb2: e1 e2 x1 x2 _cons}) /// (eq3: y3 - {xb3: e1 e2 x1 x2 _cons}) /// (eq4: y4 - {xb4: e1 e2 x1 x2 _cons}) /// , instruments(z1 z2 x1 x2) /// winitial(unadjusted, independent) wmatrix(unadjusted) twostep

I know this is a SUR+IV (3SLS) problem which reg3 handles well. However, since I have some heteroskedasticity issues in the data, I would like to eventually fit the GMM 3SLS using the robust wmatrix option, something which reg3 doesn't allow as 3SLS with reg3 returns inconsistent parameter estimates when there is heteroskedasticity.

Thanking you in advance,
Rijo.
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

13 Nov 2018, 13:30

Hi Rijo,

What you are doing is correct, and it is not a total surprise that gmm takes long.

reg3 uses a closed form matrix solution for the estimators. gmm on the other hand is a nonlinear (very general estimator) which searches numerically for the solution.

gmm does not use the known closed form matrix solution for this linear problem, but rather approaches it as a more general nonlinear problem.

Be happy that gmm converged at all, and converged on the right solution.
2 likes
Comment
Rijo John

Join Date: Jan 2017

Posts: 48
#3

13 Nov 2018, 18:24

Originally posted by Joro Kolev

Hi Rijo,

What you are doing is correct, and it is not a total surprise that gmm takes long.

reg3 uses a closed form matrix solution for the estimators. gmm on the other hand is a nonlinear (very general estimator) which searches numerically for the solution.

gmm does not use the known closed form matrix solution for this linear problem, but rather approaches it as a more general nonlinear problem.

Be happy that gmm converged at all, and converged on the right solution.

Thank you Joro. When I increase the number of equations to a few more (upto 8), the whole thing was completed in reg3 in less than a minute whereas, gmm took several hours to return the results. But, I am glad both gave the same results in the end. For the heteroskedasticity part, I have no choice, but to use gmm. Else, I have to forget the system estimation and do a simple equation by equation IV.

Anyways, thanks for your explanation on why gmm is taking much longer.
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

13 Nov 2018, 19:04

Unfortunately as of now we do not have a pre programmed solution if we want sureg or reg3 with robust variance, we need to go through the nonlinear gmm.

I have never had to use this on so many equations as you describe, and on small system I have never encountered the problem of gmm calculating for hours.

You have some options to help the solution:

1. You can try initial weighting matrix to be the identity matrix. Sometimes this helps, sometimes it makes things worse. (I am not showing this in the example below because in this example initial identity makes things worse).

2. You can try to center your moments.

3. You can feed the results from reg3 as initial values.

Here is an example of feeding initial values and centering, I have no idea whether this helps, because in this example the solution converges fast and without problems anyways:

Code:

. webuse klein, clear

. reg3 (consump wagepriv wagegovt) (wagepriv consump govt capital1)

Three-stage least-squares regression
--------------------------------------------------------------------------
Equation             Obs   Parms        RMSE    "R-sq"       chi2        P
--------------------------------------------------------------------------
consump               22       2    1.776297    0.9388     208.02   0.0000
wagepriv              22       3    2.372443    0.8542      80.04   0.0000
--------------------------------------------------------------------------

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
consump      |
    wagepriv |   .8012754   .1279329     6.26   0.000     .5505314    1.052019
    wagegovt |   1.029531   .3048424     3.38   0.001      .432051    1.627011
       _cons |    19.3559   3.583772     5.40   0.000     12.33184    26.37996
-------------+----------------------------------------------------------------
wagepriv     |
     consump |   .4026076   .2567312     1.57   0.117    -.1005764    .9057916
        govt |   1.177792   .5421253     2.17   0.030     .1152461    2.240338
    capital1 |  -.0281145   .0572111    -0.49   0.623    -.1402462    .0840173
       _cons |   14.63026   10.26693     1.42   0.154    -5.492552    34.75306
------------------------------------------------------------------------------
Endogenous variables:  consump wagepriv 
Exogenous variables:   wagegovt govt capital1 
------------------------------------------------------------------------------

. mat b = e(b)

. gmm (eq1: consump - {xb1: wagepriv wagegovt _cons}) (eq2: wagepriv - {xb2: consump govt capital1 _cons}), instru
> ments(wagegovt govt capital1) winitial(unadjusted, independent)  wmatrix(unadjusted) center twostep from(b)

Step 1
Iteration 0:   GMM criterion Q(b) =  .33947959  
Iteration 1:   GMM criterion Q(b) =  .22175631  
Iteration 2:   GMM criterion Q(b) =  .22175631  

Step 2
Iteration 0:   GMM criterion Q(b) =  .09716589  
Iteration 1:   GMM criterion Q(b) =  .07028208  
Iteration 2:   GMM criterion Q(b) =  .07028208  

GMM estimation 

Number of parameters =   7
Number of moments    =   8
Initial weight matrix: Unadjusted                 Number of obs   =         22
GMM weight matrix:     Unadjusted

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
xb1          |
    wagepriv |   .8012754   .1279329     6.26   0.000     .5505314    1.052019
    wagegovt |   1.029531   .3048424     3.38   0.001      .432051    1.627011
       _cons |    19.3559   3.583772     5.40   0.000     12.33184    26.37996
-------------+----------------------------------------------------------------
xb2          |
     consump |   .4026076   .2567312     1.57   0.117    -.1005764    .9057916
        govt |   1.177792   .5421253     2.17   0.030     .1152461    2.240338
    capital1 |  -.0281145   .0572111    -0.49   0.623    -.1402462    .0840173
       _cons |   14.63026   10.26693     1.42   0.154    -5.492552    34.75306
------------------------------------------------------------------------------
Instruments for equation eq1: wagegovt govt capital1 _cons
Instruments for equation eq2: wagegovt govt capital1 _cons

Last edited by Joro Kolev; 13 Nov 2018, 19:06.

Comment

Rijo John

Join Date: Jan 2017

Posts: 48
#5

14 Nov 2018, 01:41

Originally posted by Joro Kolev

Unfortunately as of now we do not have a pre programmed solution if we want sureg or reg3 with robust variance, we need to go through the nonlinear gmm.

I have never had to use this on so many equations as you describe, and on small system I have never encountered the problem of gmm calculating for hours.

You have some options to help the solution:

1. You can try initial weighting matrix to be the identity matrix. Sometimes this helps, sometimes it makes things worse. (I am not showing this in the example below because in this example initial identity makes things worse).

2. You can try to center your moments.

3. You can feed the results from reg3 as initial values.

Here is an example of feeding initial values and centering, I have no idea whether this helps, because in this example the solution converges fast and without problems anyways:

Thank you again Joro and appreciate you taking time to explain this. I tried your solution on feeding the result from reg3 to gmm. It did improve the time a little bit. But, still several hours for a 8 equation system compared to few seconds in reg3.

By the way, I have a related question:

If I estimate the standard errors in reg3 with a bootstrap prefix, it will return bootstrap standard errors. However, if hetroskedasticity is actually present,
this process alone is not solving it right? As I understand, even the parameter estimates are inconsistent in reg3 if there is heteroskedasticity. Correct me if I am wrong on this.

Thanks,
Rijo.

Last edited by Rijo John; 14 Nov 2018, 02:28.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#6

14 Nov 2018, 03:53

Hi Rijo,

Estimating reg3 with bootstrap prefix is an option and will give you the correct variance under heteroskedasticity. I thought about this, but I did not propose it to you because you mentioned something like that it takes "less than a minute to estimate reg3". Say it is a minute, if you do 99 bootstrap replications, this is 99 minutes. If you want (more realistically) to do 999 replications, this is 999 minutes, which is worse than the current situation with gmm.

Otherwise reg3 is still consistent under heteroskedasticity, but reg3 is no longer the optimal estimator. There is another estimator which Wooldridge 2010, Chapter 8, even calls the Three Stage Least Squares, and he calls what reg3 does "the traditional three stage least squares" (if I remember correctly).

The relationship between reg3 with robust standard errors and the optimal system GMM three stage least squares is the same as the relationship between 2SLS with robust variance, and the optimal single equation GMM estimator.

reg3 is still consistent, but no longer the optimal estimator under system heteroskedasticity. There is nothing wrong in doing reg3 with bootstrap prefix, it will give you consistent parameter estimates and the right variance. Note also that this optimal system GMM three stage least squares is more efficient only asymptotically, so there is not guarantee that even if you could implement it (and you could easily implement it but again through gmm), that it will give you better results than just reg3 with bootstrap prefix in finite samples.
1 like
Comment
Rijo John

Join Date: Jan 2017

Posts: 48
#7

14 Nov 2018, 05:10

Originally posted by Joro Kolev

Hi Rijo,

Estimating reg3 with bootstrap prefix is an option and will give you the correct variance under heteroskedasticity. I thought about this, but I did not propose it to you because you mentioned something like that it takes "less than a minute to estimate reg3". Say it is a minute, if you do 99 bootstrap replications, this is 99 minutes. If you want (more realistically) to do 999 replications, this is 999 minutes, which is worse than the current situation with gmm.

Otherwise reg3 is still consistent under heteroskedasticity, but reg3 is no longer the optimal estimator. There is another estimator which Wooldridge 2010, Chapter 8, even calls the Three Stage Least Squares, and he calls what reg3 does "the traditional three stage least squares" (if I remember correctly).

The relationship between reg3 with robust standard errors and the optimal system GMM three stage least squares is the same as the relationship between 2SLS with robust variance, and the optimal single equation GMM estimator.

reg3 is still consistent, but no longer the optimal estimator under system heteroskedasticity. There is nothing wrong in doing reg3 with bootstrap prefix, it will give you consistent parameter estimates and the right variance. Note also that this optimal system GMM three stage least squares is more efficient only asymptotically, so there is not guarantee that even if you could implement it (and you could easily implement it but again through gmm), that it will give you better results than just reg3 with bootstrap prefix in finite samples.

Thank you again. Yes, I've seen Wooldridge 2010 ch 8 on this. He calls it a GMM 3SLS estimator as against a traditional 3SLS (which reg3 does). But, as I understood, with heteroskedasticity, the 3SLS parameters as well as covariance matrix are inconsistent. Wooldridge says, and I quote "without Assumption SIV.5 (system homoskedasticity assumption), the 3SLS estimator is generally less efficient, asymptotically, than the minimum chi-square estimator, and the asymptotic variance estimator for 3SLS in equation (8.41) is inappropriate". Cameron and Trivedi (Microeconometrics using stata) is even more explicit about it and says "3SLS estimator becomes inconsistent if errors are heteroskedastsic, and errors are often heteroskedastic".

I also read in few other places that heteroskedasticity will leave both traditional 3SLS parameter and standard errors inconsistent. This is why I was reluctant to use the bootstrap with reg3.

Since my reg3 took only few seconds (perhaps some 20 seconds), a 1000 bootstrap replications should be over in about 5 hours. And I guess, that was pretty much the time it took for gmm too. So, it may not make a big deal in practice. But, still like to confirm if a bootstrap with reg3 makes for a consistent result (both coefficient and standard errors).

Edit: Let me also quote a para from Wooldridge: "Given the fact that the GMM estimator using expression (8.32) as the weighting matrix (i.e., the GMM 3SLS estimator) is never worse, asymptotically, than 3SLS, and in some important cases is strictly better, why is 3SLS ever used? There are at least two reasons. First, 3SLS has a long history in simultaneous equations models, whereas the GMM approach has been around only since the early 1980s, starting with the work of Hansen (1982) and White (1982b). Second, the 3SLS estimator might have better finite sample properties than the optimal GMM estimator when Assumption SIV.5 (i.e., homoskedasticity assumption) holds. However, whether it does or not must be determined on a case-by-case basis."

So, the finite sample advantage that reg3 may have is only in case of the homoskedasticity.

Thank you again for your continued support.

Best,
Rijo.

Last edited by Rijo John; 14 Nov 2018, 05:42.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#8

15 Nov 2018, 06:13

In my view you do not correctly interpret what Wooldridge (2010) says.

Under SIV1, SIV2 and SIV3 the traditional three stage least squares (reg3) is consistent. It is no longer optimal and the usual standard errors are wrong, it is optimal only under SIV.5 (system homoskedasticity) and only under SIV.5 the usual standard errors are correct.

Not only this, but also what Wooldridge calls the System two stage least squares is consistent (under SIV1, SIV2 and SIV3) too. The bottom line is that the GLS/SUREG weighting that we do does not affect consistency, but only optimality.

Up to here we just have different interpretations of what Wooldridge says. From here on, and concerning your Edit, different schools of thought emerge:

Wooldridge asks, in my opinion and in my words, If under SIV.5 the System GMM three stage least squares, and the traditional three stage least squares are both optimal, why would anybody ever want to use the traditional three stage least squares?

My answer is as follows, and this is just my opinion, there is no agreement on that in the literature. The traditional three stage least squares estimates second order moments in the error term for the weighting matrix, that is cross moments that have the form Sum(u*u'). The GMM three stage least squares estimates fourth order moments including both the error and the regressors of the form Sum(Xu*u'X'). Asymptotically it might make no difference and both might be optimal under SIV.5, but in finite samples lower order moments are a lot more precisely estimated. Hence to me it makes perfect sense to be "minimal" and not overly ambitious, and to go just for reg3 with robust standard errors, rather than for a full blown asymptotically optimal GMM three stage least squares.

I do not know why Cameron and Trivedi say what they say. To my knowledge what you quote from them is incorrect. Note that the way how you quoted it ""3SLS estimator becomes inconsistent if errors are heteroskedastsic, and errors are often heteroskedastic", it is not clear to what "estimator" this statement refers. The estimator of the variance is also an estimator, and this is what they might mean. Or maybe they just say something which is wrong.

I would be interested to hear which are those other references that you have in mind when you say "I also read in few other places that heteroskedasticity will leave both traditional 3SLS parameter and standard errors inconsistent."
1 like
Comment
Rijo John

Join Date: Jan 2017

Posts: 48
#9

15 Nov 2018, 07:33

Thank you Joro and appreciate you taking time to respond again. I agree with your interpretation of Wooldridge. I read it once again and see that he is actually talking about the traditional 3SLS being less efficient and its variance estimator being inappropriate in the presence of heteroskedasticity. He’s not exactly saying the parameters become inconsistent. I was a bit confused with his statement on page 197 that “The GMM 3SLS estimator is guaranteed to be consistent under Assumptions SIV.1–SIV.3, while the traditional 3SLS estimator is not.”

As for Cameron and Trivedi, Microeconometrics using Stata, revised edition 2010: I’m quoting the full para to give you the correct context. It appears to me that they are referring to the parameter estimates for 3SLS and not the variance estimator. See the full para quoted here. “Under the strong assumption that errors are i.i.d., more-efficient estimation is possible by exploiting cross-equation correlation of errors, just as for the SUR model discussed in section .5.4. This estimator is called the three-stage least-squares (3SLS) estimator. We do not pursue it in detail, however, because the 3SLS estimator becomes inconsistent if errors are heteroskedastic, and errors are often heteroskedastic.”

My conclusion on the 3SLS estimator becoming inconsistent under heteroscedasticity was also guided by the lecture notes by some professors available online. For example, “If there is heteroscedasticity, then both 3SLS and I3SLS produce inconsistent estimates of the parameters and they should not be used” from the Econ 515 notes by Prof. Thornton, Dept. of Econ. Eastern Michigan University.
https://people.emich.edu/jthornton/t...ions_model.doc

I don’t intend to argue, but only hoping for better clarity. Your point about 3SLS’ use of second order moments in the weighting matrix and the resulting advantage in finite samples makes sense to me. But that holds only under SIV.5, i.e., when there is system homoskedasticity right? And I would any day like to use reg3 when SIV.5 holds as that saves a lot of computing time for me. However, does your argument still hold when SIV.5 doesn't hold? In other words, the need for "go just for reg3 with robust standard errors, rather than for a full blown asymptotically optimal GMM three stage least squares" arise in the first place only when SIV.5 doesnt hold right? Of course, the answer is that 3SLS (reg3) parameter estimates are still consistent. But then, as we have already talked above, the computing time for a 1000 bootstrap replications and that for a single gmm 3SLS (in my special case) is more or less the same.

Thank you once again for this discussion. I must say, I learnt quite a bit too in the process.
Comment

Announcement