SUR with Roodman's cmp command: convergence/speed problem

Tatjana Schulze

Join Date: May 2017
Posts: 3

SUR with Roodman's cmp command: convergence/speed problem

22 May 2017, 13:21

Hi,

I am trying to estimate a SUR model with 40 equations, 5 regressors per equation and linear coefficient constraints across the 40 equations using Roodman's cmp command. Each of the 40 equations corresponds to one country. Each country has ~110 to 180 monthly observations. The coefficients are restricted to be equal across countries, i.e. across equations.

The main reason why I want to use cmp rather than sureg is that I have an unbalanced panel of data. I am following the suggestion by Kit Baum to use cmp as an equivalent command to sureg: http://fmwww.bc.edu/EC-C/S2016/8823/...n14.slides.pdf

After having specified the equations and constraints in macros and extended the matsize, I have tried to run the SUR regression using cmp. However, the estimation does not achieve convergence.

I have specified the regression as follows:

Code:

*------------------------------------------------------------------------------*
* Step 1: Create macros for the cmp regression
*------------------------------------------------------------------------------*

// Define equations
forvalues i = 1/40 {
    global c`i' "(r2liq_`i' = mktret_`i' mktvol_`i' mktliq_`i' mktturn_`i' time)"
    di "$c`i'"
 }


 // Define coefficient constraints for 40 equations (à 40 countries)
 forvalues i = 1/39 {
    local j = `i' + 1
    local k = `i' + 39
    local l = `i' + 2*39
    local m = `i' + 3*39
    local n = `i' + 4*39
    constraint `i' [r2liq_`i']mktret_`i' = [r2liq_`j']mktret_`j'
    constraint `k' [r2liq_`i']mktvol_`i' = [r2liq_`j']mktvol_`j'
    constraint `l' [r2liq_`i']mktliq_`i' = [r2liq_`j']mktliq_`j'
    constraint `m' [r2liq_`i']mktturn_`i' = [r2liq_`j']mktturn_`j'
    constraint `n' [r2liq_`i']time = [r2liq_`j']time
 }
*------------------------------------------------------------------------------*v
* Step 2: Run the cmp regression
*------------------------------------------------------------------------------*

set matsize 1500
// SUR using cmp
 cmp $c1 $c2 $c3 $c4 $c5 $c6 $c7 $c8 $c9 $c10 $c11 $c12 $c13 $c14 $c15 $c16 ///
 $c17 $c18 $c19 $c20 $c21 $c22 $c23 $c24 $c25 $c26 $c27 $c28 $c29 $c30 $c31 ///
 $c32 $c33 $c34 $c35 $c36 $c37 $c38 $c39 $c40, ///
 indicators(1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1) ///
 constraint(1-195) ///
 nonrtolerance tech(dfp nr) difficult interactive

When I tried running it for the first time without specifying nonrtolerance tech(dfp nr) difficult interactive yet, it took Stata 48 hours for only 11 iterations of the ML but it would still indicate "not concave".

I followed the "Tips for achieving and speeding convergence" but cmp still cannot handle the SUR model specification. It does not go farer than this:

Code:

      Source |       SS       df       MS              Number of obs =     180
-------------+------------------------------           F(  5,   174) =   41.05
       Model |  7.05891694     5  1.41178339           Prob > F      =  0.0000
    Residual |  5.98415464   174  .034391693           R-squared     =  0.5412
-------------+------------------------------           Adj R-squared =  0.5280
       Total |  13.0430716   179  .072866322           Root MSE      =  .18545

------------------------------------------------------------------------------
    r2liq_40 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   mktret_40 |  -.2172259   .3576781    -0.61   0.544    -.9231722    .4887203
   mktvol_40 |   .0461518   .0331787     1.39   0.166    -.0193328    .1116364
   mktliq_40 |  -17195.91   19340.27    -0.89   0.375    -55367.64    20975.82
  mktturn_40 |   50.69199   15.18905     3.34   0.001     20.71348    80.67049
        time |   .0011005   .0010363     1.06   0.290    -.0009449    .0031459
       _cons |  -1.668928   .0879425   -18.98   0.000      -1.8425   -1.495357
------------------------------------------------------------------------------

Warning: regressor matrix for r2liq_40 equation appears ill-conditioned. (Condition number = 1
> 57.03271.)
This might prevent convergence. If it does, and if you have not done so already, you may need 
> to remove nearly
collinear regressors to achieve convergence. Or you may need to add a nrtolerance(#) or nonrto
> lerance option to the command line.
See cmp tips.

Fitting full model.

(setting technique to dfp)
Iteration 0:   log likelihood = -579061.22  
Iteration 1:   log likelihood = -453965.72  (backed up)
Iteration 2:   log likelihood = -189756.27  (backed up)
Iteration 3:   log likelihood = -59124.591  (backed up)
Iteration 4:   log likelihood = -34253.207  (backed up)
(switching technique to nr)

Is there any way to speed up the process? Or else, is there another way to estimate a SUR with an unbalanced panel AND linear cross-equation coefficient constraints? The latter question is related to my previous post:
http://www.statalist.org/forums/foru...ross-equations

Many thanks in advance!
Tatjana

Tags: cmp, convergence, panel data, regression, sur

David Roodman

Join Date: Jul 2014

Posts: 470
#2

23 May 2017, 07:57

Tatjana,
The covariance matrix for the errors will be a symmetric 40x40 matrix. So it will have 40*41/2 = 820 independent entries that must be estimated in your set-up, which is vastly more parameters than observations. So this approach essentially cannot work. You might try covar(exchangeable), which imposes the requirement that all of the cross-equation error correlations are the same.
--David
Comment
Tatjana Schulze

Join Date: May 2017

Posts: 3
#3

29 May 2017, 08:24

Hi David,

Many thanks for your response. I added the covar(exchangeable) option to cmp as you suggested and it worked! It still took ~1.5 hours to converge. The coefficient constraints were implemented by Stata as specified. However, the estimates are far off what they are supposed to be (I'm initially trying to replicate the estimates of an existing paper). Correct me if I'm wrong but the covar(exchangeable) option does not correctly account for contemporaneous correlation in the errors across equations which the SUR model assumes.

I forgot to mention in my previous post that the intercepts are also restricted to be the same:

Y1 = a + b1*X11 + … + b9*X91 + e1
Y2 = a + b1*X12 + … + b9*X92 + e2
.
.
.
Y40 = a + b1*X140 + … + b9*X940 + e40

where the intercept a and the coefficients b1 to b9 are restricted to be the same across the 40 equations.

If I am not mistaken, this model setup essentially boils down to something like pooled OLS with contemporaneous correlation in the errors. While pooled OLS is consistent in this context, it is not efficient. Alternatively, I believe FGLS should be used to account for spatial dependence/correlation across panels (countries in this case).

I have been looking into solutions that Stata offers to account for contemporaneous correlation and unbalanced panels. The three commands that may come closest to SUR with coefficient restrictions are:
xtgls: fits panel-data linear models by using FGLS, allows estimation in the presence of AR(1) autocorrelation within panels and cross-sectional correlation and heteroskedasticity across panels. Unfortunately it cannot handle unbalanced panels.

xtpcse: calculates panel-corrected standard error (PCSE) estimates for linear cross-sectional timeseries models where the parameters are estimated by either OLS or Prais–Winsten regression. When computing the standard errors and the variance–covariance estimates, xtpcse assumes that the disturbances are, by default, heteroskedastic and contemporaneously correlated across panels.

xtscc: produces Driscoll and Kraay (1998) standard errors for coefficients estimated by pooled OLS/WLS or fixed-effects (within) regression. The error structure is assumed to be heteroskedastic, autocorrelated up to some lag, and possibly correlated between the groups (panels). Driscoll-Kraay standard errors are robust to very general forms of cross-sectional ("spatial") and temporal dependence when the time dimension becomes large. The argument is presented here: http://fmwww.bc.edu/repec/bocode/x/xtscc_paper

xtgee: fits population-averaged panel-data models. In particular, xtgee fits generalized linear models and allows you to specify the within-group correlation structure for the panels. McDowell (2004) offers instructions of how to implement a SUR regression with xtgee. This requires to scale all variables and intercepts by the RMSE from individual OLS regressions of each SUR equation (why?) and convert the panel dataset from Stata’s wide format into long format. Finally, the rescaled regressors and intercepts are regressed on the rescaled independent variables using xtgee. I attempted this procedure but the resulting estimates were far off the ones from the original paper. I might have made a mistake with the coefficient and intercept restrictions.

Which of these approaches would you deem to be the most efficient and legitimate/ closest to SUR? Or is there any other way I could work this out with cmp?

Best,
Tatjana
Comment

Announcement

SUR with Roodman's cmp command: convergence/speed problem

Comment

Comment