MM robust regression residuals

Sanaullah Farooq

Join Date: Mar 2016

Posts: 107
#1

MM robust regression residuals

10 Apr 2016, 18:31

Hello, I am following a specific econometrics model in Finance where I need to carry MM robust regression and then I need to extract the estimated values and residuals values of this regression. I downloaded the MM robust regression package in STATA and did my regression analysis by using "mmregress" command.

I need the residuals, estimated values and r square of this regression. But I cannot get it through any of the conventional procedure. for example when I try to extract residuals by typing "predict residuals, residuals" It says "option not allowed r(198). When I try to retrieve residuals from the menu by clicking on postestimation>predicitons, residuals etc. It says "this operation requires that you previously performed an estimation". Please help. I will highly appreciate any feedback from valuable members. Thanks.
Tags: None
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

10 Apr 2016, 19:39

If you take another look at the help for mmregress, you will see that the outlier option creates the Robust Standardized Residual (residual divided by a robust estimate of scale) as the variable S_stdres; the estimate of scale is in the results section, but is also given in the returned scalar e(scale). Notice that the Stata Journal article doesn't mention r-square. Usually it's the squared correlation of observed and predicted, but mmregress downweights or excludes outliers and high leverage points, thus the usual measure is not correct and you compute it at your own risk.

Code:

sysuse auto, clear replace mpg = 80 in 5 mmregress mpg turn weight, outlier graph gen residual = S_stdres*e(scale) gen predicted = mpg -residual

See the associated Stata Journal article.

Verardi, V., and C. Croux. 2009. Robust regression in Stata. Stata Journal 9, no. 3: 439-453,

Article Link

http://www.stata-journal.com

Last edited by Steve Samuels; 10 Apr 2016, 19:56.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Sanaullah Farooq

Join Date: Mar 2016

Posts: 107
#3

10 Apr 2016, 22:05

thanks a lot . thats very helpful indeed. i got it up and running. thanks again!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35642
#4

11 Apr 2016, 12:03

http://www.stata.com/support/faqs/st...red/index.html is sympathetic to the idea that you can knit your own analog[ue] to R-squared by just squaring the correlation between observed and predicted. But that measure plays no part, even implicitly, in the robust regression. It might be of some use descriptively as a summary of how far the robust regression is from the usual version. (It would be even more important to keep track of changes in coefficients and graphically to monitor how the regression fit falls relative to the data.)
1 like
Comment
Sanaullah Farooq

Join Date: Mar 2016

Posts: 107
#5

11 Apr 2016, 16:13

Hello , I got a reply from the real author of this model. He suggested this procedure below for post-estimation and R square. I thought it would be helpful for others.

let's say you estimate model

mmregress y x1 x2 x3

you can predict the residuals by doing:

predict yhat
gen res=y-yhat

for the R2 there are several options. Probably the easyest is to do 1-(s1/s2)^2 where s1 is the residual scale of the complete model and s2 is the scale of a model with just a constant.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#6

11 Apr 2016, 21:13

Thanks for sharing this, Sanullah. This version is akin to adjusted \(R^2\) in ordinary regression and would be my first choice. Would you please tell us which of the esteemed authors of mmregress wrote to you and the other options he suggested. Perhaps you can quote from the original email.

Last edited by Steve Samuels; 11 Apr 2016, 21:17.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Sanaullah Farooq

Join Date: Mar 2016

Posts: 107
#7

11 Apr 2016, 23:54

I received answer from Professor
Vincenzo Verardi. He did not suggest any other options. I just copied his email reply completely in the above discussion.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35642
#8

12 Apr 2016, 00:58

I think much depends on whether you want to compare robust regressions with each other or with plain regressions.
Comment
Jonas Selmeryd

Join Date: Jun 2016

Posts: 2
#9

05 Jun 2016, 05:10

I believe this question fits in this thread. I'm having problems getting the residuals from mmregress right. When calculated in two different ways they don't match:

mmregress mpg weight, outlier
predict p, xb
gen r=mpg-p
gen r2=S_stdres*e(scale)
pwcorr r2 r

| r2 r
-------------+------------------
r2 | 1.0000
r | 0.9983 1.0000

sum r2 r

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
r2 | 74 .957506 3.515488 -5.348241 15.82534
r | 74 .8222398 3.472092 -5.646473 15.42817

From my understanding the results ough to be identical, but they aren't. The result is the same when p is calculated manually from the coefficents. Where do I go wrong?

Regards

Jonas Selmeryd

Originally posted by Steve Samuels View Post

If you take another look at the help for mmregress, you will see that the outlier option creates the Robust Standardized Residual (residual divided by a robust estimate of scale) as the variable S_stdres; the estimate of scale is in the results section, but is also given in the returned scalar e(scale). Notice that the Stata Journal article doesn't mention r-square. Usually it's the squared correlation of observed and predicted, but mmregress downweights or excludes outliers and high leverage points, thus the usual measure is not correct and you compute it at your own risk.

Code:

sysuse auto, clear replace mpg = 80 in 5 mmregress mpg turn weight, outlier graph gen residual = S_stdres*e(scale) gen predicted = mpg -residual

See the associated Stata Journal article.

Verardi, V., and C. Croux. 2009. Robust regression in Stata. Stata Journal 9, no. 3: 439-453,
Comment

Steve Samuels

Join Date: Mar 2014
Posts: 1786

#10

05 Jun 2016, 11:16

Good catch. The difference occurs because mmregress does two robust regressions. The initial one is an S-regression to estimate the scale parameter. This S-regression one can be displayed by adding the initial option to mmregress) According to the Verardi-Croux article (p 443), the "S-estimator" is very robust (can tolerate up to 50% outliers), but has low efficiency (high standard errors) at a Gaussian distribution. Therefore, as a second step, the program does MM regression, but with the scale parameter fixed at that produced at the first step. The residual from the S-regression is the one produced by

Code:

gen r2=S_stdres*e(scale)

This isl illustrated in the following log. Here r3 is the observed- predicted residual from the S-estimator.

Code:

. set seed 438205

. sysuse auto, clear

. mmregress mpg weight, outlier
The total number of p-subsets to check is 20
------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |  -.0052036   .0003089   -16.85   0.000    -.0058194   -.0045879
       _cons |   36.18723   1.139205    31.77   0.000     33.91627    38.45819
------------------------------------------------------------------------------
Scale parameter=  1.890264

. predict p1, xb

. gen r1 = mpg-p1

. gen r2=S_stdr*e(scale)

. mmregress mpg weight, outlier initial  // S-estimator
The total number of p-subsets to check is 20
------------------------------------------------------------------------------
         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      weight |  -.0049362   .0004281   -11.53   0.000    -.0057896   -.0040827
       _cons |   35.24437   1.554421    22.67   0.000     32.14569    38.34305
------------------------------------------------------------------------------
Scale parameter=  1.890267

. predict p3, xb

. gen r3  = mpg-p3

. sum r1 r2 r3

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          r1 |         74    .8222416    3.472094  -5.646461   15.42819
          r2 |         74    .9576024     3.51551  -5.348072   15.82556
          r3 |         74    .9575548    3.515501   -5.34815   15.82546

. corr r1 r2 r3
             |       r1       r2       r3
-------------+---------------------------
          r1 |   1.0000
          r2 |   0.9983   1.0000
          r3 |   0.9983   1.0000   1.0000

Now a request: FAQ 12 asks that you post all code and results between CODE delimiters, described there. Please do so in the future. It isn't enough to use a monospace font, because, as you can see, column headings and dividers don't line up. That isn't a serious issue here, but can be with more extensive output.

Last edited by Steve Samuels; 05 Jun 2016, 11:32.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2

Comment

Jonas Selmeryd

Join Date: Jun 2016

Posts: 2
#11

06 Jun 2016, 12:26

Sorry about the missing CODE delimiters. This was my first post on statalist.org. I will do better next time!

If I interpret what you say correctly it is better to use observed-predicted than e(scale)*S_stdres since the latter is without the MM-estimation, yes?
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#12

07 Jun 2016, 21:27

Your do interpret correctly: It's better to use the observed-predicted, as that is the final fit.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment

Announcement