Durbin Watson d-statistic

Adrian Cernescu

Join Date: Oct 2020

Posts: 27
#1

Durbin Watson d-statistic

28 Feb 2021, 07:16

Hi, I am running a regression with multiple lagged variables of the dependent variable

Code:

reg CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort CSAD_L1 CSAD_L2 CSAD_L3 CSAD_L4 CSAD_L5 CSAD_L6 CSAD_L7 CSAD_L8

, can one use the Durbin Watson d-statistic, (

Code:

estat dwatson

) to check whether serial correlation has been removed from my initial model

Code:

reg CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort

?

I am concerned that the Durbin Watson d-statistic can only be used wen there is one lag of the dependent variable from what I have understood online, but I am not sure, could someone clarify this to me please.

It is also confusing because when running the Durbin Watson d-statistic on Stata I get a closer value to 2 (about 2.005) when running the test on only 2 lags of the dependent variable, whereas when running the test on 8 lags the Durbin Watson d-statistic is around 1.95.
Is this because the Durbin Watson d-statistic cannot be used to regressions that use more than one lag of the dependent variable on the RHS of the regression?

Thank you.
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

28 Feb 2021, 07:44

The Durbin Watson statistic is valid only if your regressors are strictly exogenous, so it is not appropriate for cases where you have lagged dependent variable(s) on the right hand side of your equation.
1 like
Comment
Adrian Cernescu

Join Date: Oct 2020

Posts: 27
#3

28 Feb 2021, 09:15

Hi Joro Kolev, thank you for your response! Could you perhaps tell me one test that will check for auto correlation as an alternative in this case, if I cannot use the Durbin Watson statistic?

I was inclined towards the Durbin Watson Alternative test or the Breusch-Godfrey test, but I am quite unfamiliar with them and unsure how many lags to use when computing them. Do you reckon in my case I should carry these tests with 8 lags since I have 8 lags in my model?

I will appreciate the help here. My aim is to find whether adding the lags has removed the auto correlation in my initial model (

Code:

reg CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort

)
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#4

28 Feb 2021, 10:02

findit actest
On edit: more explicitly, in Stata type -findit actest-. It will find the community contributed command -actest-
Or you can directly from within Stata type

Code:

ssc install actest

Last edited by Eric de Souza; 28 Feb 2021, 10:06.
Comment
Adrian Cernescu

Join Date: Oct 2020

Posts: 27
#5

28 Feb 2021, 10:21

Eric de Souza, I tried this code and its is really weird because when running the code after my regression with only 2 lags the p-value is 0.8043, but when running the test you specified after the regression with 8 lags I get a p-value of 0.1643. Would you have any idea why this is so?

More specifically, after adding more than two lags the p-value start to decrease. What would thisbe a sign of? Should I be using only two lags instead of 8, it is really confusing because when looking at my dependent variable the partial correlogram for the dependent variable it suggest autocorrelation up to at least 8 lags.

Last edited by Adrian Cernescu; 28 Feb 2021, 10:27.
Comment
Adrian Cernescu

Join Date: Oct 2020

Posts: 27
#6

28 Feb 2021, 10:44

Originally posted by Eric de Souza View Post

findit actest
On edit: more explicitly, in Stata type -findit actest-. It will find the community contributed command -actest-
Or you can directly from within Stata type

Code:

ssc install actest

Eric de Souza, I tried this code and its is really weird because when running the code after my regression with only 2 lags the p-value is 0.8043, but when running the test you specified after the regression with 8 lags I get a p-value of 0.1643. Would you have any idea why this is so?

More specifically, after adding more than two lags the p-value start to decrease. What would thisbe a sign of? Should I be using only two lags instead of 8, it is really confusing because when looking at my dependent variable the partial correlogram for the dependent variable it suggest autocorrelation up to at least 8 lags.
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#7

28 Feb 2021, 11:24

I just looked at your first post. The way you introduce lags is weird, that is to say, not conventional Stata.
Your original variables seem to be

Code:

CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort

I assume you have a date variable.
In that case you should -tsset- your data indicating clearly which is your date variable
Then your regression command would be

Code:

CSAD L(1/8).CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort

After this regression just issue the command
actest, lag(4)
and show us your results
Without seeing your resulat and without seeing an excerpt of your data it is not possible to comment
Comment

Adrian Cernescu

Join Date: Oct 2020
Posts: 27

28 Feb 2021, 11:37

Originally posted by Eric de Souza View Post

I just looked at your first post. The way you introduce lags is weird, that is to say, not conventional Stata.
Your original variables seem to be

Code:

 CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort

I assume you have a date variable.
In that case you should -tsset- your data indicating clearly which is your date variable
Then your regression command would be

Code:

 CSAD L(1/8).CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort

After this regression just issue the command
actest, lag(4)
and show us your results
Without seeing your resulat and without seeing an excerpt of your data it is not possible to comment

Code:

Cumby-Huizinga test for autocorrelation (Breusch-Godfrey)
  H0: variable is MA process up to order q
  HA: serial correlation present at specified lags >q
-----------------------------------------------------------------------------
  H0: q=0 (serially uncorrelated)        |  H0: q=specified lag-1
  HA: s.c. present at range specified    |  HA: s.c. present at lag specified
-----------------------------------------+-----------------------------------
    lags   |      chi2      df     p-val | lag |      chi2      df     p-val
-----------+-----------------------------+-----+-----------------------------
   1 -  1  |      1.941      1    0.1636 |   1 |      1.941      1    0.1636
   1 -  2  |      2.789      2    0.2479 |   2 |      0.635      1    0.4254
   1 -  3  |      3.158      3    0.3679 |   3 |      0.329      1    0.5662
   1 -  4  |      3.171      4    0.5297 |   4 |      0.043      1    0.8366
-----------------------------------------------------------------------------
  Test allows predetermined regressors/instruments
  Test requires conditional homoskedasticity

This would be the result after doing everything you mentioned. (8 lags) [Specifically, running actest, lag(4) after running the regression

Code:

reg CSAD L(1/8).CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort

]

Below I also give the results for 2 lags only:

And when using the same thing as you said but with 2 lags:

Code:

reg CSAD L(1/2).CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort

and then running actest, lag(4) I get:

Code:

Cumby-Huizinga test for autocorrelation (Breusch-Godfrey)
  H0: variable is MA process up to order q
  HA: serial correlation present at specified lags >q
-----------------------------------------------------------------------------
  H0: q=0 (serially uncorrelated)        |  H0: q=specified lag-1
  HA: s.c. present at range specified    |  HA: s.c. present at lag specified
-----------------------------------------+-----------------------------------
    lags   |      chi2      df     p-val | lag |      chi2      df     p-val
-----------+-----------------------------+-----+-----------------------------
   1 -  1  |      0.061      1    0.8043 |   1 |      0.061      1    0.8043
   1 -  2  |     30.562      2    0.0000 |   2 |     28.814      1    0.0000
   1 -  3  |     30.590      3    0.0000 |   3 |      0.956      1    0.3281
   1 -  4  |     30.649      4    0.0000 |   4 |      2.749      1    0.0973

Could you tell me which model is the correct one to follow (2 lags or 8 lags), and what does the test actually looks at (i.e: what do the values mean more exactly). I really do apologise for taking your time on such matter but I will deeply like to know and your input would be extremely valuable to me.

Below I have also attached the

Code:

pac CSAD

This is the reason I am looking for 8 lags, any suggestions?

Click image for larger version

Name: code.jpg
Views: 1
Size: 26.1 KB
ID: 1595521

Last edited by Adrian Cernescu; 28 Feb 2021, 11:42.

Comment

Eric de Souza

Join Date: Mar 2014

Posts: 587
#9

28 Feb 2021, 11:56

The acf for CSAD conveys no information here. It is the acf of the residuals from the regression which count.
Your regression with two lags is telling you that you still have serial correlation in the residuals at the second lag
With eight lags the problem of serial correlation has disappeared.
But since you haven't produced the results of your regression I cannot tell whether you really need eight lags.
I am logging off now.
Comment

Adrian Cernescu

Join Date: Oct 2020
Posts: 27

#10

28 Feb 2021, 12:07

Originally posted by Eric de Souza View Post

The acf for CSAD conveys no information here. It is the acf of the residuals from the regression which count.
Your regression with two lags is telling you that you still have serial correlation in the residuals at the second lag
With eight lags the problem of serial correlation has disappeared.
But since you haven't produced the results of your regression I cannot tell whether you really need eight lags.
I am logging off now.

Code:

      Source |       SS           df       MS      Number of obs   =     1,143
-------------+----------------------------------   F(11, 1131)     =    242.89
       Model |  .250038619        11  .022730784   Prob > F        =    0.0000
    Residual |  .105842212     1,131  .000093583   R-squared       =    0.7026
-------------+----------------------------------   Adj R-squared   =    0.6997
       Total |   .35588083     1,142  .000311629   Root MSE        =    .00967


------------------------------------------------------------------------------
        CSAD |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        CSAD |
         L1. |   .3336416   .0243921    13.68   0.000     .2857828    .3815004
         L2. |   .0740344   .0260313     2.84   0.005     .0229595    .1251094
         L3. |   .0658262   .0261828     2.51   0.012     .0144538    .1171985
         L4. |   .0369317   .0262639     1.41   0.160    -.0145996    .0884631
         L5. |   .0749067   .0262794     2.85   0.004     .0233448    .1264686
         L6. |   .0090667   .0261697     0.35   0.729      -.04228    .0604134
         L7. |   .0437392     .02598     1.68   0.093    -.0072353    .0947137
         L8. |   .0994451   .0243214     4.09   0.000     .0517249    .1471653
             |
 RtnMrktPort |   .0903693   .0064486    14.01   0.000     .0777168    .1030218
AbsRtnMrkt~t |   .1457908   .0175011     8.33   0.000     .1114524    .1801291
SquRtnMrkt~t |   .2127399   .1026588     2.07   0.038     .0113168    .4141631
       _cons |   .0012906   .0006429     2.01   0.045     .0000292    .0025519
------------------------------------------------------------------------------

These are the results from running the regression with 8 lags:

Code:

reg CSAD L(1/8).CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort

Would this now help in determining whether I need 8 lags or not? How would I know whether I have included the correct number of lags?

Could you also tell me why is the acf of the residuals from the regression which count, not the acf of the CSAD, since CSAD is the dependent variable? (Below I have also attached the code and the acf of the residuals from running: reg CSAD L(1/8).CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort[ )

Code:

. predict residual, r
(9 missing values generated)

. pca residual

Also, is there a rule behind doing lag(4), because I tried lag(20) and after lag 15 the p-value goes to zero for all the other lags.

Thank you very much for your patience and help.

Attached Files

Last edited by Adrian Cernescu; 28 Feb 2021, 12:24.

Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

#11

28 Feb 2021, 14:22

I have not used what Eric proposes.

This is how I test for autocorrelation:

Code:

. webuse qsales

. reg csales l.csales, noheader
------------------------------------------------------------------------------
      csales |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      csales |
         L1. |   1.015392   .0367864    27.60   0.000     .9377791    1.093004
             |
       _cons |   .0368356    .899291     0.04   0.968    -1.860502    1.934174
------------------------------------------------------------------------------

. predict e, resid
(1 missing value generated)

. estat durbinalt

Durbin's alternative test for autocorrelation
---------------------------------------------------------------------------
    lags(p)  |          chi2               df                 Prob > chi2
-------------+-------------------------------------------------------------
       1     |          2.888               1                   0.0892
---------------------------------------------------------------------------
                        H0: no serial correlation

. reg csales l.csales l.e, noheader
------------------------------------------------------------------------------
      csales |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      csales |
         L1. |   1.028136   .0389193    26.42   0.000     .9451811     1.11109
             |
           e |
         L1. |  -.3993484   .2393386    -1.67   0.116    -.9094866    .1107898
             |
       _cons |  -.2818436   .9584674    -0.29   0.773    -2.324768    1.761081
------------------------------------------------------------------------------

Comment

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#12

28 Feb 2021, 14:27

The first thing is Durbin's h, and the second is what I actually do--just include the lagged residual and test whether the coefficient on the lagged residual is 0, the p-value here is about the same as the Durbin h, 0.116.
Comment
Adrian Cernescu

Join Date: Oct 2020

Posts: 27
#13

01 Mar 2021, 01:31

Originally posted by Joro Kolev View Post

The first thing is Durbin's h, and the second is what I actually do--just include the lagged residual and test whether the coefficient on the lagged residual is 0, the p-value here is about the same as the Durbin h, 0.116.

Could, you please tell me how would you determine the number of lags to use? In this case I can see that you use the lags of the residual, but when is it appropriate to stop using lagged values? i.e: why stop at only 1 lag.
I have left below comment which shows the partial auto correlation of the residual of my model without lags, will this convey any information about the appropriate number of lags to use?

Last edited by Adrian Cernescu; 01 Mar 2021, 01:37.
Comment
Adrian Cernescu

Join Date: Oct 2020

Posts: 27
#14

01 Mar 2021, 01:35

Originally posted by Eric de Souza View Post

I just looked at your first post. The way you introduce lags is weird, that is to say, not conventional Stata.
Your original variables seem to be

Code:

CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort

I assume you have a date variable.
In that case you should -tsset- your data indicating clearly which is your date variable
Then your regression command would be

Code:

CSAD L(1/8).CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort

After this regression just issue the command
actest, lag(4)
and show us your results
Without seeing your resulat and without seeing an excerpt of your data it is not possible to comment

Code:

CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort

After running this regression and running

Code:

predict residual, r

followed by

Code:

pac residual

I get the following graph, would this provide information on the number of lags I should use?

Attached Files
Comment

Announcement