Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Durbin Watson d-statistic

    Hi, I am running a regression with multiple lagged variables of the dependent variable
    Code:
    reg CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort CSAD_L1 CSAD_L2 CSAD_L3 CSAD_L4 CSAD_L5 CSAD_L6 CSAD_L7 CSAD_L8
    , can one use the Durbin Watson d-statistic, (
    Code:
    estat dwatson
    ) to check whether serial correlation has been removed from my initial model
    Code:
    reg CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort
    ?

    I am concerned that the Durbin Watson d-statistic can only be used wen there is one lag of the dependent variable from what I have understood online, but I am not sure, could someone clarify this to me please.

    It is also confusing because when running the Durbin Watson d-statistic on Stata I get a closer value to 2 (about 2.005) when running the test on only 2 lags of the dependent variable, whereas when running the test on 8 lags the Durbin Watson d-statistic is around 1.95.
    Is this because the Durbin Watson d-statistic cannot be used to regressions that use more than one lag of the dependent variable on the RHS of the regression?

    Thank you.

  • #2
    The Durbin Watson statistic is valid only if your regressors are strictly exogenous, so it is not appropriate for cases where you have lagged dependent variable(s) on the right hand side of your equation.

    Comment


    • #3
      Hi Joro Kolev, thank you for your response! Could you perhaps tell me one test that will check for auto correlation as an alternative in this case, if I cannot use the Durbin Watson statistic?

      I was inclined towards the Durbin Watson Alternative test or the Breusch-Godfrey test, but I am quite unfamiliar with them and unsure how many lags to use when computing them. Do you reckon in my case I should carry these tests with 8 lags since I have 8 lags in my model?

      I will appreciate the help here. My aim is to find whether adding the lags has removed the auto correlation in my initial model (
      Code:
       
       reg CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort
      )

      Comment


      • #4
        findit actest
        On edit: more explicitly, in Stata type -findit actest-. It will find the community contributed command -actest-
        Or you can directly from within Stata type
        Code:
         ssc install actest
        Last edited by Eric de Souza; 28 Feb 2021, 10:06.

        Comment


        • #5
          Eric de Souza, I tried this code and its is really weird because when running the code after my regression with only 2 lags the p-value is 0.8043, but when running the test you specified after the regression with 8 lags I get a p-value of 0.1643. Would you have any idea why this is so?

          More specifically, after adding more than two lags the p-value start to decrease. What would thisbe a sign of? Should I be using only two lags instead of 8, it is really confusing because when looking at my dependent variable the partial correlogram for the dependent variable it suggest autocorrelation up to at least 8 lags.
          Last edited by Adrian Cernescu; 28 Feb 2021, 10:27.

          Comment


          • #6
            Originally posted by Eric de Souza View Post
            findit actest
            On edit: more explicitly, in Stata type -findit actest-. It will find the community contributed command -actest-
            Or you can directly from within Stata type
            Code:
             ssc install actest
            Eric de Souza, I tried this code and its is really weird because when running the code after my regression with only 2 lags the p-value is 0.8043, but when running the test you specified after the regression with 8 lags I get a p-value of 0.1643. Would you have any idea why this is so?

            More specifically, after adding more than two lags the p-value start to decrease. What would thisbe a sign of? Should I be using only two lags instead of 8, it is really confusing because when looking at my dependent variable the partial correlogram for the dependent variable it suggest autocorrelation up to at least 8 lags.

            Comment


            • #7
              I just looked at your first post. The way you introduce lags is weird, that is to say, not conventional Stata.
              Your original variables seem to be
              Code:
               CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort
              I assume you have a date variable.
              In that case you should -tsset- your data indicating clearly which is your date variable
              Then your regression command would be
              Code:
               CSAD L(1/8).CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort
              After this regression just issue the command
              actest, lag(4)
              and show us your results
              Without seeing your resulat and without seeing an excerpt of your data it is not possible to comment

              Comment


              • #8
                Originally posted by Eric de Souza View Post
                I just looked at your first post. The way you introduce lags is weird, that is to say, not conventional Stata.
                Your original variables seem to be
                Code:
                 CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort
                I assume you have a date variable.
                In that case you should -tsset- your data indicating clearly which is your date variable
                Then your regression command would be
                Code:
                 CSAD L(1/8).CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort
                After this regression just issue the command
                actest, lag(4)
                and show us your results
                Without seeing your resulat and without seeing an excerpt of your data it is not possible to comment
                Code:
                Cumby-Huizinga test for autocorrelation (Breusch-Godfrey)
                  H0: variable is MA process up to order q
                  HA: serial correlation present at specified lags >q
                -----------------------------------------------------------------------------
                  H0: q=0 (serially uncorrelated)        |  H0: q=specified lag-1
                  HA: s.c. present at range specified    |  HA: s.c. present at lag specified
                -----------------------------------------+-----------------------------------
                    lags   |      chi2      df     p-val | lag |      chi2      df     p-val
                -----------+-----------------------------+-----+-----------------------------
                   1 -  1  |      1.941      1    0.1636 |   1 |      1.941      1    0.1636
                   1 -  2  |      2.789      2    0.2479 |   2 |      0.635      1    0.4254
                   1 -  3  |      3.158      3    0.3679 |   3 |      0.329      1    0.5662
                   1 -  4  |      3.171      4    0.5297 |   4 |      0.043      1    0.8366
                -----------------------------------------------------------------------------
                  Test allows predetermined regressors/instruments
                  Test requires conditional homoskedasticity
                This would be the result after doing everything you mentioned. (8 lags) [Specifically, running actest, lag(4) after running the regression
                Code:
                reg CSAD L(1/8).CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort
                ]


                Below I also give the results for 2 lags only:

                And when using the same thing as you said but with 2 lags:
                Code:
                reg CSAD L(1/2).CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort
                and then running actest, lag(4) I get:


                Code:
                Cumby-Huizinga test for autocorrelation (Breusch-Godfrey)
                  H0: variable is MA process up to order q
                  HA: serial correlation present at specified lags >q
                -----------------------------------------------------------------------------
                  H0: q=0 (serially uncorrelated)        |  H0: q=specified lag-1
                  HA: s.c. present at range specified    |  HA: s.c. present at lag specified
                -----------------------------------------+-----------------------------------
                    lags   |      chi2      df     p-val | lag |      chi2      df     p-val
                -----------+-----------------------------+-----+-----------------------------
                   1 -  1  |      0.061      1    0.8043 |   1 |      0.061      1    0.8043
                   1 -  2  |     30.562      2    0.0000 |   2 |     28.814      1    0.0000
                   1 -  3  |     30.590      3    0.0000 |   3 |      0.956      1    0.3281
                   1 -  4  |     30.649      4    0.0000 |   4 |      2.749      1    0.0973

                Could you tell me which model is the correct one to follow (2 lags or 8 lags), and what does the test actually looks at (i.e: what do the values mean more exactly). I really do apologise for taking your time on such matter but I will deeply like to know and your input would be extremely valuable to me.

                Below I have also attached the
                Code:
                pac CSAD
                This is the reason I am looking for 8 lags, any suggestions?
                Click image for larger version

Name:	code.jpg
Views:	1
Size:	26.1 KB
ID:	1595521






                Last edited by Adrian Cernescu; 28 Feb 2021, 11:42.

                Comment


                • #9
                  The acf for CSAD conveys no information here. It is the acf of the residuals from the regression which count.
                  Your regression with two lags is telling you that you still have serial correlation in the residuals at the second lag
                  With eight lags the problem of serial correlation has disappeared.
                  But since you haven't produced the results of your regression I cannot tell whether you really need eight lags.
                  I am logging off now.

                  Comment


                  • #10
                    Originally posted by Eric de Souza View Post
                    The acf for CSAD conveys no information here. It is the acf of the residuals from the regression which count.
                    Your regression with two lags is telling you that you still have serial correlation in the residuals at the second lag
                    With eight lags the problem of serial correlation has disappeared.
                    But since you haven't produced the results of your regression I cannot tell whether you really need eight lags.
                    I am logging off now.


                    Code:
                          Source |       SS           df       MS      Number of obs   =     1,143
                    -------------+----------------------------------   F(11, 1131)     =    242.89
                           Model |  .250038619        11  .022730784   Prob > F        =    0.0000
                        Residual |  .105842212     1,131  .000093583   R-squared       =    0.7026
                    -------------+----------------------------------   Adj R-squared   =    0.6997
                           Total |   .35588083     1,142  .000311629   Root MSE        =    .00967
                    
                    
                    ------------------------------------------------------------------------------
                            CSAD |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                            CSAD |
                             L1. |   .3336416   .0243921    13.68   0.000     .2857828    .3815004
                             L2. |   .0740344   .0260313     2.84   0.005     .0229595    .1251094
                             L3. |   .0658262   .0261828     2.51   0.012     .0144538    .1171985
                             L4. |   .0369317   .0262639     1.41   0.160    -.0145996    .0884631
                             L5. |   .0749067   .0262794     2.85   0.004     .0233448    .1264686
                             L6. |   .0090667   .0261697     0.35   0.729      -.04228    .0604134
                             L7. |   .0437392     .02598     1.68   0.093    -.0072353    .0947137
                             L8. |   .0994451   .0243214     4.09   0.000     .0517249    .1471653
                                 |
                     RtnMrktPort |   .0903693   .0064486    14.01   0.000     .0777168    .1030218
                    AbsRtnMrkt~t |   .1457908   .0175011     8.33   0.000     .1114524    .1801291
                    SquRtnMrkt~t |   .2127399   .1026588     2.07   0.038     .0113168    .4141631
                           _cons |   .0012906   .0006429     2.01   0.045     .0000292    .0025519
                    ------------------------------------------------------------------------------
                    These are the results from running the regression with 8 lags:
                    Code:
                    reg CSAD L(1/8).CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort
                    Would this now help in determining whether I need 8 lags or not? How would I know whether I have included the correct number of lags?


                    Could you also tell me why is the acf of the residuals from the regression which count, not the acf of the CSAD, since CSAD is the dependent variable? (Below I have also attached the code and the acf of the residuals from running: reg CSAD L(1/8).CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort[ )


                    Code:
                    . predict residual, r
                    (9 missing values generated)
                    
                    . pca residual

                    Also, is there a rule behind doing lag(4), because I tried lag(20) and after lag 15 the p-value goes to zero for all the other lags.

                    Thank you very much for your patience and help.
                    Attached Files
                    Last edited by Adrian Cernescu; 28 Feb 2021, 12:24.

                    Comment


                    • #11
                      I have not used what Eric proposes.

                      This is how I test for autocorrelation:

                      Code:
                      . webuse qsales
                      
                      . reg csales l.csales, noheader
                      ------------------------------------------------------------------------------
                            csales |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                            csales |
                               L1. |   1.015392   .0367864    27.60   0.000     .9377791    1.093004
                                   |
                             _cons |   .0368356    .899291     0.04   0.968    -1.860502    1.934174
                      ------------------------------------------------------------------------------
                      
                      . predict e, resid
                      (1 missing value generated)
                      
                      . estat durbinalt
                      
                      Durbin's alternative test for autocorrelation
                      ---------------------------------------------------------------------------
                          lags(p)  |          chi2               df                 Prob > chi2
                      -------------+-------------------------------------------------------------
                             1     |          2.888               1                   0.0892
                      ---------------------------------------------------------------------------
                                              H0: no serial correlation
                      
                      . reg csales l.csales l.e, noheader
                      ------------------------------------------------------------------------------
                            csales |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                            csales |
                               L1. |   1.028136   .0389193    26.42   0.000     .9451811     1.11109
                                   |
                                 e |
                               L1. |  -.3993484   .2393386    -1.67   0.116    -.9094866    .1107898
                                   |
                             _cons |  -.2818436   .9584674    -0.29   0.773    -2.324768    1.761081
                      ------------------------------------------------------------------------------

                      Comment


                      • #12
                        The first thing is Durbin's h, and the second is what I actually do--just include the lagged residual and test whether the coefficient on the lagged residual is 0, the p-value here is about the same as the Durbin h, 0.116.

                        Comment


                        • #13
                          Originally posted by Joro Kolev View Post
                          The first thing is Durbin's h, and the second is what I actually do--just include the lagged residual and test whether the coefficient on the lagged residual is 0, the p-value here is about the same as the Durbin h, 0.116.
                          Could, you please tell me how would you determine the number of lags to use? In this case I can see that you use the lags of the residual, but when is it appropriate to stop using lagged values? i.e: why stop at only 1 lag.
                          I have left below comment which shows the partial auto correlation of the residual of my model without lags, will this convey any information about the appropriate number of lags to use?
                          Last edited by Adrian Cernescu; 01 Mar 2021, 01:37.

                          Comment


                          • #14
                            Originally posted by Eric de Souza View Post
                            I just looked at your first post. The way you introduce lags is weird, that is to say, not conventional Stata.
                            Your original variables seem to be
                            Code:
                             CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort
                            I assume you have a date variable.
                            In that case you should -tsset- your data indicating clearly which is your date variable
                            Then your regression command would be
                            Code:
                             CSAD L(1/8).CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort
                            After this regression just issue the command
                            actest, lag(4)
                            and show us your results
                            Without seeing your resulat and without seeing an excerpt of your data it is not possible to comment
                            Code:
                             
                              CSAD RtnMrktPort AbsRtnMrktPort SquRtnMrktPort
                            After running this regression and running
                            Code:
                            predict residual, r
                            followed by
                            Code:
                            pac residual
                            I get the following graph, would this provide information on the number of lags I should use?

                            Attached Files

                            Comment

                            Working...
                            X