Durbin Watson

Jerome Anderson

Join Date: Nov 2017

Posts: 23
#1

Durbin Watson

12 May 2025, 17:12

I ran a simple multiple regression. 1 DV, 2 IVs. Wanted to check independence of the IVs. Ran Durbin Watson estat dwatson. Got this error message: time variable not set, use tsset varname ...

I do not understand this. The Stata manual says to simply type estat dwatson.

Any ideas?
Tags: Durbin Watson, dwatson
Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#2

12 May 2025, 18:46

The Durbin-Watson test is a test of serial correlation of the regression residuals, which requires knowing the chronological order of the observations. The only way Stata will know that is if you first -tsset- the data. See -help tsset- for the use of that command.

However, I think most people would not describe testing for serial correlation of the residuals as a test of the "independence of the IVs." So if ou mean something other than looking for serial correlation of residuals, then you need to be clear about what exactly you want to check for and then find an appropriate approach to do that, which would not be a Durbin-Watson test.
2 likes
Comment
Jerome Anderson

Join Date: Nov 2017

Posts: 23
#3

12 May 2025, 19:17

Thanks, Clyde. I got my information about using Durbin-Watson from https://www.statology.org/multiple-l...n-assumptions/ . Apparently the rather simplistic instruction at Item 3, Independence of IVs, is either not correct or too abbreviated. Further checking suggests vif may be used to test the independence of the IVs, but that is the same test used to determine multicollinearity. If vif is used for both, then it would seem multicollinearity and independence of IVs are one and the same, at least for practical purposes.

Thanks again for your prompt and helpful response.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35782
#4

13 May 2025, 00:29

The source you give does mention serial correlation twice and autocorrelation once under that heading. But it's not a very clear explanation at all.

Twice, the assumption (following others, I prefer to say "ideal condition") that residuals are normally distributed is referred to as an assumption of multivariate normality. But the residuals (or more pedantically the errors) are a single distribution; there is nothing multivariate there.

I would start with looking at the correlations between your predictor variables and a scatter plot matrix. I wish that the terminology independent variables would just fade away.

It's really hard to be so concise without introducing scope for misunderstanding. For example, adding the square of a predictor to model quadratic relationship is usually adding a variable -- that square -- that is highly correlated with the original predictor, which rather goes against other advice. I think the apparent contradiction can be resolved, but in a much longer discussion.
3 likes
Comment
Jerome Anderson

Join Date: Nov 2017

Posts: 23
#5

13 May 2025, 09:39

Thank you, Nick, for this clarification and confirmation that the source I used was somewhat, to say the least, confusing.

I don't quite understand what you mean by "the correlations between your predictor variables and a scatter plot matrix." Scatter plot matrix of what?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35782
#6

13 May 2025, 11:39

A scatterplot matrix of your data, including the response and the predictors. Suppose your response (DV in your terminology) is called y and your predictors (IVs) are called x1 x2. Then

Code:

graph matrix x1 x2 y, half

gives you a scatterplot matrix, with the detail that the last-named variable y appears on the vertical axis of each graph on the last row, as is conventional.

In your case, you have just two rows. Naturally the option half is not compulsory but it is usually a good idea.
1 like
Comment
Jerome Anderson

Join Date: Nov 2017

Posts: 23
#7

13 May 2025, 12:26

Thanks.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment