You are not logged in. You can browse but not post. Login or Register by clicking 'Login or Register' at the top-right of this page. For more information on Statalist, see the FAQ.
Here's how. Suppose I want to predict homicides with ice cream sales. I have the weekly homicide rate as my outcome, a weekly vector of sales in millions for each city (let's say) and unit and time indicators. I uncover a significant relationship, with a coefficient of 10. Does this mean that as sales increase by a million dollars, the average change in the homicide rate rises by 10 points? Is ice cream causing an increase in killings?
If your panel is with a large cross section, then the time series properties are not important
I don't understand. Is this a general rule? I mean take the often cited example I gave above. Let's say we have 52 weeks of data and 100 cities. A regression of the kind I describe would be unacceptable, right, even if I had 500 cities?
Here's how. Suppose I want to predict homicides with ice cream sales. I have the weekly homicide rate as my outcome, a weekly vector of sales in millions for each city (let's say) and unit and time indicators. I uncover a significant relationship, with a coefficient of 10. Does this mean that as sales increase by a million dollars, the average change in the homicide rate rises by 10 points? Is ice cream causing an increase in killings?
Yes, this is a good example to explain the meaning of spurious regressions, but I want to know how this is detected statistically in stata.
For example, in other software, the value of Durbin Watson is compared to the value of R-squared. If the value of Durbin Watson is less than the value R-squared, then there will be a possibility of a spurious regression.
I want to know how this is detected statistically in stata
Not possible. As the analyst, you must adjust for all predictors you can that you think are theoretically relevant that would also affect your outcome. But, you can't adjust for everything, so after you've adjusted for everything that you think might be relevant, ultimately, there ain't no way to test for this statistically.
In fact, there's always the possibility of omitted predictors. More precisely, there's always some omitted predictors! The issue is, does this omission make a practical difference.
I don't understand. Is this a general rule? I mean take the often cited example I gave above. Let's say we have 52 weeks of data and 100 cities. A regression of the kind I describe would be unacceptable, right, even if I had 500 cities?
The example you give is of spurious correlation. That is, correlation that is driven by some other unobserved/unmeasured factor in your model. Spurious correlation can happen always and everywhere, and generally disappears if you include the omitted factor in your model.
"Spurious regression" is a term coined by Granger, C. W., & Newbold, P. (1974). Spurious regressions in econometrics. Journal of econometrics, 2(2), 111-120
and is a term which has a strict technical meaning in econometrics.
The meaning of "spurious regression" is the spurious correlation that you get when you regress one nonstationary variable on another non-stationary variable.
You can also check out Kolev, G. I. (2011). The" spurious regression problem" in the classical regression model framework. Economics Bulletin, 31(1), 925-937
for a short reading on the matter, and for pointing out that "spurious regression" is intimately related to the estimation method. E.g., it occurs if you use OLS, and it does not occur (in certain conditions) if you use GLS.
Not possible. As the analyst, you must adjust for all predictors you can that you think are theoretically relevant that would also affect your outcome. But, you can't adjust for everything, so after you've adjusted for everything that you think might be relevant, ultimately, there ain't no way to test for this statistically.
In fact, there's always the possibility of omitted predictors. More precisely, there's always some omitted predictors! The issue is, does this omission make a practical difference.
I'll admit, I'm not familiar with the literature on spurious regressions. I just want to point out that you can use a post estimation command to estimate a Durbin Watson statistic. After running a regression, use the following command:
Code:
estat dwatson
Then I think you can just look at the output and check to see if the Durbin Watson statistic is less than the model R-squared.
Comment