OLS with dependent variables

Flavio Ciampi

Join Date: Sep 2019

Posts: 4
#1

OLS with dependent variables

25 Sep 2019, 04:36

Hi everybody,

I have to run a regression with a dummy variable as response variable and a vector of regressors. My question is the following: is totally uncorrect to apply the OLS method in the presence of binary dependent variables? From what I've studied in my econometrics courses, the answer would be yes. However, I was told by my new professor that OLS can be used even in these cases and produces good estimates. I am a bit confused.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35711
#2

25 Sep 2019, 05:02

By OLS (strictly an estimation method) I think you mean a linear regression model. Linear regression with a binary response is often called a linear probability model. There are enough people to say that it can often work well in practice that calling it incorrect is itself loaded. It's clear that many other people would prefer logit or some other link to ensure certain kinds of behaviour.

Last edited by Nick Cox; 25 Sep 2019, 05:54.
Comment
Flavio Ciampi

Join Date: Sep 2019

Posts: 4
#3

25 Sep 2019, 06:16

That's exactly what I meant, thanks for replying. Is there any paper shedding light on this problem/debate?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35711
#4

25 Sep 2019, 06:47

I will leave to others where this is best discussed. People who think the linear probability model beyond scorn won’t necessarily trouble to demolish it. From consideration of binary response and any predictor it seems immediate that with the linear probability model

1. Nothing constrains predictions to [0, 1]. How much that bites for data and/or within the space of interest is empirical.

2. It can’t even be a convenient fiction that errors are normal or Gaussian.

Nevertheless that model may often seem to track average response as it varies and to be often simpler than alternatives.
Comment
Flavio Ciampi

Join Date: Sep 2019

Posts: 4
#5

25 Sep 2019, 06:55

Thanks again!
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#6

25 Sep 2019, 08:25

Paul Allison has a good discussion at

https://statisticalhorizons.com/in-d...f-logit-part-2

I agree with his main conclusion: logit + margins gives you the best of both worlds and is better than the lpm:

The upshot is that combining logistic regression with the margins command gives you the best of both worlds. You get a model that is likely to be a more accurate description of the world than a linear regression model. It will always produce predicted probabilities within the allowable range, and its parameters will tend to be more stable over varying conditions. On the other hand, you can also get numerical estimates that are interpretable as probabilities. And you can see how those probability-based estimates vary under different conditions.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
1 like
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1133

25 Sep 2019, 09:05

Notice that using -regress- with vce(robust) produces the same estimate of the risk difference that you get with -cs- or -csi-.

Code:

csi 7 12 9 2 // Example from help for -cs- command

* Duplicate example using -cs- rather than -csi-
clear *
input byte(case exposed n)
1 1 7
1 0 12
0 1 9
0 0 2
end

cs case exposed [fw=n]

* Now use -regress- with robust SEs
regress case i.exposed [fw=n], vce(robust)
* Use margins to show the risks in the two exposure groups
margins exposed
* Use margins with r. contrast operator to show the risk difference
margins r.exposed, contrast(nowald effects)

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Announcement

OLS with dependent variables

Comment

Comment

Comment

Comment

Comment

Comment