Can I use a linear probability model (LPM) over logit and probit under this circumstance?

Arren Zhou

Join Date: Aug 2022

Posts: 12
#1

Can I use a linear probability model (LPM) over logit and probit under this circumstance?

13 Sep 2022, 08:04

Hi there! My question is simple:

if I only care about the correlation between a binary dependent variable but have no interest in the coefficient and prediction, can I use LPM to do the regression analysis?

I heard that LPM only goes wrong when doing the prediction since the distribution of error terms are is not normal, but does this affect the correlation between variables?

Thank you!
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3850
#2

13 Sep 2022, 08:25

Your question implies the answer. A correlation summarizes a linear association. If you are interested in a linear association, then the LPM is a reasonable choice. Whether a linear association is a meaningful approximation for answering your substantive research questions is a different question; we cannot tell because you have not told us anything about that.
1 like
Comment
Arren Zhou

Join Date: Aug 2022

Posts: 12
#3

13 Sep 2022, 08:29

Dear Mr. Klein,

How can I know if the association is linear or not? My research is about the correlation between willing to split the ticket in an election and the fractionalization of political parties. From my research I do not see any sign suggesting whether there is a linear correlation or not.

Sorry if you feel my question is dumb. I only got into the world of statistics a few weeks ago. I'm struggling with it everyday.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

13 Sep 2022, 08:58

If you are interested in the association of a binary outcome with a binary predictor, why not just use cross tabs? If it’s a continuous predictor, why not just use a t-test?

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#5

13 Sep 2022, 09:01

Generally, the association between a (quasi) continuous predictor and a binary outcome cannot be linear.* However, a linear model often provides a useful starting point and sometimes provides a reasonable approximation. There is much more to say about the LPM and non-linear models than I could possibly cover in a post to this forum. There are multiple blog entries on the topic on Paul Allison's blog (start with and ).

Why do you not want to use a logit or probit model?

* There are exceptions. A continuous variable might be orthogonal to a binary outcome, in which case the correlation of 0 would perfectly summarize the association.

Last edited by daniel klein; 13 Sep 2022, 09:07.
1 like
Comment
Arren Zhou

Join Date: Aug 2022

Posts: 12
#6

13 Sep 2022, 09:13

Dear Mr. Klein

I do not want to use a logit or probit model because when I ran my data through xtlogit regression with fixed effect , it said "Not concave' iterations" and gave no results. I have no idea how to fix that even if I googled it for, like, a whole day. But the regression works in linear probability model. Anyway, I'll continue trying to figure it out but I just wanna know if I fail to solve the "no concave" problem, can I just use a LPM (it seems I cannot).
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#7

13 Sep 2022, 09:22

Why did you use xtlogit? Do you have panel data? Do you have hierarchical data?
Comment
Arren Zhou

Join Date: Aug 2022

Posts: 12
#8

13 Sep 2022, 09:30

Dear Klein,

Yes, I have panel data, and when I tried to run xtlogit with fixded effect, it does not converge. I think the problem may be the scarcity of my observations (37 observations with 12 groups. some of the groups even have only one observation), but I cannot collect more data since it's historical and new data not available.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#9

13 Sep 2022, 09:39

Are you sure you want a fixed-effects estimator? It will only use the groups with more than one observation.

Given the sample size, I would probably stick with the simple linear model and, perhaps, clustered standard errors. Asymptotics might not well apply here either way; so interpret your results cautiously.
Comment
Arren Zhou

Join Date: Aug 2022

Posts: 12
#10

13 Sep 2022, 09:52

Dear Klein,

Thank you for your help! You're not obligated to do so but you still answer all my questions. I really appreciate that. It is people like you who make the world a better place ))
Comment

Announcement

Can I use a linear probability model (LPM) over logit and probit under this circumstance?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment