This is not a question immediately related to Stata, but perhaps there are experts who nevertheless can help:
I am involved in a discussion about the use of a linear probability model (LPM) instead of a logistic regression model for cross-sectional data. The variables are a binary outcome variable (0/1, 6.5 and 8.8% = 1 in two groups), two groups (4,100 and 8,400 cases) and 10 predictors (3 binary, 2 2 3-categorical, 5 quasi-continuous). One model aims to test the interaction of 2 quasi-continuous variables and the interaction of 1 binary with 1 continuous variable. The research question: Do the effects of the predictors (model 1) or the interactions (model 2) differ between both groups (identical models per group)?
I am advocating a binary logistic regression model while my opponent is arguing for the use of a LPM because (a) we are studying rare events (6.5% and 8.8% in group A and B, resp.), and (b) the interpretation of the interaction effects is more straightforward. The opponent came up with a paper by Timoneda (2021) who argues that LPM outperforms logistic regression when estimating group fixed effects in panel data with a binary dependent variable. My counterargument: The paper discusses time-series cross-sectional data with many groups (likely to result in the "incidental parameter problem") but our data are not time-series data, with a relatively small number of predictors. Just because the outcome is binary (and comparatively rare) we should use logistic regression. The problem of interaction effects should be handled using AMEs (if a single coefficient is required), better using plots of predicted values.
However, I am far from an expert in time-series analysis (even this is an overstatement). Hence my question: Is there anyone in the Stata Forum to can help to shed light on the issue?
Reference: Timoneda, J. C. (2021). Estimating group fixed effects in panel data with a binary dependent variable: How the LPM outperforms logistic regression in rare events data. Social Science Research, 93, 102486. https://doi.org/10.1016/j.ssresearch.2020.102486
I am involved in a discussion about the use of a linear probability model (LPM) instead of a logistic regression model for cross-sectional data. The variables are a binary outcome variable (0/1, 6.5 and 8.8% = 1 in two groups), two groups (4,100 and 8,400 cases) and 10 predictors (3 binary, 2 2 3-categorical, 5 quasi-continuous). One model aims to test the interaction of 2 quasi-continuous variables and the interaction of 1 binary with 1 continuous variable. The research question: Do the effects of the predictors (model 1) or the interactions (model 2) differ between both groups (identical models per group)?
I am advocating a binary logistic regression model while my opponent is arguing for the use of a LPM because (a) we are studying rare events (6.5% and 8.8% in group A and B, resp.), and (b) the interpretation of the interaction effects is more straightforward. The opponent came up with a paper by Timoneda (2021) who argues that LPM outperforms logistic regression when estimating group fixed effects in panel data with a binary dependent variable. My counterargument: The paper discusses time-series cross-sectional data with many groups (likely to result in the "incidental parameter problem") but our data are not time-series data, with a relatively small number of predictors. Just because the outcome is binary (and comparatively rare) we should use logistic regression. The problem of interaction effects should be handled using AMEs (if a single coefficient is required), better using plots of predicted values.
However, I am far from an expert in time-series analysis (even this is an overstatement). Hence my question: Is there anyone in the Stata Forum to can help to shed light on the issue?
Reference: Timoneda, J. C. (2021). Estimating group fixed effects in panel data with a binary dependent variable: How the LPM outperforms logistic regression in rare events data. Social Science Research, 93, 102486. https://doi.org/10.1016/j.ssresearch.2020.102486
Comment