Probit / Logit Regression with an unbalanced Panel (CRE?)

Anela Kien

Join Date: Oct 2023

Posts: 24
#1

Probit / Logit Regression with an unbalanced Panel (CRE?)

21 Apr 2026, 14:51

Hello Statalist users,

I would like to estimate a model with a binary dependent variable using panel data. For linear panel models, I would usually compare specifications (for example with a Hausman test) to decide between fixed and random effects. What is the appropriate approach when estimating a nonlinear panel model such as logit or probit with unbalanced data?

I have read about the correlated random effects (CRE) approach proposed by Jeffrey Wooldridge, using Mundlak terms, which seems to combine features of fixed and random effects models.

Is this generally the preferred approach to control for unobserved heterogeneity in nonlinear panel settings? And how should standard errors be handled—should they be clustered at the panel level?

How would this be implemented correctly in Stata? For example:
tsset id year
xtprobit y x1 x2 mean_x1 mean_x2 i.year, re
or:
xtprobit y x1 x2 mean_x1 mean_x2 i.year, re vce(cluster id) (maybe also use means of the time-variable?)

With the second specification, I only obtain coefficients, but no standard errors or p-values.

In addition, estimation of the full model takes a very long time. Even after several hours, a single regression has still not converged.

I also came across xtprobitunbal by Albarrán et al. for unbalanced panels:
xtprobitunbal y x1 x2, meansvar(x1 x2)

However, I repeatedly receive warnings such as:
Warning: subpanel 2 cannot be used in estimation

Does anyone have guidance on the most appropriate estimator in this setting, especially for unbalanced panels with many observations?

I used the Mundlak specification test and have to reject the null hypothesis. Therefore random effects should not be the right model rather CRE or FE, right?

Many thanks in advance for any advice or suggestions. I would greatly appreciate your guidance.

Best regards,
Anela

Last edited by Anela Kien; 21 Apr 2026, 15:13.
Tags: fixed effects, logit, panel, panel data, regression
Manh Hoang Ba

Join Date: Aug 2023

Posts: 87
#2

21 Apr 2026, 21:46

With unbalanced panel data, when the model contains time dummies (yr2-yrT), manual CRE estimation requires time-averaging for these dummies (mean_yr2-mean_yrT) as well.

HTML Code:

tab year, gen(yr) qui foreach var of varlist yr* { egen double mean_`var' = mean(`var') , by(id) } xtprobit y x1 x2 yr* mean_x1 mean_x2 mean_yr* , re vce(cluster id)

Last edited by Manh Hoang Ba; 21 Apr 2026, 21:47. Reason: Edited: "mean_yr2*" --> "mean_yr*"

Manh Hoang-Ba,
Facebook,
Eureka! Uni - YouTube,
ManhHB94 (Manh Hoang Ba),
Hoàng Bá Mạnh – Kinh tế lượng: Lý thuyết và ứng dụng
1 like
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2534
#3

22 Apr 2026, 08:23

if xtprobit still gives you problems, you may also want to consider

Code:

probit y x1 x2 yr* mean_x1 mean_x2 mean_yr* , vce(cluster id)

It will be faster, with fewer distributional assumptions
3 likes
Comment

Announcement

Probit / Logit Regression with an unbalanced Panel (CRE?)

Comment

Comment