Hello Statalist users,
I would like to estimate a model with a binary dependent variable using panel data. For linear panel models, I would usually compare specifications (for example with a Hausman test) to decide between fixed and random effects. What is the appropriate approach when estimating a nonlinear panel model such as logit or probit with unbalanced data?
I have read about the correlated random effects (CRE) approach proposed by Jeffrey Wooldridge, using Mundlak terms, which seems to combine features of fixed and random effects models.
Is this generally the preferred approach to control for unobserved heterogeneity in nonlinear panel settings? And how should standard errors be handled—should they be clustered at the panel level?
How would this be implemented correctly in Stata? For example:
tsset id year
xtprobit y x1 x2 mean_x1 mean_x2 i.year, re
or:
xtprobit y x1 x2 mean_x1 mean_x2 i.year, re vce(cluster id) (maybe also use means of the time-variable?)
With the second specification, I only obtain coefficients, but no standard errors or p-values.
In addition, estimation of the full model takes a very long time. Even after several hours, a single regression has still not converged.
I also came across xtprobitunbal by Albarrán et al. for unbalanced panels:
xtprobitunbal y x1 x2, meansvar(x1 x2)
However, I repeatedly receive warnings such as:
Warning: subpanel 2 cannot be used in estimation
Does anyone have guidance on the most appropriate estimator in this setting, especially for unbalanced panels with many observations?
I used the Mundlak specification test and have to reject the null hypothesis. Therefore random effects should not be the right model rather CRE or FE, right?
Many thanks in advance for any advice or suggestions. I would greatly appreciate your guidance.
Best regards,
Anela
I would like to estimate a model with a binary dependent variable using panel data. For linear panel models, I would usually compare specifications (for example with a Hausman test) to decide between fixed and random effects. What is the appropriate approach when estimating a nonlinear panel model such as logit or probit with unbalanced data?
I have read about the correlated random effects (CRE) approach proposed by Jeffrey Wooldridge, using Mundlak terms, which seems to combine features of fixed and random effects models.
Is this generally the preferred approach to control for unobserved heterogeneity in nonlinear panel settings? And how should standard errors be handled—should they be clustered at the panel level?
How would this be implemented correctly in Stata? For example:
tsset id year
xtprobit y x1 x2 mean_x1 mean_x2 i.year, re
or:
xtprobit y x1 x2 mean_x1 mean_x2 i.year, re vce(cluster id) (maybe also use means of the time-variable?)
With the second specification, I only obtain coefficients, but no standard errors or p-values.
In addition, estimation of the full model takes a very long time. Even after several hours, a single regression has still not converged.
I also came across xtprobitunbal by Albarrán et al. for unbalanced panels:
xtprobitunbal y x1 x2, meansvar(x1 x2)
However, I repeatedly receive warnings such as:
Warning: subpanel 2 cannot be used in estimation
Does anyone have guidance on the most appropriate estimator in this setting, especially for unbalanced panels with many observations?
I used the Mundlak specification test and have to reject the null hypothesis. Therefore random effects should not be the right model rather CRE or FE, right?
Many thanks in advance for any advice or suggestions. I would greatly appreciate your guidance.
Best regards,
Anela

Comment