Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Probit / Logit Regression with an unbalanced Panel (CRE?)

    Hello Statalist users,

    I would like to estimate a model with a binary dependent variable using panel data. For linear panel models, I would usually compare specifications (for example with a Hausman test) to decide between fixed and random effects. What is the appropriate approach when estimating a nonlinear panel model such as logit or probit with unbalanced data?

    I have read about the correlated random effects (CRE) approach proposed by Jeffrey Wooldridge, using Mundlak terms, which seems to combine features of fixed and random effects models.

    Is this generally the preferred approach to control for unobserved heterogeneity in nonlinear panel settings? And how should standard errors be handled—should they be clustered at the panel level?


    How would this be implemented correctly in Stata? For example:
    tsset id year
    xtprobit y x1 x2 mean_x1 mean_x2 i.year, re
    or:
    xtprobit y x1 x2 mean_x1 mean_x2 i.year, re vce(cluster id) (maybe also use means of the time-variable?)

    With the second specification, I only obtain coefficients, but no standard errors or p-values.

    In addition, estimation of the full model takes a very long time. Even after several hours, a single regression has still not converged.



    I also came across xtprobitunbal by Albarrán et al. for unbalanced panels:
    xtprobitunbal y x1 x2, meansvar(x1 x2)

    However, I repeatedly receive warnings such as:
    Warning: subpanel 2 cannot be used in estimation

    Does anyone have guidance on the most appropriate estimator in this setting, especially for unbalanced panels with many observations?

    I used the Mundlak specification test and have to reject the null hypothesis. Therefore random effects should not be the right model rather CRE or FE, right?


    Many thanks in advance for any advice or suggestions. I would greatly appreciate your guidance.

    Best regards,
    Anela
    Last edited by Anela Kien; 21 Apr 2026, 15:13.

  • #2
    With unbalanced panel data, when the model contains time dummies (yr2-yrT), manual CRE estimation requires time-averaging for these dummies (mean_yr2-mean_yrT) as well.
    HTML Code:
    tab year, gen(yr)
    qui foreach var of varlist yr* {
        egen double mean_`var' = mean(`var') , by(id)
    }
    xtprobit y x1 x2 yr* mean_x1 mean_x2 mean_yr* , re vce(cluster id)
    Last edited by Manh Hoang Ba; 21 Apr 2026, 21:47. Reason: Edited: "mean_yr2*" --> "mean_yr*"
    Manh Hoang-Ba,
    Facebook,
    Eureka! Uni - YouTube,
    ManhHB94 (Manh Hoang Ba),
    Hoàng Bá Mạnh – Kinh tế lượng: Lý thuyết và ứng dụng

    Comment


    • #3
      if xtprobit still gives you problems, you may also want to consider
      Code:
      probit y x1 x2 yr* mean_x1 mean_x2 mean_yr* , vce(cluster id)
      It will be faster, with fewer distributional assumptions

      Comment

      Working...
      X