Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pooled OLS vs. Tobit vs. Poisson FE for panel data with many zeros and time-stable covariates

    Hi everyone,

    I am a master's student writing my thesis on variation in Danish municipalities' utilization of solar panel potential on agricultural land (2018–2025). I have read the 2019 thread on OLS vs. Tobit for solar energy installation data and found it very helpful, but my setup differs in some important ways and I would appreciate further input.

    My panel covers 73 municipalities over 8 years (584 observations). The dependent variable is a utilization rate (installed MW / potential based on 1% of agricultural area). Of 584 observations, 313 are zero, and 25 municipalities have zero utilization across all 8 years. The variable is left-censored at zero by construction — utilization cannot be negative.

    I have tested several estimators:

    - Pooled OLS with clustered standard errors: several significant results across covariates
    - Pooled Tobit (ll=0) with clustered standard errors: similar results, consistent direction and magnitude
    - Poisson FE (xtpoisson, fe): drops 25 municipalities with all-zero outcomes
    - PPML with absorbed fixed effects (ppmlhdfe): drops 194 observations, leaving only 47 of 73 municipalities. Results almost entirely insignificant.

    An additional complication is that many of my key independent variables are time-stable or near time-stable: local political party affiliation, DK2020 climate plan membership, socioeconomic index. My rho is around 0.70-0.76 (depending on specification), meaning most variation is between municipalities rather than within. Fixed effects — whether linear or Poisson — absorb most of what I am trying to explain. The F-test confirms significant municipal heterogeneity (F = 3.51, p = 0.0000), and the Hausman test favors FE over RE (chi2 = 44.42, p = 0.0008), but FE is substantively problematic given the time-stable covariates.

    My research question concerns what explains variation in the utilization rate across municipalities — so between-variation is central to the analysis. Losing a third of the sample in Poisson FE removes precisely the variation I need.

    I am not attached to any particular model. I simply want to use whatever is methodologically defensible. My questions are:

    1. Is pooled OLS with clustered standard errors defensible as the primary estimator here, given the high share of zeros and the time-stable covariates?
    2. Is pooled Tobit a meaningful robustness check, or is the left-censoring argument too weak given that zero is a genuine outcome rather than a censored value?
    3. Is there an alternative estimator I may have overlooked?

    Thank you in advance for any input.

    Best regards,
    Elias
    Master's student, Politics and Administration
    Aalborg University, Denmark

  • #2
    Dear Elias von Froelich,

    First things first, your data is not censored because, as you say, utilisation cannot be negative and therefore the zeros do not result from censoring. The data you have are called corner-solutions data.

    For this type of data, I would say that a linear model is not a sensible choice because it does not take into account the lower bound at zero. In these cases, my preferred approach is Poisson regression.

    The other relevant feature of your data is the fact that it varies little over time. In this case, as you found, using fixed effects mops out most of the variation and you end up explaining very little. So, I would consider models without fixed effects. Another thing you can consider is to use a binary model for whether the outcome is positive or not. You can also use a two-part model, but your sample is not very large, so you will not have a lot of data for the second part.

    Best wishes,

    Joao

    Comment


    • #3
      Dear Professor Silva,

      Thank you very much for your response — it was extremely helpful and has given me a much clearer sense of direction.

      Based on your advice, I will use pooled Poisson regression with clustered standard errors as my primary estimator, and include pooled OLS and linear fixed effects as robustness checks. I will also consider a binary model for whether utilization is positive or not as a supplementary analysis.

      I really appreciate you taking the time to respond.

      Best regards,
      Elias
      Master's student, Politics and Administration
      Aalborg University, Denmark

      Comment

      Working...
      X