Stationarity test for unbalance panel with gaps and the dependent variable is dummy

Ariyo Irhamna

Join Date: Jul 2017

Posts: 9
#1

Stationarity test for unbalance panel with gaps and the dependent variable is dummy

18 Jun 2025, 12:32

I have a question on how to do panel ARDL with logit, especially on stationarity test.

My dataset has 4163 unique hhids during 2007, 2008, 2010, 2011, 2013, 2016, and 2017. The dependent variable is a dummy, and the independent variable is continuous. I have dropped unique HHIDs that appear in less than 3 years. Below is the output from Stata that shows an error.

xtunitroot fisher dummy_y, dfuller lags(1)
performing unit-root test on first panel using the syntax
dfuller dummy_y, lags(1)
returned error code 2000
r(2000);

xtunitroot fisher continous_X1, dfuller lags(1)
performing unit-root test on first panel using the syntax
dfuller continous_X1, lags(1)
returned error code 2000
r(2000);

xtunitroot ips continous_X1, lags(1)
Im–Pesaran–Shin test cannot have gaps in data
r(498);

. xtsum continous_X1 dummy_y lag_dummy_y lag_continous_X1

Variable | Mean Std. dev. Min Max | Observations
-----------------+--------------------------------------------+----------------
continous_X1 overall | .1560596 .4047683 -.8895833 1.108167 | N = 21494
between | .1288414 -.2840556 .6132778 | n = 3766
within | .3854087 -.6924237 .9570041 | T-bar = 5.70738
| |
dummy_y overall | .0715549 .2577554 0 1 | N = 21494
between | .1497109 0 1 | n = 3766
within | .2135162 -.7617785 .9286977 | T-bar = 5.70738
| |
lag_dummy_y overall | .0724278 .2592022 0 1 | N = 17728
between | .1596944 0 1 | n = 3766
within | .2098082 -.7275722 .9057611 | T-bar = 4.70738
| |
lag_continous_X1 overall | .0920504 .3714877 -.8895833 .9658333 | N = 17728
between | .1331359 -.4109583 .5668333 | n = 3766
within | .3507517 -.6689705 .8854671 | T-bar = 4.70738

I hope these clear. Thanks.

Thanks in Advance,

Ariyo DP Irhamna
(StataNow/MP 18.5)
Tags: None
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2156
#2

18 Jun 2025, 18:58

Why are you testing that? It’s not needed for anything you want to do. You have a large N and very small T. Include a full set of time dummies and think about using correlated random effects.
Comment
Ariyo Irhamna

Join Date: Jul 2017

Posts: 9
#3

19 Jun 2025, 06:31

Hi Jeff,

Thank you for your response.

I implemented a panel ARDL logit model because the lags of both my dependent variable and my key independent variable influence the current period's dependent variable. However, I am having difficulty performing a stationary test, which I believe is related to the nature of my dataset, as it is an unbalanced panel that includes lags. I'm curious as to why correlated random effects might be considered more reliable than the panel ARDL logit approach. My dependent variable is a dummy variable indicating whether a farmer adopts private irrigation (1 for adoption, 0 for non-adoption), while my independent variable is the standardized precipitation evapotranspiration index (SPEI).

By the way, I follow your X account. Thanks for always being humble.

Thanks in Advance,

Ariyo DP Irhamna
(StataNow/MP 18.5)
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2156
#4

20 Jun 2025, 12:58

Estimating a dynamic model (ARDL) is fine with a binary response. But you don't need to worry about stationarity unless T is large and N is small, In your setting, anything is allowed. One potential drawback is that you aren't allowing an unobserved effect, making it hard to distinguish between heterogeneity and dynamics. I can recommend the approach in my 2005 Journal of Applied Econometrics paper that models the relationship between the unobserved heterogeneity and the initial condition.
Comment
Ariyo Irhamna

Join Date: Jul 2017

Posts: 9
#5

20 Jun 2025, 15:10

Thank you for this important clarification, Jeff. You're absolutely correct about the identification problem—I need to distinguish between true state dependence and unobserved heterogeneity. Would you recommend applying your 2005 JAE approach with Mundlak's correlated random effects? This would involve modeling unobserved heterogeneity as correlated with the time averages of the time-varying regressors.
Regarding the issue of initial conditions, my panel starts in 2007, but farmers may have been irrigating for years prior to my observation. When I use the 2007 irrigation status as the lagged dependent variable for 2008, it could potentially correlate with unobserved heterogeneity since early adopters likely possess different unobserved characteristics. In your 2005 approach, should I model the 2007 irrigation decision based on pre-2007 observable factors (such as average historical weather), or would it be more appropriate to make auxiliary assumptions about the irrigation adoption process?

Lastly, to clarify, are you suggesting that I implement a correlated random effects panel ARDL logit in which I:
Include time averages of weather variables as additional regressors,

Model the relationship between unobserved heterogeneity and initial irrigation status,

Use xtlogit with the CRE specification, or would you recommend a different Stata implementation?

I appreciate your input, Jeff Wooldridge.

Thanks in Advance,

Ariyo DP Irhamna
(StataNow/MP 18.5)
Comment

Announcement

Stationarity test for unbalance panel with gaps and the dependent variable is dummy

Comment

Comment

Comment

Comment