unit-root test: unbalanced panel data problem with xtunitroot var, fuller lags(#)

Moritz Weis

Join Date: Jun 2020

Posts: 14
#1

unit-root test: unbalanced panel data problem with xtunitroot var, fuller lags(#)

03 Jun 2020, 12:48

Hi,
I want to check for a unit-root in a strongly unbalanced panel data set. I am using the Stata 16 version for Mac. My dataset includes around 650 variables and around 420,000 observations.

When entering the command:

. xtunitroot fisher grossmargin, dfuller lags(10)

I get the message:

(119,908 missing values generated)
could not compute test for panel 2
could not compute test for panel 3
could not compute test for panel 4
could not compute test for panel 6

This list would continue for a few hundred rows. I cannot remove the panels with less than 10 lags as I do not want to experience the problem of "survivor's bias". I also do not want to "start" with fewer lags for the dfuller. Do you know a better way to test for a unit-root that does not require a balanced dataset?

Thank you!
Tags: panel data, unit-root
Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#2

03 Jun 2020, 13:42

Apologies, I know next to nothing about unit root tests. However, it might be helpful to know why 119,908 missing values are generated when the xtunitroot command is run. Are there 420,000 total observations, or 420,000 observations per panel?

Regarding "survival bias"; you might do some analysis to determine how much of an issue this will be by dividing the data into two sets; one with the data you might exclude and one without, then examining the two data sets for meaningful differences, e.g. in terms of summary statistics, bi-variate analysis, or model coefficient estimates. The extent to which such differences exist may approximate the extent to which you should worry about survival bias.
Comment
Scott Merryman

Join Date: Mar 2014

Posts: 895
#3

03 Jun 2020, 14:15

Since the Fisher style test is accumulating p-values from individual unit root tests, you could do this by hand and pass a unique lag length to each panel. Or even start with 10 lags and increment down until there is no error (return code is 0) for each individual unit-root test.

, it might be helpful to know why 119,908 missing values are generated when the xtunitroot command is run.

This I believe is because it creates a copy of the variable that is used if the -demean- option is selected.
1 like
Comment
Moritz Weis

Join Date: Jun 2020

Posts: 14
#4

03 Jun 2020, 14:49

Wow thank you for the prompt replies!

420,000 total observations, or 420,000 observations per panel?

- total observations
Thank you for the tip, I will come back with the results on dividing the datasets tomorrow.

do this by hand and pass a unique lag length to each panel.

- That sounds good. But how would one implement this?

start with 10 lags and increment down until there is no error

- The nature of the panels leads to many of those to only exist for one period and even with lag 1, I get similar results.

Concerning the missing values, I believe Stata creates those after using the xtset command. I think the application just makes up for the missing periods.

Edit: I did not use the demean function.
Comment

Scott Merryman

Join Date: Mar 2014
Posts: 895

04 Jun 2020, 04:52

1. Panels with 1 observation - that does make it difficult to run a difference regression on 10 lagged difference.

Below is an example implementation.

In the example the difference between the P statistic reported by -xtunitroot- (5.0236) and the example implementation (10.7616) is difference between dropping panel 10 and using only 2 lags in panel 10.

Code:

. display "Variable lag length Fisher unit root test"
Variable lag length Fisher unit root test

. display "chi2(`=2*`ng'')  =" %9.4f `P' " pval = " %9.4f chi2tail(2*`ng', `P')
chi2(20)  =  10.7616 pval =    0.9522

. mat list lags

lags[10,1]
     c1
 r1   8
 r2   8
 r3   8
 r4   8
 r5   8
 r6   8
 r7   8
 r8   8
 r9   8
r10   2

. xtunitroot fisher mvalue, lag(8) dfuller
could not compute test for panel 10

Fisher-type unit-root test for mvalue
Based on augmented Dickey-Fuller tests
--------------------------------------
Ho: All panels contain unit roots           Number of panels       =     10
Ha: At least one panel is stationary        Avg. number of periods =  19.60

AR parameter: Panel-specific                Asymptotics: T -> Infinity
Panel means:  Included
Time trend:   Not included
Drift term:   Not included                  ADF regressions: 8 lags
------------------------------------------------------------------------------
                                  Statistic      p-value
------------------------------------------------------------------------------
 Inverse chi-squared(18)   P         5.0236       0.9988
 Inverse normal            Z         4.8035       1.0000
 Inverse logit t(49)       L*        5.1916       1.0000
 Modified inv. chi-squared Pm       -2.1627       0.9847
------------------------------------------------------------------------------
 P statistic requires number of panels to be finite.
 Other statistics are suitable for finite or infinite number of panels.
------------------------------------------------------------------------------

. qui dfuller mvalue if company == 10, lag(2)

. disp -2*ln(r(p))
5.7379332

Code:

webuse grunfeld,clear
set seed 123456
//create unbalanced panel
drop  if  com == 10 & runiform() < .15

//set max lags
local maxlag = 10

local p = 0
local ng  = 0 

levelsof com, local(levels)
mat lags = J(r(r),1,.)
local row = 1
foreach l of local levels {
    local rc  = 1

    local lag = `maxlag'

    while `rc' != 0 {
        capture dfuller mvalue if com == `l', lags(`lag')
        if _rc != 0  {
            local lag = `lag'- 1
            local rc = _rc
            if `lag' == 0 {
                    local rc = 0
            }
        }
        else {
            local zt = .
            qui dfuller mvalue if com == `l', lags(`lag')
            local zt = r(Zt)
            while `zt' == . {
                 qui dfuller mvalue if com == `l', lags(`lag')
                local lag = `lag'- 1
                local l2 = r(lags)
                local zt = r(Zt)
                 if `l2' == 0 {
                        local zt = 0
                 }
            }
            local p = `p' + log(r(p))
            mat lags[`row' ,1] = r(lags)
            local ++row
            local ng = `ng' + 1
            local rc = _rc
            local lag = `maxlag'
        }
    }
}
local  P = -2*`p'

display "Variable lag length Fisher unit root test"
display "chi2(`=2*`ng'')  =" %9.4f `P' " pval = " %9.4f chi2tail(2*`ng', `P')

//mat list lags

Comment

Moritz Weis

Join Date: Jun 2020

Posts: 14
#6

07 Jun 2020, 06:57

1. Panels with 1 observation - that does make it difficult to run a difference regression on 10 lagged difference.

Thank you for the example. However, upon trial it appears as if I would have to determine the max lag length of each process and generally type down conditions for each panel individually, which is impossible considering the size.
Comment

Announcement

unit-root test: unbalanced panel data problem with xtunitroot var, fuller lags(#)

Comment

Comment

Comment

Comment

Comment