Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • unit-root test: unbalanced panel data problem with xtunitroot var, fuller lags(#)

    Hi,
    I want to check for a unit-root in a strongly unbalanced panel data set. I am using the Stata 16 version for Mac. My dataset includes around 650 variables and around 420,000 observations.

    When entering the command:

    . xtunitroot fisher grossmargin, dfuller lags(10)

    I get the message:

    (119,908 missing values generated)
    could not compute test for panel 2
    could not compute test for panel 3
    could not compute test for panel 4
    could not compute test for panel 6

    This list would continue for a few hundred rows. I cannot remove the panels with less than 10 lags as I do not want to experience the problem of "survivor's bias". I also do not want to "start" with fewer lags for the dfuller. Do you know a better way to test for a unit-root that does not require a balanced dataset?

    Thank you!

  • #2
    Apologies, I know next to nothing about unit root tests. However, it might be helpful to know why 119,908 missing values are generated when the xtunitroot command is run. Are there 420,000 total observations, or 420,000 observations per panel?

    Regarding "survival bias"; you might do some analysis to determine how much of an issue this will be by dividing the data into two sets; one with the data you might exclude and one without, then examining the two data sets for meaningful differences, e.g. in terms of summary statistics, bi-variate analysis, or model coefficient estimates. The extent to which such differences exist may approximate the extent to which you should worry about survival bias.

    Comment


    • #3
      Since the Fisher style test is accumulating p-values from individual unit root tests, you could do this by hand and pass a unique lag length to each panel. Or even start with 10 lags and increment down until there is no error (return code is 0) for each individual unit-root test.

      , it might be helpful to know why 119,908 missing values are generated when the xtunitroot command is run.
      This I believe is because it creates a copy of the variable that is used if the -demean- option is selected.

      Comment


      • #4
        Wow thank you for the prompt replies!

        420,000 total observations, or 420,000 observations per panel?
        - total observations
        Thank you for the tip, I will come back with the results on dividing the datasets tomorrow.


        do this by hand and pass a unique lag length to each panel.
        - That sounds good. But how would one implement this?

        start with 10 lags and increment down until there is no error
        - The nature of the panels leads to many of those to only exist for one period and even with lag 1, I get similar results.

        Concerning the missing values, I believe Stata creates those after using the xtset command. I think the application just makes up for the missing periods.

        Edit: I did not use the demean function.

        Comment


        • #5
          1. Panels with 1 observation - that does make it difficult to run a difference regression on 10 lagged difference.

          Below is an example implementation.

          In the example the difference between the P statistic reported by -xtunitroot- (5.0236) and the example implementation (10.7616) is difference between dropping panel 10 and using only 2 lags in panel 10.

          Code:
          . display "Variable lag length Fisher unit root test"
          Variable lag length Fisher unit root test
          
          . display "chi2(`=2*`ng'')  =" %9.4f `P' " pval = " %9.4f chi2tail(2*`ng', `P')
          chi2(20)  =  10.7616 pval =    0.9522
          
          . mat list lags
          
          lags[10,1]
               c1
           r1   8
           r2   8
           r3   8
           r4   8
           r5   8
           r6   8
           r7   8
           r8   8
           r9   8
          r10   2
          
          . xtunitroot fisher mvalue, lag(8) dfuller
          could not compute test for panel 10
          
          Fisher-type unit-root test for mvalue
          Based on augmented Dickey-Fuller tests
          --------------------------------------
          Ho: All panels contain unit roots           Number of panels       =     10
          Ha: At least one panel is stationary        Avg. number of periods =  19.60
          
          AR parameter: Panel-specific                Asymptotics: T -> Infinity
          Panel means:  Included
          Time trend:   Not included
          Drift term:   Not included                  ADF regressions: 8 lags
          ------------------------------------------------------------------------------
                                            Statistic      p-value
          ------------------------------------------------------------------------------
           Inverse chi-squared(18)   P         5.0236       0.9988
           Inverse normal            Z         4.8035       1.0000
           Inverse logit t(49)       L*        5.1916       1.0000
           Modified inv. chi-squared Pm       -2.1627       0.9847
          ------------------------------------------------------------------------------
           P statistic requires number of panels to be finite.
           Other statistics are suitable for finite or infinite number of panels.
          ------------------------------------------------------------------------------
          
          . qui dfuller mvalue if company == 10, lag(2)
          
          . disp -2*ln(r(p))
          5.7379332

          Code:
          webuse grunfeld,clear
          set seed 123456
          //create unbalanced panel
          drop  if  com == 10 & runiform() < .15
          
          //set max lags
          local maxlag = 10
          
          local p = 0
          local ng  = 0 
          
          levelsof com, local(levels)
          mat lags = J(r(r),1,.)
          local row = 1
          foreach l of local levels {
              local rc  = 1
          
              local lag = `maxlag'
          
              while `rc' != 0 {
                  capture dfuller mvalue if com == `l', lags(`lag')
                  if _rc != 0  {
                      local lag = `lag'- 1
                      local rc = _rc
                      if `lag' == 0 {
                              local rc = 0
                      }
                  }
                  else {
                      local zt = .
                      qui dfuller mvalue if com == `l', lags(`lag')
                      local zt = r(Zt)
                      while `zt' == . {
                           qui dfuller mvalue if com == `l', lags(`lag')
                          local lag = `lag'- 1
                          local l2 = r(lags)
                          local zt = r(Zt)
                           if `l2' == 0 {
                                  local zt = 0
                           }
                      }
                      local p = `p' + log(r(p))
                      mat lags[`row' ,1] = r(lags)
                      local ++row
                      local ng = `ng' + 1
                      local rc = _rc
                      local lag = `maxlag'
                  }
              }
          }
          local  P = -2*`p'
          
          display "Variable lag length Fisher unit root test"
          display "chi2(`=2*`ng'')  =" %9.4f `P' " pval = " %9.4f chi2tail(2*`ng', `P')
          
          //mat list lags

          Comment


          • #6
            1. Panels with 1 observation - that does make it difficult to run a difference regression on 10 lagged difference.
            Thank you for the example. However, upon trial it appears as if I would have to determine the max lag length of each process and generally type down conditions for each panel individually, which is impossible considering the size.

            Comment

            Working...
            X