Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can I Drop Independent Variables That Are Always Zero When Testing for Stationarity and Cross-Sectional Dependence?

    Hi!

    In my panel dataset, I find evidence of both non-stationarity and cross-sectional dependence when running the Pesaran CIPS unit root test and the Pesaran CD test on the full sample. One independent variable, carbon_tax, has a value of zero for many observations. When I drop all observations where carbon_tax = 0, my sample size falls from about 1,100 to 500 observations. In this reduced sample, both issues disappear: the variables appear stationary and there is no cross-sectional dependence. My question is whether it is econometrically sound to remove these zero-value observations (and thus half of my sample), and what this means for the validity and interpretation of my regression results.

    Thanks in advance!

  • #2
    Is carbon_tax uncorrelated with everything else in your data, so that filtering out one value of carbon_tax would not cause selection bias in your coefficient of interest?

    My guess is that the very fact that the stationarity and cross-sectional dependence characteristics change when you do so, signals that the answer is probably no.

    Comment


    • #3
      In addition to Hemanshu's comments: a lack of significance does not mean the null hypothesis is true. It only means you did not detect enough deviations to reject it. That is not really a surprise: you halved the data, so of course you will be able to detect less... So in all likelihood you did not solved the problem, you only killed enough statistical power so that you are no longer able to detect it. That is like poking out your eyes to solve the problem that you don't like the color of your room.
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Thanks! So there would be no econometrical reason to drop carbon_tax values when 0 purely to improve the tests?

        The reason I’m asking is that in the paper Missing Values in Panel Data Unit Root Tests (Karavias, Tzavalis & Zhang, 2022), the authors discuss how different ways of handling missing data affect panel unit root tests. They show that “closing up the gaps” ,i.e., dropping the missing-value observations and treating the remaining ones as continuous, keeps the null distribution of the test unchanged and delivers the highest power compared to forward fill or linear interpolation.

        In my case, carbon_tax is zero for long stretches of time in some countries, or even the entire period for a few countries. Removing these zero observations (treating them as “gaps”) cuts my sample in half, but stationarity and cross-sectional dependence problems disappear. I was wondering whether this kind of gap-closing approach would be a valid econometric strategy in this context, or whether I’m just masking the issue by reducing the sample and its variability.

        Comment


        • #5
          To me, these are two entirely different things. The paper you are citing is giving you ideas on how to overcome missing information -- specifically, how to test for stationarity in the presence of missing values. Notice that the default approach to missing values in Stata, at least for purposes of estimation, is listwise deletion -- basically, those observations are just dropped before the estimation is conducted. Now, I don't work with time series data, but I presume with that type of data, depending on the extent of knowledge one had about the data generating process, you could try to retain some or all of those missing observations by doing other things, like some type of interpolation. For the specific case of unit root testing, the paper you cite seems to suggest that dropping the observations may be fine or even better, relative to interpolation, etc. Which is fine.

          You on the other hand, do not have missing information. You have information that a variable takes on a specific value over particular periods of time. You want to throw away the observations where the variable takes on those values because it leads to a more convenient "result" about the stationarity of the data generating process. I am happy to be corrected by a time series data expert on this, but to me, that is pure cherry picking. You may be okay if all you are trying to claim is something about the data generating process conditional on carbon_tax being zero. But that will usually not be a sensible condition, insofar as it represents policy that is itself (partly) endogenous.

          Not to belabor the point, but why is the value of zero even special? While we're at it, why not drop all observations where it is 1.. or 2.. or 5? or where it alternates between 0 and 1? Or anything else at all?
          Last edited by Hemanshu Kumar; 10 Aug 2025, 07:49.

          Comment


          • #6
            I suppose you could try to model the zeros.

            Comment

            Working...
            X