Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ARDL Lag Selection and Short-Run Elasticity Inference in Small Sample

    I have a small sample of 29 annual observations. While the literature recommends using AIC lags, doing so in my case results in a large number of estimated parameters — enough that I lose degrees of freedom and STATA won’t allow me to run the bounds test for cointegration.

    Here's my main agenda: I'm trying to calculate a short-run elasticity, and N=29 in my case.

    If I use the ARDL with BIC lags, I can test for cointegration (bounds test).

    But when I use ARDL with AIC lags, I cannot test for cointegration (bound test) since STATA won't let me do that. STATA gives an error that # of observations should be at least twice higher than estimated parameters. So, we don't have enough degrees of freedom. But surprisingly in this AIC case, my short run elasticity (the parameter that I'm interested in) is significant.

    Should I trust my short-run elasticity in this case or it might be spurious since we could not test for a genuine relationship?


    Or let me phrase my doubt the other way: Is the cointegration (bound test) only for long-run elasticities/parameters and has nothing to do with the short-run elasticities? Or co-integration bound test is a necessary/rule requirement, whether we're interested in short-run or long run. I can't understand if not being able to test for cointegration makes all the coefficients and the entire equation spurious? Like what exactly do we test for cointegration?

    In short: Should I prioritize the AIC model (more lags) with a significant short-run result (despite not being able to test for cointegration), or the BIC model (lesser lags) where cointegration can be verified but the short-run elasticity is insignificant?

    Does not being able to conduct a bound test for cointegration makes the entire model spurious/meaningless and nothing can be trusted if cointegration could not be tested?


    I would greatly appreciate your guidance on how you would approach this situation.
    Thank you very much for your time and advice.

  • #2
    For an ARDL analysis, a sample size of 29 observations is very small, especially if you have multiple regressors. You should problably reduce the maximum number of lags with the maxlags() option to conserve degrees of freedom. In addition, the BIC should be the preferred criterion with such small data sets as it is more conservative in the lag choice.

    Whether the estimate is significant or not should not guide the model choice. Here, the results from AIC are essentially unreliable due to the lack of degrees of freedom.

    The bounds test is a test for the existence of a long-run relationship. If there is no long-run relationship, you can still interpret the short-run elasticities.
    https://www.kripfganz.de/stata/

    Comment


    • #3
      Thanks, Professor Sebastian, your reply answered most of my doubts. One small thing remains. I'll try to ask it in different ways, so that you understand all the possible scenarios that are bothering me.

      1. Testing for cointegration is a necessary thumb rule/requirement in time series analysis, or is it only a choice when we're interested in long-run coefficients?

      2. If cointegration is absent (or untestable due to small sample size/degrees of freedom, like my case), does that mean the relationship is necessarily spurious, and one should stop there and not trust any coefficients, neither short-run nor long-run? In other words, without evidence of cointegration or inability to test for it, are we implicitly ASSUMING that a genuine economic relationship exists between non-stationary variables, even though one doesn't exist, and things might be totally spurious?

      3. Or can there still be a genuine economic relationship between non-stationary variables, even if cointegration is not found or could not be detected because of sample size/degrees of freedom?

      And I guess that's why this question of mine remains, as to what we learn from cointegration? Is the main role to confirm that the relationship, as chosen by lags, is not spurious and that a meaningful link exists?

      Thanks, and apologies for a long question.

      Comment


      • #4
        1. Testing for cointegrating is not a necessity. It is done for 2 main reasons: a) People might be interest in the cointegrating/long-run relationship itself. b) If there is no such cointegrating/long-run relationship, then the estimation of the short-run effects can be done more efficiently by simply estimating a model in first differences.
        2. In the absence of cointegration, you might find a spurious relationship if you regress levels on levels without accounting for the dynamics. As long as you allow for dynamics (i.e., lags of the dependent and independent variables), there is no risk of running a spurious regression. Estimates of the short-run effects are still valid. Note, however, that absence of significant evidence of a cointegrating/long-run relationship does not imply evidence of absence of such a relationship. You might simply not have enough data to estimate the effects precisely enough.
        3. There can still be a genuine short-run (!) relationship between the variables; i.e., a relationship in their first differences.
        https://www.kripfganz.de/stata/

        Comment


        • #5
          Dear Prof. Sebastian Kripfganz,

          I am working on a time series model with a sample size of approximately 750 observations. We are estimating a dynamic regression that includes a lagged dependent variable among the regressors.

          The dependent variable was found to be nonstationary, so we used its first difference in the model. In this case, I am not sure whether the lagged dependent variable should also be included in first differences, or it should remain in levels.
          So, which specification should we adopt in this case?
          1) Δyt = Δy(t-1) + Xt + εt; or
          2) Δyt = y(t-1) + Xt + εt; or
          3) Δyt = Δy(t-1) + y(t-1) + Xt + εt;

          The other regressors in the model are stationary.

          Could you please guide me on this point?

          Thanks,
          Sarah
          Last edited by Sarah Magd; 31 Jul 2025, 03:35.

          Comment


          • #6
            Specification 3 is the most general specification among those three; this is effectively an error correction model.

            Specification 1 imposes the absence of lagged level effects as a restriction; in essence, you are ruling out the existence of a long-run level relationship between y and X.

            Specification 2 is an error correction model with restricted short-run dynamics.

            Unless you have prior information about which specification is most meaningful for your application, the best way forward might be to simply estimate an ARDL / error correction model and let the data speak for itself.
            https://www.kripfganz.de/stata/

            Comment


            • #7
              Dear Prof. Sebastian Kripfganz
              Thank you very much for the constructive feedback.

              I have one follow-up question:


              Regarding the point of "simply estimate an ARDL / error correction model and let the data speak for itself"

              Δyt is a stationary series, and the independent variables are either dummy variables or stationary independent variables.
              In this case, would I run the ARDL/ecm with the y in level? Would it be possible to do this while all the independent variables are stationary?



              Thanks,
              Sarah

              Comment


              • #8
                You should generally estimate the ARDL model in levels. Note that the corresponding error correction representation would have the dependent variable in first differences; this is done automatically, if you are using the ec option of the ardl command. There is nothing wrong with this even if all independent variables are stationary.

                If you are sure that the dependent variable is the only nonstationary variable and that all independent variables are stationary, then you could also estimate the ARDL model directly with first differences of the dependent variable. In this case, you would implicitly be incorporating this prior knowledge about the integration orders.
                https://www.kripfganz.de/stata/

                Comment

                Working...
                X