No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • ARDL in stata

    Does anyone know how to estimate an Autoregressive Distributed Lag Model in stata? Also called Bounds Testing method (Pesaran 2001)

  • #2
    I was about to ask the same question. It would be very helpful if someone can post a step by step guide to ARDL

    b/w ARDL in STATA is estimated by

    regress infln pcwage L1.pcwage L2.pcwage L3.pcwage L1.infln L2.infln

    The above model has p lags of the dependent variable, yt ,and q lags of the independent variable, xt The ARDL(2,3) model of inflation can be estimated using least squares.


    • #3
      Here's my question:

      1) Does all lags of the variables in an ARDL model have to be stationary? The model I posted above has the variable pcwage (at level) and it first differences too. Is it okay if pcwage is non-stationary?

      2) Is there a formal test to decide how much lags of each variable should be added in an ARDL model.



      • #4
        1) I do not fully understand your first question. In essence, your variables do not have to be stationary. The ARDL model is appropriate whenever you have (at most) one co-integrating relationship among your variables.

        2) The optimal lag length is usually decided on the basis of model selection criteria, like the Akaike oder Schwarz information criterion. You run the model for all possible lag combinations and eventually choose the model that delivers the smallest value of the respective criterion among all models. For example, with the Schwarz-Bayesian criterion (SBC), one independent variable, and a maximum lag length of 4:
        local maxlag = 4
        local sbcstar = .
        local pstar = 0
        local qstar = 0
        local p = 1
        while `p' <= `maxlag' {
                        local q = 0
                        while `q' <= `maxlag' {
                                       reg L(0/`p').depvar L(0/`q').indepvar if time >= 1 + `maxlag'
                                       estat ic
                                       mat stats = r(S)
                                       local sbc = el(stats, 1, 6)
                                       if `sbc' < `sbcstar' {
                                                       local sbcstar = `sbc'
                                                       local pstar = `p'
                                                       local qstar = `q'
                                       local ++q
                        local ++p
        reg L(0/`pstar').depvar L(0/`qstar').indepvar if time >= 1 + `maxlag'
        Importantly, make sure to restrict the sample to be the same with all lag combinations such that the same number of observations is used in each case. Otherwise, the sample selection criteria would not be comparable.


        • #5
          Thanks alot for answering the second question. I understand that now.

          Are you saying that integrated order is not important for ARDL? which in other words imply that one need not to use augemented dickey fuller test to check for stationarity of variables?


          • #6
            If your variables are I(1) and you have more than one co-integrating relationship among them, the single-equation ARDL model would be misspecified as it can accommodate only one co-integrating relationship. In that case you would prefer to estimate a vector error-correction model (VECM).

            If your variables are I(1) and you have exactly one co-integrating relationship, you can rewrite the ARDL model analytically in error-correction representation with first-differences of depvar on the left-hand side, the co-integrating relationship of the level variables as well as additional lags of first-differenced depvar and indepvars on the right-hand side. All those components are then I(0) which shows that you can safely estimate this ARDL model in levels.

            If your variables are I(1) but you do not have any co-integrating relationship among them, estimation is still fine because there exist values for the population parameters such that the error term can be I(0) due to the inclusion of lags of the dependent variable (the sum of the coefficients for the lags of depvar would equal unity in the underlying data generating process such that the level term drops out in the error-correction representation of the model; similarly for indepvars that are I(1)). However, in this case it would be more efficient to estimate an ARDL model directly in first differences.

            If all of your variables are I(0) then you obviously do not have any problem with the ARDL model.

            The point that I want to make is the following: Testing for non-stationarity and co-integration of your variables is still useful as it guides you towards the optimal model choice (VECM, ARDL in levels, ARDL in first differences).
            Last edited by Sebastian Kripfganz; 26 Jul 2014, 11:42.


            • #7
              Thanks alot. One last question, how would I know if my model has more than one co-integration relationship? I am trying to find out the impact of private investment , public investment and road length on the employment in the transport sector. The model I estimated (pictured attached) does not have any significant p-values, though the wald test f-stat is higher than the upper bound value of persan which suggests that there exists a long run relationship among the variables. But what about the p-values? should I be worried? All variables of the model are I(0)
              Last edited by danishussalam; 26 Jul 2014, 15:34.


              • #8


                • #9
                  You would estimate a vector error-correction model and test for the cointegrating rank. See vecrank and its documentation in the Stata manual.

                  Regarding your EViews output: Your Wald test includes the coefficient of the lagged dependent variable such that it is not surprising that it rejects the null hypothesis. However, I am worried about the coefficient of your lagged dependent variable in the error-correction representation of the ARDL which is -1.39. This coefficient should typically lie in the range [-1,0] but yours is by far outside of this economically meaningful range.


                  • #10
                    Thank you sebastian. That was really helpful. I now am getting a grip on ARDL models.

                    Let's say after the johansen cointegration test, I estimate an ARDL model of the following type

                    d(emp_t) c @trend emp_t(-1) lnroad(-1) d(emp_t(-1)) d(lnroad(-1)) ,

                    the coefficient of emp_t(-1) was b/w -1 and 0 and also the f-stat of c(3) and c(4) was greater than the upper bound value. Now I save the residual of the above equation in a series called ECM and estimate another regression for short run relationship.

                    my questions: which regression should I now estimate?

                    1) d(emp_t) c @trend d(emp_t(-1)) d(lnroad(-1)) ecm(-1)


                    2) d(emp_t) c @trend d(emp_t) d(lnroad) ecm(-1)

                    Secondly , also let's say that the coefficient of ecm(-1) is not b/w -1 and 0 , can we conclude that the variables have long run relationship but they do not have any short run relationship?

                    Would really appreciate if you can reply at your earliest.


                    • #11
                      Maybe it is too early in the morning for me, but I do not see any reason why you want to estimate 1) or 2). The short-run effects are already included in the initial model. They are given by the coefficients of the first-differenced regressors. The coefficient of ecm in 1) and 2) should actually be zero because it contains the noisy part (the residuals) of your initial model that does not explain d(emp_t). If it is non-zero, that may be a consequence of using ecm one period lagged in 1) and 2) (Again, why?). I would then suspect that you are missing some lags in your initial model to fully capture the dynamics.

                      Btw: As this is the Stata forum it would be better if you use Stata syntax instead of EViews syntax to make it easier for the Stata listers to understand your problem.
                      Last edited by Sebastian Kripfganz; 30 Jul 2014, 02:43.


                      • #12
                        Hi Sebastian, just in relation to your code for choosing optimal lag length. Is it possible to modify it so as to run (p+1)^k regressions where p is your max lag length and k is the number of variables one has included? As with the above code Stata runs combinations where the lags are equal on all explanatory variables as opposed to running all possible combinations (thus giving a lot more regressions).

                        Any help would be much appreciated! Karl.


                        • #13
                          I just released a user-written command by myself and Daniel Schneider that accomplishes this task.

                          The command ardl fits a linear regression model of depvar on indepvars with lagged depvar and indepvars as additional regressors. Information criteria can be used to find the optimal lag lengths. Estimation output is delivered either in levels form or in error-correction form. As an option, results from the Pesaran/Shin/Smith (2001) bounds testing procedure for the existence of a levels relationship can be displayed.

                          You can find and install the ardl package by typing the following line in the Stata command window:
                          net from ""

                          Please see the Stata help file for additonal information about the command. Comments, suggestions, and bug reports are highly welcome.


                          • #14
                            Great thanks. I will run through it and if I have any feedback, suggestions, or questions will let you know.


                            • #15
                              @ Sebastian,

                              When specifying the ec option in order to run in first difference form I noticed that in the resulting estimates the long run effects, which are usually given by the lagged level of your depvar and indepvar, are given as the level of each variable (I.e. given variable in time t instead of t-1). Is this deliberate in the model or is there a way to re specify so as to obtain a 'standard' ARDL bound testing model? If you could let me know when you get a chance I would appreciate it!