Does anyone know how to estimate an Autoregressive Distributed Lag Model in stata? Also called Bounds Testing method (Pesaran 2001)
Announcement
Collapse
No announcement yet.
X

I was about to ask the same question. It would be very helpful if someone can post a step by step guide to ARDL
b/w ARDL in STATA is estimated by
regress infln pcwage L1.pcwage L2.pcwage L3.pcwage L1.infln L2.infln
The above model has p lags of the dependent variable, yt ,and q lags of the independent variable, xt The ARDL(2,3) model of inflation can be estimated using least squares.

Here's my question:
1) Does all lags of the variables in an ARDL model have to be stationary? The model I posted above has the variable pcwage (at level) and it first differences too. Is it okay if pcwage is nonstationary?
2) Is there a formal test to decide how much lags of each variable should be added in an ARDL model.
Thanks.
Comment

1) I do not fully understand your first question. In essence, your variables do not have to be stationary. The ARDL model is appropriate whenever you have (at most) one cointegrating relationship among your variables.
2) The optimal lag length is usually decided on the basis of model selection criteria, like the Akaike oder Schwarz information criterion. You run the model for all possible lag combinations and eventually choose the model that delivers the smallest value of the respective criterion among all models. For example, with the SchwarzBayesian criterion (SBC), one independent variable, and a maximum lag length of 4:
Code:local maxlag = 4 local sbcstar = . local pstar = 0 local qstar = 0 local p = 1 while `p' <= `maxlag' { local q = 0 while `q' <= `maxlag' { reg L(0/`p').depvar L(0/`q').indepvar if time >= 1 + `maxlag' estat ic mat stats = r(S) local sbc = el(stats, 1, 6) if `sbc' < `sbcstar' { local sbcstar = `sbc' local pstar = `p' local qstar = `q' } local ++q } local ++p } reg L(0/`pstar').depvar L(0/`qstar').indepvar if time >= 1 + `maxlag'
 1 like
Comment

If your variables are I(1) and you have more than one cointegrating relationship among them, the singleequation ARDL model would be misspecified as it can accommodate only one cointegrating relationship. In that case you would prefer to estimate a vector errorcorrection model (VECM).
If your variables are I(1) and you have exactly one cointegrating relationship, you can rewrite the ARDL model analytically in errorcorrection representation with firstdifferences of depvar on the lefthand side, the cointegrating relationship of the level variables as well as additional lags of firstdifferenced depvar and indepvars on the righthand side. All those components are then I(0) which shows that you can safely estimate this ARDL model in levels.
If your variables are I(1) but you do not have any cointegrating relationship among them, estimation is still fine because there exist values for the population parameters such that the error term can be I(0) due to the inclusion of lags of the dependent variable (the sum of the coefficients for the lags of depvar would equal unity in the underlying data generating process such that the level term drops out in the errorcorrection representation of the model; similarly for indepvars that are I(1)). However, in this case it would be more efficient to estimate an ARDL model directly in first differences.
If all of your variables are I(0) then you obviously do not have any problem with the ARDL model.
The point that I want to make is the following: Testing for nonstationarity and cointegration of your variables is still useful as it guides you towards the optimal model choice (VECM, ARDL in levels, ARDL in first differences).Last edited by Sebastian Kripfganz; 26 Jul 2014, 10:42.
 3 likes
Comment

Thanks alot. One last question, how would I know if my model has more than one cointegration relationship? I am trying to find out the impact of private investment , public investment and road length on the employment in the transport sector. The model I estimated (pictured attached) does not have any significant pvalues, though the wald test fstat is higher than the upper bound value of persan which suggests that there exists a long run relationship among the variables. But what about the pvalues? should I be worried? All variables of the model are I(0)Last edited by danishussalam; 26 Jul 2014, 14:34.
Comment

You would estimate a vector errorcorrection model and test for the cointegrating rank. See vecrank and its documentation in the Stata manual.
Regarding your EViews output: Your Wald test includes the coefficient of the lagged dependent variable such that it is not surprising that it rejects the null hypothesis. However, I am worried about the coefficient of your lagged dependent variable in the errorcorrection representation of the ARDL which is 1.39. This coefficient should typically lie in the range [1,0] but yours is by far outside of this economically meaningful range.
 2 likes
Comment

Thank you sebastian. That was really helpful. I now am getting a grip on ARDL models.
Let's say after the johansen cointegration test, I estimate an ARDL model of the following type
d(emp_t) c @trend emp_t(1) lnroad(1) d(emp_t(1)) d(lnroad(1)) ,
the coefficient of emp_t(1) was b/w 1 and 0 and also the fstat of c(3) and c(4) was greater than the upper bound value. Now I save the residual of the above equation in a series called ECM and estimate another regression for short run relationship.
my questions: which regression should I now estimate?
1) d(emp_t) c @trend d(emp_t(1)) d(lnroad(1)) ecm(1)
or
2) d(emp_t) c @trend d(emp_t) d(lnroad) ecm(1)
Secondly , also let's say that the coefficient of ecm(1) is not b/w 1 and 0 , can we conclude that the variables have long run relationship but they do not have any short run relationship?
Would really appreciate if you can reply at your earliest.
Comment

Maybe it is too early in the morning for me, but I do not see any reason why you want to estimate 1) or 2). The shortrun effects are already included in the initial model. They are given by the coefficients of the firstdifferenced regressors. The coefficient of ecm in 1) and 2) should actually be zero because it contains the noisy part (the residuals) of your initial model that does not explain d(emp_t). If it is nonzero, that may be a consequence of using ecm one period lagged in 1) and 2) (Again, why?). I would then suspect that you are missing some lags in your initial model to fully capture the dynamics.
Btw: As this is the Stata forum it would be better if you use Stata syntax instead of EViews syntax to make it easier for the Stata listers to understand your problem.Last edited by Sebastian Kripfganz; 30 Jul 2014, 01:43.
 2 likes
Comment

Hi Sebastian, just in relation to your code for choosing optimal lag length. Is it possible to modify it so as to run (p+1)^k regressions where p is your max lag length and k is the number of variables one has included? As with the above code Stata runs combinations where the lags are equal on all explanatory variables as opposed to running all possible combinations (thus giving a lot more regressions).
Any help would be much appreciated! Karl.
Comment

I just released a userwritten command by myself and Daniel Schneider that accomplishes this task.
The command ardl fits a linear regression model of depvar on indepvars with lagged depvar and indepvars as additional regressors. Information criteria can be used to find the optimal lag lengths. Estimation output is delivered either in levels form or in errorcorrection form. As an option, results from the Pesaran/Shin/Smith (2001) bounds testing procedure for the existence of a levels relationship can be displayed.
You can find and install the ardl package by typing the following line in the Stata command window:
net from "http://www.kripfganz.de/stata/"
Please see the Stata help file for additonal information about the command. Comments, suggestions, and bug reports are highly welcome.
 4 likes
Comment

@ Sebastian,
When specifying the ec option in order to run in first difference form I noticed that in the resulting estimates the long run effects, which are usually given by the lagged level of your depvar and indepvar, are given as the level of each variable (I.e. given variable in time t instead of t1). Is this deliberate in the model or is there a way to re specify so as to obtain a 'standard' ARDL bound testing model? If you could let me know when you get a chance I would appreciate it!
Comment
Comment