Dear all,
I'm having some trouble with determining the amount of lags I should use in an ARDL model. I am regressing a form of interest rates (x) on the size of the economy (y) of a country. It is very likely that for both dependent and independent variables autocorrelation is present. To adress this, I want to use the ARDL model. However, according to literature (Principles of Econometrics 4th edition, by Hill, Griffiths and Lim), there are a number of subjective criteria for choosing the amount of lags of both variables.
1. Has serial correlation in the errors been eliminated? (this can be checked with a correlogram or LM tests)
2. Are the coefficient signs consistent with expectations?
3. Are the estimates significantly different from zero?
4. Which amount of lags minimize information criteria such as the AIC and SC?
Obs: 420
Countries: 28
Years per country: 15
Strongly balanced panel
To normalise Y, i take logs.
H0 of no first order autocorrelation can be rejected as per:
Question 1: which lags do I determine first, those of X or those of Y? Or does it need to be determined simultaneously?
I started off with determining lags of X first, but since it's a panel data set, I cannot use -corrgram- to see how many lags are statistically significant. I have looked at scatter diagrams between x and x_1, x and x_2, x and x_3, but those do not give a definitive answer to the question. I decided to use a correlation table to see which lags (out of 5 to start with) are correlated, but I'm not sure if that's the right way:
This suggests lags 1-3 are statistically significant with the x variable. When regressing this however, it does not give me statistically significant results for lag 1 and 2 (where 2 also has an unexpected sign):
Also, I do not seem to be able to use an LM test (-estat bgodfrey- is not valid) or -dwstat2- because I'm using multiple panels.
Question 2: what am I doing wrong with regards to these lags?
As for the lags of Y, I'm running into a similar problem of not being able to test whether the amount of values I choose is correct.
I hope I have been detailed enough. However if there is missing information, I'm very happy to supply it. Thanks in advance for any help!
Mark
I'm having some trouble with determining the amount of lags I should use in an ARDL model. I am regressing a form of interest rates (x) on the size of the economy (y) of a country. It is very likely that for both dependent and independent variables autocorrelation is present. To adress this, I want to use the ARDL model. However, according to literature (Principles of Econometrics 4th edition, by Hill, Griffiths and Lim), there are a number of subjective criteria for choosing the amount of lags of both variables.
1. Has serial correlation in the errors been eliminated? (this can be checked with a correlogram or LM tests)
2. Are the coefficient signs consistent with expectations?
3. Are the estimates significantly different from zero?
4. Which amount of lags minimize information criteria such as the AIC and SC?
Obs: 420
Countries: 28
Years per country: 15
Strongly balanced panel
To normalise Y, i take logs.
H0 of no first order autocorrelation can be rejected as per:
Code:
xtserial logy x Wooldridge test for autocorrelation in panel data H0: no first-order autocorrelation F( 1, 27) = 124.790 Prob > F = 0.0000
I started off with determining lags of X first, but since it's a panel data set, I cannot use -corrgram- to see how many lags are statistically significant. I have looked at scatter diagrams between x and x_1, x and x_2, x and x_3, but those do not give a definitive answer to the question. I decided to use a correlation table to see which lags (out of 5 to start with) are correlated, but I'm not sure if that's the right way:
Code:
| x x_1 x_2 x_3 x_4 x_5 -------------+------------------------------------------------------ x | 1.0000 | | x_1 | 0.6639* 1.0000 | 0.0000 | x_2 | 0.3015* 0.6528* 1.0000 | 0.0000 0.0000 | x_3 | 0.1671* 0.2796* 0.6514* 1.0000 | 0.0021 0.0000 0.0000 | x_4 | 0.1044 0.1655* 0.2779* 0.6536* 1.0000 | 0.0672 0.0036 0.0000 0.0000 | x_ 5 | 0.0299 0.1364* 0.1816* 0.2983* 0.6710* 1.0000 | 0.6188 0.0225 0.0023 0.0000 0.0000
Code:
Random-effects GLS regression Number of obs = 336 Group variable: Country Number of groups = 28 R-sq: Obs per group: within = 0.0835 min = 12 between = 0.0459 avg = 12.0 overall = 0.0312 max = 12 Wald chi2(4) = 28.26 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ log(y) | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .0017694 .0007606 2.33 0.020 .0002787 .0032601 x_1 | .0005826 .0009607 0.61 0.544 -.0013004 .0024656 x_2 | -.0000425 .000964 -0.04 0.965 -.0019318 .0018468 x_3 | .0023062 .0007694 3.00 0.003 .0007982 .0038143 _cons | .742227 .0200991 36.93 0.000 .7028335 .7816205 -------------+---------------------------------------------------------------- sigma_u | .10615044 sigma_e | .02462382 rho | .94893718 (fraction of variance due to u_i) ------------------------------------------------------------------------------
Question 2: what am I doing wrong with regards to these lags?
As for the lags of Y, I'm running into a similar problem of not being able to test whether the amount of values I choose is correct.
I hope I have been detailed enough. However if there is missing information, I'm very happy to supply it. Thanks in advance for any help!
Mark