Dynamic panel vs XTREG vs XTEGAR vs other

Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#1

Dynamic panel vs XTREG vs XTEGAR vs other

22 Mar 2022, 14:32

Dear all,

I am running a panel macro data where N=50 and T=80. I have a quadratic equation and my coefficients are all linear. In the model, I have a bunch of dummies, count and categorical variables among others. Macro variables are in logs or in percentage, and data are stationary,

The mode is the following in the picture below with only difference, instead of Π, I got growth of gdpY (in log) and intercept a is given by is (1-a)Yt-1. The rest is the same

GDP (growth) is the dependent variable, and the lag of GDP enters the equation as an independent variable.. T is and indicator, an overall constant a ((1-a)yt-1) in my model, is included. Z is a vector of control variables that affect the level of gdp

Data are stationary and there is first order autocorrelation tested with xtserial,. I suspected, but not confirmed, the presence of heteroscedasticity, from the structure and some kind of heterogeneity present in the composition of the panel .
In short, it is a classical macro panel.

Till now, I've run the following two regressions, for xtregar , just because of T>N , and xtreg . Equations are in a reduced form presented here, for obvious reasons

xtregar growthgdp l.gdp cpi u Output dummy1 dummy2 c.indicator1##c.indicator1, fe

and

xtreg growthgdp l.gdp cpi u Output dummy1 dummy2 c.indicator1##c.indicator1, vce (cluster id) fe

I've chosen initially xtregar because of the large T, but I now see that it may be wrong. Note also that I alternate the regression by estimating sub-periods and T are then reduced to only ten 10 years T=10

Estimation results from entire period and sub-periods must be compared.

When I run xtreg, many of the dummies and the indicators are omitted because of collinearity,, but this does not happen in their square terms thought.
Panels are also reduced to 42 and the respected number of observations is reduced as well. That causes to lose information on panels and entire groups they belong

[QUOTE] F test are

F(87,1016) = 4.21 Prob > F = 0.0000 for xtreg

F F(87,974) = 2.77 Prob > F = 0.0000. for xtregar [QUOTE]

With xtregar I get better results.

On the estimation I was thinking of a dynamic panel, like the command written by Sebastian Kripfganz, yet not sure if this is the correct way to go. I do not have instruments in the model.

Jeff Wooldridge, , digging in the forum ,says that he avoids xtegar
https://www.statalist.org/forums/for...72#post1572672

While other panel experts Joao Santos Silva, Clyde Schechter Carlo Lazzaro have expressed different views in older threads

Can you please suggest which is the best method to use?
In only case, the method must be able to produce margins.

Last but not least, have I written correctly the factorial and the regression in Stata language according to the equation?

Last edited by Giorgio Di Stefano; 22 Mar 2022, 15:25.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17730
#2

23 Mar 2022, 00:53

Giorgio:
you do not post outcome tables, so I cannot say.

Kind regards,
Carlo
(Stata 19.0)
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#3

23 Mar 2022, 11:15

Originally posted by Carlo Lazzaro View Post

Giorgio:
you do not post outcome tables, so I cannot say.

I am running a big panel
Here they are the first part of diagnostics . Would that help? I am getting statistically better results with xtregar, does not mean it is correct. Cannot post regression outcomes for the variables, they are way too many and for privacy reasons.

I have not tested yet for GMM or other dynamic panel. Much appreciate your help!

xtreg varlist ,)fe
Fixed-effects (within) regression Number of obs = 1,145
Group variable: id Number of groups = 42

R-sq: Obs per group:
within = 0.2648 min = 4
between = 0.0071 avg = 27.3
overall = 0.1767 max = 40

F(87,1016) = 4.21
corr(u_i, Xb) = -0.2466 Prob > F = 0.0000

xtreg varlist ,vce (cluster id)fe

Fixed-effects (within) regression Number of obs = 1,145
Group variable: id Number of groups = 42

R-sq: Obs per group:
within = 0.2648 min = 4
between = 0.0071 avg = 27.3
overall = 0.1767 max = 40

F(44,41) = .
corr(u_i, Xb) = -0.2466 Prob > F = .

(Std. Err. adjusted for 42 clusters in id)

xtregar varlist,fe

FE (within) regression with AR(1) disturbances Number of obs = 1,103
Group variable: id Number of groups = 42

R-sq: Obs per group:
within = 0.1986 min = 3
between = 0.0904 avg = 26.3
overall = 0.1019 max = 39

F(87,974) = 2.77
corr(u_i, Xb) = -0.3154 Prob > F = 0.0000
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17730
#4

23 Mar 2022, 11:37

Giorgio:
as per your description, I would stick with -xtregar,fe-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#5

23 Mar 2022, 12:42

Originally posted by Carlo Lazzaro View Post

Giorgio:
as per your description, I would stick with -xtregar,fe-.

That's what I thought.
Nonetheless, I known that xtregar is for T> N. That is fine for the entire panel . Won't then that cause problems when I estimate sub-periods with N >T?
And does xtregar correct heteroscedasticity? I am thinking of using xtreg, vce (cluster panelid) for smaller periods. Will the results be comparable to the full period sample if I alternate? How to handle smaller periods?
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#6

23 Mar 2022, 12:55

Originally posted by Carlo Lazzaro View Post

Giorgio:
as per your description, I would stick with -xtregar,fe-.

Carlo, I ve just run a Hausman test and gotthese results at the end

Note: the rank of the differenced variance matrix (59) does not equal the
number of coefficients being tested (87); be sure this is what you
expect, or there may be problems computing the test. Examine the
output of your estimators for anything unexpected and possibly
consider scaling your variables so that the coefficients are on a
similar scale.

b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg

Test: Ho: difference in coefficients not systematic

chi2(59) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= -89.82 chi2<0 ==> model fitted on these
data fails to meet the asymptotic
assumptions of the Hausman test;
see suest for a generalized test

Should I be concerned about my model?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17730
#7

24 Mar 2022, 09:21

Giorgio:
first, you should decide which approach is the best for your research goal T>N or N>T.
That said, -hausman- outcome is, from time to time, a painful read.
You can replace it with the community-contributed modules -rhausman- or -xtoverid- (the latter if you go -xtreg,re-).
Last but not least, you may want to take a look at https://blog.stata.com/tag/mundlak/

Kind regards,
Carlo
(Stata 19.0)
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#8

24 Mar 2022, 11:27

Originally posted by Carlo Lazzaro View Post

Giorgio:
first, you should decide which approach is the best for your research goal T>N or N>T.
That said, -hausman- outcome is, from time to time, a painful read.
You can replace it with the community-contributed modules -rhausman- or -xtoverid- (the latter if you go -xtreg,re-).
Last but not least, you may want to take a look at https://blog.stata.com/tag/mundlak/

Carlo,

Both N.>T and T.>N are important in order to compare results of the entire period to the smaller ones.

I 've seen them all, but since I have factor variables, the relation between the variables and notation in my model the procedure cannot be completed. In the classic Hausman , I run the test with the sigmamore option and get

b = consistent under Ho and Ha; obtained from xtreg
B = inconsistent under Ha, efficient under Ho; obtained from xtreg

Test: Ho: difference in coefficients not systematic

chi2(39) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 151.18
Prob>chi2 = 0.0000
(V_b-V_B is not positive definite)

I know that I have dummies here that may play a role.

I would like to. Include individual country id effects in the model. Heteroscedasticity (or better heterogeneity)is highly present in the data, along with autocorrelation

I found also xtfevd which contains a lot of critiques.

So , I might go for , xterger, vce (clustring ipaneld) but that will be a blank choice, based on the testparm. That is more the best guess option, though.

By the way, what is the difference between areg y x1, absorb(country) and xtreg y x1, vce (cluster id)?

Or better, how can individuate time varying coefficients and get time varying effects be obtained?

Last edited by Giorgio Di Stefano; 24 Mar 2022, 11:30.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2606
#9

24 Mar 2022, 11:53

You have asked me in this other thread if my xtdpdgmm command can be of use here. As I mentioned there, it is normally used for situations where T is relatively small. As long as you ensure that the number of instruments remains small when T gets large, you might still be able to use it here. You do not necessarily need additional external instruments. You can simply use internal instruments, which are lags (or lagged differences) of the variables in your model. If you want to proceed along those lines, I recommend to make yourself familiar with the literature on dynamic panel models. The respective methods cannot be explained in just a few sentences here.

Notice that in your model GDP growth is regressed on lagged GDP. Because GDP growth is typically approximated as the difference of log GDP and lagged log GDP, your model is essentially already a dynamic model. GDP is definitely not a strictly exogenous regressor. If T is small, xtreg and xtregar are not appropriate. If T is large, the bias from including such a lagged dependent variable vanishes, and therefore using those commands might be acceptable.

https://www.kripfganz.de/stata/
1 like
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17730

#10

24 Mar 2022, 12:01

Giorgio:
I fail to get what you're after.
As far as your last question is concerned:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage age, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1026                                         min =          1
     Between = 0.0877                                         avg =        6.1
     Overall = 0.0774                                         max =         15

                                                F(1,4709)         =     884.05
corr(u_i, Xb) = 0.0314                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0181349   .0006099    29.73   0.000     .0169392    .0193306
       _cons |   1.148214   .0177153    64.81   0.000     1.113483    1.182944
-------------+----------------------------------------------------------------
     sigma_u |  .40635023
     sigma_e |  .30349389
         rho |  .64192015   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. areg ln_wage age, abs(idcode) vce(cluster idcode)

Linear regression, absorbing indicators             Number of obs     = 28,510
Absorbed variable: idcode                           No. of categories =  4,710
                                                    F(1, 4709)        = 738.02
                                                    Prob > F          = 0.0000
                                                    R-squared         = 0.6636
                                                    Adj R-squared     = 0.5970
                                                    Root MSE          = 0.3035

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0181349   .0006675    27.17   0.000     .0168262    .0194436
       _cons |   1.148214   .0193889    59.22   0.000     1.110202    1.186225
------------------------------------------------------------------------------

.


.

Kind regards,
Carlo
(Stata 19.0)

Comment

Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#11

24 Mar 2022, 12:54

Originally posted by Sebastian Kripfganz View Post

You have asked me in this other thread if my xtdpdgmm command can be of use here. As I mentioned there, it is normally used for situations where T is relatively small. As long as you ensure that the number of instruments remains small when T gets large, you might still be able to use it here. You do not necessarily need additional external instruments. You can simply use internal instruments, which are lags (or lagged differences) of the variables in your model. If you want to proceed along those lines, I recommend to make yourself familiar with the literature on dynamic panel models. The respective methods cannot be explained in just a few sentences here.

Notice that in your model GDP growth is regressed on lagged GDP. Because GDP growth is typically approximated as the difference of log GDP and lagged log GDP, your model is essentially already a dynamic model. GDP is definitely not a strictly exogenous regressor. If T is small, xtreg and xtregar are not appropriate. If T is large, the bias from including such a lagged dependent variable vanishes, and therefore using those commands might be acceptable.

Yes, I did ask you in another thread. My T is large enough T=80 bigger than N=50, but I then reduce T to 10 and either N remains constant (80) or is reduced according to the groups I am considering. I need to consider either ways, full sample and reduced sub-periods by groups.
Now will still be a the bias using those commands be accepted in the subperiods or need to go in another direction, and if so which one?

Considering GMM, I cannot think any of my variables to play the role of instrument, without risking the instrument variables police to arrive!

On another point, I have a model where it says that gpdgrowth =a+(1-a)gdpt-1+etc. In my hypothesis (1-a)Ygdpt-1 is an essential part. Should I still consider the constant term a or I can ommited it?

Finally what will be a way to include individual country effects and time varations in fixed effects model like this? Would clustering by country be enough?

Thanks a million for the suggestions!

Last edited by Giorgio Di Stefano; 24 Mar 2022, 13:02.
Comment

Giorgio Di Stefano

Join Date: Oct 2021
Posts: 154

#12

24 Mar 2022, 13:01

Originally posted by Carlo Lazzaro View Post

Giorgio:
I fail to get what you're after.
As far as your last question is concerned:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage age, fe vce(cluster idcode)

Fixed-effects (within) regression Number of obs = 28,510
Group variable: idcode Number of groups = 4,710

R-squared: Obs per group:
Within = 0.1026 min = 1
Between = 0.0877 avg = 6.1
Overall = 0.0774 max = 15

F(1,4709) = 884.05
corr(u_i, Xb) = 0.0314 Prob > F = 0.0000

(Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .0181349 .0006099 29.73 0.000 .0169392 .0193306
_cons | 1.148214 .0177153 64.81 0.000 1.113483 1.182944
-------------+----------------------------------------------------------------
sigma_u | .40635023
sigma_e | .30349389
rho | .64192015 (fraction of variance due to u_i)
------------------------------------------------------------------------------

. areg ln_wage age, abs(idcode) vce(cluster idcode)

Linear regression, absorbing indicators Number of obs = 28,510
Absorbed variable: idcode No. of categories = 4,710
F(1, 4709) = 738.02
Prob > F = 0.0000
R-squared = 0.6636
Adj R-squared = 0.5970
Root MSE = 0.3035

(Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .0181349 .0006675 27.17 0.000 .0168262 .0194436
_cons | 1.148214 .0193889 59.22 0.000 1.110202 1.186225
------------------------------------------------------------------------------

.


.

Carlo, I asked Sebastian above. Should you wish to comment I would be grateful.
On the xtreg y x1 , fe vce(cluster idcode) and areg are y x1 abs(idcode) vce(cluster idcode) while I can see the results I do not get their differences. And similar to what I asked to Sébastien above, how could I include individual country effects and time varying coefficients in this model?

Comment

Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#13

24 Mar 2022, 16:33

,
What about reghdfe in this case?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17730
#14

25 Mar 2022, 00:58

Giorgio:
-ared- and -xtreg- entries, Stata .pfd manual, clearly report the differences between the two.
I sponsor your point to give the community-contributed module -reghdfe- a shot (provided you're dealing with a N>T dataset).

Kind regards,
Carlo
(Stata 19.0)
Comment
Giorgio Di Stefano

Join Date: Oct 2021

Posts: 154
#15

28 Mar 2022, 13:01

Originally posted by Carlo Lazzaro View Post

Giorgio:
-ared- and -xtreg- entries, Stata .pfd manual, clearly report the differences between the two.
I sponsor your point to give the community-contributed module -reghdfe- a shot (provided you're dealing with a N>T dataset).

Carlo,
I am dealing with T>N, but I switch also to N>T to estimate subperiods.

I have these follow point to ask before ending this topic.

1. If I alternate between xtreg, vce(clustering panelsid) when T.>N and reghdfe when the opposite N>T, will I get significant different results by using two different commnad?

2. How could I get individual -country and time effects without creating a dummy variable and thus eating me , missing a lot of degree of freedom ?

3. .How can I include or capture time varying coefficients in the model ?
Comment

Announcement