xtreg, xtgls, xtpcse, OR xtregar?

Sylwia JPiatkowska

Join Date: Sep 2017

Posts: 23
#1

xtreg, xtgls, xtpcse, OR xtregar?

22 Nov 2018, 15:33

Hello,

I am trying to estimate the effect of state law on number of crashes relative to total vehicle miles traveled. To assess the effect of implementation of the law, I created a dummy variable law (1=presence of the law; 0=absence of the law). My unit of analysis is state and I have pooled cross-sectional time-series data. My N=50 and My T=29 (1985-2014). I have about 1500 observations. It appears that there is a significant auto-correlation in the dataset based on the Wooldridge test (p<.000), hereoskedasticity, and contemporaneous correlation. Data are stationary.

Here are the models that I consider:

Code:

xtgls crashes_r_vtm l.law year Alabama - Wyoming , panels (heteroskedastic) corr(ar1) force

Code:

xtreg crashes_r_vtm l.law year , fe cluster (state)

Code:

xtregar crashes_r_vtm l.law , fe

Would you please help me to decide what command is the best for my N and T. I have read some previous discussions that if N>T, xtreg is more appropriate. Would be that the case in my data even if my T=29? The results for xtgls and xtreg for dummy variable l.law are significant, but they are not significant for xtregar.
How can I decide if I should use xtregar?
I read on forum that
"xtregar- are recommended whenever you have a T>N panel data structure, when the autocorrelation preocess is AR1 (something unfeasible with -xtreg-)." Would that mean that I should choose xtreg with cluster errors? (again there is significant autocorrelation).

Finally, my second variable is drug crashes relative to total vehicle traveled. This measure is highly positively skewed. Should I use a log term of this variable?
Thank you so much for your help.

Sincerely,
Sylwia
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

23 Nov 2018, 07:41

Sylwia:
as the T dimension of your panel dataset is not negligible, I would consider -xtgls- or -xtregar- (with -fe- or -re- specification) regardless the statistical significance of your preedictors (which should not be considered the goal of any inference procedure).

Kind regards,
Carlo
(Stata 19.0)
Comment

Sylwia JPiatkowska

Join Date: Sep 2017
Posts: 23

24 Nov 2018, 18:02

Dear Carlo,

Thank you so much for your quick response. It truly helps a lot.

I would like to follow up a little with xtgls model.

1) Since there is a significant autocorrelation and I will include an option corr (ar1), would you suggest that I ALSO include time trend (year) in the model? How I can decide if I should include time trend? Or should I include fixed-effects for time instead?

2) As I mentioned, my second dependent variable variables is positively skewed. Would you suggest logging this measure?

3) Finally, I have just seen a similar study in this forum. The author used difference-in-difference approach instead.
I tried to follow his study. I created a dummy variable "Intervention". If a state received a law at any time "Intervention"=1, if a state did not received a law at all "Intervention"=0. Again, my variable "law" indicates years that a state received a law.Then, I created an "interaction" between "Intervention" and "law". However, while "interaction
is in the model, the "law" and "Intervention" have been dropped (see below). Would you please advise if this approach is correct and if it is more suitable for my study than xtgls?

Code:

. xtreg crashes_r_vtm interaction Intervention law year,fe cluster(state)
note: Intervention omitted because of collinearity
note: law omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =      1,600
Group variable: state                           Number of groups  =         50

R-sq:                                           Obs per group:
     within  = 0.7718                                         min =         32
     between = 0.0018                                         avg =       32.0
     overall = 0.4664                                         max =         32

                                                F(2,49)           =     326.38
corr(u_i, Xb)  = -0.0323                        Prob > F          =     0.0000

                                 (Std. Err. adjusted for 50 clusters in state)
------------------------------------------------------------------------------
             |               Robust
crashes_r_~m |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 interaction |  -.1759452   .0342623    -5.14   0.000    -.2447978   -.1070926
Intervention |          0  (omitted)
         law |          0  (omitted)
        year |  -.0352209   .0017007   -20.71   0.000    -.0386387   -.0318032
       _cons |   72.01776   3.396325    21.20   0.000     65.19259    78.84293
-------------+----------------------------------------------------------------
     sigma_u |  .32214032
     sigma_e |   .1941473
         rho |  .73355607   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Thank you so much.
Sylwia

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

25 Nov 2018, 05:28

Sylwia:
1) -(ar1) is OK if you think/suspect the -ar1- process is the same across all panels; otherwise, you should consider -psar1- option. Time trend is probably more interesting in T>N panel data regressions. I woud also check whether a squared terms for time exist via the following chunck of code:

Code:

c.time##c.time

2) there's no pre-requirement concerning the normality of the regressand. Logging the dependent variable makes sense if your gola is a log-linear regression model and/or if logging fix omitted variable bias and/or heteroskedasticity.
3) admittedly, I'm not really familiar with DID. However, I would still stick with -xtgls- due to your T>N panel data structure.

Kind regards,
Carlo
(Stata 19.0)
Comment
Sylwia JPiatkowska

Join Date: Sep 2017

Posts: 23
#5

25 Nov 2018, 12:56

Thank you very much, Carlo. I really appreciate your help!
Comment
Sidra Ilyas

Join Date: Feb 2017

Posts: 62
#6

16 Apr 2019, 15:52

Dear Carlo,

In the situation given by Sylwia above, her N=50 and T=29 and you say that because T dimension of the panel dataset is not negligible, you would consider -xtgls- or -xtregar-

My question is that, doesn't it make the choice more complicated? How to determine whether one's T dimension is negligible or not?

In a lot of literature, I see that xtgls is ruled out when N>T, what would be your expert opinion about it?

Thank you.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

16 Apr 2019, 23:36

Sydra:
unfortunately, there's no hard and fast rule about the "right" T dimension that makes a short panel long.
Most depends on the autocorrelation issue, that needs to be modeled as T dimension increases.

Kind regards,
Carlo
(Stata 19.0)
Comment
Bhavna Gupta

Join Date: May 2020

Posts: 1
#8

04 Jun 2020, 07:38

Hello,

I have a short unbalanced panel data, N=82 and T=5. Unit of analysis is firm. There is significant heteroskedasticity and auto-correlation based on wooldridge and modified wald test. According to Hausman test, fe is suitable. But because of autocorrelation and heteroskedasticity in my data, i considered xtgls:

xtgls y1 x1 x2 x3 x4 c1 c2 c3 c4

Cross-sectional time-series FGLS regression

Coefficients: generalized least squares
Panels: homoskedastic
Correlation: no autocorrelation

Please help me to decide which command is best given my short unbalanced panel data - xtreg (fe or re), xtgls or xtpcse? The results for xtgls and xtpcse pairwise are significant, but they are not for xtpcse.

How can I decide which estimation to go for? What are the steps I should follow? For all my IVs between variation > within variation.

This is the first time I'm doing panel data modelling.

Best regards,
Bhavna
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#9

04 Jun 2020, 11:16

Bhavna:
welcome to this forum.
If, in a short (ie, N>T) panel dataset you detect heteroskedasticity and/or autocorrelation, you can simply invoke -robust- or -vce(cluster clusterid)- standard errors to handle the issue.
Unfortunately, -hausman- does not support non-default standard errors, and you have to switch to the user-written command -xtoverid- to test which specification fits your data better.

Kind regards,
Carlo
(Stata 19.0)
Comment
sedki zn

Join Date: May 2017

Posts: 208
#10

20 Jul 2020, 17:49

Dear Carlo Lazzaro

i hope you are doing good

i would like to ask you about your last intervention in this section ( #9)

having N>T and i would like to invoke -robust-

i run it with the "reg" command or i with the "xtreg" command ??

best regards
SEDKI
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#11

21 Jul 2020, 00:27

Sedki:
I do hope you're well, too.
If you have a panel dataset, your first choice should be -xtreg-.
Assuming that you are going to run a -re- specification model (the default in -xtreg-), your codes can be:

Code:

xtreg <depvar> <indepvars> <controls>, vce(cluster <panelid>)

or

Code:

xtreg <depvar> <indepvars> <controls>, robust

as both non-default standard error options do the same job under -xtreg-.

As an aside, what above holds for -fe- specification, too.

Kind regards,
Carlo
(Stata 19.0)
Comment
sedki zn

Join Date: May 2017

Posts: 208
#12

21 Jul 2020, 16:37

Dear Carlo

thank you very much

kind regards,
SEDKI
Comment
sedki zn

Join Date: May 2017

Posts: 208
#13

23 Jul 2020, 16:29

Dear Carlo Lazzaro

i'd like to know if in case of heteroskedasticity issue, we systematically have to run the overidentification test instead of the hausman test to choose between fe and re effect ??

kind regards
SEDKI
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#14

23 Jul 2020, 23:11

Sedki:
yes, because -hausman- does not support non-default standard errors.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Dhulika Arora

Join Date: Jul 2021

Posts: 4
#15

20 Jul 2021, 01:13

Dear Carlo Lazzaro sir,I came across this thread.I have unbalanced panel with N=11 and T=17(maximum) which is case of long panels.
1.I have applied xtgls,xtregar( fe and re),xtpcse and xtscc (pooled and fe) commands.But,I am not sure which one of these I should rely upon?
2.Also,my model has heteroskedasticity.What test I can do to choose between FE_REGAR and RE_REGAR if not hausman?
3.Further,I would like to use dynamic modelling and I cannot apply cointegration since my variables are not I(1).I applied Panel ARDL and estimated the model using PMG but the estimates are not consistent.Can you suggest why this may be the issue?
Thanks.
Comment

Announcement