Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtreg, xtgls, xtpcse, OR xtregar?

    Hello,

    I am trying to estimate the effect of state law on number of crashes relative to total vehicle miles traveled. To assess the effect of implementation of the law, I created a dummy variable law (1=presence of the law; 0=absence of the law). My unit of analysis is state and I have pooled cross-sectional time-series data. My N=50 and My T=29 (1985-2014). I have about 1500 observations. It appears that there is a significant auto-correlation in the dataset based on the Wooldridge test (p<.000), hereoskedasticity, and contemporaneous correlation. Data are stationary.

    Here are the models that I consider:
    Code:
    xtgls crashes_r_vtm l.law year Alabama - Wyoming ,  panels (heteroskedastic) corr(ar1) force
    Code:
    xtreg crashes_r_vtm l.law year  , fe cluster (state)
    Code:
    xtregar crashes_r_vtm l.law   , fe
    Would you please help me to decide what command is the best for my N and T. I have read some previous discussions that if N>T, xtreg is more appropriate. Would be that the case in my data even if my T=29? The results for xtgls and xtreg for dummy variable l.law are significant, but they are not significant for xtregar.
    How can I decide if I should use xtregar?
    I read on forum that
    "xtregar- are recommended whenever you have a T>N panel data structure, when the autocorrelation preocess is AR1 (something unfeasible with -xtreg-)." Would that mean that I should choose xtreg with cluster errors? (again there is significant autocorrelation).


    Finally, my second variable is drug crashes relative to total vehicle traveled. This measure is highly positively skewed. Should I use a log term of this variable?
    Thank you so much for your help.

    Sincerely,
    Sylwia

  • #2
    Sylwia:
    as the T dimension of your panel dataset is not negligible, I would consider -xtgls- or -xtregar- (with -fe- or -re- specification) regardless the statistical significance of your preedictors (which should not be considered the goal of any inference procedure).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo,

      Thank you so much for your quick response. It truly helps a lot.

      I would like to follow up a little with xtgls model.

      1) Since there is a significant autocorrelation and I will include an option corr (ar1), would you suggest that I ALSO include time trend (year) in the model? How I can decide if I should include time trend? Or should I include fixed-effects for time instead?

      2) As I mentioned, my second dependent variable variables is positively skewed. Would you suggest logging this measure?

      3) Finally, I have just seen a similar study in this forum. The author used difference-in-difference approach instead.
      I tried to follow his study. I created a dummy variable "Intervention". If a state received a law at any time "Intervention"=1, if a state did not received a law at all "Intervention"=0. Again, my variable "law" indicates years that a state received a law.Then, I created an "interaction" between "Intervention" and "law". However, while "interaction
      is in the model, the "law" and "Intervention" have been dropped (see below). Would you please advise if this approach is correct and if it is more suitable for my study than xtgls?


      Code:
      . xtreg crashes_r_vtm interaction Intervention law year,fe cluster(state)
      note: Intervention omitted because of collinearity
      note: law omitted because of collinearity
      
      Fixed-effects (within) regression               Number of obs     =      1,600
      Group variable: state                           Number of groups  =         50
      
      R-sq:                                           Obs per group:
           within  = 0.7718                                         min =         32
           between = 0.0018                                         avg =       32.0
           overall = 0.4664                                         max =         32
      
                                                      F(2,49)           =     326.38
      corr(u_i, Xb)  = -0.0323                        Prob > F          =     0.0000
      
                                       (Std. Err. adjusted for 50 clusters in state)
      ------------------------------------------------------------------------------
                   |               Robust
      crashes_r_~m |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
       interaction |  -.1759452   .0342623    -5.14   0.000    -.2447978   -.1070926
      Intervention |          0  (omitted)
               law |          0  (omitted)
              year |  -.0352209   .0017007   -20.71   0.000    -.0386387   -.0318032
             _cons |   72.01776   3.396325    21.20   0.000     65.19259    78.84293
      -------------+----------------------------------------------------------------
           sigma_u |  .32214032
           sigma_e |   .1941473
               rho |  .73355607   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------


      Thank you so much.
      Sylwia

      Comment


      • #4
        Sylwia:
        1) -(ar1) is OK if you think/suspect the -ar1- process is the same across all panels; otherwise, you should consider -psar1- option. Time trend is probably more interesting in T>N panel data regressions. I woud also check whether a squared terms for time exist via the following chunck of code:
        Code:
        c.time##c.time
        2) there's no pre-requirement concerning the normality of the regressand. Logging the dependent variable makes sense if your gola is a log-linear regression model and/or if logging fix omitted variable bias and/or heteroskedasticity.
        3) admittedly, I'm not really familiar with DID. However, I would still stick with -xtgls- due to your T>N panel data structure.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you very much, Carlo. I really appreciate your help!

          Comment


          • #6
            Dear Carlo,

            In the situation given by Sylwia above, her N=50 and T=29 and you say that because T dimension of the panel dataset is not negligible, you would consider -xtgls- or -xtregar-

            My question is that, doesn't it make the choice more complicated? How to determine whether one's T dimension is negligible or not?

            In a lot of literature, I see that xtgls is ruled out when N>T, what would be your expert opinion about it?

            Thank you.

            Comment


            • #7
              Sydra:
              unfortunately, there's no hard and fast rule about the "right" T dimension that makes a short panel long.
              Most depends on the autocorrelation issue, that needs to be modeled as T dimension increases.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Hello,

                I have a short unbalanced panel data, N=82 and T=5. Unit of analysis is firm. There is significant heteroskedasticity and auto-correlation based on wooldridge and modified wald test. According to Hausman test, fe is suitable. But because of autocorrelation and heteroskedasticity in my data, i considered xtgls:

                xtgls y1 x1 x2 x3 x4 c1 c2 c3 c4

                Cross-sectional time-series FGLS regression

                Coefficients: generalized least squares
                Panels: homoskedastic
                Correlation: no autocorrelation

                Please help me to decide which command is best given my short unbalanced panel data - xtreg (fe or re), xtgls or xtpcse? The results for xtgls and xtpcse pairwise are significant, but they are not for xtpcse.

                How can I decide which estimation to go for? What are the steps I should follow? For all my IVs between variation > within variation.

                This is the first time I'm doing panel data modelling.

                Best regards,
                Bhavna

                Comment


                • #9
                  Bhavna:
                  welcome to this forum.
                  If, in a short (ie, N>T) panel dataset you detect heteroskedasticity and/or autocorrelation, you can simply invoke -robust- or -vce(cluster clusterid)- standard errors to handle the issue.
                  Unfortunately, -hausman- does not support non-default standard errors, and you have to switch to the user-written command -xtoverid- to test which specification fits your data better.
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Dear Carlo Lazzaro

                    i hope you are doing good

                    i would like to ask you about your last intervention in this section ( #9)

                    having N>T and i would like to invoke -robust-

                    i run it with the "reg" command or i with the "xtreg" command ??

                    best regards
                    SEDKI

                    Comment


                    • #11
                      Sedki:
                      I do hope you're well, too.
                      If you have a panel dataset, your first choice should be -xtreg-.
                      Assuming that you are going to run a -re- specification model (the default in -xtreg-), your codes can be:
                      Code:
                      xtreg <depvar> <indepvars> <controls>, vce(cluster <panelid>)
                      or

                      Code:
                      xtreg <depvar> <indepvars> <controls>, robust
                      as both non-default standard error options do the same job under -xtreg-.

                      As an aside, what above holds for -fe- specification, too.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Dear Carlo

                        thank you very much

                        kind regards,
                        SEDKI

                        Comment


                        • #13
                          Dear Carlo Lazzaro

                          i'd like to know if in case of heteroskedasticity issue, we systematically have to run the overidentification test instead of the hausman test to choose between fe and re effect ??

                          kind regards
                          SEDKI

                          Comment


                          • #14
                            Sedki:
                            yes, because -hausman- does not support non-default standard errors.
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              Dear Carlo Lazzaro sir,I came across this thread.I have unbalanced panel with N=11 and T=17(maximum) which is case of long panels.
                              1.I have applied xtgls,xtregar( fe and re),xtpcse and xtscc (pooled and fe) commands.But,I am not sure which one of these I should rely upon?
                              2.Also,my model has heteroskedasticity.What test I can do to choose between FE_REGAR and RE_REGAR if not hausman?
                              3.Further,I would like to use dynamic modelling and I cannot apply cointegration since my variables are not I(1).I applied Panel ARDL and estimated the model using PMG but the estimates are not consistent.Can you suggest why this may be the issue?
                              Thanks.

                              Comment

                              Working...
                              X