Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression preconditions in cross-sectional time-series data.

    Hei!

    I have a dataset that is a hierarchical, cross-sectional time series.

    There are 438 units that all belong to one of the 19 superior clusters. The observations run 46 occations (monthly data e.g: 2012m1, 201510, making 20130 observations.

    I have a question about whether or not the same preconditions exist for simple multivariate linear regression models and multivariate cross-sectional time series models?

    Some of the usual preconditions specified in my literature (which is in the Norwegian Language, so it won't be helpful here), echoed in the dss.princeton.edu/training series for Stata are:
    - normally distributed residuals
    - the abscence of heteroskedasticity
    - the abscence of omitted variable-bias.
    These preconditions are tested with these commands:

    reg Y X1 X2 X3
    estat hettest
    ovtest
    linktest
    swilk


    Are these tests and others that are based on the regress command functional when the loaded data is set to be time series cross-sectional and have run xtset id time?

    I have tried to find the answers in methodology literature (as per Statlists FAQ). Maybe the question is too basic, but how do I test the basic assumptions of the regression methodology when i have xt-data? I know many of them are not relevant when correcting for them (robust standard errors for heteroskedsticity), but I still want to inspect for flaws.

    Alternatively:

    What are the definitive model precondition statistical tests for xt-data that I cannot do without?




    Thanks,
    Andreas Roaldsnes

  • #2
    In theory, the same issues apply to many panel estimators as apply to simple OLS regression. However, I don't know that all of the OLS tests and checks have easy analogues for panel data. xtreg,fe is identical to reg with i.panel as one of the rhs variables. So, if your version of Stata will handle the model size you're estimating and you're interested in a fixed effect estimator, you can run a fixed effect regression using regress i.panel and then use the OLS tests.

    We often use clustered robust standard errors in panel models since heteroskedasticity is so common. You should be aware that with 20,000 observations you could easily have a statistically significant test statistic even though the actual issue is very small.

    Comment


    • #3
      Andeas:
      as an aside to Phil's helpful comments, I would only add that:
      - under -xtreg- (and unlike -regression-) -robust- and -cluster- options for standard errors work the same and they handle not only residual heteroskedasticity but autocorrelation, too;
      - you may also want to investigate quasi-extreme multicollinearity. This task is easily accomplished via -estat vif- under -regression-. Unfortunately, since -xtreg- does not support -estat vif-, you may want to consider -estat vce, corr- in turn.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Thank you!

        I have run the xtserial test and found my model to be just shy of first order serial correlation. The xtserial y x1 x2 x3 (etc) returned 0.065, thereby rejecting the null of first order Serial correlation.
        But should I be worried by not rejecting the null by a clearer margin than 95%? Should the rejection of the null have a higher threshold, lets say at ten percent instead of five percent?

        The model I will run is:

        xtreg y x1 x2 x3, mle vce(bootstrap)

        Am I correct in assuming that vce(bootstrap) will give me robust standard errors and protect me from first order autocorrelation the same way vce(robust) will? It seems that way from most of the documentation I'm reading.


        Thank you again,
        Andreas

        Comment


        • #5
          Andreas:
          -as per -xtserial- outcome, I would not worry about autocorrelation;
          - I cannot say whether bootstrap can shelter you from first order autocorrelation; anyway, you do not seem to have that problem.

          As a sidelight, if you are dealing with a hierarchical model, why not considering -mixed- capabilities?
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            You can try using the xtqptest and xthrtest (available on ssc) to get a second opinion on whether you have serial correlation.

            Note that vce only affects the way your standard errors are calculated. If your model is misspecified (e.g. static instead of dynamic, levels vs differences, trends or not, ...), choosing robust/cluster/bootstrap won't save your estimates. The presence of serial correlation can be an indication of mis-specification (though it is usually ignored, at least in economics).

            Comment


            • #7
              Carlo: I have considered -xtmixed-, but I'm unsure if I have the theoretical justification for running a multilevel model.
              My higher Level is a cluster of counties to which municipalities belong to. Could I still run mulitlevel if I'm unsure of the theoretical reasoning just to protect myself from cross-sectional dependence?

              Jesse: I may be too green here, but I don't understand what static vs dynamic pertains to in model specification. As for Levels vs differences, most of my variables, including the dependent variable, are ratios expressed in percentages (0 min - 1 max), so I Guess they are Levels, not differences.
              Trends or not is more difficult, as -dfuller- test for stationarity does not work with multiple panels (e.g xtreg), which is kind of my fundamental problem. Few of the -regress- postesimations are easily transferable to -xt- data.
              I have considered mis-spesification, though. Any advice?

              Thank you!

              Andreas

              Comment


              • #8
                Andreas:
                you seem to have a theoretical justification to switch to -mixed- (of which -xtmixed- is the ancestor) if your municipalities are nested within counties (the issue might be whether you are dealing with a convenience or a randomized sample of municipalities and/or counties).
                Moreover, -mixed- supports -cluster()- option for standard errors.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Originally posted by Andreas Roaldsnes View Post
                  Carlo: I have considered -xtmixed-, but I'm unsure if I have the theoretical justification for running a multilevel model.
                  My higher Level is a cluster of counties to which municipalities belong to. Could I still run mulitlevel if I'm unsure of the theoretical reasoning just to protect myself from cross-sectional dependence?

                  Jesse: I may be too green here, but I don't understand what static vs dynamic pertains to in model specification. As for Levels vs differences, most of my variables, including the dependent variable, are ratios expressed in percentages (0 min - 1 max), so I Guess they are Levels, not differences.
                  Trends or not is more difficult, as -dfuller- test for stationarity does not work with multiple panels (e.g xtreg), which is kind of my fundamental problem. Few of the -regress- postesimations are easily transferable to -xt- data.
                  I have considered mis-spesification, though. Any advice?

                  Thank you!

                  Andreas
                  In a dynamic model, lagged dependent and/or independent variables are included. In panel models this opens up a whole can of worms, so it's usually avoided. There are many panel stationarity tests, the most popular currently being the Pesaran's CADF test (ssc install pescadf). The interpretation becomes complicated however, as you might have some stationary and some non-stationary panels... Specification tests in panels are hard and difficult to interpret in general.

                  Comment


                  • #10
                    Carlo: Both the municipalities and the counties include all potential members.
                    So there are 438 municipalities (lets call them -id-) and 19 counties (let's call them level2_id).
                    I've been looking for an answer whether or not I need vce(robust) or vce(cluster level2_id) when I run the model using level2_id as the Level 2 in the model?

                    That is:
                    xtset id timevar
                    xtmixed y x1 x2 x3 || level2_id: , mle vce(robust)

                    For Jesse (and Carlo):
                    The model does not include lagged variables yet, but I migth consider toying with it if it proves necessary.
                    Most of the cross-sectional dependence tests show dependence (Pesarans CADF), xtunitroot.
                    But is that cross-sectional dependence relevant when I know the structure of the spatial dependence, as it is very likely (theoretically at least) clustered around the level2 geographical cluster?
                    Does running a multilevel model (with a spatial level2) take care of cross-sectional dependence? And if not, are there any approaches to fixing it? I find none.

                    (Just a reminder, I have no time serial correlation even though I have a large T (46)

                    thanks again!

                    Andreas

                    Comment


                    • #11
                      Originally posted by Andreas Roaldsnes View Post
                      I have run the xtserial test and found my model to be just shy of first order serial correlation. The xtserial y x1 x2 x3 (etc) returned 0.065, thereby rejecting the null of first order Serial correlation.
                      It might sound nit-picking, but you are NOT REJECTING the null of NO serial correlation.
                      https://www.kripfganz.de/stata/

                      Comment


                      • #12
                        Originally posted by Andreas Roaldsnes View Post
                        Carlo: Both the municipalities and the counties include all potential members.
                        So there are 438 municipalities (lets call them -id-) and 19 counties (let's call them level2_id).
                        I've been looking for an answer whether or not I need vce(robust) or vce(cluster level2_id) when I run the model using level2_id as the Level 2 in the model?

                        That is:
                        xtset id timevar
                        xtmixed y x1 x2 x3 || level2_id: , mle vce(robust)

                        For Jesse (and Carlo):
                        The model does not include lagged variables yet, but I migth consider toying with it if it proves necessary.
                        Most of the cross-sectional dependence tests show dependence (Pesarans CADF), xtunitroot.
                        But is that cross-sectional dependence relevant when I know the structure of the spatial dependence, as it is very likely (theoretically at least) clustered around the level2 geographical cluster?
                        Does running a multilevel model (with a spatial level2) take care of cross-sectional dependence? And if not, are there any approaches to fixing it? I find none.

                        (Just a reminder, I have no time serial correlation even though I have a large T (46)

                        thanks again!

                        Andreas
                        The CADF tests for stationarity, not cross-sectional dependence? You need xtcd for that.

                        Comment


                        • #13
                          Originally posted by Sebastian Kripfganz View Post

                          It might sound nit-picking, but you are NOT REJECTING the null of NO serial correlation.
                          Right you are. Damned double negatives.

                          Comment


                          • #14
                            Follow up on the cross-sectional dependence:
                            I've run -xtcsd- checking for cross-sectional dependence and found dependence to be present using every option accompanying the -xtcsd- test (pesaran friedman and frees).
                            This dependence is likely to be spatial given the data in my model. The Level 1 municipalities are clustered into Level 2 states. There is no random sample, every state and municipality is included.

                            Carlo Lazzaro: You advocated a multi level model before. Is it possible to argue that the (spatial) level2 part of a multi Level model will correct for the dependence?
                            My model is not dynamic (which is supposed to be the worst for CSD), but the consequences of not correcting fro CSD seems severe according to De Hoyos and Sarafidis in feb 2006 Stata Journal. It seems both significance testing and coefficients will be off.

                            If a multi level model is not the answer in itself, is there some kind of correction procedure recommended? I have considered -xtgls- and -xtpcse-, but they only work With fixed effects models, but my Hausman-test was clear that a random effects model was preferable, barring me from those solutions.

                            As always: thanks for the imput.

                            Andreas

                            Comment


                            • #15
                              It largely depends on your field what you should do. It's quite likely no one in your field has ever heard of cross sectional dependence. At least AFAIK, most economics papers for example don't mention it, even though they almost definitely suffer from it. The most popular technique currently is to add cross sectional averages to your equation, this is the "CCEP" estimator, aka the common correlated effects pooled estimator.

                              Comment

                              Working...
                              X