Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best-suited estimator for a large N, small T panel that controls for heteroscedasticity, serial correlation and spatial dependence?

    Dear Statalist users,

    I have several questions regarding the best approach to estimate the following model. I am estimating the effect of temperature on the amount of loans provided to enterprises. Data are aggregated at the municipal level in a single country (there is no bilateral relationship enterprise-bank) in my dataset. Temperature data are split between a trend (slow-moving) and deviation from trend (iid) obtained in an earlier step by decomposing the observed temperature series at the municipal level with a Local Level model in state-space form. I also add control variables at the national level to capture the overall determinants of credit. Credit is trending upwards and its properties resemble GDP (fairly smooth, but with some inter-annual variation).

    The dimensions of the dataset are: N=7040, T=35.

    The abridged model is therefore:
    Code:
    Credit_it = L1.Credit_it Temperature_trend_it Temperature_deviation_it L1.GDPnational_t
    I have already checked that the model requires FE (and I consider only id FE). I am, however, a bit confused as to the best estimator and variance estimator to use in my case.

    In a nutshell, my question is: what would be the best estimator and variance estimator for a panel dataset with N=7040 and T=35 that controls for heteroscedasticity, serial correlation and potentially spatial dependence where the model includes a lagged dependent and some regressors at a higher degree of aggregation than the dependent?

    I break down the big question in several:
    1. According to what I have read, some estimators and variance estimators are better suited than others depending on how large or small N and T are. What is the rule of thumb in that context? I am fairly confident that N=7040 is large, but what about T=35?
    2. Accordingly, my reading seems to point that
      Code:
      xtreg credit l.credit temptrend tempsdeviation l.GDPnational,fe vce(cluster municipality)
      would be the best overall choice because it controls for the heteroscedasticity and autocorrelation at the same time.
      As a side note, I have checked the value of the autocorrelation of residuals in each municipality with the following:
      1. Estimate the model with
        Code:
        credit l.credit temptrend tempsdeviation l.GDPnational,fe vce(cluster municipality)
      2. Generate the residuals with
        Code:
        predict residuals, res
      3. For each municipality (panel id), obtain the autocorrelation coefficient via
        Code:
        regress residuals L.residuals
      4. Most of the residuals suffer from autocorrelation, and is more or less the same for all municipalities within the same province.
    3. However, having detected autocorrelation, I now wonder whether I should take this into account at the estimation level. That is, should I use xtregar, xtgee?
    4. Does the fact that I use regressors at the national level (merely for control) introduce some kind of cross-sectional correlation in my model? If so, what would be the solution?
    5. If I have cross-sectional dependence, do you suggest I model this with a spatial model (at least in the error term) or that I apply Driscoll and Kraay standard errors via xtscc? The latter solution causes my standard errors to inflated substantially, but I am not sure of the structure in the errors, if any, the xtscc imposes/allows.
    6. Given the size of my dataset and the presence of a lagged dependent in the regressors, is the Nickell bias a potential problem? If so, should I use xtdpdbc? This method, however, requires serially-uncorrelated errors, which I do not have.
    Thanks a lot in advance.

    Best regards,
    Olivier.
    Last edited by sladmin; 04 Oct 2022, 09:20. Reason: reformat text

  • #2
    [I am sorry to repost the same part, but it seems that there was a display error with the previous part and I am unable to edit my own post]
    1. According to what I have read, some estimators and variance estimators are better suited than others depending on how large or small N and T are. What is the rule of thumb in that context? I am fairly confident that N=7040 is large, but what about T=35?
    2. Accordingly, my reading seems to point that xtreg […],fe vce(cluster municipality) would be the best overall choice because it controls for the heteroscedasticity and autocorrelation at the same time.
      I have checked the value of autocorrelation of residuals in each municipality with the following:
      1. Estimate the model with xtreg […],fe vce(cluster municipality)
      2. Generate the residuals with predict residuals, res
      3. For each municipality (panel id), obtain the autocorrelation coefficient via regress residuals L.residuals
      4. Most of the residuals suffer from autocorrelation, and is more or less the same for all municipalities within the same province.
    3. However, having detected autocorrelation, I now wonder whether I should take this into account at the estimation level. That is, should I use xtregar, xtgee?
    4. Does the fact that I use regressors at the national level (merely for control) introduce some kind of cross-sectional correlation in my model? If so, what would be the solution?
    5. If I have cross-sectional dependence, do you suggest I model this with a spatial model (at least in the error term) or that I apply Driscoll and Kraay standard errors via xtscc? The latter solution causes my standard errors to inflated substantially, but I am not sure of the structure in the errors, if any, the xtscc imposes/allows.
    6. Given the size of my dataset and the presence of a lagged dependent in the regressors, is the Nickell bias a potential problem? If so, should I use xtdpdbc? This method, however, requires serially-uncorrelated errors, which I do not have.
    Thanks a lot in advance.

    Best regards,
    Olivier.

    Comment


    • #3
      Lagging the DV requires special attention.

      HTML Code:
      https://www.statalist.org/forums/forum/general-stata-discussion/general/1379035-lagging-one-period-of-the-dependent-variable-in-panel-data-what-model-could-i-use
      https://www.cafed.sssup.it/~federico/etrix_allievi/Baum_on_GMM.pdf

      Comment


      • #4
        If you still have serially dependent errors after including a lagged dependent variable, you could add higher-order lags of the dependent variable and/or lags of independent variables to obtain a dynamically complete model. T=35 seems fairly large. Unless your variables are quite strongly dependent over time, you may not have to worry too much about the Nickell bias.

        If anything, regressors at the national level reduce the remaining cross-sectional dependence in your errors.

        If you have a good theoretical reason to model cross-sectional dependence with a spatial model, go for it. Otherwise, Driscoll-Kraay standard errors might do. If those standard errors appear inflated, this might just be a sign that the conventional standard errors are too small.
        https://www.kripfganz.de/stata/

        Comment


        • #5
          Thank you all for your replies.

          Comment

          Working...
          X