Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtdcce2 - Help with using cross-sectional averages

    I have the following code working as intended:
    Code:
    xtdcce2 d.growth d.l(growth) d.l(0/1)(gfcfgdp employmentgrowth yearsedugrowth), ///
        lr(l.growth gfcfgdp employmentgrowth yearsedugrowth l.lngdpph) ///
        p(l.growth gfcfgdp employmentgrowth yearsedugrowth l.lngdpph) nocross
    However, the results show that many variables one would expect to be significant are not, for example human capital growth in the long run:
    Code:
    Units (cno) to be removed due to insufficient numbers of observations:  17
    (Dynamic) Common Correlated Effects Estimator - Pooled Mean Group (CS-ECM)
    
    Panel Variable (i): cno                               Number of obs     =       1166
    Time Variable (t): year                               Number of groups  =         31
    
    Degrees of freedom per group:                         Obs per group:    
     without cross-sectional avg. min   = 4                             min =         17
                                  max   = 32                            avg =         40
     with cross-sectional avg.    min   = 4                             max =         45
                                  max   = 32
    Number of                                             F(253, 913)       =       9.06
     cross-sectional lags               none              Prob > F          =       0.00
     variables in mean group regression = 222             R-squared         =       0.28
     variables partialled out           = 31              Adj. R-squared    =       0.09
                                                          Root MSE          =       1.93
                                                          CD Statistic      =      37.05
                                                             p-value        =     0.0000
    ------------------------------------------------------------------------------------
                D.growth|     Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    --------------------+---------------------------------------------------------------
     Short Run Est.    |
    --------------------+---------------------------------------------------------------
       Mean Group:      |
               LD.growth| -.0538945   .0441897   -1.22    0.223     -.1405047   .0327157
               D.gfcfgdp|  .8163549   .0822089    9.93    0.000      .6552285   .9774814
              LD.gfcfgdp| -.0743282   .0610556   -1.22    0.223     -.1939949   .0453385
      D.employmentgrowth|  .3011228    .067428    4.47    0.000      .1689662   .4332793
     LD.employmentgrowth|  .0174869   .0429406    0.41    0.684     -.0666751   .1016489
        D.yearsedugrowth| -.2871611   .6437639   -0.45    0.656     -1.548915    .974593
       LD.yearsedugrowth|  .6205096   .4298602    1.44    0.149     -.2220009    1.46302
    --------------------+---------------------------------------------------------------
     Adjust. Term       |
    --------------------+---------------------------------------------------------------
       Pooled:          |
                L.growth| -.7996822   .1247142   -6.41    0.000     -1.044118  -.5552469
    --------------------+---------------------------------------------------------------
     Long Run Est.      |
    --------------------+---------------------------------------------------------------
       Pooled:          |
                 gfcfgdp|  .0392039   .0725913    0.54    0.589     -.1030724   .1814803
        employmentgrowth|  .3290876   .1407156    2.34    0.019      .0532901   .6048852
          yearsedugrowth|  .9766148   .9375461    1.04    0.298     -.8609419   2.814171
               L.lngdpph| -2.366211   .5916889   -4.00    0.000       -3.5259  -1.206522
    ------------------------------------------------------------------------------------
    Pooled Variables: L.growth gfcfgdp employmentgrowth yearsedugrowth L.lngdpph
    Mean Group Variables: LD.growth D.gfcfgdp LD.gfcfgdp D.employmentgrowth LD.employmentgrowth D.yearsedugrowt
    > h LD.yearsedugrowth
    Long Run Variables: gfcfgdp employmentgrowth yearsedugrowth L.lngdpph
    Cointegration variable(s): L.growth
    Heterogenous constant partialled out.
    Upon further reading I realised that cross-sectional dependence may be a real problem in this dataset. For example, a common shock in this time period which is not controlled for is the 2007 Financial Crisis. I have been trying to use the command for testing for cross-sectional dependence below:
    Code:
    xtcse2, res
    which outputs:
    Code:
    Cross-Sectional Dependence Exponent Estimation and Test
    
    Panel Variable (i): cno
    Time Variable (t): year
    
    Estimation of Cross-Sectional Exponent (alpha)
    ----------------------------------------------------------------
           variable|     alpha   Std. Err.    [95% Conf. Interval]
    ---------------+------------------------------------------------
          residuals|   .549541          .            .           .
    ----------------------------------------------------------------
    0.5 <= alpha < 1 implies strong cross-sectional dependence.
    SE and CI not available. Use option reps() to bootstrap SE and CI.
    Residuals calculated using  predict, residuals from xtdcce2.
    Unbalanced panel detected, test adjusted.
     Missing values imputed for CD*.
    
    Testing for weak cross-sectional dependence (CSD)
       H0: weak cross-section dependence
       H1: strong cross-section dependence
    ------------------------------------------------------------------------
                   |    CD            CDw           CDw+          CD*
    ---------------+--------------------------------------------------------
    residuals      |    37.05        1.23        955.96         20.43
                   |  (0.000)     (0.220)       (0.000)       (0.000)
    ------------------------------------------------------------------------
    p-values in parenthesis.
    So this tells me I have cross-sectional dependence and should add cross-sectional averages to the command. However, running below causes the estimate of alpha to actually increase!
    Code:
    xtdcce2 d.growth d.l(growth) d.l(0/1)(gfcfgdp employmentgrowth yearsedugrowth), ///
        lr(l.growth gfcfgdp employmentgrowth yearsedugrowth l.lngdpph) ///
        p(l.growth gfcfgdp employmentgrowth yearsedugrowth l.lngdpph) cr(growth l.lngdpph) cr_lags(3)
    xtcse2, res lags(3)
    generates
    Code:
    Cross-Sectional Dependence Exponent Estimation and Test
    
    Panel Variable (i): cno
    Time Variable (t): year
    
    Estimation of Cross-Sectional Exponent (alpha)
    ----------------------------------------------------------------
           variable|     alpha   Std. Err.    [95% Conf. Interval]
    ---------------+------------------------------------------------
          residuals|   .7663756          .            .           .
    ----------------------------------------------------------------
    0.5 <= alpha < 1 implies strong cross-sectional dependence.
    SE and CI not available. Use option reps() to bootstrap SE and CI.
    Residuals calculated using  predict, residuals from xtdcce2.
    Unbalanced panel detected, test adjusted.
     Missing values imputed for CD*.
    
    Testing for weak cross-sectional dependence (CSD)
       H0: weak cross-section dependence
       H1: strong cross-section dependence
    ------------------------------------------------------------------------
                   |    CD            CDw           CDw+          CD*
    ---------------+--------------------------------------------------------
    residuals      |     1.64       -2.10        459.23          7.52
                   |  (0.101)     (0.036)       (0.000)       (0.000)
    ------------------------------------------------------------------------
    p-values in parenthesis.
    Could someone please explain what the problem here is? How do I know which variables to include cross-sectional averages for?
    Last edited by Bailey Stringer; 14 Apr 2023, 06:43.

  • #2
    Hi Bailey,
    indeed in your specification I would be worried about strong cross-sectional dependence. Regarding your second model, I would suggest to add some of the other covariates as cross-section averages and see if a) the values of the CD test improve and b) regression results improve.
    To be very honest, I would not put too much emphasis on the value of the exponent and advise to do a bootstrap to obtain confidence intervals.
    Hope this helps.
    Best,
    Jan

    Comment


    • #3
      Hi Jan, thanks for responding. I have started using the exponent option at the end of the xtdcce2 command, and basing my decisions off the CD-statistic and corresponding p-value. The CD statistic here does in fact reduce when adding cross-sectional averages.

      I still run into the problem that there is simply not enough data for T^1/3 cr lags, but this is simply a data issue. Honestly, I think all of my problems so far in this project can easily be written down as data issues. Have you done any work with Monte Carlo simulations for ECM with PMG estimators? I remember finding that even 5,000 observations might not be enough for a random time series with some serial correlation to be mean-reverting while exploring the concept last year, so in comparison 1166 observations feels pitifully small.

      To be clear, I consider this thread to be solved. I ask mostly out of curiosity. Thanks!

      Comment


      • #4
        Indeed, the time dimension might be problematic, though you "only" add 3 lags of the cross-section averages. I think I would try to find a balanced panel or remove those units with a small number of observations over time. This might improve results. I haven't done any simulations of the CS-ECM or PMG estimations, but if I recall correctly, they are in the original papers. Due to the time dynamics, it wont be an easy task.

        Comment


        • #5
          Hi Jan,

          I followed your advice on removing those units with a small number of observations over time to make the panel more balanced. This is something I should have already considered since I want the cross-section to be the same for each regression I run for easier comparison. However, in doing so and looking over my new estimates a thought has occurred to me.

          I am trying to estimate 200 variables (heterogeneous short run coefficients and constants, homogenous long run coefficients, and cross-sectional averages) with a panel with 577 observations. Surely I am right in thinking that this is obviously absurd? No wonder my estimates are giving me so much grief. To clarify, I am attempting to replicate the results of a paper publish in the Canadian Journal of Economics in 2015, so it was not my idea to estimate the panel in this way. If my logic above is correct, this leaves me in a bit of a sticky situation, deadline wise...

          Thanks for your help so far.

          Comment


          • #6
            This is indeed a problem common for models with heterogeneous slope coefficients. However it is essentially a time series regression and your degree of freedom is around 30-40 for the individual time series, so it is not that bad.

            Since it is a replication I would try to keep as close as possible to the publication but discuss the shortcomings.

            Comment


            • #7
              Jan, is serial correlation a problem with this estimation, or does PMG account for this? I have read some contradictory answers, but I think Pesaran (2006) states that PMG is consistent in the presence of serial correlation, so this is not an issue. Thanks.

              Comment

              Working...
              X