xtdcce2 - Help with using cross-sectional averages

Bailey Stringer

Join Date: Apr 2023
Posts: 16

xtdcce2 - Help with using cross-sectional averages

14 Apr 2023, 06:37

I have the following code working as intended:

Code:

xtdcce2 d.growth d.l(growth) d.l(0/1)(gfcfgdp employmentgrowth yearsedugrowth), ///
    lr(l.growth gfcfgdp employmentgrowth yearsedugrowth l.lngdpph) ///
    p(l.growth gfcfgdp employmentgrowth yearsedugrowth l.lngdpph) nocross

However, the results show that many variables one would expect to be significant are not, for example human capital growth in the long run:

Code:

Units (cno) to be removed due to insufficient numbers of observations:  17
(Dynamic) Common Correlated Effects Estimator - Pooled Mean Group (CS-ECM)

Panel Variable (i): cno                               Number of obs     =       1166
Time Variable (t): year                               Number of groups  =         31

Degrees of freedom per group:                         Obs per group:    
 without cross-sectional avg. min   = 4                             min =         17
                              max   = 32                            avg =         40
 with cross-sectional avg.    min   = 4                             max =         45
                              max   = 32
Number of                                             F(253, 913)       =       9.06
 cross-sectional lags               none              Prob > F          =       0.00
 variables in mean group regression = 222             R-squared         =       0.28
 variables partialled out           = 31              Adj. R-squared    =       0.09
                                                      Root MSE          =       1.93
                                                      CD Statistic      =      37.05
                                                         p-value        =     0.0000
------------------------------------------------------------------------------------
            D.growth|     Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------------+---------------------------------------------------------------
 Short Run Est.    |
--------------------+---------------------------------------------------------------
   Mean Group:      |
           LD.growth| -.0538945   .0441897   -1.22    0.223     -.1405047   .0327157
           D.gfcfgdp|  .8163549   .0822089    9.93    0.000      .6552285   .9774814
          LD.gfcfgdp| -.0743282   .0610556   -1.22    0.223     -.1939949   .0453385
  D.employmentgrowth|  .3011228    .067428    4.47    0.000      .1689662   .4332793
 LD.employmentgrowth|  .0174869   .0429406    0.41    0.684     -.0666751   .1016489
    D.yearsedugrowth| -.2871611   .6437639   -0.45    0.656     -1.548915    .974593
   LD.yearsedugrowth|  .6205096   .4298602    1.44    0.149     -.2220009    1.46302
--------------------+---------------------------------------------------------------
 Adjust. Term       |
--------------------+---------------------------------------------------------------
   Pooled:          |
            L.growth| -.7996822   .1247142   -6.41    0.000     -1.044118  -.5552469
--------------------+---------------------------------------------------------------
 Long Run Est.      |
--------------------+---------------------------------------------------------------
   Pooled:          |
             gfcfgdp|  .0392039   .0725913    0.54    0.589     -.1030724   .1814803
    employmentgrowth|  .3290876   .1407156    2.34    0.019      .0532901   .6048852
      yearsedugrowth|  .9766148   .9375461    1.04    0.298     -.8609419   2.814171
           L.lngdpph| -2.366211   .5916889   -4.00    0.000       -3.5259  -1.206522
------------------------------------------------------------------------------------
Pooled Variables: L.growth gfcfgdp employmentgrowth yearsedugrowth L.lngdpph
Mean Group Variables: LD.growth D.gfcfgdp LD.gfcfgdp D.employmentgrowth LD.employmentgrowth D.yearsedugrowt
> h LD.yearsedugrowth
Long Run Variables: gfcfgdp employmentgrowth yearsedugrowth L.lngdpph
Cointegration variable(s): L.growth
Heterogenous constant partialled out.

Upon further reading I realised that cross-sectional dependence may be a real problem in this dataset. For example, a common shock in this time period which is not controlled for is the 2007 Financial Crisis. I have been trying to use the command for testing for cross-sectional dependence below:

Code:

xtcse2, res

which outputs:

Code:

Cross-Sectional Dependence Exponent Estimation and Test

Panel Variable (i): cno
Time Variable (t): year

Estimation of Cross-Sectional Exponent (alpha)
----------------------------------------------------------------
       variable|     alpha   Std. Err.    [95% Conf. Interval]
---------------+------------------------------------------------
      residuals|   .549541          .            .           .
----------------------------------------------------------------
0.5 <= alpha < 1 implies strong cross-sectional dependence.
SE and CI not available. Use option reps() to bootstrap SE and CI.
Residuals calculated using  predict, residuals from xtdcce2.
Unbalanced panel detected, test adjusted.
 Missing values imputed for CD*.

Testing for weak cross-sectional dependence (CSD)
   H0: weak cross-section dependence
   H1: strong cross-section dependence
------------------------------------------------------------------------
               |    CD            CDw           CDw+          CD*
---------------+--------------------------------------------------------
residuals      |    37.05        1.23        955.96         20.43
               |  (0.000)     (0.220)       (0.000)       (0.000)
------------------------------------------------------------------------
p-values in parenthesis.

So this tells me I have cross-sectional dependence and should add cross-sectional averages to the command. However, running below causes the estimate of alpha to actually increase!

Code:

xtdcce2 d.growth d.l(growth) d.l(0/1)(gfcfgdp employmentgrowth yearsedugrowth), ///
    lr(l.growth gfcfgdp employmentgrowth yearsedugrowth l.lngdpph) ///
    p(l.growth gfcfgdp employmentgrowth yearsedugrowth l.lngdpph) cr(growth l.lngdpph) cr_lags(3)
xtcse2, res lags(3)

generates

Code:

Cross-Sectional Dependence Exponent Estimation and Test

Panel Variable (i): cno
Time Variable (t): year

Estimation of Cross-Sectional Exponent (alpha)
----------------------------------------------------------------
       variable|     alpha   Std. Err.    [95% Conf. Interval]
---------------+------------------------------------------------
      residuals|   .7663756          .            .           .
----------------------------------------------------------------
0.5 <= alpha < 1 implies strong cross-sectional dependence.
SE and CI not available. Use option reps() to bootstrap SE and CI.
Residuals calculated using  predict, residuals from xtdcce2.
Unbalanced panel detected, test adjusted.
 Missing values imputed for CD*.

Testing for weak cross-sectional dependence (CSD)
   H0: weak cross-section dependence
   H1: strong cross-section dependence
------------------------------------------------------------------------
               |    CD            CDw           CDw+          CD*
---------------+--------------------------------------------------------
residuals      |     1.64       -2.10        459.23          7.52
               |  (0.101)     (0.036)       (0.000)       (0.000)
------------------------------------------------------------------------
p-values in parenthesis.

Could someone please explain what the problem here is? How do I know which variables to include cross-sectional averages for?

Last edited by Bailey Stringer; 14 Apr 2023, 06:43.

Tags: Cross-Sectional Dependenc, error correction model, panel, xtdcce2

JanDitzen

Join Date: Jan 2015

Posts: 350
#2

17 Apr 2023, 01:22

Hi Bailey,
indeed in your specification I would be worried about strong cross-sectional dependence. Regarding your second model, I would suggest to add some of the other covariates as cross-section averages and see if a) the values of the CD test improve and b) regression results improve.
To be very honest, I would not put too much emphasis on the value of the exponent and advise to do a bootstrap to obtain confidence intervals.
Hope this helps.
Best,
Jan
1 like
Comment
Bailey Stringer

Join Date: Apr 2023

Posts: 16
#3

17 Apr 2023, 04:27

Hi Jan, thanks for responding. I have started using the exponent option at the end of the xtdcce2 command, and basing my decisions off the CD-statistic and corresponding p-value. The CD statistic here does in fact reduce when adding cross-sectional averages.

I still run into the problem that there is simply not enough data for T^1/3 cr lags, but this is simply a data issue. Honestly, I think all of my problems so far in this project can easily be written down as data issues. Have you done any work with Monte Carlo simulations for ECM with PMG estimators? I remember finding that even 5,000 observations might not be enough for a random time series with some serial correlation to be mean-reverting while exploring the concept last year, so in comparison 1166 observations feels pitifully small.

To be clear, I consider this thread to be solved. I ask mostly out of curiosity. Thanks!
Comment
JanDitzen

Join Date: Jan 2015

Posts: 350
#4

17 Apr 2023, 07:34

Indeed, the time dimension might be problematic, though you "only" add 3 lags of the cross-section averages. I think I would try to find a balanced panel or remove those units with a small number of observations over time. This might improve results. I haven't done any simulations of the CS-ECM or PMG estimations, but if I recall correctly, they are in the original papers. Due to the time dynamics, it wont be an easy task.
Comment
Bailey Stringer

Join Date: Apr 2023

Posts: 16
#5

17 Apr 2023, 16:04

Hi Jan,

I followed your advice on removing those units with a small number of observations over time to make the panel more balanced. This is something I should have already considered since I want the cross-section to be the same for each regression I run for easier comparison. However, in doing so and looking over my new estimates a thought has occurred to me.

I am trying to estimate 200 variables (heterogeneous short run coefficients and constants, homogenous long run coefficients, and cross-sectional averages) with a panel with 577 observations. Surely I am right in thinking that this is obviously absurd? No wonder my estimates are giving me so much grief. To clarify, I am attempting to replicate the results of a paper publish in the Canadian Journal of Economics in 2015, so it was not my idea to estimate the panel in this way. If my logic above is correct, this leaves me in a bit of a sticky situation, deadline wise...

Thanks for your help so far.
Comment
JanDitzen

Join Date: Jan 2015

Posts: 350
#6

18 Apr 2023, 00:56

This is indeed a problem common for models with heterogeneous slope coefficients. However it is essentially a time series regression and your degree of freedom is around 30-40 for the individual time series, so it is not that bad.

Since it is a replication I would try to keep as close as possible to the publication but discuss the shortcomings.
1 like
Comment
Bailey Stringer

Join Date: Apr 2023

Posts: 16
#7

20 Apr 2023, 07:32

Jan, is serial correlation a problem with this estimation, or does PMG account for this? I have read some contradictory answers, but I think Pesaran (2006) states that PMG is consistent in the presence of serial correlation, so this is not an issue. Thanks.
Comment

Announcement

xtdcce2 - Help with using cross-sectional averages

Comment

Comment

Comment

Comment

Comment

Comment