Panel Data fixed effects and time effects

Abdelkarim VUB

Join Date: Jun 2022
Posts: 10

Panel Data fixed effects and time effects

26 Jun 2022, 15:28

Hello there,
For my master thesis I am conducting research about the effects of the digital divide on the educational attainment in the European continent. For this research I gathered data of 29 countries over a period of 14 years
My dependent variable is the % of the population that compelted tertiary education( age group 24-34)
Independent are : Population that has acces to broadband internet (in %), gini score(from 0 to 100, lower means better)
Then I looked up for some control variables: Population (total) & mean income , (still thinking about adding unemployment rate as another control var)

Upon using fixed and random effect
Fixed:

Code:

. xtreg educ population gini broadband incomeMean, fe

Fixed-effects (within) regression               Number of obs     =        398
Group variable: country                         Number of groups  =         29

R-squared:                                      Obs per group:
     Within  = 0.7214                                         min =         11
     Between = 0.3053                                         avg =       13.7
     Overall = 0.3832                                         max =         14

                                                F(4,365)          =     236.33
corr(u_i, Xb) = -0.3124                         Prob > F          =     0.0000

------------------------------------------------------------------------------
        educ | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  population |  -9.70e-08   2.92e-07    -0.33   0.740    -6.71e-07    4.77e-07
        gini |   -.150809   .1070503    -1.41   0.160    -.3613219    .0597038
   broadband |   .2142734   .0098261    21.81   0.000     .1949504    .2335963
  incomeMean |   .0004968   .0000762     6.52   0.000      .000347    .0006467
       _cons |   20.37698   5.580489     3.65   0.000     9.403037    31.35093
-------------+----------------------------------------------------------------
     sigma_u |  7.6449458
     sigma_e |  2.5569006
         rho |  .89939297   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(28, 365) = 91.84                    Prob > F = 0.0000

random:

Code:

 xtreg educ population gini broadband incomeMean, re

Random-effects GLS regression                   Number of obs     =        398
Group variable: country                         Number of groups  =         29

R-squared:                                      Obs per group:
     Within  = 0.7206                                         min =         11
     Between = 0.3157                                         avg =       13.7
     Overall = 0.3976                                         max =         14

                                                Wald chi2(4)      =     945.94
corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000

------------------------------------------------------------------------------
        educ | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
  population |  -8.62e-08   5.70e-08    -1.51   0.130    -1.98e-07    2.54e-08
        gini |  -.0609734   .1026333    -0.59   0.552     -.262131    .1401841
   broadband |   .2166468   .0096297    22.50   0.000     .1977729    .2355207
  incomeMean |   .0004461   .0000651     6.85   0.000     .0003184    .0005737
       _cons |   18.24877   3.490679     5.23   0.000     11.40716    25.09037
-------------+----------------------------------------------------------------
     sigma_u |  6.9655036
     sigma_e |  2.5569006
         rho |  .88125285   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

I used the Hausman test to confirm that fixed effects would be the better method to use :

Code:

 hausman fixed random

Note: the rank of the differenced variance matrix (3) does not equal the number of coefficients being tested (4); be sure this is what you expect, or there may be problems
        computing the test.  Examine the output of your estimators for anything unexpected and possibly consider scaling your variables so that the coefficients are on a
        similar scale.

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |     fixed        random       Difference       Std. err.
-------------+----------------------------------------------------------------
  population |   -9.70e-08    -8.62e-08       -1.08e-08        2.86e-07
        gini |    -.150809    -.0609734       -.0898356        .0304333
   broadband |    .2142734     .2166468       -.0023734        .0019549
  incomeMean |    .0004968     .0004461        .0000508        .0000395
------------------------------------------------------------------------------
                          b = Consistent under H0 and Ha; obtained from xtreg.
           B = Inconsistent under Ha, efficient under H0; obtained from xtreg.

Test of H0: Difference in coefficients not systematic

    chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
            =   8.95
Prob > chi2 = 0.0299
(V_b-V_B is not positive definite)

I did add robust to cluster my standard errors and got this as a result:

Code:

. xtreg educ population gini broadband incomeMean, fe robust

Fixed-effects (within) regression               Number of obs     =        398
Group variable: country                         Number of groups  =         29

R-squared:                                      Obs per group:
     Within  = 0.7214                                         min =         11
     Between = 0.3053                                         avg =       13.7
     Overall = 0.3832                                         max =         14

                                                F(4,28)           =      35.53
corr(u_i, Xb) = -0.3124                         Prob > F          =     0.0000

                               (Std. err. adjusted for 29 clusters in country)
------------------------------------------------------------------------------
             |               Robust
        educ | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  population |  -9.70e-08   4.97e-07    -0.20   0.847    -1.12e-06    9.21e-07
        gini |   -.150809   .1814625    -0.83   0.413    -.5225182    .2209001
   broadband |   .2142734   .0252991     8.47   0.000     .1624506    .2660962
  incomeMean |   .0004968   .0001977     2.51   0.018      .000092    .0009017
       _cons |   20.37698   9.725685     2.10   0.045     .4548208    40.29915
-------------+----------------------------------------------------------------
     sigma_u |  7.6449458
     sigma_e |  2.5569006
         rho |  .89939297   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Now two of my independent variables are significant and overall the model seems also significant if I read the F Stat.

Upon adding i.year in the xtreg code like this:

Code:

. xtreg educ population gini broadband incomeMean i.year, fe robust

Fixed-effects (within) regression               Number of obs     =        398
Group variable: country                         Number of groups  =         29

R-squared:                                      Obs per group:
     Within  = 0.7746                                         min =         11
     Between = 0.0199                                         avg =       13.7
     Overall = 0.0545                                         max =         14

                                                F(17,28)          =      24.52
corr(u_i, Xb) = -0.8738                         Prob > F          =     0.0000

                               (Std. err. adjusted for 29 clusters in country)
------------------------------------------------------------------------------
             |               Robust
        educ | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
  population |  -8.02e-07   4.62e-07    -1.73   0.094    -1.75e-06    1.45e-07
        gini |  -.1726822    .165679    -1.04   0.306    -.5120602    .1666957
   broadband |   .0004282   .0541787     0.01   0.994    -.1105518    .1114082
  incomeMean |  -.0000562   .0001719    -0.33   0.746    -.0004084     .000296
             |
        year |
       2008  |   1.393604    .514999     2.71   0.011     .3386763    2.448531
       2009  |   2.785943   .9833509     2.83   0.008     .7716395    4.800246
       2010  |   3.947826   1.240529     3.18   0.004     1.406718    6.488935
       2011  |   4.918202   1.571527     3.13   0.004     1.699074    8.137329
       2012  |   6.289633   1.930353     3.26   0.003     2.335484    10.24378
       2013  |   7.574748   2.091623     3.62   0.001     3.290252    11.85924
       2014  |   9.325942    2.29589     4.06   0.000     4.623026    14.02886
       2015  |    9.79276   2.449228     4.00   0.000     4.775744    14.80978
       2016  |   10.65857   2.584618     4.12   0.000     5.364219    15.95292
       2017  |   11.28827   2.743002     4.12   0.000     5.669486    16.90705
       2018  |   12.10324   2.847884     4.25   0.000     6.269612    17.93686
       2019  |   12.90674   3.009904     4.29   0.000     6.741226    19.07224
       2020  |   13.88196   3.176958     4.37   0.000     7.374253    20.38966
             |
       _cons |   50.14916   8.640671     5.80   0.000     32.44955    67.84878
-------------+----------------------------------------------------------------
     sigma_u |  19.538604
     sigma_e |  2.3420248
         rho |  .98583553   (fraction of variance due to u_i)
----

with testparm for year:

Code:

. testparm i.year

 ( 1)  2008.year = 0
 ( 2)  2009.year = 0
 ( 3)  2010.year = 0
 ( 4)  2011.year = 0
 ( 5)  2012.year = 0
 ( 6)  2013.year = 0
 ( 7)  2014.year = 0
 ( 8)  2015.year = 0
 ( 9)  2016.year = 0
 (10)  2017.year = 0
 (11)  2018.year = 0
 (12)  2019.year = 0
 (13)  2020.year = 0

       F( 13,    28) =    3.71
            Prob > F =    0.0018

Now my question is am I doing this right by adding i.year into the regression? Because it seems that my dependent variables that were significant are not anymore. Also R-Squared here changed drastically but the F stat still says it's significant.
How can I fix this? Help or hints would greatly help me and is enormously appreciated.
Thank you and sorry for this very long message, but I tried to be as clear as possible by adding every step I took.

Kind regards,
Karim

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#2

26 Jun 2022, 15:56

My interpretation about the change in results when you add the year effects to the model is that the estimates obtained in the model without year effects were inflated by omitted variable bias: they were actually just standing in as proxies for the time trend in the outcome variable. If you look at the coefficients of the time indicators, you can see that there is a very regular and strong upward progression, increasing by approximately 1 percentage point each year. The absence of this time trend in the model led other variables that had some level of linear trend over time to get larger magnitude coefficients than they should have, to roughly represent the time trend.
Comment
Abdelkarim VUB

Join Date: Jun 2022

Posts: 10
#3

26 Jun 2022, 16:05

Originally posted by Clyde Schechter View Post

My interpretation about the change in results when you add the year effects to the model is that the estimates obtained in the model without year effects were inflated by omitted variable bias: they were actually just standing in as proxies for the time trend in the outcome variable. If you look at the coefficients of the time indicators, you can see that there is a very regular and strong upward progression, increasing by approximately 1 percentage point each year. The absence of this time trend in the model led other variables that had some level of linear trend over time to get larger magnitude coefficients than they should have, to roughly represent the time trend.

Thank you Clyde, it makes sense, but now I'm really stuck if I have to use that time trend as it completly makes everything insignificant... is there a way to fix that bias? Or is that model doomed?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#4

26 Jun 2022, 16:21

The model is not doomed, nor is it broken. You just don't like the fact that it leads to a conclusion different from what you had hoped for. There is nothing to fix here. And it is pretty certain that any other approach to fixing the missing variable bias that afflicts the model lacking time indicators will lead to the same conclusion.

However, I do have a suggestion for a different analysis that may shed more light. Notwithstanding the prevailing obsession with fixed effects models that pervades some disciplines, they have a severe limitation that is too often overlooked: the parameters that are estimated by that model are exclusively within panel effects. Whatever effects variables population, gini index, mean income, and broadband may have across countries are lost in fixed effects analyses. In other words, it is possible that, for example, countries with higher values of broadband access might have a higher proportion of young adults completing tertiary education, and yet over time as use of broadband increases within the country, it does so in ways that are unrelated to that outcome. A fixed effects model is completely incapable of recognizing and telling that kind of story.

To see if something like that might be going on, try using the Mundlak correlated random effects model. It enables you to simultaneously estimate the within-country and between-country effects of all of the variables in the model. It is implemented in the command -xthybrid-, which is available from SSC. You may find that although these variables are not directly related to your educ outcome over time within countries, they may be associated across countries.

That said, if you do find associations, given what these variables appear to mean, I would worry about reverse causality here. (Just to be clear, I would have worried about reverse causality had you found meaningful effects in the fixed-effects models, too.)
1 like
Comment

Abdelkarim VUB

Join Date: Jun 2022
Posts: 10

26 Jun 2022, 16:29

Originally posted by Clyde Schechter View Post

The model is not doomed, nor is it broken. You just don't like the fact that it leads to a conclusion different from what you had hoped for. There is nothing to fix here. And it is pretty certain that any other approach to fixing the missing variable bias that afflicts the model lacking time indicators will lead to the same conclusion.

However, I do have a suggestion for a different analysis that may shed more light. Notwithstanding the prevailing obsession with fixed effects models that pervades some disciplines, they have a severe limitation that is too often overlooked: the parameters that are estimated by that model are exclusively within panel effects. Whatever effects variables population, gini index, mean income, and broadband may have across countries are lost in fixed effects analyses. In other words, it is possible that, for example, countries with higher values of broadband access might have a higher proportion of young adults completing tertiary education, and yet over time as use of broadband increases within the country, it does so in ways that are unrelated to that outcome. A fixed effects model is completely incapable of recognizing and telling that kind of story.

To see if something like that might be going on, try using the Mundlak correlated random effects model. It enables you to simultaneously estimate the within-country and between-country effects of all of the variables in the model. It is implemented in the command -xthybrid-, which is available from SSC. You may find that although these variables are not directly related to your educ outcome over time within countries, they may be associated across countries.

That said, if you do find associations, given what these variables appear to mean, I would worry about reverse causality here. (Just to be clear, I would have worried about reverse causality had you found meaningful effects in the fixed-effects models, too.)

Okay thank you for the tips, as it is my first master thesis. Is it possible to still hand in research where variables are not significant at all? I will still try the Mundlak correlated random effects model and see there. Thanks again!

edit: I tried the method you recommended me I read the h xthybrid, but not sure I'm doing everything correctly

Code:

. xthybrid educ broadband gini incomeMean population, clusterid(country) vce(robust) full


----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Model model
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Mixed-effects GLM                               Number of obs     =        398
Family: Gaussian
Link:   Identity
Group variable: country                         Number of groups  =         29

                                                Obs per group:
                                                              min =         11
                                                              avg =       13.7
                                                              max =         14

Integration method: mvaghermite                 Integration pts.  =          7

                                                Wald chi2(8)      =     382.64
Log pseudolikelihood = -1000.7814               Prob > chi2       =     0.0000
                                (Std. err. adjusted for 29 clusters in country)
-------------------------------------------------------------------------------
              |               Robust
         educ | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
--------------+----------------------------------------------------------------
 W__broadband |   .2142734   .0251713     8.51   0.000     .1649385    .2636082
      W__gini |   -.150809    .180546    -0.84   0.404    -.5046728    .2030547
W__incomeMean |   .0004968   .0001967     2.53   0.012     .0001114    .0008823
W__population |  -9.70e-08   4.95e-07    -0.20   0.845    -1.07e-06    8.72e-07
 B__broadband |   .2487463   .1947198     1.28   0.201    -.1328975    .6303901
      B__gini |   .9156804   .3248014     2.82   0.005     .2790813    1.552279
B__incomeMean |   .0004381   .0001754     2.50   0.013     .0000943     .000782
B__population |  -1.19e-07   6.44e-08    -1.84   0.066    -2.45e-07    7.71e-09
        _cons |   -12.3188   17.64988    -0.70   0.485    -46.91192    22.27433
--------------+----------------------------------------------------------------
country       |
    var(_cons)|    40.0854   8.856034                      25.99746    61.80754
--------------+----------------------------------------------------------------
   var(e.educ)|   6.466811    1.42578                      4.197785    9.962311
-------------------------------------------------------------------------------

I don't know how I should interpret this as there are two versions of each IV's, I haven't found anything in the documentation.

Last edited by Abdelkarim VUB; 26 Jun 2022, 16:42.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#6

26 Jun 2022, 17:06

You forgot to include i.year!

The coefficients that begin with W__ are estimates of the within-country effects of those variables. You will also notice that these coefficients will be the same as those you got from the fixed-effects model (with the same variables). The standard errors, and therefore also the test statistics and confidence intervals, will be somewhat different. Those that begin with B__ are estimates of the between (or across)-country effects.

You can find more information about -xthybrid- at https://www.stata-journal.com/articl...article=st0283.
Comment
Abdelkarim VUB

Join Date: Jun 2022

Posts: 10
#7

26 Jun 2022, 17:44

Originally posted by Clyde Schechter View Post

You forgot to include i.year!

The coefficients that begin with W__ are estimates of the within-country effects of those variables. You will also notice that these coefficients will be the same as those you got from the fixed-effects model (with the same variables). The standard errors, and therefore also the test statistics and confidence intervals, will be somewhat different. Those that begin with B__ are estimates of the between (or across)-country effects.

You can find more information about -xthybrid- at https://www.stata-journal.com/articl...article=st0283.

I cannot add i.year sadly as factor-variable and time-series operators not allowed
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#8

26 Jun 2022, 19:37

Yes, sorry, I forgot that -xthybrid- is old and predates factor-variable notation in Stata. So do it this way:

Code:

xi: xthybrid educ broadband gini incomeMean population i.year, clusterid(country) vce(robust) full
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2291
#9

27 Jun 2022, 06:41

If you had a balanced panel there is no difference between the Mundlak estimator and fixed effects. FE is more robust than Mundlak unless you adjust Mundlak— to make it the same as FE. The key seems to be including time dummies or not, and they seem necessary.
1 like
Comment
John Schawrz

Join Date: Nov 2019

Posts: 30
#10

30 Jun 2022, 00:25

Your estimates are most likely biased anyway as country adoption of broadband is likely correlated with the unobservables, which you can partly purge with fixed effects.
Try to find an instrument for broadband adoption, as only that will give you a casual interpretation of findings. Year dummies capture macro effect and are almost certainly needed here.
Comment

Announcement

Panel Data fixed effects and time effects

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment