Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data fixed effects and time effects

    Hello there,
    For my master thesis I am conducting research about the effects of the digital divide on the educational attainment in the European continent. For this research I gathered data of 29 countries over a period of 14 years
    My dependent variable is the % of the population that compelted tertiary education( age group 24-34)
    Independent are : Population that has acces to broadband internet (in %), gini score(from 0 to 100, lower means better)
    Then I looked up for some control variables: Population (total) & mean income , (still thinking about adding unemployment rate as another control var)

    Upon using fixed and random effect
    Fixed:
    Code:
    . xtreg educ population gini broadband incomeMean, fe
    
    Fixed-effects (within) regression               Number of obs     =        398
    Group variable: country                         Number of groups  =         29
    
    R-squared:                                      Obs per group:
         Within  = 0.7214                                         min =         11
         Between = 0.3053                                         avg =       13.7
         Overall = 0.3832                                         max =         14
    
                                                    F(4,365)          =     236.33
    corr(u_i, Xb) = -0.3124                         Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
            educ | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
      population |  -9.70e-08   2.92e-07    -0.33   0.740    -6.71e-07    4.77e-07
            gini |   -.150809   .1070503    -1.41   0.160    -.3613219    .0597038
       broadband |   .2142734   .0098261    21.81   0.000     .1949504    .2335963
      incomeMean |   .0004968   .0000762     6.52   0.000      .000347    .0006467
           _cons |   20.37698   5.580489     3.65   0.000     9.403037    31.35093
    -------------+----------------------------------------------------------------
         sigma_u |  7.6449458
         sigma_e |  2.5569006
             rho |  .89939297   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(28, 365) = 91.84                    Prob > F = 0.0000
    random:
    Code:
     xtreg educ population gini broadband incomeMean, re
    
    Random-effects GLS regression                   Number of obs     =        398
    Group variable: country                         Number of groups  =         29
    
    R-squared:                                      Obs per group:
         Within  = 0.7206                                         min =         11
         Between = 0.3157                                         avg =       13.7
         Overall = 0.3976                                         max =         14
    
                                                    Wald chi2(4)      =     945.94
    corr(u_i, X) = 0 (assumed)                      Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
            educ | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
      population |  -8.62e-08   5.70e-08    -1.51   0.130    -1.98e-07    2.54e-08
            gini |  -.0609734   .1026333    -0.59   0.552     -.262131    .1401841
       broadband |   .2166468   .0096297    22.50   0.000     .1977729    .2355207
      incomeMean |   .0004461   .0000651     6.85   0.000     .0003184    .0005737
           _cons |   18.24877   3.490679     5.23   0.000     11.40716    25.09037
    -------------+----------------------------------------------------------------
         sigma_u |  6.9655036
         sigma_e |  2.5569006
             rho |  .88125285   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    .
    I used the Hausman test to confirm that fixed effects would be the better method to use :
    Code:
     hausman fixed random
    
    Note: the rank of the differenced variance matrix (3) does not equal the number of coefficients being tested (4); be sure this is what you expect, or there may be problems
            computing the test.  Examine the output of your estimators for anything unexpected and possibly consider scaling your variables so that the coefficients are on a
            similar scale.
    
                     ---- Coefficients ----
                 |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                 |     fixed        random       Difference       Std. err.
    -------------+----------------------------------------------------------------
      population |   -9.70e-08    -8.62e-08       -1.08e-08        2.86e-07
            gini |    -.150809    -.0609734       -.0898356        .0304333
       broadband |    .2142734     .2166468       -.0023734        .0019549
      incomeMean |    .0004968     .0004461        .0000508        .0000395
    ------------------------------------------------------------------------------
                              b = Consistent under H0 and Ha; obtained from xtreg.
               B = Inconsistent under Ha, efficient under H0; obtained from xtreg.
    
    Test of H0: Difference in coefficients not systematic
    
        chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                =   8.95
    Prob > chi2 = 0.0299
    (V_b-V_B is not positive definite)
    I did add robust to cluster my standard errors and got this as a result:
    Code:
    . xtreg educ population gini broadband incomeMean, fe robust
    
    Fixed-effects (within) regression               Number of obs     =        398
    Group variable: country                         Number of groups  =         29
    
    R-squared:                                      Obs per group:
         Within  = 0.7214                                         min =         11
         Between = 0.3053                                         avg =       13.7
         Overall = 0.3832                                         max =         14
    
                                                    F(4,28)           =      35.53
    corr(u_i, Xb) = -0.3124                         Prob > F          =     0.0000
    
                                   (Std. err. adjusted for 29 clusters in country)
    ------------------------------------------------------------------------------
                 |               Robust
            educ | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
      population |  -9.70e-08   4.97e-07    -0.20   0.847    -1.12e-06    9.21e-07
            gini |   -.150809   .1814625    -0.83   0.413    -.5225182    .2209001
       broadband |   .2142734   .0252991     8.47   0.000     .1624506    .2660962
      incomeMean |   .0004968   .0001977     2.51   0.018      .000092    .0009017
           _cons |   20.37698   9.725685     2.10   0.045     .4548208    40.29915
    -------------+----------------------------------------------------------------
         sigma_u |  7.6449458
         sigma_e |  2.5569006
             rho |  .89939297   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    Now two of my independent variables are significant and overall the model seems also significant if I read the F Stat.

    Upon adding i.year in the xtreg code like this:
    Code:
    . xtreg educ population gini broadband incomeMean i.year, fe robust
    
    Fixed-effects (within) regression               Number of obs     =        398
    Group variable: country                         Number of groups  =         29
    
    R-squared:                                      Obs per group:
         Within  = 0.7746                                         min =         11
         Between = 0.0199                                         avg =       13.7
         Overall = 0.0545                                         max =         14
    
                                                    F(17,28)          =      24.52
    corr(u_i, Xb) = -0.8738                         Prob > F          =     0.0000
    
                                   (Std. err. adjusted for 29 clusters in country)
    ------------------------------------------------------------------------------
                 |               Robust
            educ | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
      population |  -8.02e-07   4.62e-07    -1.73   0.094    -1.75e-06    1.45e-07
            gini |  -.1726822    .165679    -1.04   0.306    -.5120602    .1666957
       broadband |   .0004282   .0541787     0.01   0.994    -.1105518    .1114082
      incomeMean |  -.0000562   .0001719    -0.33   0.746    -.0004084     .000296
                 |
            year |
           2008  |   1.393604    .514999     2.71   0.011     .3386763    2.448531
           2009  |   2.785943   .9833509     2.83   0.008     .7716395    4.800246
           2010  |   3.947826   1.240529     3.18   0.004     1.406718    6.488935
           2011  |   4.918202   1.571527     3.13   0.004     1.699074    8.137329
           2012  |   6.289633   1.930353     3.26   0.003     2.335484    10.24378
           2013  |   7.574748   2.091623     3.62   0.001     3.290252    11.85924
           2014  |   9.325942    2.29589     4.06   0.000     4.623026    14.02886
           2015  |    9.79276   2.449228     4.00   0.000     4.775744    14.80978
           2016  |   10.65857   2.584618     4.12   0.000     5.364219    15.95292
           2017  |   11.28827   2.743002     4.12   0.000     5.669486    16.90705
           2018  |   12.10324   2.847884     4.25   0.000     6.269612    17.93686
           2019  |   12.90674   3.009904     4.29   0.000     6.741226    19.07224
           2020  |   13.88196   3.176958     4.37   0.000     7.374253    20.38966
                 |
           _cons |   50.14916   8.640671     5.80   0.000     32.44955    67.84878
    -------------+----------------------------------------------------------------
         sigma_u |  19.538604
         sigma_e |  2.3420248
             rho |  .98583553   (fraction of variance due to u_i)
    ----
    with testparm for year:
    Code:
    . testparm i.year
    
     ( 1)  2008.year = 0
     ( 2)  2009.year = 0
     ( 3)  2010.year = 0
     ( 4)  2011.year = 0
     ( 5)  2012.year = 0
     ( 6)  2013.year = 0
     ( 7)  2014.year = 0
     ( 8)  2015.year = 0
     ( 9)  2016.year = 0
     (10)  2017.year = 0
     (11)  2018.year = 0
     (12)  2019.year = 0
     (13)  2020.year = 0
    
           F( 13,    28) =    3.71
                Prob > F =    0.0018

    Now my question is am I doing this right by adding i.year into the regression? Because it seems that my dependent variables that were significant are not anymore. Also R-Squared here changed drastically but the F stat still says it's significant.
    How can I fix this? Help or hints would greatly help me and is enormously appreciated.
    Thank you and sorry for this very long message, but I tried to be as clear as possible by adding every step I took.

    Kind regards,
    Karim

  • #2
    My interpretation about the change in results when you add the year effects to the model is that the estimates obtained in the model without year effects were inflated by omitted variable bias: they were actually just standing in as proxies for the time trend in the outcome variable. If you look at the coefficients of the time indicators, you can see that there is a very regular and strong upward progression, increasing by approximately 1 percentage point each year. The absence of this time trend in the model led other variables that had some level of linear trend over time to get larger magnitude coefficients than they should have, to roughly represent the time trend.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      My interpretation about the change in results when you add the year effects to the model is that the estimates obtained in the model without year effects were inflated by omitted variable bias: they were actually just standing in as proxies for the time trend in the outcome variable. If you look at the coefficients of the time indicators, you can see that there is a very regular and strong upward progression, increasing by approximately 1 percentage point each year. The absence of this time trend in the model led other variables that had some level of linear trend over time to get larger magnitude coefficients than they should have, to roughly represent the time trend.
      Thank you Clyde, it makes sense, but now I'm really stuck if I have to use that time trend as it completly makes everything insignificant... is there a way to fix that bias? Or is that model doomed?

      Comment


      • #4
        The model is not doomed, nor is it broken. You just don't like the fact that it leads to a conclusion different from what you had hoped for. There is nothing to fix here. And it is pretty certain that any other approach to fixing the missing variable bias that afflicts the model lacking time indicators will lead to the same conclusion.

        However, I do have a suggestion for a different analysis that may shed more light. Notwithstanding the prevailing obsession with fixed effects models that pervades some disciplines, they have a severe limitation that is too often overlooked: the parameters that are estimated by that model are exclusively within panel effects. Whatever effects variables population, gini index, mean income, and broadband may have across countries are lost in fixed effects analyses. In other words, it is possible that, for example, countries with higher values of broadband access might have a higher proportion of young adults completing tertiary education, and yet over time as use of broadband increases within the country, it does so in ways that are unrelated to that outcome. A fixed effects model is completely incapable of recognizing and telling that kind of story.

        To see if something like that might be going on, try using the Mundlak correlated random effects model. It enables you to simultaneously estimate the within-country and between-country effects of all of the variables in the model. It is implemented in the command -xthybrid-, which is available from SSC. You may find that although these variables are not directly related to your educ outcome over time within countries, they may be associated across countries.

        That said, if you do find associations, given what these variables appear to mean, I would worry about reverse causality here. (Just to be clear, I would have worried about reverse causality had you found meaningful effects in the fixed-effects models, too.)

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          The model is not doomed, nor is it broken. You just don't like the fact that it leads to a conclusion different from what you had hoped for. There is nothing to fix here. And it is pretty certain that any other approach to fixing the missing variable bias that afflicts the model lacking time indicators will lead to the same conclusion.

          However, I do have a suggestion for a different analysis that may shed more light. Notwithstanding the prevailing obsession with fixed effects models that pervades some disciplines, they have a severe limitation that is too often overlooked: the parameters that are estimated by that model are exclusively within panel effects. Whatever effects variables population, gini index, mean income, and broadband may have across countries are lost in fixed effects analyses. In other words, it is possible that, for example, countries with higher values of broadband access might have a higher proportion of young adults completing tertiary education, and yet over time as use of broadband increases within the country, it does so in ways that are unrelated to that outcome. A fixed effects model is completely incapable of recognizing and telling that kind of story.

          To see if something like that might be going on, try using the Mundlak correlated random effects model. It enables you to simultaneously estimate the within-country and between-country effects of all of the variables in the model. It is implemented in the command -xthybrid-, which is available from SSC. You may find that although these variables are not directly related to your educ outcome over time within countries, they may be associated across countries.

          That said, if you do find associations, given what these variables appear to mean, I would worry about reverse causality here. (Just to be clear, I would have worried about reverse causality had you found meaningful effects in the fixed-effects models, too.)
          Okay thank you for the tips, as it is my first master thesis. Is it possible to still hand in research where variables are not significant at all? I will still try the Mundlak correlated random effects model and see there. Thanks again!

          edit: I tried the method you recommended me I read the h xthybrid, but not sure I'm doing everything correctly
          Code:
          . xthybrid educ broadband gini incomeMean population, clusterid(country) vce(robust) full
          
          
          ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          Model model
          ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          
          Mixed-effects GLM                               Number of obs     =        398
          Family: Gaussian
          Link:   Identity
          Group variable: country                         Number of groups  =         29
          
                                                          Obs per group:
                                                                        min =         11
                                                                        avg =       13.7
                                                                        max =         14
          
          Integration method: mvaghermite                 Integration pts.  =          7
          
                                                          Wald chi2(8)      =     382.64
          Log pseudolikelihood = -1000.7814               Prob > chi2       =     0.0000
                                          (Std. err. adjusted for 29 clusters in country)
          -------------------------------------------------------------------------------
                        |               Robust
                   educ | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
          --------------+----------------------------------------------------------------
           W__broadband |   .2142734   .0251713     8.51   0.000     .1649385    .2636082
                W__gini |   -.150809    .180546    -0.84   0.404    -.5046728    .2030547
          W__incomeMean |   .0004968   .0001967     2.53   0.012     .0001114    .0008823
          W__population |  -9.70e-08   4.95e-07    -0.20   0.845    -1.07e-06    8.72e-07
           B__broadband |   .2487463   .1947198     1.28   0.201    -.1328975    .6303901
                B__gini |   .9156804   .3248014     2.82   0.005     .2790813    1.552279
          B__incomeMean |   .0004381   .0001754     2.50   0.013     .0000943     .000782
          B__population |  -1.19e-07   6.44e-08    -1.84   0.066    -2.45e-07    7.71e-09
                  _cons |   -12.3188   17.64988    -0.70   0.485    -46.91192    22.27433
          --------------+----------------------------------------------------------------
          country       |
              var(_cons)|    40.0854   8.856034                      25.99746    61.80754
          --------------+----------------------------------------------------------------
             var(e.educ)|   6.466811    1.42578                      4.197785    9.962311
          -------------------------------------------------------------------------------
          I don't know how I should interpret this as there are two versions of each IV's, I haven't found anything in the documentation.
          Last edited by Abdelkarim VUB; 26 Jun 2022, 16:42.

          Comment


          • #6
            You forgot to include i.year!

            The coefficients that begin with W__ are estimates of the within-country effects of those variables. You will also notice that these coefficients will be the same as those you got from the fixed-effects model (with the same variables). The standard errors, and therefore also the test statistics and confidence intervals, will be somewhat different. Those that begin with B__ are estimates of the between (or across)-country effects.

            You can find more information about -xthybrid- at https://www.stata-journal.com/articl...article=st0283.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              You forgot to include i.year!

              The coefficients that begin with W__ are estimates of the within-country effects of those variables. You will also notice that these coefficients will be the same as those you got from the fixed-effects model (with the same variables). The standard errors, and therefore also the test statistics and confidence intervals, will be somewhat different. Those that begin with B__ are estimates of the between (or across)-country effects.

              You can find more information about -xthybrid- at https://www.stata-journal.com/articl...article=st0283.
              I cannot add i.year sadly as factor-variable and time-series operators not allowed

              Comment


              • #8
                Yes, sorry, I forgot that -xthybrid- is old and predates factor-variable notation in Stata. So do it this way:
                Code:
                xi: xthybrid educ broadband gini incomeMean population i.year, clusterid(country) vce(robust) full

                Comment


                • #9
                  If you had a balanced panel there is no difference between the Mundlak estimator and fixed effects. FE is more robust than Mundlak unless you adjust Mundlak— to make it the same as FE. The key seems to be including time dummies or not, and they seem necessary.

                  Comment


                  • #10
                    Your estimates are most likely biased anyway as country adoption of broadband is likely correlated with the unobservables, which you can partly purge with fixed effects.
                    Try to find an instrument for broadband adoption, as only that will give you a casual interpretation of findings. Year dummies capture macro effect and are almost certainly needed here.

                    Comment

                    Working...
                    X