Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data Analysis Fixed Effects

    Good morning,

    I'm writing my master thesis and I find myself having an issue regarding the inclusion of Year fixed effects in my regression. I'm now studying the effect of Gender Development Index on GDP per capita in a panel of 27 countries from 1990 to 2020. However, the coeff. reverts when including i.Year, going from a positive to a negative value (and completely changing the policy implications of my analysis).

    Even though the "expectations" for my analysis were relying on a positive correlation, why do you think the inclusion of time fixed effects might reverse the sign of the coefficient? Would it make sense, in this specific case, not to include them?

    without fixed effects:

    Code:
    xtreg ln_GDPpc GDI GINI INV_r UN_r POP_g HCPI GOV_exp, fe
    
    Fixed-effects (within) regression               Number of obs     =        614
    Group variable: Country_ID                      Number of groups  =         27
    
    R-squared:                                      Obs per group:
         Within  = 0.6758                                         min =          6
         Between = 0.2723                                         avg =       22.7
         Overall = 0.3684                                         max =         31
    
                                                    F(7, 580)         =     172.74
    corr(u_i, Xb) = -0.4103                         Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
        ln_GDPpc | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             GDI |   6.744834   .5237326    12.88   0.000     5.716191    7.773478
            GINI |  -.0214288   .0043979    -4.87   0.000    -.0300666   -.0127911
           INV_r |   .0253672   .0034532     7.35   0.000     .0185849    .0321494
            UN_r |  -.0620038   .0085478    -7.25   0.000    -.0787922   -.0452153
           POP_g |  -.3070663   .0485502    -6.32   0.000    -.4024218   -.2117107
            HCPI |  -.0123887   .0022354    -5.54   0.000    -.0167792   -.0079981
         GOV_exp |   .0245106   .0043933     5.58   0.000     .0158818    .0331393
           _cons |   1.922476   .5395802     3.56   0.000     .8627065    2.982245
    -------------+----------------------------------------------------------------
         sigma_u |  .68725516
         sigma_e |  .30698036
             rho |  .83366721   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0: F(26, 580) = 62.12                    Prob > F = 0.0000

    with Year fixed effects

    Code:
    xtreg ln_GDPpc GDI GINI INV_r UN_r POP_g HCPI GOV_exp i.Year, fe
    
    Fixed-effects (within) regression               Number of obs     =        614
    Group variable: Country_ID                      Number of groups  =         27
    
    R-squared:                                      Obs per group:
         Within  = 0.9097                                         min =          6
         Between = 0.0484                                         avg =       22.7
         Overall = 0.3801                                         max =         31
    
                                                    F(37, 550)        =     149.67
    corr(u_i, Xb) = -0.0162                         Prob > F          =     0.0000
    
    ------------------------------------------------------------------------------
        ln_GDPpc | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             GDI |  -.1145937   .3549888    -0.32   0.747    -.8118934     .582706
            GINI |    .016365   .0028453     5.75   0.000      .010776    .0219539
           INV_r |   .0049661   .0020244     2.45   0.014     .0009896    .0089425
            UN_r |  -.0106453   .0049719    -2.14   0.033    -.0204114   -.0008792
           POP_g |  -.0181923   .0300618    -0.61   0.545    -.0772423    .0408578
            HCPI |  -.0071328   .0013566    -5.26   0.000    -.0097976    -.004468
         GOV_exp |  -.0067474   .0026602    -2.54   0.011    -.0119728    -.001522
                 |
    
    Year |
           1991  |   .0552928   .0703768     0.79   0.432    -.0829475     .193533
           1992  |   .0339597   .0723705     0.47   0.639    -.1081967     .176116
           1993  |   .0529695   .0722955     0.73   0.464    -.0890396    .1949787
           1994  |   .1105175    .072307     1.53   0.127    -.0315141    .2525491
           
    (...)

    I'm very pleased in advance for your support!

  • #2
    Time is included when the outcome variable is subject to time effects. A variable like GDPpc, or its log transform, is such a variable. Not only are there long term trends in this variable, but there are often yearly shocks of appreciable magnitude. At a minimum, including these in the model reduces the residual variance, thereby improving the efficiency of the model. But there is also the possibility that the focal independent variable, GDI in your case, might itself be subject to time effects. And if so, it may be that the GDI variable serves as a "proxy" for the time effects. The fact that your GDI coefficient changes drastically depending on whether you include the time effects tells us that GDI is, indeed, itself time sensitive, and that the effects of time on GDI are (using language loosely here) parallel to the time effects on GDPpc. I think that the best interpretation of your results is that the apparent effect of GDI in your "timeless" model are the result of this confounding (aka omitted variable bias). The model with time effects demonstrates that the apparent GDI effect is just the consequence of GDI serving as a proxy for time when time is omitted. If you are looking to establish something that you can plausibly argue is a causal relationship, you have to go with the model that includes time effects here. The "timeless" model is showing you an association that is demonstrably non-causal--such associations are useful for some purposes, but not others. So ultimately it depends on your research goal. But typically studies like this hope to demonstrate a causal relationship, and the timeless model will not do that here.
    Last edited by Clyde Schechter; 11 Jul 2024, 09:43.

    Comment


    • #3
      Iris:
      Clyde gave an excellent advice.
      On a different note, while your R_sq within is sky-rocketing (0.9097), most of your coefficients do not reach statistical significance.
      I would take an additional look at your regression specification, as it probably suffers from overfitting (see multiple regression - High R-squared although many insignificant coefficients - Cross Validated (stackexchange.com)).
      In addition, with 27 panels you should check whether or not cluster-robust standard errors are the way to go.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Dear @Carlo and @Clyde, I would like to thank you both for your valuable and meaningful feedbacks

        Comment

        Working...
        X