Time dummies in fixed effects model causes an unexpected sign

Vishaal Gupta

Join Date: Apr 2016

Posts: 4
#1

Time dummies in fixed effects model causes an unexpected sign

24 Apr 2016, 08:41

Dear Statalist
I am conducting an investigation looking at the effects of corruption on FDI inflows (19 countries between 2000-2010).

HTML Code:

xtreg lnfdipc L.corr L.econopen L.lnpcgdp L.elec L.agglom L.law L.pol i.year, fe cluster(region)

In my first model I ran a fixed effects regression and in my second model I added a year dummy variable to this regression (see regression above). This has caused the
signs of one of my main explanatory variable, natural log of per capita GDP (proxy for market size), to go from positive (as in my first model) to negative.

A list of variable definitions is as follows (every variable has been lagged one year):
lnfdipc - log of per capita fdi inflows
corr - corruption
econopen - economic openness
lnpcgdp - log of per capita gdp
elec - electricity consumption (proxy for infrastructure)
agglom - agglomeration effect
law - rule of law
pol - polity

I apologise if this is a naive question however I was wondering if there was an explanation for this?
Thanks in advance

Vishaal
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

24 Apr 2016, 08:59

Vishaal:
welcome to the list.
As per FAQ #12, please post what you typed and what Stata gave you back via CODE delimiters (not HTML Code, please).
It's almost impossible (for me, at least) to comment positively on your query with such scant details (a regression coefficient can flip its sign for several reasons).

Kind regards,
Carlo
(Stata 19.0)
Comment

Vishaal Gupta

Join Date: Apr 2016
Posts: 4

24 Apr 2016, 09:11

Hi Carlo

Sorry for the lack of detail. The results of the regression are as follows.

Model 1:

Code:

. xtreg  lnfdipc L.corr L.econopen L.lnpcgdp L.elec L.agglom L.law L.pol, fe cluster(region)

Fixed-effects (within) regression               Number of obs      =       157
Group variable: region                          Number of groups   =        19

R-sq:  within  = 0.2918                         Obs per group: min =         6
       between = 0.7872                                        avg =       8.3
       overall = 0.6340                                        max =         9

                                                F(7,18)            =     11.51
corr(u_i, Xb)  = -0.9373                        Prob > F           =    0.0000

                                (Std. Err. adjusted for 19 clusters in region)
------------------------------------------------------------------------------
             |               Robust
     lnfdipc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        corr |
         L1. |   .1449457   .1807158     0.80   0.433    -.2347242    .5246155
             |
    econopen |
         L1. |    .010007   .0054691     1.83   0.084    -.0014832    .0214972
             |
     lnpcgdp |
         L1. |   2.024846   .8355394     2.42   0.026     .2694429    3.780249
             |
        elec |
         L1. |   .0008041   .0004997     1.61   0.125    -.0002458     .001854
             |
      agglom |
         L1. |    .003663   .0074741     0.49   0.630    -.0120395    .0193654
             |
         law |
         L1. |   .7285327   .5571249     1.31   0.207    -.4419433    1.899009
             |
         pol |
         L1. |   .1136691    .105916     1.07   0.297    -.1088521    .3361903
             |
       _cons |  -14.65124   6.477692    -2.26   0.036    -28.26036   -1.042109
-------------+----------------------------------------------------------------
     sigma_u |  1.7538733
     sigma_e |  .58414056
         rho |  .90014895   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Model 2:

Code:

. xtreg  lnfdipc L.corr L.econopen L.lnpcgdp L.elec L.agglom L.law L.pol i.year, fe cluster(region)

Fixed-effects (within) regression               Number of obs      =       157
Group variable: region                          Number of groups   =        19

R-sq:  within  = 0.4537                         Obs per group: min =         6
       between = 0.0543                                        avg =       8.3
       overall = 0.1182                                        max =         9

                                                F(15,18)           =    252.36
corr(u_i, Xb)  = -0.5129                        Prob > F           =    0.0000

                                (Std. Err. adjusted for 19 clusters in region)
------------------------------------------------------------------------------
             |               Robust
     lnfdipc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        corr |
         L1. |   .1171643   .1355097     0.86   0.399    -.1675311    .4018597
             |
    econopen |
         L1. |    .011487   .0068036     1.69   0.109    -.0028069    .0257808
             |
     lnpcgdp |
         L1. |  -1.605614   1.534577    -1.05   0.309     -4.82964    1.618413
             |
        elec |
         L1. |   .0007026   .0003832     1.83   0.083    -.0001026    .0015077
             |
      agglom |
         L1. |   .0040171   .0056572     0.71   0.487    -.0078681    .0159024
             |
         law |
         L1. |   1.200021   .6309905     1.90   0.073    -.1256409    2.525683
             |
         pol |
         L1. |   .0081509   .0684112     0.12   0.906    -.1355757    .1518774
             |
        year |
       2003  |  -.4646744   .1834023    -2.53   0.021    -.8499884   -.0793605
       2004  |  -.3184604   .2248359    -1.42   0.174    -.7908232    .1539024
       2005  |  -.0936331   .2364591    -0.40   0.697    -.5904153    .4031491
       2006  |   .0750272   .2732951     0.27   0.787    -.4991445    .6491988
       2007  |   .5630124   .2409954     2.34   0.031     .0566998    1.069325
       2008  |   .7694771   .3080264     2.50   0.022     .1223376    1.416617
       2009  |   .2637801   .2842839     0.93   0.366    -.3334781    .8610384
       2010  |   .7155929   .2846822     2.51   0.022     .1174978    1.313688
             |
       _cons |   15.55407    12.0978     1.29   0.215    -9.862465    40.97061
-------------+----------------------------------------------------------------
     sigma_u |  1.4222801
     sigma_e |  .52947737
         rho |  .87828109   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Kind regards

Vishaal

Last edited by Vishaal Gupta; 24 Apr 2016, 09:19.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

24 Apr 2016, 10:24

Vishaal:
thanks for providing further details.
Considering that most of the years in the second model show no evidence of statistical significance and increase the number of predictors vs sample size remarkably, I would stay with the first model (which loses something in terms of -R-sq within- but outperforms the second one in terms of both -R-sq overall- and -rho-.
As a closing-out remarks and anticipating that your research field is far away from mine, I would also say that in both models most of predictors do not reach statistical significance: even though I'm not a "p-value less than" fan, I would contrast those results with some findings reported in the literature of your research field, just to take a look at the way Others specify their regression model (by the way: have you already checked via -hausman- that there's no room for an -re- specification?) when presented with the same research topic.

Kind regards,
Carlo
(Stata 19.0)
Comment
Vishaal Gupta

Join Date: Apr 2016

Posts: 4
#5

24 Apr 2016, 10:54

Hi Carlo

Thanks a lot for the advice. The variable choices have been based upon existing literature and the Hausman test I ran rejected the null hypothesis, thus I used fixed effects. Following the regression including the year dummies I used testparm to test the significance of the years. It yielded the following result:

Code:

. testparm i.year ( 1) 2003.year = 0 ( 2) 2004.year = 0 ( 3) 2005.year = 0 ( 4) 2006.year = 0 ( 5) 2007.year = 0 ( 6) 2008.year = 0 ( 7) 2009.year = 0 ( 8) 2010.year = 0 F( 8, 18) = 11.54 Prob > F = 0.0000

Would you still recommend staying with the first model despite this?

Vishaal
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#6

24 Apr 2016, 11:41

Vishaal:
I would not base my potential preference for model 2 on the result of -testparm-, but on the better -R-sq: within-, vs model 1 (a higher R-sq within is a good result when it comes to -fe- specification).
That said, I would still prefer model 1 (although the criticalities about predictors contribution in models 1 and 2 still hold).

Kind regards,
Carlo
(Stata 19.0)
Comment
Vishaal Gupta

Join Date: Apr 2016

Posts: 4
#7

24 Apr 2016, 12:07

Thanks for these comments Carlo.

Vishaal
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#8

25 Apr 2016, 11:50

Carlo,

Could you help me understand your thinking and the meaning of these R-sq's a bit? I am not disagreeing with your advice but want to understand it a bit better.

As I understand it, the within R-square is closest to what is actually estimated in the fe model. The other two R-squares are then calculated with the parameters from the fe estimate.

I wondered why adding the year dummies lowered between and overall R-square but this makes sense because the extra variables help the within estimate but just happen to change the other R-squares. This doesn't fit the "adding variables doesn't lower R-square" because the added variables were done in an estimation for the within - they could have any impact on the between and overall.

How exactly should we think about the between and overall R-squares in a fixed effect regression? Why should we pay attention to them when we use an estimator designed based on the within explained variance?

Once you run the model with the year effects and find the year dummies are statistically significant, aren't you ignoring omitted variables bias in dropping them?

I would appreciate your guidance.

Phil
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#9

25 Apr 2016, 23:53

Phil:
I agree with your concern.
In a -fe- model, R-sq between and overall are simply squared correlation coefficients and the focus of the analysis should be on R-sq within.
In Vishaal's cases, R-sq within is higher in the second model, where -i.years- taken all together are also different from 0 (even though some of them are quite far from reaching statistical significance). I agree that, taken all together, omitting -i-year- can lead to omitted variable bias.
Even if we do not consider R-sq between and overall (which is a legal take when it comes to -xtreg, fe- specification), the first model outperforms the second one as far as -sigma_u- is concerned (a feature I should have repoeted in my previous reply).
So the choice may be between higher R-sq within but lower -sigma_u- or viceversa (even though in both model most part of the predictors hardly explain the variation in -depvar-; however, even this outcome may be an interesting result)..

Kind regards,
Carlo
(Stata 19.0)
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#10

26 Apr 2016, 10:36

Carlo,

Thanks for the clarification. I'm afraid I have another question.

I would have thought that lower sigma u was a good thing - indicating that the variables in the model are explaining more of the variance. Hypothetically, if we had all the right variables, sigma u should be zero - we would not need the fixed effects. Why is larger sigma u a good thing?

I suspect underneath it all Vishaal is estimating too many parameters with too little data. Even in the first smaller model he has 19 fixed effects plus 7 variables which is a lot if you only have 157 observations, particularly if you're looking for a modest effect. Adding the 8 time dummies makes this worse. I would think he should work on getting more data.

Phil
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#11

26 Apr 2016, 15:12

Phil:
I meant that in model 1 -sigma_u- outperforms -sigma_e- and a higher portion of the variation in -depvar- is explained by individual effect rather than idiosyncratic error.
However, the more I look at Vishaal's example, the stronger is my belief that the problem (if any) lies in the predictors, which hardly explain any variation in the -depvar-. It would be interesting for Vishaal to take this matter further and investigate if the absence of (statistical significance) evidence is indeed absence of evidence or if increasing the sample sizes can give different findings.
Otherwise I agree with you that both models suffer from an excess of predictors when constrasted against their sample sizes.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Time dummies in fixed effects model causes an unexpected sign

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment