Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Time dummies in fixed effects model causes an unexpected sign

    Dear Statalist
    I am conducting an investigation looking at the effects of corruption on FDI inflows (19 countries between 2000-2010).

    HTML Code:
    xtreg  lnfdipc L.corr L.econopen L.lnpcgdp L.elec L.agglom L.law L.pol i.year, fe cluster(region)
    In my first model I ran a fixed effects regression and in my second model I added a year dummy variable to this regression (see regression above). This has caused the
    signs of one of my main explanatory variable, natural log of per capita GDP (proxy for market size), to go from positive (as in my first model) to negative.


    A list of variable definitions is as follows (every variable has been lagged one year):
    lnfdipc - log of per capita fdi inflows
    corr - corruption
    econopen - economic openness
    lnpcgdp - log of per capita gdp
    elec - electricity consumption (proxy for infrastructure)
    agglom - agglomeration effect
    law - rule of law
    pol - polity

    I apologise if this is a naive question however I was wondering if there was an explanation for this?

    Thanks in advance

    Vishaal


  • #2
    Vishaal:
    welcome to the list.
    As per FAQ #12, please post what you typed and what Stata gave you back via CODE delimiters (not HTML Code, please).
    It's almost impossible (for me, at least) to comment positively on your query with such scant details (a regression coefficient can flip its sign for several reasons).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo

      Sorry for the lack of detail. The results of the regression are as follows.

      Model 1:
      Code:
      . xtreg  lnfdipc L.corr L.econopen L.lnpcgdp L.elec L.agglom L.law L.pol, fe cluster(region)
      
      Fixed-effects (within) regression               Number of obs      =       157
      Group variable: region                          Number of groups   =        19
      
      R-sq:  within  = 0.2918                         Obs per group: min =         6
             between = 0.7872                                        avg =       8.3
             overall = 0.6340                                        max =         9
      
                                                      F(7,18)            =     11.51
      corr(u_i, Xb)  = -0.9373                        Prob > F           =    0.0000
      
                                      (Std. Err. adjusted for 19 clusters in region)
      ------------------------------------------------------------------------------
                   |               Robust
           lnfdipc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
              corr |
               L1. |   .1449457   .1807158     0.80   0.433    -.2347242    .5246155
                   |
          econopen |
               L1. |    .010007   .0054691     1.83   0.084    -.0014832    .0214972
                   |
           lnpcgdp |
               L1. |   2.024846   .8355394     2.42   0.026     .2694429    3.780249
                   |
              elec |
               L1. |   .0008041   .0004997     1.61   0.125    -.0002458     .001854
                   |
            agglom |
               L1. |    .003663   .0074741     0.49   0.630    -.0120395    .0193654
                   |
               law |
               L1. |   .7285327   .5571249     1.31   0.207    -.4419433    1.899009
                   |
               pol |
               L1. |   .1136691    .105916     1.07   0.297    -.1088521    .3361903
                   |
             _cons |  -14.65124   6.477692    -2.26   0.036    -28.26036   -1.042109
      -------------+----------------------------------------------------------------
           sigma_u |  1.7538733
           sigma_e |  .58414056
               rho |  .90014895   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------


      Model 2:
      Code:
      . xtreg  lnfdipc L.corr L.econopen L.lnpcgdp L.elec L.agglom L.law L.pol i.year, fe cluster(region)
      
      Fixed-effects (within) regression               Number of obs      =       157
      Group variable: region                          Number of groups   =        19
      
      R-sq:  within  = 0.4537                         Obs per group: min =         6
             between = 0.0543                                        avg =       8.3
             overall = 0.1182                                        max =         9
      
                                                      F(15,18)           =    252.36
      corr(u_i, Xb)  = -0.5129                        Prob > F           =    0.0000
      
                                      (Std. Err. adjusted for 19 clusters in region)
      ------------------------------------------------------------------------------
                   |               Robust
           lnfdipc |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
              corr |
               L1. |   .1171643   .1355097     0.86   0.399    -.1675311    .4018597
                   |
          econopen |
               L1. |    .011487   .0068036     1.69   0.109    -.0028069    .0257808
                   |
           lnpcgdp |
               L1. |  -1.605614   1.534577    -1.05   0.309     -4.82964    1.618413
                   |
              elec |
               L1. |   .0007026   .0003832     1.83   0.083    -.0001026    .0015077
                   |
            agglom |
               L1. |   .0040171   .0056572     0.71   0.487    -.0078681    .0159024
                   |
               law |
               L1. |   1.200021   .6309905     1.90   0.073    -.1256409    2.525683
                   |
               pol |
               L1. |   .0081509   .0684112     0.12   0.906    -.1355757    .1518774
                   |
              year |
             2003  |  -.4646744   .1834023    -2.53   0.021    -.8499884   -.0793605
             2004  |  -.3184604   .2248359    -1.42   0.174    -.7908232    .1539024
             2005  |  -.0936331   .2364591    -0.40   0.697    -.5904153    .4031491
             2006  |   .0750272   .2732951     0.27   0.787    -.4991445    .6491988
             2007  |   .5630124   .2409954     2.34   0.031     .0566998    1.069325
             2008  |   .7694771   .3080264     2.50   0.022     .1223376    1.416617
             2009  |   .2637801   .2842839     0.93   0.366    -.3334781    .8610384
             2010  |   .7155929   .2846822     2.51   0.022     .1174978    1.313688
                   |
             _cons |   15.55407    12.0978     1.29   0.215    -9.862465    40.97061
      -------------+----------------------------------------------------------------
           sigma_u |  1.4222801
           sigma_e |  .52947737
               rho |  .87828109   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      Kind regards

      Vishaal
      Last edited by Vishaal Gupta; 24 Apr 2016, 09:19.

      Comment


      • #4
        Vishaal:
        thanks for providing further details.
        Considering that most of the years in the second model show no evidence of statistical significance and increase the number of predictors vs sample size remarkably, I would stay with the first model (which loses something in terms of -R-sq within- but outperforms the second one in terms of both -R-sq overall- and -rho-.
        As a closing-out remarks and anticipating that your research field is far away from mine, I would also say that in both models most of predictors do not reach statistical significance: even though I'm not a "p-value less than" fan, I would contrast those results with some findings reported in the literature of your research field, just to take a look at the way Others specify their regression model (by the way: have you already checked via -hausman- that there's no room for an -re- specification?) when presented with the same research topic.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Hi Carlo

          Thanks a lot for the advice. The variable choices have been based upon existing literature and the Hausman test I ran rejected the null hypothesis, thus I used fixed effects. Following the regression including the year dummies I used testparm to test the significance of the years. It yielded the following result:

          Code:
          . testparm i.year
          
           ( 1)  2003.year = 0
           ( 2)  2004.year = 0
           ( 3)  2005.year = 0
           ( 4)  2006.year = 0
           ( 5)  2007.year = 0
           ( 6)  2008.year = 0
           ( 7)  2009.year = 0
           ( 8)  2010.year = 0
          
                 F(  8,    18) =   11.54
                      Prob > F =    0.0000
          Would you still recommend staying with the first model despite this?

          Vishaal

          Comment


          • #6
            Vishaal:
            I would not base my potential preference for model 2 on the result of -testparm-, but on the better -R-sq: within-, vs model 1 (a higher R-sq within is a good result when it comes to -fe- specification).
            That said, I would still prefer model 1 (although the criticalities about predictors contribution in models 1 and 2 still hold).
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Thanks for these comments Carlo.

              Vishaal

              Comment


              • #8
                Carlo,

                Could you help me understand your thinking and the meaning of these R-sq's a bit? I am not disagreeing with your advice but want to understand it a bit better.

                As I understand it, the within R-square is closest to what is actually estimated in the fe model. The other two R-squares are then calculated with the parameters from the fe estimate.

                I wondered why adding the year dummies lowered between and overall R-square but this makes sense because the extra variables help the within estimate but just happen to change the other R-squares. This doesn't fit the "adding variables doesn't lower R-square" because the added variables were done in an estimation for the within - they could have any impact on the between and overall.

                How exactly should we think about the between and overall R-squares in a fixed effect regression? Why should we pay attention to them when we use an estimator designed based on the within explained variance?

                Once you run the model with the year effects and find the year dummies are statistically significant, aren't you ignoring omitted variables bias in dropping them?

                I would appreciate your guidance.

                Phil

                Comment


                • #9
                  Phil:
                  I agree with your concern.
                  In a -fe- model, R-sq between and overall are simply squared correlation coefficients and the focus of the analysis should be on R-sq within.
                  In Vishaal's cases, R-sq within is higher in the second model, where -i.years- taken all together are also different from 0 (even though some of them are quite far from reaching statistical significance). I agree that, taken all together, omitting -i-year- can lead to omitted variable bias.
                  Even if we do not consider R-sq between and overall (which is a legal take when it comes to -xtreg, fe- specification), the first model outperforms the second one as far as -sigma_u- is concerned (a feature I should have repoeted in my previous reply).
                  So the choice may be between higher R-sq within but lower -sigma_u- or viceversa (even though in both model most part of the predictors hardly explain the variation in -depvar-; however, even this outcome may be an interesting result)..
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Carlo,

                    Thanks for the clarification. I'm afraid I have another question.

                    I would have thought that lower sigma u was a good thing - indicating that the variables in the model are explaining more of the variance. Hypothetically, if we had all the right variables, sigma u should be zero - we would not need the fixed effects. Why is larger sigma u a good thing?

                    I suspect underneath it all Vishaal is estimating too many parameters with too little data. Even in the first smaller model he has 19 fixed effects plus 7 variables which is a lot if you only have 157 observations, particularly if you're looking for a modest effect. Adding the 8 time dummies makes this worse. I would think he should work on getting more data.

                    Phil


                    Comment


                    • #11
                      Phil:
                      I meant that in model 1 -sigma_u- outperforms -sigma_e- and a higher portion of the variation in -depvar- is explained by individual effect rather than idiosyncratic error.
                      However, the more I look at Vishaal's example, the stronger is my belief that the problem (if any) lies in the predictors, which hardly explain any variation in the -depvar-. It would be interesting for Vishaal to take this matter further and investigate if the absence of (statistical significance) evidence is indeed absence of evidence or if increasing the sample sizes can give different findings.
                      Otherwise I agree with you that both models suffer from an excess of predictors when constrasted against their sample sizes.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment

                      Working...
                      X