Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Wrong coefficient in a fixed effects panel regression

    Hello,

    I'm trying to regress the impact that fiscal rules might have on the Primary Balance of a Country. I have a panel data with 28 countries and 28 years. So t=28 and n=28. The problem is when i'm using the fixed effects regression for panel data i get the wrong coefficient comparing with all the previous literature.

    When i asses the same regression on random effects i get a correct coefficient, however from the BP Lagrange result, random effects are not appropriate and i need robust std errors due to the existence of heterokedasticity...

    I've been using the -xtreg, fe vce(r)- to regress this unbalanced dynamic panel data. I'm getting the same result for the IV-fe estimator... Is it a problem of the fixed effects model? Here's some example:

    Fixed-effects (within) regression Number of obs = 552
    Group variable: id Number of groups = 28

    R-sq: Obs per group:
    within = 0.5762 min = 12
    between = 0.5118 avg = 19.7
    overall = 0.5144 max = 23

    F(7,27) = 128.73
    corr(u_i, Xb) = -0.4775 Prob > F = 0.0000

    (Std. Err. adjusted for 28 clusters in id)

    Robust
    PB Coef. Std. Err. t P>t [95% Conf. Interval]

    PB1 .624859 .0553634 11.29 0.000 .5112626 .7384554
    Debt1 .0388924 .0104554 3.72 0.001 .0174397 .0603451
    Gap1 .0363654 .0568418 0.64 0.528 -.0802644 .1529951
    EXPDEC .0863861 .0625988 1.38 0.179 -.0420561 .2148284
    Election -.071106 .2191869 -0.32 0.748 -.5208403 .3786282
    FSI -11.35462 2.135158 -5.32 0.000 -15.7356 -6.973639
    Rules -.3425743 .136504 -2.51 0.018 -.6226573 -.0624912
    _cons -3.107349 2.058504 -1.51 0.143 -7.331051 1.116353

    sigma_u 1.5384986
    sigma_e 1.9666157
    rho .37965466 (fraction of variance due to u_i)

  • #2
    If I understand your post, the issue is that the results you are getting do not agree with your expectations of what they should be. Since -xtreg, fe- is a very old Stata command at this point and it does not lead to a lot of people posting questions about whether it has a bug, we can probably just forget the possiblity that that command is at fault.

    So either your data is wrong or your expectations are wrong.

    Are you running the same exact model as was used in the studies you are comparing your results to? Adding or removing even a single new variable to the model can change everything. How was your data sample accrued: is it similar to the way the data in the earlier studies was gathered? Are the same measurement procedures in use? Are we talking about the same overall time periods in your study and the others? Same geography? It may be that the findings of the earlier studies are implicitly dependent on those contextual features and your study is not a replication in that respect.

    But there is a more enticing clue. You indicate that with random effects modeling you get results along the lines you were expecting, but not with fixed effects. You then go on to mention a test that says you should be using fixed effects. I am familiar with the widespread practice in economics and econometrics of using such tests to choose between fixed and random effects modeling. But it is wrong-headed to apply it mindlessly. You need to first consider what kind of effects you are trying to estimate. In panel data, you have to think about both effects within panels (ids) and effects between panels. They can be very different--even opposite in sign, as the following code demonstrates:
    Code:
    clear
    set obs 5
    gen panel_id = _n
    expand 2
    
    set seed 1234
    by panel_id , sort: gen y = 4*panel_id - _n + 3 + rnormal(0, 0.5)
    by panel_id: gen x = panel_id + _n
    
    xtset panel_id 
    
    xtreg y x, fe
    regress y x
    
    //    GRAPH THE DATA TO SHOW WHAT'S HAPPENING
    separate y, by(panel_id)
    
    graph twoway connect y? x || lfit y x
    A fixed effects model estimates only within-panel effects. A random effects model assumes that the within and between effects are the same and models a parameter that is an estimate of those common effects. It sounds to me like the previous studies have looked at a between-panel effect and you are getting thrown off by using a within-panel analysis. Of course, if the -fe- and -re- results are so discrepant, it implies that the "within effect = between effect" assumption implicit in random-effects modeling is incorrect. So you might be better off doing separate modeling of within- and between- effects. One way to get that is with Francisco Perales' -xthybrid- command, available from SSC. If you run this I suspect that you will find that the between effects match your expectations and the within-effects do not.

    In the future, when showing Stata output, please place it between code delimiters so that it will align readably. If you are not familiar with code delimiters, please read Forum FAQ #12, or watch David Benson's video at https://youtu.be/bXfaRCAOPbI. (In either place you will also learn about using -dataex- to show example data.)

    Comment


    • #3
      Thank you for the reply. Regarding the data, i already checked it twice before the post so i believe that nothing is wrong there... In studies that I checked they mostly used a "fixed effects OLS regression" and 2SLS (I expect that these two methods correpond to the same that I'm using for panel data) with moreless the same variables. However it is true that my study considers a wider period, but it's just more 4 years and one extra country.

      I'll test the command that you suggested and come back with a reply...

      And sorry for the lack of code delimiters...

      Comment


      • #4
        Here's an expample of my panel data

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte id int Year double(PB PB1 Debt1 Gap1 EXPDEC) byte Election double(FSI Rules) byte(EMU SGP ENL)
        1 1990          .          .       .          .                  . 1 .08955833333333334 -.949364 0 0 0
        1 1991          .          .       .  1.4292178                  . 0 .18814166666666665 -.949364 0 0 0
        1 1992          .          .       .  1.6432503                  . 0 .25890833333333335 -.949364 0 0 0
        1 1993          .          .       .   .7410163                  . 0 .19639166666666666 -.949364 0 0 0
        1 1994          .          .       . -1.1694382                  . 1            .091825 -.949364 0 0 0
        1 1995 -2.1370968          .       . -1.0350857  33.58018214814436 1               .146 -.949364 1 0 0
        1 1996  -.5822203 -2.1370968 68.3205  -.6094489  33.50052722711015 0 .11419166666666665 -.949364 1 0 0
        1 1997  1.0246299  -.5822203 68.2596  -.5113661  32.87075193266632 0 .13790833333333333 -.949364 1 0 0
        1 1998   .8804277  1.0246299 63.4934  -.8796375 33.191438084032995 0 .09439999999999998 -.949364 1 0 0
        1 1999   .8382758   .8804277  63.859   .0623869 33.633089456844914 1 .18689166666666665  .224278 0 1 0
        1 2000  1.1549509   .8382758 66.6905   .9046167  33.69649516397724 0 .10743333333333333  .224278 0 1 0
        1 2001  2.9436833  1.1549509 66.1242  1.6168195 32.840013984541706 0 .06568333333333333  .370483 0 1 0
        1 2002  2.0627188  2.9436833  66.729   .4728021  33.56798646938247 1 .03391666666666666  .370483 0 1 0
        1 2003  1.3936754  2.0627188  66.728  -.0780398   33.4312592894763 0 .05971666666666666  .370483 0 1 0
        1 2004 -1.7988411  1.3936754 65.8529 -1.2765579 31.509362574357073 0            .066325  .370483 0 1 0
        end
        format %ty Year

        The result that i've obtained with the -fe- comand (with rules coefficient negative) is

        Code:
        Fixed-effects (within) regression               Number of obs     =        552
        Group variable: id                              Number of groups  =         28
        
        R-sq:                                           Obs per group:
             within  = 0.5887                                         min =         12
             between = 0.5054                                         avg =       19.7
             overall = 0.5177                                         max =         23
        
                                                        F(10,27)          =     117.69
        corr(u_i, Xb)  = -0.4882                        Prob > F          =     0.0000
        
                                            (Std. Err. adjusted for 28 clusters in id)
        ------------------------------------------------------------------------------
                     |               Robust
                  PB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 PB1 |   .6127309   .0569516    10.76   0.000     .4958758     .729586
               Debt1 |   .0380682   .0092282     4.13   0.000     .0191335    .0570028
                Gap1 |   .0452214   .0578105     0.78   0.441     -.073396    .1638387
              EXPDEC |   .0990221   .0545673     1.81   0.081    -.0129408    .2109849
            Election |    -.05914   .2161574    -0.27   0.786    -.5026584    .3843784
                 FSI |  -11.00049   2.082507    -5.28   0.000    -15.27345   -6.727541
               Rules |   -.216667   .1136778    -1.91   0.067    -.4499145    .0165806
                 EMU |   .9387213   .3226792     2.91   0.007     .2766382    1.600804
                 SGP |  -.3660889   .4234953    -0.86   0.395    -1.235029    .5028516
                 ENL |   .3710115   .3068094     1.21   0.237    -.2585093    1.000532
               _cons |  -3.498912   1.735103    -2.02   0.054     -7.05905    .0612257
        -------------+----------------------------------------------------------------
             sigma_u |  1.5663862
             sigma_e |  1.9432308
                 rho |  .39384898   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        The result of the command suggested:

        Code:
        Hybrid model. Family: gaussian. Link: identity.
        
        +-----------------------------------+
        |             Variable |   model    |
        |----------------------+------------|
        | PB                   |            |
        |               W__PB1 |     0.6127 |
        |             W__Debt1 |     0.0381 |
        |              W__Gap1 |     0.0452 |
        |            W__EXPDEC |     0.0990 |
        |          W__Election |    -0.0591 |
        |               W__FSI |   -11.0005 |
        |             W__Rules |    -0.2167 |
        |               W__EMU |     0.9387 |
        |               W__SGP |    -0.3661 |
        |               W__ENL |     0.3710 |
        |               B__PB1 |     0.9899 |
        |             B__Debt1 |    -0.0011 |
        |              B__Gap1 |     0.0108 |
        |            B__EXPDEC |    -0.0004 |
        |          B__Election |     1.1762 |
        |               B__FSI |    -2.5772 |
        |             B__Rules |    -0.1081 |
        |               B__EMU |     1.4356 |
        |               B__SGP |     0.0545 |
        |               B__ENL |     0.2680 |
        |                _cons |     0.0087 |
        |----------------------+------------|
        |        var(_cons[id])|            |
        |                _cons |     0.0000 |
        |----------------------+------------|
        |             var(e.PB)|            |
        |                _cons |     3.5342 |
        |----------------------+------------|
        | Statistics           |            |
        |                   ll | -1131.7025 |
        |                 chi2 | 16966.7127 |
        |                    p |     0.0000 |
        |                  aic |  2307.4051 |
        |                  bic |  2402.3031 |
        +-----------------------------------+
        Level 1: 552 units. Level 2: 28 units.
        Both Rules remain negative, however if i exclude the lagged dependent variable i get

        Code:
        Hybrid model. Family: gaussian. Link: identity.
        
        +-----------------------------------+
        |             Variable |   model    |
        |----------------------+------------|
        | PB                   |            |
        |             W__Debt1 |     0.0437 |
        |              W__Gap1 |     0.3014 |
        |            W__EXPDEC |     0.1756 |
        |          W__Election |    -0.0679 |
        |               W__FSI |   -14.0767 |
        |             W__Rules |    -0.0222 |
        |               W__EMU |     2.0770 |
        |               W__SGP |    -0.0243 |
        |               W__ENL |     0.1246 |
        |             B__Debt1 |     0.0076 |
        |              B__Gap1 |    -0.2754 |
        |            B__EXPDEC |     0.0235 |
        |          B__Election |   -22.0083 |
        |               B__FSI |    11.2416 |
        |             B__Rules |     1.4942 |
        |               B__EMU |   -13.6932 |
        |               B__SGP |     0.4878 |
        |               B__ENL |    -2.8541 |
        |                _cons |     4.4337 |
        |----------------------+------------|
        |        var(_cons[id])|            |
        |                _cons |     0.6333 |
        |----------------------+------------|
        |             var(e.PB)|            |
        |                _cons |     6.3011 |
        |----------------------+------------|
        | Statistics           |            |
        |                   ll | -1308.8622 |
        |                 chi2 |   433.9111 |
        |                    p |     0.0000 |
        |                  aic |  2659.7243 |
        |                  bic |  2750.3469 |
        +-----------------------------------+
        Level 1: 553 units. Level 2: 28 units.
        One is already positive... Is it possible that the existence of the lagged dependent variable as independent can cause a distoriton of the results? If yes, how can i solve it?

        Comment


        • #5
          Is it possible that the existence of the lagged dependent variable as independent can cause a distoriton of the results? If yes, how can i solve it?
          Well, it is definitely possible for inclusion of the lagged dependent variable as a predictor to change the results, even dramatically. Whether it is appropriate to call that a "distortion" is a different question. It depends on whether the real world data generating process depends on the lagged outcome or not. If it does, then including it leads to improving the results, not distorting them. On the other hand, if the real world data generating process is independent of the lagged outcome, then including it in the analysis would be properly called a distortion.

          As for which of those scenarios applies, that is a substantive question of economics, and you will need to consult an economist about that.

          Added: By the way, looking at the output of -xthybrid-, it is very clear that the within and between effects in your data are dramatically different. So you really need to be very clear about which effects are the appropriate ones for your purposes.

          Also, when posting results form Stata, it is better to post the direct output of the regression command (as you did for -xtreg, fe-, rather than the results that have been laundered through -estout- or -esttab- or some other pretty-print program. Often to really understand your results you need to see the standard errors or confidence intervals, not just the coefficients. For example, it could be in the case that the confidence interval for the coefficient you are concerned about is so wide that it actually includes the "correct" value you were hoping to see. In that case, your problem is just that your data provide very imprecise estimates for that parameter.
          Last edited by Clyde Schechter; 12 Oct 2019, 11:06.

          Comment


          • #6
            An update...

            What forces the tests to recomend the use of a effects is the existence of the lagged dependent variable... If i remove it and perform a re regression i get the followign results:

            Code:
            Random-effects GLS regression                   Number of obs     =        553
            Group variable: id                              Number of groups  =         28
            
            R-sq:                                           Obs per group:
                 within  = 0.2823                                         min =         12
                 between = 0.2004                                         avg =       19.8
                 overall = 0.2501                                         max =         23
            
                                                            Wald chi2(9)      =     203.28
            corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
            
            ------------------------------------------------------------------------------
                      PB |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   Debt1 |   .0211899    .006601     3.21   0.001     .0082522    .0341276
                    Gap1 |   .2703571   .0388877     6.95   0.000     .1941385    .3465756
                  EXPDEC |   .0565655   .0164415     3.44   0.001     .0243408    .0887902
                Election |  -.1235842   .2525402    -0.49   0.625    -.6185538    .3713855
                     FSI |  -13.76216   1.351895   -10.18   0.000    -16.41183   -11.11249
                   Rules |   .1978891    .146105     1.35   0.176    -.0884714    .4842496
                     EMU |   2.130496   .5166952     4.12   0.000     1.117792      3.1432
                     SGP |   .0923995   .3879298     0.24   0.812     -.667929    .8527279
                     ENL |   .0563332   .3571908     0.16   0.875    -.6437479    .7564143
                   _cons |  -1.277455   .7057384    -1.81   0.070    -2.660677    .1057667
            -------------+----------------------------------------------------------------
                 sigma_u |  1.0552278
                 sigma_e |  2.5329955
                     rho |  .14788433   (fraction of variance due to u_i)
            ------------------------------------------------------------------------------
            Although the insignificance of the result all the coefficients match the previous studies... However they used the lagged dependent variable PB1 and fixed model to obtain the same result.

            Comment


            • #7
              Well, again, let me emphasize that reliance on a statistical test to decide between fixed and random effects modeling here is wrong-headed and you should not do it. The -xthybrid- results are quite unambiguous: the within- and between- effects are radically different for most of your predictors. This means that the random effects model is reliant on an assumption that isn't even a reasonable approximation of the truth for your data. So you just can't use random effects here. But, you also need to be clear about whether you want the within- or between- effects. If the former, use -xtreg, fe-. If the latter, use the B_ outputs from -xthybrid- (or use -xtreg, be- for yet another approach).

              As for your data set being a reasonable replication of the earlier studies' data, have you actually checked that you have the same values for all the variables for the countries and years that are common to both data sets? Or, if the earlier data sets are not available to you, can you at least reproduce the earlier studies' published summary statistics (means, standard deviations and ranges) of the variables when you restrict your data set to the common country-year pairs? If so, the implication is that the additional four years (and additional countries if there are any) really are different from what happened in the earlier study (or there are serious data errors in your data for those four years).

              Comment


              • #8
                I have a question... How is it possible to perform a "OLS fixed effects" and a 2SLS with a panel data? The comands that I'm using are correct?

                Comment


                • #9
                  -xtreg, fe- is implemented in Stata as OLS applied to group-demeaned data. The term "OLS fixed effects" is presumably a short-hand way of saying that. In any case, all of the ways of implementing fixed-effects regression that I am familiar with ultimately apply OLS regression to something derived from the original data. As for 2SLS with panel data, see -help xtivreg-.

                  Comment


                  • #10
                    So, with the random effects excluded... I focused on the between- effects and i get these results:

                    Code:
                    Between regression (regression on group means)  Number of obs     =        552
                    Group variable: id                              Number of groups  =         28
                    
                    R-sq:                                           Obs per group:
                         within  = 0.4593                                         min =         12
                         between = 0.9929                                         avg =       19.7
                         overall = 0.5619                                         max =         23
                    
                                                                    F(10,17)          =     237.88
                    sd(u_i + avg(e_i.))=  .1711315                  Prob > F          =     0.0000
                    
                    ------------------------------------------------------------------------------
                              PB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                             PB1 |   .9876384   .0339807    29.06   0.000     .9159453    1.059332
                           Debt1 |  -.0009253    .001789    -0.52   0.612    -.0046998    .0028493
                            Gap1 |     .01601   .0600337     0.27   0.793      -.11065      .14267
                          EXPDEC |  -.0008328    .003113    -0.27   0.792    -.0074006    .0057349
                        Election |   1.056956   1.295901     0.82   0.426    -1.677155    3.791068
                             FSI |  -2.595907   1.682728    -1.54   0.141    -6.146153    .9543382
                           Rules |  -.1044775   .0972972    -1.07   0.298    -.3097567    .1008017
                             EMU |   1.276397   1.335746     0.96   0.353    -1.541781    4.094574
                             SGP |   .0614065   .1255529     0.49   0.631    -.2034869    .3262999
                             ENL |   .2223172   .2323802     0.96   0.352    -.2679621    .7125966
                           _cons |   .0651435   .4467789     0.15   0.886    -.8774774    1.007765
                    ------------------------------------------------------------------------------
                    Once again they are googd, if it wasn't the coefficient for the Rules variable...

                    Comment


                    • #11
                      Ok, so those are exactly the same commands that I was using... Both the -xtreg, fe- and the -xtivreg-

                      Comment


                      • #12
                        Regarding the data set... For the Rules, one of the studies that I'm reading as the following information:

                        Mean: 0.00; Median: -0.21; Std. Dev: 1.00 and 593 Obs

                        And this is what I get from STATA for the "same" variable:

                        Code:
                          Variable |        Obs        Mean    Std. Dev.       Min        Max
                        -------------+---------------------------------------------------------
                               Rules |        784     .000908    1.001225   -.949364   3.404152

                        Comment


                        • #13
                          Here is what i get from the -xtivreg-

                          Code:
                          Fixed-effects (within) IV regression            Number of obs     =        543
                          Group variable: id                              Number of groups  =         28
                          
                          R-sq:                                           Obs per group:
                               within  = 0.5886                                         min =         12
                               between = 0.5395                                         avg =       19.4
                               overall = 0.5390                                         max =         23
                          
                          
                                                                          Wald chi2(10)     =    2355.55
                          corr(u_i, Xb)  = -0.4393                        Prob > chi2       =     0.0000
                          
                                                              (Std. Err. adjusted for 28 clusters in id)
                          ------------------------------------------------------------------------------
                                       |               Robust
                                    PB |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                  Gap1 |  -.0543808   .0718442    -0.76   0.449    -.1951928    .0864313
                                   PB1 |   .6530802    .063096    10.35   0.000     .5294143    .7767461
                                 Debt1 |   .0282966   .0104082     2.72   0.007     .0078968    .0486963
                                EXPDEC |   .0995508   .0497539     2.00   0.045     .0020351    .1970666
                              Election |  -.0802867   .2262745    -0.35   0.723    -.5237767    .3632032
                                   FSI |  -11.08375   2.200692    -5.04   0.000    -15.39702   -6.770471
                                   EMU |    .848432   .3601742     2.36   0.018     .1425037     1.55436
                                   SGP |  -.3362498     .42773    -0.79   0.432    -1.174585    .5020855
                                   ENL |   .6529903   .3106763     2.10   0.036      .044076    1.261905
                                 Rules |  -.2398042   .1106926    -2.17   0.030    -.4567576   -.0228507
                                 _cons |  -3.065612    1.67161    -1.83   0.067    -6.341907    .2106842
                          -------------+----------------------------------------------------------------
                               sigma_u |  1.4142447
                               sigma_e |  1.9580111
                                   rho |  .34283922   (fraction of variance due to u_i)
                          ------------------------------------------------------------------------------
                          Instrumented:   Gap1
                          Instruments:    PB1 Debt1 EXPDEC Election FSI EMU SGP ENL Rules Gap2 Gap3
                          ------------------------------------------------------------------------------
                          Basically same results, with even higher statisticall relevance...

                          Comment


                          • #14
                            One minor detail... All my variables (excluding the dummy, year and id) get the format of %10.0g and rules is the only that gets %14.2f.

                            Does this indicates something that I should be aware of?

                            Comment


                            • #15
                              One minor detail... All my variables (excluding the dummy, year and id) get the format of %10.0g and rules is the only that gets %14.2f.
                              The display format does not affect the actual values that Stata uses for computing--just how Stata shows them to you.

                              Now, you might wonder what transpired during the data management that culminated in the creation of this data set: something was different about that rules variable, perhaps in the original source, or perhaps by virtue of some processing that was applied to it. But it seems from #12 that its summary statistics are a pretty good match to the other study. So I wouldn't be worried about that.

                              However, I do have a question about the other variables. The formatting %10.0g is appropriate variables that only take on integer values (or where only the integer part is meaningful). Are all of these other variables supposed to be integers? Because it is likely that that is what you have. Do these other variables have value labels attached to them? If any of them are not supposed to be integers, and if they have value labels attached to them, then it would suggest that those variables were initially imported, for who knows what reason, as strings, and then inappropriately -encode-d rather than being fixed with -destring-. In that case, the actual values that Stata is calculating with are the underlying encodings: 1, 2, 3, 4,... and not the values you actually need. That could throw off any kind of regression in any way imaginable. So, again, did you look at the summary statistics for all of these variables. Just because the rules variable came out as expected doesn't mean you don't have a problem with some other variable(s). And a problem in any variable could affect the results for the rules variable.

                              Comment

                              Working...
                              X