Wrong coefficient in a fixed effects panel regression

David Coelho

Join Date: Oct 2019

Posts: 16
#1

Wrong coefficient in a fixed effects panel regression

12 Oct 2019, 09:31

Hello,

I'm trying to regress the impact that fiscal rules might have on the Primary Balance of a Country. I have a panel data with 28 countries and 28 years. So t=28 and n=28. The problem is when i'm using the fixed effects regression for panel data i get the wrong coefficient comparing with all the previous literature.

When i asses the same regression on random effects i get a correct coefficient, however from the BP Lagrange result, random effects are not appropriate and i need robust std errors due to the existence of heterokedasticity...

I've been using the -xtreg, fe vce(r)- to regress this unbalanced dynamic panel data. I'm getting the same result for the IV-fe estimator... Is it a problem of the fixed effects model? Here's some example:

Fixed-effects (within) regression Number of obs = 552
Group variable: id Number of groups = 28

R-sq: Obs per group:
within = 0.5762 min = 12
between = 0.5118 avg = 19.7
overall = 0.5144 max = 23

F(7,27) = 128.73
corr(u_i, Xb) = -0.4775 Prob > F = 0.0000

(Std. Err. adjusted for 28 clusters in id)

Robust
PB Coef. Std. Err. t P>t [95% Conf. Interval]

PB1 .624859 .0553634 11.29 0.000 .5112626 .7384554
Debt1 .0388924 .0104554 3.72 0.001 .0174397 .0603451
Gap1 .0363654 .0568418 0.64 0.528 -.0802644 .1529951
EXPDEC .0863861 .0625988 1.38 0.179 -.0420561 .2148284
Election -.071106 .2191869 -0.32 0.748 -.5208403 .3786282
FSI -11.35462 2.135158 -5.32 0.000 -15.7356 -6.973639
Rules -.3425743 .136504 -2.51 0.018 -.6226573 -.0624912
_cons -3.107349 2.058504 -1.51 0.143 -7.331051 1.116353

sigma_u 1.5384986
sigma_e 1.9666157
rho .37965466 (fraction of variance due to u_i)
Tags: fixed effects, panel data, regression
Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#2

12 Oct 2019, 09:47

If I understand your post, the issue is that the results you are getting do not agree with your expectations of what they should be. Since -xtreg, fe- is a very old Stata command at this point and it does not lead to a lot of people posting questions about whether it has a bug, we can probably just forget the possiblity that that command is at fault.

So either your data is wrong or your expectations are wrong.

Are you running the same exact model as was used in the studies you are comparing your results to? Adding or removing even a single new variable to the model can change everything. How was your data sample accrued: is it similar to the way the data in the earlier studies was gathered? Are the same measurement procedures in use? Are we talking about the same overall time periods in your study and the others? Same geography? It may be that the findings of the earlier studies are implicitly dependent on those contextual features and your study is not a replication in that respect.

But there is a more enticing clue. You indicate that with random effects modeling you get results along the lines you were expecting, but not with fixed effects. You then go on to mention a test that says you should be using fixed effects. I am familiar with the widespread practice in economics and econometrics of using such tests to choose between fixed and random effects modeling. But it is wrong-headed to apply it mindlessly. You need to first consider what kind of effects you are trying to estimate. In panel data, you have to think about both effects within panels (ids) and effects between panels. They can be very different--even opposite in sign, as the following code demonstrates:

Code:

clear set obs 5 gen panel_id = _n expand 2 set seed 1234 by panel_id , sort: gen y = 4*panel_id - _n + 3 + rnormal(0, 0.5) by panel_id: gen x = panel_id + _n xtset panel_id xtreg y x, fe regress y x // GRAPH THE DATA TO SHOW WHAT'S HAPPENING separate y, by(panel_id) graph twoway connect y? x || lfit y x

A fixed effects model estimates only within-panel effects. A random effects model assumes that the within and between effects are the same and models a parameter that is an estimate of those common effects. It sounds to me like the previous studies have looked at a between-panel effect and you are getting thrown off by using a within-panel analysis. Of course, if the -fe- and -re- results are so discrepant, it implies that the "within effect = between effect" assumption implicit in random-effects modeling is incorrect. So you might be better off doing separate modeling of within- and between- effects. One way to get that is with Francisco Perales' -xthybrid- command, available from SSC. If you run this I suspect that you will find that the between effects match your expectations and the within-effects do not.

In the future, when showing Stata output, please place it between code delimiters so that it will align readably. If you are not familiar with code delimiters, please read Forum FAQ #12, or watch David Benson's video at https://youtu.be/bXfaRCAOPbI. (In either place you will also learn about using -dataex- to show example data.)
Comment
David Coelho

Join Date: Oct 2019

Posts: 16
#3

12 Oct 2019, 10:14

Thank you for the reply. Regarding the data, i already checked it twice before the post so i believe that nothing is wrong there... In studies that I checked they mostly used a "fixed effects OLS regression" and 2SLS (I expect that these two methods correpond to the same that I'm using for panel data) with moreless the same variables. However it is true that my study considers a wider period, but it's just more 4 years and one extra country.

I'll test the command that you suggested and come back with a reply...

And sorry for the lack of code delimiters...
Comment

David Coelho

Join Date: Oct 2019
Posts: 16

12 Oct 2019, 10:49

Here's an expample of my panel data

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id int Year double(PB PB1 Debt1 Gap1 EXPDEC) byte Election double(FSI Rules) byte(EMU SGP ENL)
1 1990          .          .       .          .                  . 1 .08955833333333334 -.949364 0 0 0
1 1991          .          .       .  1.4292178                  . 0 .18814166666666665 -.949364 0 0 0
1 1992          .          .       .  1.6432503                  . 0 .25890833333333335 -.949364 0 0 0
1 1993          .          .       .   .7410163                  . 0 .19639166666666666 -.949364 0 0 0
1 1994          .          .       . -1.1694382                  . 1            .091825 -.949364 0 0 0
1 1995 -2.1370968          .       . -1.0350857  33.58018214814436 1               .146 -.949364 1 0 0
1 1996  -.5822203 -2.1370968 68.3205  -.6094489  33.50052722711015 0 .11419166666666665 -.949364 1 0 0
1 1997  1.0246299  -.5822203 68.2596  -.5113661  32.87075193266632 0 .13790833333333333 -.949364 1 0 0
1 1998   .8804277  1.0246299 63.4934  -.8796375 33.191438084032995 0 .09439999999999998 -.949364 1 0 0
1 1999   .8382758   .8804277  63.859   .0623869 33.633089456844914 1 .18689166666666665  .224278 0 1 0
1 2000  1.1549509   .8382758 66.6905   .9046167  33.69649516397724 0 .10743333333333333  .224278 0 1 0
1 2001  2.9436833  1.1549509 66.1242  1.6168195 32.840013984541706 0 .06568333333333333  .370483 0 1 0
1 2002  2.0627188  2.9436833  66.729   .4728021  33.56798646938247 1 .03391666666666666  .370483 0 1 0
1 2003  1.3936754  2.0627188  66.728  -.0780398   33.4312592894763 0 .05971666666666666  .370483 0 1 0
1 2004 -1.7988411  1.3936754 65.8529 -1.2765579 31.509362574357073 0            .066325  .370483 0 1 0
end
format %ty Year

The result that i've obtained with the -fe- comand (with rules coefficient negative) is

Code:

Fixed-effects (within) regression               Number of obs     =        552
Group variable: id                              Number of groups  =         28

R-sq:                                           Obs per group:
     within  = 0.5887                                         min =         12
     between = 0.5054                                         avg =       19.7
     overall = 0.5177                                         max =         23

                                                F(10,27)          =     117.69
corr(u_i, Xb)  = -0.4882                        Prob > F          =     0.0000

                                    (Std. Err. adjusted for 28 clusters in id)
------------------------------------------------------------------------------
             |               Robust
          PB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         PB1 |   .6127309   .0569516    10.76   0.000     .4958758     .729586
       Debt1 |   .0380682   .0092282     4.13   0.000     .0191335    .0570028
        Gap1 |   .0452214   .0578105     0.78   0.441     -.073396    .1638387
      EXPDEC |   .0990221   .0545673     1.81   0.081    -.0129408    .2109849
    Election |    -.05914   .2161574    -0.27   0.786    -.5026584    .3843784
         FSI |  -11.00049   2.082507    -5.28   0.000    -15.27345   -6.727541
       Rules |   -.216667   .1136778    -1.91   0.067    -.4499145    .0165806
         EMU |   .9387213   .3226792     2.91   0.007     .2766382    1.600804
         SGP |  -.3660889   .4234953    -0.86   0.395    -1.235029    .5028516
         ENL |   .3710115   .3068094     1.21   0.237    -.2585093    1.000532
       _cons |  -3.498912   1.735103    -2.02   0.054     -7.05905    .0612257
-------------+----------------------------------------------------------------
     sigma_u |  1.5663862
     sigma_e |  1.9432308
         rho |  .39384898   (fraction of variance due to u_i)
------------------------------------------------------------------------------

The result of the command suggested:

Code:

Hybrid model. Family: gaussian. Link: identity.

+-----------------------------------+
|             Variable |   model    |
|----------------------+------------|
| PB                   |            |
|               W__PB1 |     0.6127 |
|             W__Debt1 |     0.0381 |
|              W__Gap1 |     0.0452 |
|            W__EXPDEC |     0.0990 |
|          W__Election |    -0.0591 |
|               W__FSI |   -11.0005 |
|             W__Rules |    -0.2167 |
|               W__EMU |     0.9387 |
|               W__SGP |    -0.3661 |
|               W__ENL |     0.3710 |
|               B__PB1 |     0.9899 |
|             B__Debt1 |    -0.0011 |
|              B__Gap1 |     0.0108 |
|            B__EXPDEC |    -0.0004 |
|          B__Election |     1.1762 |
|               B__FSI |    -2.5772 |
|             B__Rules |    -0.1081 |
|               B__EMU |     1.4356 |
|               B__SGP |     0.0545 |
|               B__ENL |     0.2680 |
|                _cons |     0.0087 |
|----------------------+------------|
|        var(_cons[id])|            |
|                _cons |     0.0000 |
|----------------------+------------|
|             var(e.PB)|            |
|                _cons |     3.5342 |
|----------------------+------------|
| Statistics           |            |
|                   ll | -1131.7025 |
|                 chi2 | 16966.7127 |
|                    p |     0.0000 |
|                  aic |  2307.4051 |
|                  bic |  2402.3031 |
+-----------------------------------+
Level 1: 552 units. Level 2: 28 units.

Both Rules remain negative, however if i exclude the lagged dependent variable i get

Code:

Hybrid model. Family: gaussian. Link: identity.

+-----------------------------------+
|             Variable |   model    |
|----------------------+------------|
| PB                   |            |
|             W__Debt1 |     0.0437 |
|              W__Gap1 |     0.3014 |
|            W__EXPDEC |     0.1756 |
|          W__Election |    -0.0679 |
|               W__FSI |   -14.0767 |
|             W__Rules |    -0.0222 |
|               W__EMU |     2.0770 |
|               W__SGP |    -0.0243 |
|               W__ENL |     0.1246 |
|             B__Debt1 |     0.0076 |
|              B__Gap1 |    -0.2754 |
|            B__EXPDEC |     0.0235 |
|          B__Election |   -22.0083 |
|               B__FSI |    11.2416 |
|             B__Rules |     1.4942 |
|               B__EMU |   -13.6932 |
|               B__SGP |     0.4878 |
|               B__ENL |    -2.8541 |
|                _cons |     4.4337 |
|----------------------+------------|
|        var(_cons[id])|            |
|                _cons |     0.6333 |
|----------------------+------------|
|             var(e.PB)|            |
|                _cons |     6.3011 |
|----------------------+------------|
| Statistics           |            |
|                   ll | -1308.8622 |
|                 chi2 |   433.9111 |
|                    p |     0.0000 |
|                  aic |  2659.7243 |
|                  bic |  2750.3469 |
+-----------------------------------+
Level 1: 553 units. Level 2: 28 units.

One is already positive... Is it possible that the existence of the lagged dependent variable as independent can cause a distoriton of the results? If yes, how can i solve it?

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#5

12 Oct 2019, 10:59

Is it possible that the existence of the lagged dependent variable as independent can cause a distoriton of the results? If yes, how can i solve it?

Well, it is definitely possible for inclusion of the lagged dependent variable as a predictor to change the results, even dramatically. Whether it is appropriate to call that a "distortion" is a different question. It depends on whether the real world data generating process depends on the lagged outcome or not. If it does, then including it leads to improving the results, not distorting them. On the other hand, if the real world data generating process is independent of the lagged outcome, then including it in the analysis would be properly called a distortion.

As for which of those scenarios applies, that is a substantive question of economics, and you will need to consult an economist about that.

Added: By the way, looking at the output of -xthybrid-, it is very clear that the within and between effects in your data are dramatically different. So you really need to be very clear about which effects are the appropriate ones for your purposes.

Also, when posting results form Stata, it is better to post the direct output of the regression command (as you did for -xtreg, fe-, rather than the results that have been laundered through -estout- or -esttab- or some other pretty-print program. Often to really understand your results you need to see the standard errors or confidence intervals, not just the coefficients. For example, it could be in the case that the confidence interval for the coefficient you are concerned about is so wide that it actually includes the "correct" value you were hoping to see. In that case, your problem is just that your data provide very imprecise estimates for that parameter.

Last edited by Clyde Schechter; 12 Oct 2019, 11:06.
Comment

David Coelho

Join Date: Oct 2019
Posts: 16

12 Oct 2019, 11:00

An update...

What forces the tests to recomend the use of a effects is the existence of the lagged dependent variable... If i remove it and perform a re regression i get the followign results:

Code:

Random-effects GLS regression                   Number of obs     =        553
Group variable: id                              Number of groups  =         28

R-sq:                                           Obs per group:
     within  = 0.2823                                         min =         12
     between = 0.2004                                         avg =       19.8
     overall = 0.2501                                         max =         23

                                                Wald chi2(9)      =     203.28
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
          PB |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       Debt1 |   .0211899    .006601     3.21   0.001     .0082522    .0341276
        Gap1 |   .2703571   .0388877     6.95   0.000     .1941385    .3465756
      EXPDEC |   .0565655   .0164415     3.44   0.001     .0243408    .0887902
    Election |  -.1235842   .2525402    -0.49   0.625    -.6185538    .3713855
         FSI |  -13.76216   1.351895   -10.18   0.000    -16.41183   -11.11249
       Rules |   .1978891    .146105     1.35   0.176    -.0884714    .4842496
         EMU |   2.130496   .5166952     4.12   0.000     1.117792      3.1432
         SGP |   .0923995   .3879298     0.24   0.812     -.667929    .8527279
         ENL |   .0563332   .3571908     0.16   0.875    -.6437479    .7564143
       _cons |  -1.277455   .7057384    -1.81   0.070    -2.660677    .1057667
-------------+----------------------------------------------------------------
     sigma_u |  1.0552278
     sigma_e |  2.5329955
         rho |  .14788433   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Although the insignificance of the result all the coefficients match the previous studies... However they used the lagged dependent variable PB1 and fixed model to obtain the same result.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#7

12 Oct 2019, 11:16

Well, again, let me emphasize that reliance on a statistical test to decide between fixed and random effects modeling here is wrong-headed and you should not do it. The -xthybrid- results are quite unambiguous: the within- and between- effects are radically different for most of your predictors. This means that the random effects model is reliant on an assumption that isn't even a reasonable approximation of the truth for your data. So you just can't use random effects here. But, you also need to be clear about whether you want the within- or between- effects. If the former, use -xtreg, fe-. If the latter, use the B_ outputs from -xthybrid- (or use -xtreg, be- for yet another approach).

As for your data set being a reasonable replication of the earlier studies' data, have you actually checked that you have the same values for all the variables for the countries and years that are common to both data sets? Or, if the earlier data sets are not available to you, can you at least reproduce the earlier studies' published summary statistics (means, standard deviations and ranges) of the variables when you restrict your data set to the common country-year pairs? If so, the implication is that the additional four years (and additional countries if there are any) really are different from what happened in the earlier study (or there are serious data errors in your data for those four years).
Comment
David Coelho

Join Date: Oct 2019

Posts: 16
#8

12 Oct 2019, 11:16

I have a question... How is it possible to perform a "OLS fixed effects" and a 2SLS with a panel data? The comands that I'm using are correct?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#9

12 Oct 2019, 11:19

-xtreg, fe- is implemented in Stata as OLS applied to group-demeaned data. The term "OLS fixed effects" is presumably a short-hand way of saying that. In any case, all of the ways of implementing fixed-effects regression that I am familiar with ultimately apply OLS regression to something derived from the original data. As for 2SLS with panel data, see -help xtivreg-.
Comment

David Coelho

Join Date: Oct 2019
Posts: 16

#10

12 Oct 2019, 11:25

So, with the random effects excluded... I focused on the between- effects and i get these results:

Code:

Between regression (regression on group means)  Number of obs     =        552
Group variable: id                              Number of groups  =         28

R-sq:                                           Obs per group:
     within  = 0.4593                                         min =         12
     between = 0.9929                                         avg =       19.7
     overall = 0.5619                                         max =         23

                                                F(10,17)          =     237.88
sd(u_i + avg(e_i.))=  .1711315                  Prob > F          =     0.0000

------------------------------------------------------------------------------
          PB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         PB1 |   .9876384   .0339807    29.06   0.000     .9159453    1.059332
       Debt1 |  -.0009253    .001789    -0.52   0.612    -.0046998    .0028493
        Gap1 |     .01601   .0600337     0.27   0.793      -.11065      .14267
      EXPDEC |  -.0008328    .003113    -0.27   0.792    -.0074006    .0057349
    Election |   1.056956   1.295901     0.82   0.426    -1.677155    3.791068
         FSI |  -2.595907   1.682728    -1.54   0.141    -6.146153    .9543382
       Rules |  -.1044775   .0972972    -1.07   0.298    -.3097567    .1008017
         EMU |   1.276397   1.335746     0.96   0.353    -1.541781    4.094574
         SGP |   .0614065   .1255529     0.49   0.631    -.2034869    .3262999
         ENL |   .2223172   .2323802     0.96   0.352    -.2679621    .7125966
       _cons |   .0651435   .4467789     0.15   0.886    -.8774774    1.007765
------------------------------------------------------------------------------

Once again they are googd, if it wasn't the coefficient for the Rules variable...

Comment

David Coelho

Join Date: Oct 2019

Posts: 16
#11

12 Oct 2019, 11:28

Ok, so those are exactly the same commands that I was using... Both the -xtreg, fe- and the -xtivreg-
Comment
David Coelho

Join Date: Oct 2019

Posts: 16
#12

12 Oct 2019, 11:34

Regarding the data set... For the Rules, one of the studies that I'm reading as the following information:

Mean: 0.00; Median: -0.21; Std. Dev: 1.00 and 593 Obs

And this is what I get from STATA for the "same" variable:

Code:

Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- Rules | 784 .000908 1.001225 -.949364 3.404152
Comment

David Coelho

Join Date: Oct 2019
Posts: 16

#13

12 Oct 2019, 11:42

Here is what i get from the -xtivreg-

Code:

Fixed-effects (within) IV regression            Number of obs     =        543
Group variable: id                              Number of groups  =         28

R-sq:                                           Obs per group:
     within  = 0.5886                                         min =         12
     between = 0.5395                                         avg =       19.4
     overall = 0.5390                                         max =         23


                                                Wald chi2(10)     =    2355.55
corr(u_i, Xb)  = -0.4393                        Prob > chi2       =     0.0000

                                    (Std. Err. adjusted for 28 clusters in id)
------------------------------------------------------------------------------
             |               Robust
          PB |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        Gap1 |  -.0543808   .0718442    -0.76   0.449    -.1951928    .0864313
         PB1 |   .6530802    .063096    10.35   0.000     .5294143    .7767461
       Debt1 |   .0282966   .0104082     2.72   0.007     .0078968    .0486963
      EXPDEC |   .0995508   .0497539     2.00   0.045     .0020351    .1970666
    Election |  -.0802867   .2262745    -0.35   0.723    -.5237767    .3632032
         FSI |  -11.08375   2.200692    -5.04   0.000    -15.39702   -6.770471
         EMU |    .848432   .3601742     2.36   0.018     .1425037     1.55436
         SGP |  -.3362498     .42773    -0.79   0.432    -1.174585    .5020855
         ENL |   .6529903   .3106763     2.10   0.036      .044076    1.261905
       Rules |  -.2398042   .1106926    -2.17   0.030    -.4567576   -.0228507
       _cons |  -3.065612    1.67161    -1.83   0.067    -6.341907    .2106842
-------------+----------------------------------------------------------------
     sigma_u |  1.4142447
     sigma_e |  1.9580111
         rho |  .34283922   (fraction of variance due to u_i)
------------------------------------------------------------------------------
Instrumented:   Gap1
Instruments:    PB1 Debt1 EXPDEC Election FSI EMU SGP ENL Rules Gap2 Gap3
------------------------------------------------------------------------------

Basically same results, with even higher statisticall relevance...

Comment

David Coelho

Join Date: Oct 2019

Posts: 16
#14

12 Oct 2019, 11:45

One minor detail... All my variables (excluding the dummy, year and id) get the format of %10.0g and rules is the only that gets %14.2f.

Does this indicates something that I should be aware of?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#15

12 Oct 2019, 13:58

One minor detail... All my variables (excluding the dummy, year and id) get the format of %10.0g and rules is the only that gets %14.2f.

The display format does not affect the actual values that Stata uses for computing--just how Stata shows them to you.

Now, you might wonder what transpired during the data management that culminated in the creation of this data set: something was different about that rules variable, perhaps in the original source, or perhaps by virtue of some processing that was applied to it. But it seems from #12 that its summary statistics are a pretty good match to the other study. So I wouldn't be worried about that.

However, I do have a question about the other variables. The formatting %10.0g is appropriate variables that only take on integer values (or where only the integer part is meaningful). Are all of these other variables supposed to be integers? Because it is likely that that is what you have. Do these other variables have value labels attached to them? If any of them are not supposed to be integers, and if they have value labels attached to them, then it would suggest that those variables were initially imported, for who knows what reason, as strings, and then inappropriately -encode-d rather than being fixed with -destring-. In that case, the actual values that Stata is calculating with are the underlying encodings: 1, 2, 3, 4,... and not the values you actually need. That could throw off any kind of regression in any way imaginable. So, again, did you look at the summary statistics for all of these variables. Just because the rules variable came out as expected doesn't mean you don't have a problem with some other variable(s). And a problem in any variable could affect the results for the rules variable.
Comment

Announcement