Wrong coefficient in a fixed effects panel regression

David Coelho

Join Date: Oct 2019

Posts: 16
#16

12 Oct 2019, 14:57

I was analysing the situation, and what was causing the difference on the format was the format under which these other variables were in the excel file. While Rules were registered as Number the other were as General...

So the difference in the formats is solved, however the problem of the coefficient remains the same. After new regressions were performed and still is negative... I'll take a look at the other variables to check for errors
Comment

David Coelho

Join Date: Oct 2019
Posts: 16

#17

12 Oct 2019, 15:34

Code:

   Source |       SS           df       MS      Number of obs   =       598
-------------+----------------------------------   F(5, 592)       =    149.80
       Model |  3606.65846         5  721.331692   Prob > F        =    0.0000
    Residual |  2850.65543       592  4.81529633   R-squared       =    0.5585
-------------+----------------------------------   Adj R-squared   =    0.5548
       Total |  6457.31389       597  10.8162712   Root MSE        =    2.1944

------------------------------------------------------------------------------
          PB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         PB1 |   1.444544   .3312114     4.36   0.000     .7940514    2.095036
       CAPB1 |  -.7276947   .3377787    -2.15   0.032    -1.391085   -.0643044
       Debt1 |   .0090005    .002973     3.03   0.003     .0031616    .0148395
        Gap1 |   -.391931   .1580371    -2.48   0.013    -.7023127   -.0815493
       Rules |   .2292214   .0926698     2.47   0.014     .0472199    .4112229
       _cons |  -.5448692   .1852369    -2.94   0.003    -.9086707   -.1810677
------------------------------------------------------------------------------

I would like to know why other estimations present the coefficient that I need excep the one I can use... There's an example for the -reg- control

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#18

12 Oct 2019, 16:12

Have you verified that

1. You are using all and only the same variables as are used in the study you are trying to replicate.

2. The distributions of your variables, restricted to the years that are also represented in the published analysis, are the same as shown in the published analysis.

3. If available, also compare the correlations among the variables in the published analysis to those in your data (restricted to the years that are also in the published analysis).

If so, then the only remaining explanation of the discrepant results is that something is very different in the four years that you have added to the data. I would explore the distributions of all of the variables in those four years, and compare that to the distributions in the original years. I would also look at the correlations among the variables in the original years and compare them to those in the four added years to see if something is different there.
Comment
David Coelho

Join Date: Oct 2019

Posts: 16
#19

12 Oct 2019, 17:13

I tested the regression for the same time-period and I'm still getting a negative coefficient for Rules...

Should I do some correction due to the existence of a lagged dependent variable? Since it might cause Autocorrelation?
Comment

David Coelho

Join Date: Oct 2019
Posts: 16

#20

12 Oct 2019, 17:19

For robustness test I have performed the following test with Primary Expense

Code:

Fixed-effects (within) regression               Number of obs     =        552
Group variable: id                              Number of groups  =         28

R-sq:                                           Obs per group:
     within  = 0.6284                                         min =         12
     between = 0.9314                                         avg =       19.7
     overall = 0.8004                                         max =         23

                                                F(10,27)          =     187.23
corr(u_i, Xb)  = 0.6217                         Prob > F          =     0.0000

                                    (Std. Err. adjusted for 28 clusters in id)
------------------------------------------------------------------------------
             |               Robust
          PE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         PE1 |   .6617686   .0581502    11.38   0.000     .5424542    .7810829
       Debt1 |  -.0172337   .0110617    -1.56   0.131    -.0399303     .005463
        Gap1 |  -.0460347   .0803094    -0.57   0.571    -.2108161    .1187467
      EXPDEC |  -.0557069   .0332314    -1.68   0.105     -.123892    .0124782
    Election |  -.1151818   .2236321    -0.52   0.611     -.574037    .3436734
         FSI |   11.49403   2.080559     5.52   0.000     7.225076    15.76298
       Rules |   .0845034   .1151699     0.73   0.469    -.1518056    .3208125
         EMU |  -.6582853   .4636416    -1.42   0.167    -1.609599    .2930287
         SGP |   .4841079   .3926119     1.23   0.228    -.3214652    1.289681
         ENL |   -.018583   .2307674    -0.08   0.936    -.4920786    .4549125
       _cons |   15.26699   3.259062     4.68   0.000     8.579951    21.95404
-------------+----------------------------------------------------------------
     sigma_u |  2.4579509
     sigma_e |  2.0219248
         rho |  .59641659   (fraction of variance due to u_i)
------------------------------------------------------------------------------

And once again I get an opposite sign for Rules... This means that with a dependent variable that I should expect a negative sign I'm getting a Positive one!!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#21

12 Oct 2019, 17:34

Look, if you're using a variable (lagged outcome) in the model that wasn't used in the study you are trying to replicate, then you are not doing a replication and you have no basis for expecting the results to be similar.

Testing the regression for the same time-period and getting results that do not replicate the earlier study adds to the evidence that either your model or your data are different from the study you are trying to replicate. As you've already mentioned that you are using an additional variable, that is a sufficient explanation already. It is also a good idea to check the data for the same time period to see if there are problems there. You should not presume that your data are correct: many data sets contain errors, large data sets often contain many errors.
Comment
David Coelho

Join Date: Oct 2019

Posts: 16
#22

12 Oct 2019, 17:59

My baseline model contains exactly the same variables of the previous study, therefore the only difference that I can confirm is the Time-Line and the extra Country... I'm not presuming that my data set is correct, however I've already checked all my data sources twice and not even one error was found so far... That's why I presume that an error might be coming from the model or estimation method.

Besides if I had an error in my data set the estimation for another dependent variable that was not used in the baseline model using the same independent variables (and its lagged version) should be correct, however I get a positive sign which in this case should be the opposite...
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#23

12 Oct 2019, 19:43

Well, you have said that the original paper used fixed effects regression, and you used that as well. Also two stage least squares. You have not shown your code, only outputs. The outputs appear to be consistent with the analyses you describe, but nobody can vouch for your code without seeing it.

So the possibilities have been narrowed down. You haven't said anything about verifying that your data's descriptive statistics and correlations (when restricted to the same countries and years) match those of the published paper. So even if chasing down the data management of your own data finds it to be correct, that doesn't rule out a discrepancy: the published paper's data could be incorrect, too. Just because it's published doesn't mean it's right. Another possibility is that the descriptions of the published analyses are in some important way different from what you have done, and perhaps they were not well described in the original paper. The best way to chase that down is to contact the authors of the published paper and ask them to be more explicit (ideally, show you their code) about what they did.

Then there is the possibility that the addition of four years of data really changes things. I believe you said you already tried running your model with the estimation restricted to the original years and you still have a difference from the published results. (The thread has gone long, and I can't remember all the details of what you've done so far.) If so, that would make this possibility moot.

Those are the remaining possibilities to pursue.
Comment

David Coelho

Join Date: Oct 2019
Posts: 16

#24

13 Oct 2019, 09:05

Although the data base is the same from the descriptive statistics i can conclude that there is some differences.... For instance, the dependent variable presents a mean of 0.3 and a Std. Dev of 3.06, while I get a mean of 0.158 and a Std. Dev of 3.09. I guess that this difference is happening due to the new rules of calculation of these variables that started to be applied recently.

Besides, a new method to compute the Rules index started to be used as well so everything has different methods of determination comparing with the previous study. I don't have access to the correlations of the previous study, so I cannot conclude nothing based on that.

The most common code that I've been using is this one (lagged variables have the number 1):

Code:

 xtreg PB PB1 Debt1 Gap1 EXPDEC Election FSI Rules EMU SGP ENL,fe vce(r)

Fixed-effects (within) regression               Number of obs     =        552
Group variable: id                              Number of groups  =         28

R-sq:                                           Obs per group:
     within  = 0.5887                                         min =         12
     between = 0.5054                                         avg =       19.7
     overall = 0.5177                                         max =         23

                                                F(10,27)          =     117.69
corr(u_i, Xb)  = -0.4882                        Prob > F          =     0.0000

                                    (Std. Err. adjusted for 28 clusters in id)
------------------------------------------------------------------------------
             |               Robust
          PB |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         PB1 |   .6127309   .0569516    10.76   0.000     .4958758     .729586
       Debt1 |   .0380682   .0092282     4.13   0.000     .0191335    .0570028
        Gap1 |   .0452214   .0578105     0.78   0.441     -.073396    .1638387
      EXPDEC |   .0990221   .0545673     1.81   0.081    -.0129408    .2109849
    Election |    -.05914   .2161574    -0.27   0.786    -.5026584    .3843784
         FSI |  -11.00049   2.082507    -5.28   0.000    -15.27345   -6.727541
       Rules |   -.216667   .1136778    -1.91   0.067    -.4499145    .0165806
         EMU |   .9387213   .3226792     2.91   0.007     .2766382    1.600804
         SGP |  -.3660889   .4234953    -0.86   0.395    -1.235029    .5028516
         ENL |   .3710115   .3068094     1.21   0.237    -.2585093    1.000532
       _cons |  -3.498912   1.735103    -2.02   0.054     -7.05905    .0612257
-------------+----------------------------------------------------------------
     sigma_u |  1.5663862
     sigma_e |  1.9432308
         rho |  .39384898   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Besides I have found an error in the previous study that mentions that they used 593 obs for the Rules index while it should be 594 obs for the same sample and time-period...

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment