Panel data regression with high coefficients values

hari venkatesh

Join Date: Feb 2019

Posts: 28
#1

Panel data regression with high coefficients values

04 Feb 2019, 04:39

Dear All

I have estimated a panel regression with command ' xtreg, fe cluster(id)'. (22 panels and 10 years)

output: y = -1493.30+0.72 x1+24.58 x2 +17.94 x3 - 0.08 x4 +3.54 x5.

is the high coefficient values means the model is wrongly specified?

Note: y values are not log transformed due to negative values. The coefficients sign and significance are good and expected to literature; F value= 47.6***

Last edited by hari venkatesh; 04 Feb 2019, 04:43.
Tags: data, fixed effects, panel, panel data, regression
Ariel Karlinsky

Join Date: Jun 2015

Posts: 491
#2

04 Feb 2019, 04:51

It would be nice to see actual output. what are the y's and x's? what are their units? A priori I would say no - high coefficent values are nothing to be afraid of. Just make sure that the variable are in correct scale for example if x2 is in dollars yet thousands of dollars is more appropriate.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#3

04 Feb 2019, 04:58

Hari:
welcome to this forum.
Please, see the FAQ on how to post more effectively (especially point 12.2 and 12.3).
As Ariel have already pointed out, high coefficients per se do not tell you anything about model misspecification.
As far as misspecification is concerned, you should be aware of non-linear relationships between regressand and predictor(s) and endogeneity. In N>T panel dataset like yours heteroskedasticity and autocorrealtion are easily tamed via -cluster- of -robust- options for standard errors (unlike -regress-, they do the very same job under -xtreg-).

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

hari venkatesh

Join Date: Feb 2019
Posts: 28

04 Feb 2019, 05:56

Ariel Karlinsky Thank you for the prompt response.

Y is the composite of variables normalize with either exports or GDP and x1 x2 and x4 are in %; x3 is log levels (Log of GDP) x5 index (0 to 1).

I hope this information helps you to understand this model.

Now is this model specification okay?.

Carlo Lazzaro Sir, I could not find how to test the endogeneity in static panel data analysis. I have addressed the heteroscedasticity and autocorrelation with robust standard errors. Even I have found the cross-sectional dependence among the panels using xtcsd command then I have used the Driscoll and Kraays robust standard errors. But still model looks same.

Whether I have to use the dynamic panel models for this kind of sample N>T study. I could not find any literature in this area which employed dynamic panel regression. Mostly past literature have used the pooled regression and some cases panel fixed effect models.

"As far as misspecification is concerned, you should be aware of non-linear relationships between regressand and predictor(s) and endogeneity"

How to test the endogeneity and nonlinear relationship in this model?

Code:

xtreg y x1 x2 x3 x4 x5, fe cluster (id)

Fixed-effects (within) regression               Number of obs      =       210
Group variable: id                              Number of groups   =        21

R-sq:  within  = 0.2488                         Obs per group: min =        10
       between = 0.0358                                        avg =      10.0
       overall = 0.0165                                        max =        10

                                                F(5,20)            =      4.44
corr(u_i, Xb)  = -0.8778                        Prob > F           =    0.0070

                                    (Std. Err. adjusted for 21 clusters in id)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .7195986   .2226033     3.23   0.004     .2552562    1.183941
          x2 |   24.57961   12.62724     1.95   0.066    -1.760348    50.91957
          x3 |    17.9403   5.548725     3.23   0.004      6.36586    29.51473
          x4 |  -.0801577   .0900084    -0.89   0.384    -.2679119    .1075966
          x5 |   3.537634   4.275944     0.83   0.418    -5.381828     12.4571
       _cons |  -493.3006   151.1427    -3.26   0.004    -808.5787   -178.0225
-------------+----------------------------------------------------------------
     sigma_u |  25.229544
     sigma_e |  5.7156443
         rho |  .95118251   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. sum y x1 x2 x3 x4 x5

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
           y |       220   -5.912505    14.04037   -55.7827    38.1989
          x1 |       220    5.417655    6.057257     -1.506       30.9
          x2 |       220    .1255516    .0804871          0    .385806
          x3 |       220    26.41601    3.071041   12.87918    29.9336
          x4 |       220    74.28806    40.12442     22.106    176.669
-------------+--------------------------------------------------------
          x5 |       210    .5044124     .324817          0          1

Last edited by hari venkatesh; 04 Feb 2019, 06:07.

Comment

Ariel Karlinsky

Join Date: Jun 2015

Posts: 491
#5

04 Feb 2019, 06:04

x2 seems to be in a different scale to the other variable. does a mean of 0.12 mean 12% or 0.12%?
Comment
hari venkatesh

Join Date: Feb 2019

Posts: 28
#6

04 Feb 2019, 06:34

@Arial Karlinsky It is mean of 12%
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

04 Feb 2019, 06:55

Hari:
- as far as testing model (mis)specification is cocnerned, you may want to take a look at this thread: https://www.statalist.org/forums/for...nel-data-model. The siginificance of the squared term for fitted values denotes misspecification which, in turn, may imply endogeneity.
As an aside, please call me Carlo, as all on (and many more off) the list do. Thanks.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#8

04 Feb 2019, 07:07

Hari:
see also https://www.stata.com/support/faqs/s...-hausman-test/.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

hari venkatesh

Join Date: Feb 2019
Posts: 28

04 Feb 2019, 07:25

@Caro Lazzaro Dear Sir, Thank you for the information.

I have estimated endogeneity test using the above link. While estimating 2nd regression equations I have found that I have to use the new variable i.e x6 (example "rent"). Similarly, I have estimated residuals it shows that there is no endogeneity issue. Please find the output and let me know any mistakes.

How I can test the non-linear relationship in this case.

Code:

 reg y x1 x2 x3 x4 x5

      Source |       SS       df       MS              Number of obs =     210
-------------+------------------------------           F(  5,   204) =    6.23
       Model |  4743.91635     5  948.783271           Prob > F      =  0.0000
    Residual |  31080.2929   204  152.354377           R-squared     =  0.1324
-------------+------------------------------           Adj R-squared =  0.1112
       Total |  35824.2093   209    171.4077           Root MSE      =  12.343

------------------------------------------------------------------------------
          y|      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        x1|  -.2503885   .1834717    -1.36   0.174    -.6121324    .1113555
        x2|   .4264287   .1064768     4.00   0.000     .2164925    .6363649
         x3|   .0447519   .0269481     1.66   0.098    -.0083807    .0978844
       x4 |  -9.146621   3.752765    -2.44   0.016     -16.5458    -1.74744
        x5|  -1.523395   .9052064    -1.68   0.094    -3.308155    .2613651
       _cons |   19.82595   19.04705     1.04   0.299    -17.72836    57.38027
------------------------------------------------------------------------------

. predict  y_reg, res
(10 missing values generated)

. reg x6 y x4  y_reg

      Source |       SS       df       MS              Number of obs =     210
-------------+------------------------------           F(  3,   206) =    8.99
       Model |  2081.26172     3  693.753905           Prob > F      =  0.0000
    Residual |  15901.3512   206  77.1910253           R-squared     =  0.1157
-------------+------------------------------           Adj R-squared =  0.1029
       Total |  17982.6129   209  86.0412102           Root MSE      =  8.7858

------------------------------------------------------------------------------
        x6|      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          y|  -.0376187   .1277718    -0.29   0.769    -.2895269    .2142894
        x4|    8.85666   1.874096     4.73   0.000     5.161793    12.55153
 y_reg |   .1410462   .1371468     1.03   0.305    -.1293451    .4114376
_cons |   11.51312   1.255193     9.17   0.000     9.038446    13.98779
------------------------------------------------------------------------------

. test  y_reg

 ( 1)  cm_reg = 0

       F(  1,   206) =    1.06
            Prob > F =    0.3050

.

Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

#10

04 Feb 2019, 09:06

Hari:
the problem with your approach is that you go -regress- when in fact you have panel data.
You can use -hausman- to test whether the iv estimator fits your data better due to an endogeneity issue (as usual, instrumental variable estimate requires picking up instruments, that shluld be chosen according to the existing literature in your research field).
That said, you can perform something along the lines of the following toy-example:

Code:

use http://www.stata-press.com/data/r15/nlswork
. xtreg ln_w c.age##c.age tenure not_smsa union south, fe

Fixed-effects (within) regression               Number of obs     =     19,007
Group variable: idcode                          Number of groups  =      4,134

R-sq:                                           Obs per group:
     within  = 0.1333                                         min =          1
     between = 0.2375                                         avg =        4.6
     overall = 0.2031                                         max =         12

                                                F(6,14867)        =     381.19
corr(u_i, Xb)  = 0.2074                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0311984   .0033902     9.20   0.000     .0245533    .0378436
             |
 c.age#c.age |  -.0003457   .0000543    -6.37   0.000    -.0004522   -.0002393
             |
      tenure |   .0176205   .0008099    21.76   0.000     .0160331    .0192079
    not_smsa |  -.0972535   .0125377    -7.76   0.000    -.1218289    -.072678
       union |   .0975672   .0069844    13.97   0.000     .0838769    .1112576
       south |  -.0620932    .013327    -4.66   0.000    -.0882158   -.0359706
       _cons |   1.091612   .0523126    20.87   0.000     .9890729    1.194151
-------------+----------------------------------------------------------------
     sigma_u |   .3910683
     sigma_e |  .25545969
         rho |  .70091004   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4133, 14867) = 8.31                 Prob > F = 0.0000

. estimate store fe

. xtivreg ln_w age c.age#c.age not_smsa (tenure = grade collgrad ), fe

Fixed-effects (within) IV regression            Number of obs     =     28,091
Group variable: idcode                          Number of groups  =      4,697

R-sq:                                           Obs per group:
     within  = 0.1144                                         min =          1
     between = 0.1487                                         avg =        6.0
     overall = 0.1253                                         max =         15

                                                Wald chi2(3)      =  876742.33
corr(u_i, Xb)  = 0.1014                         Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      tenure |          0  (omitted)
         age |   .0542657   .0028195    19.25   0.000     .0487396    .0597918
             |
 c.age#c.age |  -.0006001   .0000467   -12.85   0.000    -.0006916   -.0005086
             |
    not_smsa |  -.1034768   .0098272   -10.53   0.000    -.1227378   -.0842159
       _cons |   .6626665   .0412026    16.08   0.000     .5819108    .7434221
-------------+----------------------------------------------------------------
     sigma_u |  .39543286
     sigma_e |  .30071649
         rho |  .63358467   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F  test that all u_i=0:     F(4696,23391) =     6.89      Prob > F    = 0.0000
------------------------------------------------------------------------------
Instrumented:   tenure
Instruments:    age c.age#c.age not_smsa grade collgrad
------------------------------------------------------------------------------

. estimate store IV_fe

. hausman fe IV_fe

                 ---- Coefficients ----
             |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
             |       fe         IV_fe        Difference          S.E.
-------------+----------------------------------------------------------------
         age |    .0311984     .0542657       -.0230672        .0018825
 c.age#c.age |   -.0003457    -.0006001        .0002543        .0000277
    not_smsa |   -.0972535    -.1034768        .0062234        .0077859
------------------------------------------------------------------------------
                           b = consistent under Ho and Ha; obtained from xtreg
          B = inconsistent under Ha, efficient under Ho; obtained from xtivreg

    Test:  Ho:  difference in coefficients not systematic

                  chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                          =      494.63
                Prob>chi2 =      0.0000

As per -hausman- output, there's no evidence of endogeneity.

To test for misspecification, you may want to try something along this second toy-example:

Code:

. xtreg ln_w c.age##c.age tenure not_smsa union south, fe

Fixed-effects (within) regression               Number of obs     =     19,007
Group variable: idcode                          Number of groups  =      4,134

R-sq:                                           Obs per group:
     within  = 0.1333                                         min =          1
     between = 0.2375                                         avg =        4.6
     overall = 0.2031                                         max =         12

                                                F(6,14867)        =     381.19
corr(u_i, Xb)  = 0.2074                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0311984   .0033902     9.20   0.000     .0245533    .0378436
             |
 c.age#c.age |  -.0003457   .0000543    -6.37   0.000    -.0004522   -.0002393
             |
      tenure |   .0176205   .0008099    21.76   0.000     .0160331    .0192079
    not_smsa |  -.0972535   .0125377    -7.76   0.000    -.1218289    -.072678
       union |   .0975672   .0069844    13.97   0.000     .0838769    .1112576
       south |  -.0620932    .013327    -4.66   0.000    -.0882158   -.0359706
       _cons |   1.091612   .0523126    20.87   0.000     .9890729    1.194151
-------------+----------------------------------------------------------------
     sigma_u |   .3910683
     sigma_e |  .25545969
         rho |  .70091004   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4133, 14867) = 8.31                 Prob > F = 0.0000

. predict fitted, xb
(9,527 missing values generated)

. gen fitted_sq=fitted^2
(9,527 missing values generated)

. xtreg ln_w fitted fitted_sq , fe

Fixed-effects (within) regression               Number of obs     =     19,007
Group variable: idcode                          Number of groups  =      4,134

R-sq:                                           Obs per group:
     within  = 0.1343                                         min =          1
     between = 0.2359                                         avg =        4.6
     overall = 0.2035                                         max =         12

                                                F(2,14871)        =    1153.74
corr(u_i, Xb)  = 0.2078                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      fitted |   2.345366   .3260557     7.19   0.000     1.706257    2.984475
   fitted_sq |  -.3770241   .0911857    -4.13   0.000    -.5557594   -.1982889
       _cons |  -1.192915   .2908455    -4.10   0.000    -1.763008   -.6228221
-------------+----------------------------------------------------------------
     sigma_u |   .3909034
     sigma_e |  .25527864
         rho |  .70103046   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4133, 14871) = 8.66                 Prob > F = 0.0000

. test fitted_sq

 ( 1)  fitted_sq = 0

       F(  1, 14871) =   17.10
            Prob > F =    0.0000

.

As -test- performed on -fitted_sq- reaches statistical significance, the model is misspecified, although (as per first code outcome) there's no evidence of endogeneity.

Kind regards,
Carlo
(Stata 19.0)

Comment

hari venkatesh

Join Date: Feb 2019
Posts: 28

#11

05 Feb 2019, 22:47

Carlo: In the above toy example, Hausman statistic shows that there is no endogeneity, although the instrumented variable "Tenure" omitted from the regression output. Is it fine?.

In my research model, I have not found any literature on endogenous variables (instrumental variables), so I could not able to test the Hausman test for endogeneity. However, I have tested the model misspecification test (second test in the above example), results show that the model is correctly specified. Please find output below.

Existing literature in my research field has addressed the endogeneity using one year lagged all the explanatory variables except log GDP in panel fixed effect model. Is this correct way to address endogeneity. Some others used pooled regression without using any lagged variables. Can you please let me know what do you think?

Is still I have to check endogeneity even if the model correctly specified?

Code:

xtscc y x1 x2 x3 x4 x5, fe

Regression with Driscoll-Kraay standard errors   Number of obs     =       210
Method: Fixed-effects regression                 Number of groups  =        21
Group variable (i): id                           F(  5,     9)     =     57.87
maximum lag: 2                                   Prob > F          =    0.0000
                                                 within R-squared  =    0.2223

------------------------------------------------------------------------------
             |             Drisc/Kraay
           y|      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         x1 |   .6091653   .1922201     3.17   0.011     .1743331    1.043997
         x2 |   .2327197    .059194     3.93   0.003     .0988135    .3666259
         x3 |   9.468494   1.145278     8.27   0.000     6.877696    12.05929
         x4 |  -.1098678    .052901    -2.08   0.068    -.2295382    .0098027
         x5 |   3.731616   2.361148     1.58   0.148    -1.609671    9.072904
       _cons |  -130.3726   18.41059    -7.08   0.000    -172.0202   -88.72492
------------------------------------------------------------------------------



. predict fitted, xb
(10 missing values generated)

. gen fitted_sq=fitted^2
(10 missing values generated)

. xtscc y fitted fitted_sq , fe

Regression with Driscoll-Kraay standard errors   Number of obs     =       210
Method: Fixed-effects regression                 Number of groups  =        21
Group variable (i): id                           F(  2,     9)     =     20.94
maximum lag: 2                                   Prob > F          =    0.0004
                                                 within R-squared  =    0.2224

------------------------------------------------------------------------------
             |             Drisc/Kraay
          y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      fitted |   .9983357   .2080298     4.80   0.001     .5277396    1.468932
   fitted_sq |   -.000279   .0037591    -0.07   0.942    -.0087827    .0082246
       _cons |   .0445582   .7932967     0.06   0.956    -1.750004     1.83912
------------------------------------------------------------------------------


. test fitted_sq

 ( 1)  fitted_sq = 0

       F(  1,     9) =    0.01
            Prob > F =    0.9425

.

Last edited by hari venkatesh; 05 Feb 2019, 22:59.

Comment

hari venkatesh

Join Date: Feb 2019
Posts: 28

#12

08 Feb 2019, 02:57

Please comment @Carlo Lazzaro

Originally posted by hari venkatesh View Post

Code:

xtscc y x1 x2 x3 x4 x5, fe

Regression with Driscoll-Kraay standard errors Number of obs = 210
Method: Fixed-effects regression Number of groups = 21
Group variable (i): id F( 5, 9) = 57.87
maximum lag: 2 Prob > F = 0.0000
within R-squared = 0.2223

------------------------------------------------------------------------------
| Drisc/Kraay
y| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .6091653 .1922201 3.17 0.011 .1743331 1.043997
x2 | .2327197 .059194 3.93 0.003 .0988135 .3666259
x3 | 9.468494 1.145278 8.27 0.000 6.877696 12.05929
x4 | -.1098678 .052901 -2.08 0.068 -.2295382 .0098027
x5 | 3.731616 2.361148 1.58 0.148 -1.609671 9.072904
_cons | -130.3726 18.41059 -7.08 0.000 -172.0202 -88.72492
------------------------------------------------------------------------------



. predict fitted, xb
(10 missing values generated)

. gen fitted_sq=fitted^2
(10 missing values generated)

. xtscc y fitted fitted_sq , fe

Regression with Driscoll-Kraay standard errors Number of obs = 210
Method: Fixed-effects regression Number of groups = 21
Group variable (i): id F( 2, 9) = 20.94
maximum lag: 2 Prob > F = 0.0004
within R-squared = 0.2224

------------------------------------------------------------------------------
| Drisc/Kraay
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
fitted | .9983357 .2080298 4.80 0.001 .5277396 1.468932
fitted_sq | -.000279 .0037591 -0.07 0.942 -.0087827 .0082246
_cons | .0445582 .7932967 0.06 0.956 -1.750004 1.83912
------------------------------------------------------------------------------


. test fitted_sq

( 1) fitted_sq = 0

F( 1, 9) = 0.01
Prob > F = 0.9425

.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#13

08 Feb 2019, 09:58

Hari:
your model seems OK and I would not test for endogeneity anymore.
On using lagged predictors as instruments, see: http://personal.rhul.ac.uk/uhte/006/...%2016_2sls.pdf.

Kind regards,
Carlo
(Stata 19.0)
Comment
hari venkatesh

Join Date: Feb 2019

Posts: 28
#14

10 Feb 2019, 10:57

Carlo:
Thank you for the great information and valuable suggestion.
Comment

Announcement