Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data regression with high coefficients values

    Dear All

    I have estimated a panel regression with command ' xtreg, fe cluster(id)'. (22 panels and 10 years)

    output: y = -1493.30+0.72 x1+24.58 x2 +17.94 x3 - 0.08 x4 +3.54 x5.

    is the high coefficient values means the model is wrongly specified?

    Note: y values are not log transformed due to negative values. The coefficients sign and significance are good and expected to literature; F value= 47.6***
    Last edited by hari venkatesh; 04 Feb 2019, 04:43.

  • #2
    It would be nice to see actual output. what are the y's and x's? what are their units? A priori I would say no - high coefficent values are nothing to be afraid of. Just make sure that the variable are in correct scale for example if x2 is in dollars yet thousands of dollars is more appropriate.

    Comment


    • #3
      Hari:
      welcome to this forum.
      Please, see the FAQ on how to post more effectively (especially point 12.2 and 12.3).
      As Ariel have already pointed out, high coefficients per se do not tell you anything about model misspecification.
      As far as misspecification is concerned, you should be aware of non-linear relationships between regressand and predictor(s) and endogeneity. In N>T panel dataset like yours heteroskedasticity and autocorrealtion are easily tamed via -cluster- of -robust- options for standard errors (unlike -regress-, they do the very same job under -xtreg-).
      Kind regards,
      Carlo
      (Stata 18.0 SE)

      Comment


      • #4
        Ariel Karlinsky Thank you for the prompt response.

        Y is the composite of variables normalize with either exports or GDP and x1 x2 and x4 are in %; x3 is log levels (Log of GDP) x5 index (0 to 1).

        I hope this information helps you to understand this model.

        Now is this model specification okay?.

        Carlo Lazzaro Sir, I could not find how to test the endogeneity in static panel data analysis. I have addressed the heteroscedasticity and autocorrelation with robust standard errors. Even I have found the cross-sectional dependence among the panels using xtcsd command then I have used the Driscoll and Kraays robust standard errors. But still model looks same.

        Whether I have to use the dynamic panel models for this kind of sample N>T study. I could not find any literature in this area which employed dynamic panel regression. Mostly past literature have used the pooled regression and some cases panel fixed effect models.

        "As far as misspecification is concerned, you should be aware of non-linear relationships between regressand and predictor(s) and endogeneity"

        How to test the endogeneity and nonlinear relationship in this model?





        Code:
        xtreg y x1 x2 x3 x4 x5, fe cluster (id)
        
        Fixed-effects (within) regression               Number of obs      =       210
        Group variable: id                              Number of groups   =        21
        
        R-sq:  within  = 0.2488                         Obs per group: min =        10
               between = 0.0358                                        avg =      10.0
               overall = 0.0165                                        max =        10
        
                                                        F(5,20)            =      4.44
        corr(u_i, Xb)  = -0.8778                        Prob > F           =    0.0070
        
                                            (Std. Err. adjusted for 21 clusters in id)
        ------------------------------------------------------------------------------
                     |               Robust
                   y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                  x1 |   .7195986   .2226033     3.23   0.004     .2552562    1.183941
                  x2 |   24.57961   12.62724     1.95   0.066    -1.760348    50.91957
                  x3 |    17.9403   5.548725     3.23   0.004      6.36586    29.51473
                  x4 |  -.0801577   .0900084    -0.89   0.384    -.2679119    .1075966
                  x5 |   3.537634   4.275944     0.83   0.418    -5.381828     12.4571
               _cons |  -493.3006   151.1427    -3.26   0.004    -808.5787   -178.0225
        -------------+----------------------------------------------------------------
             sigma_u |  25.229544
             sigma_e |  5.7156443
                 rho |  .95118251   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        
        . sum y x1 x2 x3 x4 x5
        
            Variable |       Obs        Mean    Std. Dev.       Min        Max
        -------------+--------------------------------------------------------
                   y |       220   -5.912505    14.04037   -55.7827    38.1989
                  x1 |       220    5.417655    6.057257     -1.506       30.9
                  x2 |       220    .1255516    .0804871          0    .385806
                  x3 |       220    26.41601    3.071041   12.87918    29.9336
                  x4 |       220    74.28806    40.12442     22.106    176.669
        -------------+--------------------------------------------------------
                  x5 |       210    .5044124     .324817          0          1
        .
        Last edited by hari venkatesh; 04 Feb 2019, 06:07.

        Comment


        • #5
          x2 seems to be in a different scale to the other variable. does a mean of 0.12 mean 12% or 0.12%?

          Comment


          • #6
            @Arial Karlinsky It is mean of 12%

            Comment


            • #7
              Hari:
              - as far as testing model (mis)specification is cocnerned, you may want to take a look at this thread: https://www.statalist.org/forums/for...nel-data-model. The siginificance of the squared term for fitted values denotes misspecification which, in turn, may imply endogeneity.
              As an aside, please call me Carlo, as all on (and many more off) the list do. Thanks.
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment


              • #8
                Hari:
                see also https://www.stata.com/support/faqs/s...-hausman-test/.
                Kind regards,
                Carlo
                (Stata 18.0 SE)

                Comment


                • #9
                  @Caro Lazzaro Dear Sir, Thank you for the information.

                  I have estimated endogeneity test using the above link. While estimating 2nd regression equations I have found that I have to use the new variable i.e x6 (example "rent"). Similarly, I have estimated residuals it shows that there is no endogeneity issue. Please find the output and let me know any mistakes.

                  How I can test the non-linear relationship in this case.



                  Code:
                   reg y x1 x2 x3 x4 x5
                  
                        Source |       SS       df       MS              Number of obs =     210
                  -------------+------------------------------           F(  5,   204) =    6.23
                         Model |  4743.91635     5  948.783271           Prob > F      =  0.0000
                      Residual |  31080.2929   204  152.354377           R-squared     =  0.1324
                  -------------+------------------------------           Adj R-squared =  0.1112
                         Total |  35824.2093   209    171.4077           Root MSE      =  12.343
                  
                  ------------------------------------------------------------------------------
                            y|      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                          x1|  -.2503885   .1834717    -1.36   0.174    -.6121324    .1113555
                          x2|   .4264287   .1064768     4.00   0.000     .2164925    .6363649
                           x3|   .0447519   .0269481     1.66   0.098    -.0083807    .0978844
                         x4 |  -9.146621   3.752765    -2.44   0.016     -16.5458    -1.74744
                          x5|  -1.523395   .9052064    -1.68   0.094    -3.308155    .2613651
                         _cons |   19.82595   19.04705     1.04   0.299    -17.72836    57.38027
                  ------------------------------------------------------------------------------
                  
                  . predict  y_reg, res
                  (10 missing values generated)
                  
                  . reg x6 y x4  y_reg
                  
                        Source |       SS       df       MS              Number of obs =     210
                  -------------+------------------------------           F(  3,   206) =    8.99
                         Model |  2081.26172     3  693.753905           Prob > F      =  0.0000
                      Residual |  15901.3512   206  77.1910253           R-squared     =  0.1157
                  -------------+------------------------------           Adj R-squared =  0.1029
                         Total |  17982.6129   209  86.0412102           Root MSE      =  8.7858
                  
                  ------------------------------------------------------------------------------
                          x6|      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                            y|  -.0376187   .1277718    -0.29   0.769    -.2895269    .2142894
                          x4|    8.85666   1.874096     4.73   0.000     5.161793    12.55153
                   y_reg |   .1410462   .1371468     1.03   0.305    -.1293451    .4114376
                  _cons |   11.51312   1.255193     9.17   0.000     9.038446    13.98779
                  ------------------------------------------------------------------------------
                  
                  . test  y_reg
                  
                   ( 1)  cm_reg = 0
                  
                         F(  1,   206) =    1.06
                              Prob > F =    0.3050
                  
                  .

                  Comment


                  • #10
                    Hari:
                    the problem with your approach is that you go -regress- when in fact you have panel data.
                    You can use -hausman- to test whether the iv estimator fits your data better due to an endogeneity issue (as usual, instrumental variable estimate requires picking up instruments, that shluld be chosen according to the existing literature in your research field).
                    That said, you can perform something along the lines of the following toy-example:
                    Code:
                    use http://www.stata-press.com/data/r15/nlswork
                    . xtreg ln_w c.age##c.age tenure not_smsa union south, fe
                    
                    Fixed-effects (within) regression               Number of obs     =     19,007
                    Group variable: idcode                          Number of groups  =      4,134
                    
                    R-sq:                                           Obs per group:
                         within  = 0.1333                                         min =          1
                         between = 0.2375                                         avg =        4.6
                         overall = 0.2031                                         max =         12
                    
                                                                    F(6,14867)        =     381.19
                    corr(u_i, Xb)  = 0.2074                         Prob > F          =     0.0000
                    
                    ------------------------------------------------------------------------------
                         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                             age |   .0311984   .0033902     9.20   0.000     .0245533    .0378436
                                 |
                     c.age#c.age |  -.0003457   .0000543    -6.37   0.000    -.0004522   -.0002393
                                 |
                          tenure |   .0176205   .0008099    21.76   0.000     .0160331    .0192079
                        not_smsa |  -.0972535   .0125377    -7.76   0.000    -.1218289    -.072678
                           union |   .0975672   .0069844    13.97   0.000     .0838769    .1112576
                           south |  -.0620932    .013327    -4.66   0.000    -.0882158   -.0359706
                           _cons |   1.091612   .0523126    20.87   0.000     .9890729    1.194151
                    -------------+----------------------------------------------------------------
                         sigma_u |   .3910683
                         sigma_e |  .25545969
                             rho |  .70091004   (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------
                    F test that all u_i=0: F(4133, 14867) = 8.31                 Prob > F = 0.0000
                    
                    . estimate store fe
                    
                    . xtivreg ln_w age c.age#c.age not_smsa (tenure = grade collgrad ), fe
                    
                    Fixed-effects (within) IV regression            Number of obs     =     28,091
                    Group variable: idcode                          Number of groups  =      4,697
                    
                    R-sq:                                           Obs per group:
                         within  = 0.1144                                         min =          1
                         between = 0.1487                                         avg =        6.0
                         overall = 0.1253                                         max =         15
                    
                                                                    Wald chi2(3)      =  876742.33
                    corr(u_i, Xb)  = 0.1014                         Prob > chi2       =     0.0000
                    
                    ------------------------------------------------------------------------------
                         ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                          tenure |          0  (omitted)
                             age |   .0542657   .0028195    19.25   0.000     .0487396    .0597918
                                 |
                     c.age#c.age |  -.0006001   .0000467   -12.85   0.000    -.0006916   -.0005086
                                 |
                        not_smsa |  -.1034768   .0098272   -10.53   0.000    -.1227378   -.0842159
                           _cons |   .6626665   .0412026    16.08   0.000     .5819108    .7434221
                    -------------+----------------------------------------------------------------
                         sigma_u |  .39543286
                         sigma_e |  .30071649
                             rho |  .63358467   (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------
                    F  test that all u_i=0:     F(4696,23391) =     6.89      Prob > F    = 0.0000
                    ------------------------------------------------------------------------------
                    Instrumented:   tenure
                    Instruments:    age c.age#c.age not_smsa grade collgrad
                    ------------------------------------------------------------------------------
                    
                    . estimate store IV_fe
                    
                    . hausman fe IV_fe
                    
                                     ---- Coefficients ----
                                 |      (b)          (B)            (b-B)     sqrt(diag(V_b-V_B))
                                 |       fe         IV_fe        Difference          S.E.
                    -------------+----------------------------------------------------------------
                             age |    .0311984     .0542657       -.0230672        .0018825
                     c.age#c.age |   -.0003457    -.0006001        .0002543        .0000277
                        not_smsa |   -.0972535    -.1034768        .0062234        .0077859
                    ------------------------------------------------------------------------------
                                               b = consistent under Ho and Ha; obtained from xtreg
                              B = inconsistent under Ha, efficient under Ho; obtained from xtivreg
                    
                        Test:  Ho:  difference in coefficients not systematic
                    
                                      chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
                                              =      494.63
                                    Prob>chi2 =      0.0000
                    As per -hausman- output, there's no evidence of endogeneity.

                    To test for misspecification, you may want to try something along this second toy-example:
                    Code:
                    . xtreg ln_w c.age##c.age tenure not_smsa union south, fe
                    
                    Fixed-effects (within) regression               Number of obs     =     19,007
                    Group variable: idcode                          Number of groups  =      4,134
                    
                    R-sq:                                           Obs per group:
                         within  = 0.1333                                         min =          1
                         between = 0.2375                                         avg =        4.6
                         overall = 0.2031                                         max =         12
                    
                                                                    F(6,14867)        =     381.19
                    corr(u_i, Xb)  = 0.2074                         Prob > F          =     0.0000
                    
                    ------------------------------------------------------------------------------
                         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                             age |   .0311984   .0033902     9.20   0.000     .0245533    .0378436
                                 |
                     c.age#c.age |  -.0003457   .0000543    -6.37   0.000    -.0004522   -.0002393
                                 |
                          tenure |   .0176205   .0008099    21.76   0.000     .0160331    .0192079
                        not_smsa |  -.0972535   .0125377    -7.76   0.000    -.1218289    -.072678
                           union |   .0975672   .0069844    13.97   0.000     .0838769    .1112576
                           south |  -.0620932    .013327    -4.66   0.000    -.0882158   -.0359706
                           _cons |   1.091612   .0523126    20.87   0.000     .9890729    1.194151
                    -------------+----------------------------------------------------------------
                         sigma_u |   .3910683
                         sigma_e |  .25545969
                             rho |  .70091004   (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------
                    F test that all u_i=0: F(4133, 14867) = 8.31                 Prob > F = 0.0000
                    
                    . predict fitted, xb
                    (9,527 missing values generated)
                    
                    . gen fitted_sq=fitted^2
                    (9,527 missing values generated)
                    
                    . xtreg ln_w fitted fitted_sq , fe
                    
                    Fixed-effects (within) regression               Number of obs     =     19,007
                    Group variable: idcode                          Number of groups  =      4,134
                    
                    R-sq:                                           Obs per group:
                         within  = 0.1343                                         min =          1
                         between = 0.2359                                         avg =        4.6
                         overall = 0.2035                                         max =         12
                    
                                                                    F(2,14871)        =    1153.74
                    corr(u_i, Xb)  = 0.2078                         Prob > F          =     0.0000
                    
                    ------------------------------------------------------------------------------
                         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                          fitted |   2.345366   .3260557     7.19   0.000     1.706257    2.984475
                       fitted_sq |  -.3770241   .0911857    -4.13   0.000    -.5557594   -.1982889
                           _cons |  -1.192915   .2908455    -4.10   0.000    -1.763008   -.6228221
                    -------------+----------------------------------------------------------------
                         sigma_u |   .3909034
                         sigma_e |  .25527864
                             rho |  .70103046   (fraction of variance due to u_i)
                    ------------------------------------------------------------------------------
                    F test that all u_i=0: F(4133, 14871) = 8.66                 Prob > F = 0.0000
                    
                    . test fitted_sq
                    
                     ( 1)  fitted_sq = 0
                    
                           F(  1, 14871) =   17.10
                                Prob > F =    0.0000
                    
                    .
                    As -test- performed on -fitted_sq- reaches statistical significance, the model is misspecified, although (as per first code outcome) there's no evidence of endogeneity.
                    Kind regards,
                    Carlo
                    (Stata 18.0 SE)

                    Comment


                    • #11
                      Carlo: In the above toy example, Hausman statistic shows that there is no endogeneity, although the instrumented variable "Tenure" omitted from the regression output. Is it fine?.

                      In my research model, I have not found any literature on endogenous variables (instrumental variables), so I could not able to test the Hausman test for endogeneity. However, I have tested the model misspecification test (second test in the above example), results show that the model is correctly specified. Please find output below.

                      Existing literature in my research field has addressed the endogeneity using one year lagged all the explanatory variables except log GDP in panel fixed effect model. Is this correct way to address endogeneity. Some others used pooled regression without using any lagged variables. Can you please let me know what do you think?

                      Is still I have to check endogeneity even if the model correctly specified?

                      Code:
                      xtscc y x1 x2 x3 x4 x5, fe
                      
                      Regression with Driscoll-Kraay standard errors   Number of obs     =       210
                      Method: Fixed-effects regression                 Number of groups  =        21
                      Group variable (i): id                           F(  5,     9)     =     57.87
                      maximum lag: 2                                   Prob > F          =    0.0000
                                                                       within R-squared  =    0.2223
                      
                      ------------------------------------------------------------------------------
                                   |             Drisc/Kraay
                                 y|      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                               x1 |   .6091653   .1922201     3.17   0.011     .1743331    1.043997
                               x2 |   .2327197    .059194     3.93   0.003     .0988135    .3666259
                               x3 |   9.468494   1.145278     8.27   0.000     6.877696    12.05929
                               x4 |  -.1098678    .052901    -2.08   0.068    -.2295382    .0098027
                               x5 |   3.731616   2.361148     1.58   0.148    -1.609671    9.072904
                             _cons |  -130.3726   18.41059    -7.08   0.000    -172.0202   -88.72492
                      ------------------------------------------------------------------------------
                      
                      
                      
                      . predict fitted, xb
                      (10 missing values generated)
                      
                      . gen fitted_sq=fitted^2
                      (10 missing values generated)
                      
                      . xtscc y fitted fitted_sq , fe
                      
                      Regression with Driscoll-Kraay standard errors   Number of obs     =       210
                      Method: Fixed-effects regression                 Number of groups  =        21
                      Group variable (i): id                           F(  2,     9)     =     20.94
                      maximum lag: 2                                   Prob > F          =    0.0004
                                                                       within R-squared  =    0.2224
                      
                      ------------------------------------------------------------------------------
                                   |             Drisc/Kraay
                                y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                            fitted |   .9983357   .2080298     4.80   0.001     .5277396    1.468932
                         fitted_sq |   -.000279   .0037591    -0.07   0.942    -.0087827    .0082246
                             _cons |   .0445582   .7932967     0.06   0.956    -1.750004     1.83912
                      ------------------------------------------------------------------------------
                      
                      
                      . test fitted_sq
                      
                       ( 1)  fitted_sq = 0
                      
                             F(  1,     9) =    0.01
                                  Prob > F =    0.9425
                      
                      .
                      Last edited by hari venkatesh; 05 Feb 2019, 22:59.

                      Comment


                      • #12
                        Please comment @Carlo Lazzaro

                        Originally posted by hari venkatesh View Post
                        Carlo: In the above toy example, Hausman statistic shows that there is no endogeneity, although the instrumented variable "Tenure" omitted from the regression output. Is it fine?.

                        In my research model, I have not found any literature on endogenous variables (instrumental variables), so I could not able to test the Hausman test for endogeneity. However, I have tested the model misspecification test (second test in the above example), results show that the model is correctly specified. Please find output below.

                        Existing literature in my research field has addressed the endogeneity using one year lagged all the explanatory variables except log GDP in panel fixed effect model. Is this correct way to address endogeneity. Some others used pooled regression without using any lagged variables. Can you please let me know what do you think?

                        Is still I have to check endogeneity even if the model correctly specified?

                        Code:
                        xtscc y x1 x2 x3 x4 x5, fe
                        
                        Regression with Driscoll-Kraay standard errors Number of obs = 210
                        Method: Fixed-effects regression Number of groups = 21
                        Group variable (i): id F( 5, 9) = 57.87
                        maximum lag: 2 Prob > F = 0.0000
                        within R-squared = 0.2223
                        
                        ------------------------------------------------------------------------------
                        | Drisc/Kraay
                        y| Coef. Std. Err. t P>|t| [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                        x1 | .6091653 .1922201 3.17 0.011 .1743331 1.043997
                        x2 | .2327197 .059194 3.93 0.003 .0988135 .3666259
                        x3 | 9.468494 1.145278 8.27 0.000 6.877696 12.05929
                        x4 | -.1098678 .052901 -2.08 0.068 -.2295382 .0098027
                        x5 | 3.731616 2.361148 1.58 0.148 -1.609671 9.072904
                        _cons | -130.3726 18.41059 -7.08 0.000 -172.0202 -88.72492
                        ------------------------------------------------------------------------------
                        
                        
                        
                        . predict fitted, xb
                        (10 missing values generated)
                        
                        . gen fitted_sq=fitted^2
                        (10 missing values generated)
                        
                        . xtscc y fitted fitted_sq , fe
                        
                        Regression with Driscoll-Kraay standard errors Number of obs = 210
                        Method: Fixed-effects regression Number of groups = 21
                        Group variable (i): id F( 2, 9) = 20.94
                        maximum lag: 2 Prob > F = 0.0004
                        within R-squared = 0.2224
                        
                        ------------------------------------------------------------------------------
                        | Drisc/Kraay
                        y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                        fitted | .9983357 .2080298 4.80 0.001 .5277396 1.468932
                        fitted_sq | -.000279 .0037591 -0.07 0.942 -.0087827 .0082246
                        _cons | .0445582 .7932967 0.06 0.956 -1.750004 1.83912
                        ------------------------------------------------------------------------------
                        
                        
                        . test fitted_sq
                        
                        ( 1) fitted_sq = 0
                        
                        F( 1, 9) = 0.01
                        Prob > F = 0.9425
                        
                        .

                        Comment


                        • #13
                          Hari:
                          your model seems OK and I would not test for endogeneity anymore.
                          On using lagged predictors as instruments, see: http://personal.rhul.ac.uk/uhte/006/...%2016_2sls.pdf.
                          Kind regards,
                          Carlo
                          (Stata 18.0 SE)

                          Comment


                          • #14
                            Carlo:
                            Thank you for the great information and valuable suggestion.

                            Comment

                            Working...
                            X