Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linearity panel data

    Hi everyone.
    I am working on a regression on the overestimation of companies, and I'm working with panel data. I want to test for linearity, but all the scatter plots I do aren't of much help. The graph matrix gives me this outcome:
    Click image for larger version

Name:	Schermafbeelding 2019-06-13 om 14.06.49.png
Views:	1
Size:	315.4 KB
ID:	1503012

    and this is the regression I did:
    Click image for larger version

Name:	Schermafbeelding 2019-06-13 om 14.07.01.png
Views:	1
Size:	167.4 KB
ID:	1503013
    Does anyone know why it doesn't show linearity? And what I am doing wrong or another way to test it?

  • #2
    Femke:
    -graph Matrix- is good at showing correlation among variables (among other interesting features).
    That said, I would interpret your striving for linearity as a way to test whether non-linera relationships among regressors and regressand do exist.
    The first thing that springs to my mind is to test whether squared age makes sense in your regrrssion model:
    Code:
    xtreg overestimation c.age##c.age entrybarrier index i.region i.cagr
    The second step would be to test if squared fitted values highlight the lack of further predictors and/or interactions in the right-hand side of your regerssion equation.
    You can take a look at the following toy-example:
    Code:
    . use "http://www.stata-press.com/data/r15/nlswork.dta"
    (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
    
    . xtreg ln_wage age i.year i.south
    
    Random-effects GLS regression                   Number of obs     =     28,502
    Group variable: idcode                          Number of groups  =      4,710
    
    R-sq:                                           Obs per group:
         within  = 0.1071                                         min =          1
         between = 0.1371                                         avg =        6.1
         overall = 0.1194                                         max =         15
    
                                                    Wald chi2(16)     =    3533.31
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
         ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |   .0139268   .0018431     7.56   0.000     .0103144    .0175392
                 |
            year |
             69  |   .0748568   .0124895     5.99   0.000     .0503779    .0993358
             70  |   .0449056   .0120106     3.74   0.000     .0213653    .0684459
             71  |   .0823852   .0124649     6.61   0.000     .0579545     .106816
             72  |   .0836341   .0134961     6.20   0.000     .0571823     .110086
             73  |    .085007   .0141998     5.99   0.000      .057176     .112838
             75  |   .0716501   .0164903     4.34   0.000     .0393297    .1039705
             77  |   .1038729   .0193532     5.37   0.000     .0659413    .1418046
             78  |   .1296735   .0210804     6.15   0.000     .0883567    .1709904
             80  |   .1093764   .0242817     4.50   0.000     .0617851    .1569676
             82  |   .0993104   .0274684     3.62   0.000     .0454734    .1531475
             83  |   .1117414   .0291919     3.83   0.000     .0545264    .1689563
             85  |   .1368575   .0325795     4.20   0.000     .0730028    .2007121
             87  |    .126616   .0360617     3.51   0.000     .0559364    .1972956
             88  |   .1638151   .0384291     4.26   0.000     .0884955    .2391347
                 |
         1.south |   -.132044   .0082467   -16.01   0.000    -.1482073   -.1158808
           _cons |   1.209158   .0370906    32.60   0.000     1.136462    1.281854
    -------------+----------------------------------------------------------------
         sigma_u |  .35537854
         sigma_e |  .30272588
             rho |  .57949767   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . predict fitted, xb
    (32 missing values generated)
    
    . g sq_fitted=fitted^2
    (32 missing values generated)
    
    . xtreg ln_wage fitted sq_fitted
    
    Random-effects GLS regression                   Number of obs     =     28,502
    Group variable: idcode                          Number of groups  =      4,710
    
    R-sq:                                           Obs per group:
         within  = 0.1085                                         min =          1
         between = 0.1413                                         avg =        6.1
         overall = 0.1222                                         max =         15
    
                                                    Wald chi2(2)      =    3599.09
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
         ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          fitted |    3.33779   .3087343    10.81   0.000     2.732682    3.942898
       sq_fitted |  -.7015439   .0925002    -7.58   0.000    -.8828409   -.5202469
           _cons |  -1.933331   .2565439    -7.54   0.000    -2.436148   -1.430514
    -------------+----------------------------------------------------------------
         sigma_u |  .35642505
         sigma_e |  .30252041
             rho |  .58126061   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    . test sq_fitted
    
     ( 1)  sq_fitted = 0
    
               chi2(  1) =   57.52
             Prob > chi2 =    0.0000
    
    .
    *as -test- outcome reaches statistical significance, the regression model is ill-specified
    Maybe non-lineraity is an issue deserving further investigation*
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Carlo, thank you very much for your explanation. The only part that I was wondering about is the second to last sentence. Does this mean that Prob > chi2 = 0.000 means it's insignificant and so the model is not ill-specified?

      Comment


      • #4
        Femke:
        not quite.
        As the H0 is rejected, .-test- oucome warns us about possible non-linearity and/or omitted predictors in the original regression model. Put differently, -test- outcome supports the evidence that the original regression model is ill-specified..
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X