Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtreg: r-squared between & r-squared overall declines when quadratic terms are added to model

    Stata users:

    I'am using the "xtreg" command to run random effects models on panel data (unbalanced panel).
    My unit of analysis is Korea government agencies.

    My question concerns the "r-squared between" and "r-squared overall".
    In model 1, there are no quadratic terms. In model 2, I include 4 quadratic terms.
    Though one of these quadratic term is statistically significant, "r-squared between" and "r-squared overall" declines.

    As far as I know, r-squared (not adjusted r-squared) is supposed to never decrease when variables are added to a model.
    Does this apply differently to panel r-squared?
    I know that xtreg, fe calculates r-squared differently from areg. But my model is RE.

    I would appreciate any help regarding this issue.


    # delimit
    xtreg reput c.c_z_coder12_r c.c_z_sv_amb_di_r c.c_z_amb_ev c.c_z_amb_pri
    c_sq_age2 c_ln_fi_total c_ln_size_full c_sq_up5_r c_fi_ex_ratio c_factor_cen c_factor_pbase
    i.org_type i.year
    , re vce(cl org) theta;
    # delimit cr

    # delimit
    xtreg reput c.c_z_coder12_r##c.c_z_coder12_r c.c_z_sv_amb_di_r##c.c_z_sv_amb_di_r c.c_z_amb_ev##c.c_z_amb_ev c.c_z_amb_pri##c.c_z_amb_pri
    c_sq_age2 c_ln_fi_total c_ln_size_full c_sq_up5_r c_fi_ex_ratio c_factor_cen c_factor_pbase
    i.org_type i.year
    , re vce(cl org) theta;
    # delimit cr




    Random-effects GLS regression Number of obs = 228
    Group variable: org Number of groups = 44

    R-sq: Obs per group:
    within = 0.0994 min = 1
    between = 0.3990 avg = 5.2
    overall = 0.3377 max = 7

    Wald chi2(19) = 70.29
    corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

    ------------------- theta --------------------
    min 5% median 95% max
    0.3668 0.4993 0.7046 0.7046 0.7046

    (Std. Err. adjusted for 44 clusters in org)
    ---------------------------------------------------------------------------------
    | Robust
    reput | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    ----------------+----------------------------------------------------------------
    c_z_coder12_r | -.1582868 .121043 -1.31 0.191 -.3955268 .0789532
    c_z_sv_amb_di_r | .1300482 .0996248 1.31 0.192 -.0652129 .3253093
    c_z_amb_ev | .2304708 .0934306 2.47 0.014 .0473501 .4135914
    c_z_amb_pri | .1374347 .1141718 1.20 0.229 -.0863379 .3612073
    c_sq_age2 | .0527441 .0909437 0.58 0.562 -.1255023 .2309906
    c_ln_fi_total | -.1303679 .0615055 -2.12 0.034 -.2509165 -.0098193
    c_ln_size_full | -.5278446 .2200759 -2.40 0.016 -.9591855 -.0965037
    c_sq_up5_r | -.7677916 1.11694 -0.69 0.492 -2.956955 1.421371
    c_fi_ex_ratio | 3.959733 5.413107 0.73 0.464 -6.649762 14.56923
    c_factor_pbase | .2466075 .1336372 1.85 0.065 -.0153165 .5085316
    c_factor_cen | -.1580916 .1283419 -1.23 0.218 -.4096371 .0934538
    |
    org_type |
    2 | .3448319 .5811616 0.59 0.553 -.794224 1.483888
    3 | -.0150424 .4863613 -0.03 0.975 -.968293 .9382083
    |
    year |
    2012 | .5635269 .2108262 2.67 0.008 .1503152 .9767386
    2013 | .5536817 .260991 2.12 0.034 .0421486 1.065215
    2014 | .6628218 .3493855 1.90 0.058 -.0219612 1.347605
    2015 | .7331962 .3281427 2.23 0.025 .0900483 1.376344
    2016 | .4634112 .3310233 1.40 0.162 -.1853825 1.112205
    2017 | .753503 .2953417 2.55 0.011 .1746438 1.332362
    |
    _cons | -.4609324 .4146254 -1.11 0.266 -1.273583 .3517184
    ----------------+----------------------------------------------------------------
    sigma_u | 1.0696963
    sigma_e | .8750723
    rho | .59908334 (fraction of variance due to u_i)
    ---------------------------------------------------------------------------------



    Random-effects GLS regression Number of obs = 228
    Group variable: org Number of groups = 44

    R-sq: Obs per group:
    within = 0.1197 min = 1
    between = 0.3884 avg = 5.2
    overall = 0.3240 max = 7

    Wald chi2(23) = 78.44
    corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000

    ------------------- theta --------------------
    min 5% median 95% max
    0.3962 0.5278 0.7247 0.7247 0.7247

    (Std. Err. adjusted for 44 clusters in org)
    ---------------------------------------------------------------------------------------------
    | Robust
    reput | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    ----------------------------+----------------------------------------------------------------
    c_z_coder12_r | -.3011932 .1001225 -3.01 0.003 -.4974297 -.1049568
    |
    c.c_z_coder12_r#|
    c.c_z_coder12_r | .1389741 .0609769 2.28 0.023 .0194616 .2584866
    |
    c_z_sv_amb_di_r | .1418791 .1084855 1.31 0.191 -.0707486 .3545068
    |
    c.c_z_sv_amb_di_r#|
    c.c_z_sv_amb_di_r | -.0106288 .0501871 -0.21 0.832 -.1089937 .0877362
    |
    c_z_amb_ev | .2514924 .1086162 2.32 0.021 .0386086 .4643762
    |
    c.c_z_amb_ev#c.c_z_amb_ev | -.0153864 .0526041 -0.29 0.770 -.1184885 .0877157
    |
    c_z_amb_pri | .1215835 .1303769 0.93 0.351 -.1339506 .3771175
    |
    c.c_z_amb_pri#c.c_z_amb_pri | .0300072 .0432316 0.69 0.488 -.0547252 .1147395
    |
    c_sq_age2 | .0359767 .0931003 0.39 0.699 -.1464964 .2184499
    c_ln_fi_total | -.1343288 .0637824 -2.11 0.035 -.2593401 -.0093176
    c_ln_size_full | -.5118246 .2192068 -2.33 0.020 -.9414621 -.0821871
    c_sq_up5_r | -.4459308 1.165331 -0.38 0.702 -2.729937 1.838075
    c_fi_ex_ratio | 4.136125 5.672302 0.73 0.466 -6.981382 15.25363
    c_factor_cen | -.1714061 .1405665 -1.22 0.223 -.4469115 .1040992
    c_factor_pbase | .2743009 .1450345 1.89 0.059 -.0099614 .5585632
    |
    org_type |
    2 | .376366 .5685421 0.66 0.508 -.7379561 1.490688
    3 | .1222509 .4832368 0.25 0.800 -.8248757 1.069378
    |
    year |
    2012 | .5509363 .2144268 2.57 0.010 .1306675 .9712051
    2013 | .6018072 .2458745 2.45 0.014 .119902 1.083712
    2014 | .7005422 .3292841 2.13 0.033 .0551572 1.345927
    2015 | .7299815 .2984586 2.45 0.014 .1450134 1.31495
    2016 | .4727062 .3103182 1.52 0.128 -.1355062 1.080919
    2017 | .8041507 .2874125 2.80 0.005 .2408324 1.367469
    |
    _cons | -.6848513 .4418399 -1.55 0.121 -1.550842 .181139
    ----------------------------+----------------------------------------------------------------
    sigma_u | 1.1598991
    sigma_e | .87865805
    rho | .63538406 (fraction of variance due to u_i)
    ---------------------------------------------------------------------------------------------
    Last edited by Hayoung Lee; 09 Nov 2018, 00:57.

  • #2
    Assuming you are using up-to-date Stata version 15.1, on page 443 of the [XT] volume of the PDF documentation you will find the formulas for the three R2 values (within, between, and overall) reported by -xtreg, re-.* None of them is really quite analogous to R2 from a single-level regression. In the context of your question, as you add terms to the model, the amount of variance explained by the random intercepts may increase (which actually happened here: look at sigma_u in both models), but this part of variation is not included in any of the R2 formulas.

    R2 in ordinary linear regression is a really nice statistic: it's easy to calculate and it can be interpreted in many useful ways. Unfortunately, other linear models, including panel regression, don't have fully analogous statistics. The various variations on "R2" in other models often share some of the nice properties of OLS R2, but they never share all of them. And the intuitions that we develop and become comfortable with about OLS R2 often fail us with other models.

    *I didn't copy/paste them here because the symbols involved just don't display properly in the Forum editor.

    Comment


    • #3
      Thanks Clyde, this is very helpful. But I am still left confused.

      If R-sq values reported by -xtreg, re- are not analogous to OLS R-sq, then can't I compare two models(model1 with first term / model2 with quadratic term) based on R-sq?

      If I want to insist model2 is better than model1, how can I compare two models?
      Significance of quadratic terms (without mentioning R-sq) could be a reason for choosing model2?

      I'm so sorry for these basic questions.

      Comment


      • #4
        Hayoung:
        you can consider R-sq between that -xtreg,re- gives back (the highe, the better).
        If the square term is part of the data generating process and reaches statistical significance (that is, there's evidence of a quadratic relationship between the predictor and the regressand) and the turning point falls within the range of the values collected for that predictor, other things being equal I would go for that model.
        Kind regards,
        Carlo
        (Stata 15.1 SE)

        Comment


        • #5
          Dear Carlo, thank you for your help.

          In my case, there's evidence of a quadratic relationship between the predictor and the regressand and the turning point falls within the range of the values collected for that predictor.
          So, as far as you recommended, I would go for model 2(with quadratic terms).
          However, in model 2, "r-squared between" declines (though quadratic terms are statistically significant).
          Isn't it a problem?

          Comment


          • #6
            Hayoung:
            no, it is not.
            Go with the model that includes the quadratic term.
            Kind regards,
            Carlo
            (Stata 15.1 SE)

            Comment

            Working...
            X