XSMLE: "r-squared within" declines when variables are added to model

Ethan Schoolman

Join Date: May 2016

Posts: 31
#1

XSMLE: "r-squared within" declines when variables are added to model

19 Oct 2017, 13:03

Stata users:

I am using the "xsmle" command to run fixed effects models on panel data, taking into account spatial autocorrelation ("sac" option); my unit of analysis is U.S. counties. My dataset has missing data, so I am using 25 imputed datasets to estimate the model using "mi estimate". For these models, I calculate r-squared by applying Rubin’s rules to imputed datasets as described in:

https://www.stata.com/support/faqs/s...-imputed-data/

My question concerns the "r-squared" measures of model fit, and especially “r-squared within”. In Model 1, there are no interaction terms. In Model 2, I interact an important explanatory variable with the period dummies (to see if the relationship between this variable and the dependent variable changes over time); these period interaction terms are all statistically significant. In Model 3, I also include interaction terms between this main explanatory variable and dummy variables for different regions of the country; none of these regional interaction terms are statistically significant.

Here is a table that describes the r-squared values for each model:

r-squared within: .481 (Model 1); .480 (Model 2); .480 (Model 3)
r-squared between: .776 (Model 1); .804 (Model 2); .791 (Model 3)
r-squared overall: .755 (Model 1); .777 (Model 2); .765 (Model 3)

My understanding has always been that “r-squared” measures (as opposed to adjusted r-squared) are supposed to never decrease when variables are added to a model. Does this rule apply differently to within, between, and overall r-squared? Even so, this wouldn’t necessarily explain Model 3, where all of the r-squared measures are lower than in Model 2.

Other possibilities that seem plausible… I am aware that xtreg, fe calculates r-squared differently from areg:

https://www.stata.com/support/faqs/s...rsus-xtreg-fe/

And I wonder if this is part of the issue with xsmle, fe. Another possibility is that r-squared measures for xsmle are calculated such that they behave more like adjusted r-equared measures, which can decrease when new variables are added if these new variables are not adding significant power to the model. This would make sense given, for instance, that none of the interaction terms added to Model 3 are statistically significant.

I would appreciate any help, methodological or theoretical, that could be provided regarding this issue.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

19 Oct 2017, 14:31

Well, I don't know anything about -xsmle- and its way of calculating R².

Speaking in more general terms, your understanding is correct that when you add new variables to a regression (provided the additional variables don't cause a shrinkage of the estimation sample due to new missing values) the R² cannot decrease. The same applies to R², within and between as these are themselves the R²s of ordinary regression carried out on transformations of the original data. The transformations of the original data do not alter the estimation sample, and adding new variables necessarily leads to an increase or no change in R², never a decrease.

I am going to speculate a little bit now, as I'm not really sure of this. But I think the problem is with Rubin's rules. When you use Rubin's rules you are, in effect, taking a weighted average of the R²s derived from regressions in different data sets. It is true that when you modify the analysis by adding variables, the R² from each data set's regression will either increase or stay the same. So you might think that the resulting MI R² should do likewise. But it's not that simple. Because when you add in the additional variables, the weighting that each imputed sample gets when applying Rubin's rules may also change. Consequently the resulting weighted average's direction of change cannot be predicted in general. It might be that the imputations that produce lower R² values when the interaction terms are included get more weight than those that produce higher R² values. As I say, I'm not really sure about this, but it's my best guess as to what's going on here.

Added: If my reasoning here is incorrect, I would appreciate a correction from someone who knows MI better than I do.
Comment
Ethan Schoolman

Join Date: May 2016

Posts: 31
#3

19 Oct 2017, 16:48

Hi Clyde -- That's helpful, but here's another wrinkle... even if you look at the various r-squared measures for individual imputed datasets, the same thing sometimes appears.

Here are the r-squared values for each model, but this time just for the 3rd imputed dataset (so Rubin's rules not applied yet):

r-squared within: .478 (Model 1); .477 (Model 2); .477 (Model 3)
r-squared between: .774 (Model 1); .807 (Model 2); .796 (Model 3)
r-squared overall: .752 (Model 1); .778 (Model 2); .769 (Model 3)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

19 Oct 2017, 17:04

I am at a loss to explain that. Based on my understanding, that shouldn't happen. Maybe you can show the code and complete output for the third imputed data set for all three models?
Comment
Ethan Schoolman

Join Date: May 2016

Posts: 31
#5

19 Oct 2017, 17:16

Happy to post the complete code but will wait and see if requests for anything else come in first (so I can do it all at once). In the meantime, though, I thought I'd share two of the articles I've come across which also uses fixed effects models, and whose regression outputs record something similar. For Griffin et al. 2015 (attached here): on p. 133 you can see that r-squareds go up and down, and quite a bit more than they do in my case. For Jorgenson & Clark 2012: on p. 23, r-squared overall is lower in the final, saturated model, compared to the prior models where interaction effects are not included.
Attached Files

Griffin et al. 2015, Tourism and the treadmill of production - a cross-national analysis.pdf (171.1 KB, 1 view)

Jorgenson and Clark 2012, Are the Economy and the Environment Decoupling?.pdf (476.8 KB, 1 view)

Last edited by Ethan Schoolman; 19 Oct 2017, 17:41. Reason: Edited to add another article.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#6

19 Oct 2017, 18:42

I should point out that the Jorgenson and Clark article does not illustrate this phenomenon. Where you highlight the overall R² decreasing between the second and third models, we do not simply have the addition of interaction terms: we have that but also the deletion of a different interaction term. The results shown in Griffin's Table 3, however, do seem to clearly exhibit the phenomenon you are concerned about.

On further exploration, here is what [XT] says about the calculation of the three R²s reported by -xtreg, fe-:

I apologize for doing a screenshot, but in this case the typography is important and I cannot get the Forum editor to replicate it. I do hope it's readable to everybody. (It's a .png file.)

So what I said in #2 about the between and overall R² values being R²s from actual regressions done on transformations of the original data is incorrect. They look like that superficially, but the beta-hat coefficients are actually the beta-hat coefficients from the mean-deviated regression; they are different from the beta-hats you would get from regressing the between-transformed data or the overall data, respectively. They are squares of correlations, but they are not bona fide R²s because the first term in each of those correlations is not in fact the predicted values of any actual regression on the averaged data or the overall data, respectively.

Nevertheless, for R² within, we are talking about an actual R² from an actual regression of de-meaned data, so I am still left with no explanation for the decrease in that one which you describe in #1. It really seems it should not happen; in your results it is a very small decrease and I wonder if it represents some kind of rounding error. FWIW, I note that in Griffin Table 3, while the overall and between R²s decrease, the within R^2_,, as I expect, does not.
Comment
Ethan Schoolman

Join Date: May 2016

Posts: 31
#7

19 Oct 2017, 20:05

Really helpful, Clyde. Given the very small decrease we're talking about, rounding error seems as good an explanation as any, though I wish I could nail it down for certain. It has also been suggested to me that I upgrade to Stata 15 and try the new -sp- suite of commands.

p.s. re: the Jorgenson & Clark article, if I'm reading the table correctly, Model C retains all the terms from Model A and then adds interaction terms (Model B includes a different interaction term, which is not in Model A or C). So the decrease in R2 in Model C, compared to Model A, does seem like the same phenomenon as in Griffin 2015. But I would be happy to stand corrected.

Last edited by Ethan Schoolman; 19 Oct 2017, 20:11.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#8

19 Oct 2017, 20:11

the Jorgenson & Clark article, if I'm reading the table correctly, Model C drops retains all the terms from Model A and then adds interaction terms (Model B includes a different interaction term, which is not in Model A or C). So the decrease in R2 in Model C, compared to Model A, does seem like the same phenomenon as in Griffin 2015. But I would be happy to stand corrected.

No, I am the one who stands corrected. I was looking at the dropoff between Model B and Model C, but didn't pay attention to the comparison of Models A and C which, does, indeed exhibit the phenomenon.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10215

20 Oct 2017, 05:15

Before I provide my explanation of what is going on here, from#5, I will remind you to be careful about uploading copyrighted work in a public forum without permission from the publisher. It is sufficient to provide a link to the paper.

The discussion relating to multiple imputation above is a red herring. The overall R-squared statistic in OLS or fixed effects is the squared correlation between the outcome and the predicted outcome. Once you get this point, the next question to ask yourself is how does FE compute fitted values and how does this differ with OLS. The general point holds that if you include extra variables in the model, the value of R2 increases, so why is the reported phenomenon of a decreasing R2 observed when using the FE estimator?

There are two main differences. With fixed effects, you do not have coefficients of the firm dummies whereas these are explicit with least squares dummy variables (LSDV). Secondly the constant in fixed effects is the average of the fixed effects whereas I do not need to define what the constant is in OLS. This implies that fixed effects will underestimate or overestimate fitted values to the extent the exclusion of these firm dummies and the differences between the constant terms represents. Therefore, including the coefficients of the interaction terms but excluding the coefficients of the firm dummies plus the effect of the differences in the constant terms can decrease the correlation between the predicted outcome and the outcome itself, leading to a lower R2 statistic. In other words, you will never have the case of a lower overall R2 in LSDV but this can happen with the fixed effects estimator in Stata. The following example illustrates:

Code:

. webuse grunfeld

. reg invest mvalue kstock i.company c.kstock#i.year

      Source |       SS           df       MS      Number of obs   =       200
-------------+----------------------------------   F(30, 169)      =    117.64
       Model |  8932200.08        30  297740.003   Prob > F        =    0.0000
    Residual |  427743.834       169   2531.0286   R-squared       =    0.9543
-------------+----------------------------------   Adj R-squared   =    0.9462
       Total |  9359943.92       199  47034.8941   Root MSE        =    50.309

-------------------------------------------------------------------------------
       invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
       mvalue |   .0877503   .0145886     6.01   0.000      .058951    .1165496
       kstock |   .8425061   .2094117     4.02   0.000     .4291064    1.255906
              |
      company |
           2  |   128.5063   36.70679     3.50   0.001     56.04342    200.9692
           3  |  -214.6438   37.10318    -5.79   0.000    -287.8892   -141.3984
           4  |  -19.32583   54.27329    -0.36   0.722    -126.4668    87.81509
           5  |  -142.7647   61.08671    -2.34   0.021     -263.356   -22.17336
           6  |  -20.03022   57.79344    -0.35   0.729    -134.1203    94.05984
           7  |  -84.88736   61.35487    -1.38   0.168     -206.008    36.23333
           8  |   -47.4304   54.47182    -0.87   0.385    -154.9632    60.10244
           9  |   -103.314   58.77129    -1.76   0.081    -219.3344    12.70642
          10  |  -6.735831   62.80364    -0.11   0.915    -130.7165    117.2449
              |
year#c.kstock |
        1936  |  -.1099621   .2359412    -0.47   0.642    -.5757338    .3558096
        1937  |  -.3024478   .2228971    -1.36   0.177    -.7424689    .1375734
        1938  |  -.4969772   .2040648    -2.44   0.016    -.8998217   -.0941327
        1939  |     -.6371   .2025571    -3.15   0.002    -1.036968    -.237232
        1940  |   -.440159   .2024035    -2.17   0.031    -.8397238   -.0405942
        1941  |  -.3275316   .2005217    -1.63   0.104    -.7233815    .0683183
        1942  |  -.3831942   .1984406    -1.93   0.055    -.7749359    .0085475
        1943  |  -.4536472   .1985109    -2.29   0.024    -.8455275   -.0617668
        1944  |  -.4641242    .198669    -2.34   0.021    -.8563167   -.0719317
        1945  |  -.4899179   .1987759    -2.46   0.015    -.8823214   -.0975143
        1946  |  -.4001617   .1990092    -2.01   0.046    -.7930257   -.0072976
        1947  |  -.4790339   .1989003    -2.41   0.017    -.8716831   -.0863846
        1948  |  -.5142672   .1996569    -2.58   0.011    -.9084099   -.1201245
        1949  |  -.5740237   .2002121    -2.87   0.005    -.9692624    -.178785
        1950  |  -.5605841   .2005968    -2.79   0.006    -.9565823   -.1645859
        1951  |  -.5388423   .2014615    -2.67   0.008    -.9365476    -.141137
        1952  |  -.5185865   .2023363    -2.56   0.011    -.9180187   -.1191544
        1953  |  -.4669786   .2039159    -2.29   0.023    -.8695291   -.0644281
        1954  |  -.4724842   .2045572    -2.31   0.022    -.8763006   -.0686678
              |
        _cons |   1.280555   62.78707     0.02   0.984    -122.6674    125.2285
-------------------------------------------------------------------------------

. predict ylsdv, xb

. corr invest ylsdv
(obs=200)

             |   invest    ylsdv
-------------+------------------
      invest |   1.0000
       ylsdv |   0.9769   1.0000


. di 0.9769^2
.95433361


. xtreg invest mvalue kstock c.kstock#i.year , fe

Fixed-effects (within) regression               Number of obs     =        200
Group variable: company                         Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.8094                                         min =         20
     between = 0.7815                                         avg =       20.0
     overall = 0.7881                                         max =         20

                                                F(21,169)         =      34.18
corr(u_i, Xb)  = 0.0327                         Prob > F          =     0.0000

-------------------------------------------------------------------------------
       invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
       mvalue |   .0877503   .0145886     6.01   0.000      .058951    .1165496
       kstock |   .8425061   .2094117     4.02   0.000     .4291064    1.255906
              |
year#c.kstock |
        1936  |  -.1099621   .2359412    -0.47   0.642    -.5757338    .3558096
        1937  |  -.3024478   .2228971    -1.36   0.177    -.7424689    .1375734
        1938  |  -.4969772   .2040648    -2.44   0.016    -.8998217   -.0941327
        1939  |     -.6371   .2025571    -3.15   0.002    -1.036968    -.237232
        1940  |   -.440159   .2024035    -2.17   0.031    -.8397238   -.0405942
        1941  |  -.3275316   .2005217    -1.63   0.104    -.7233815    .0683183
        1942  |  -.3831942   .1984406    -1.93   0.055    -.7749359    .0085475
        1943  |  -.4536472   .1985109    -2.29   0.024    -.8455275   -.0617668
        1944  |  -.4641242    .198669    -2.34   0.021    -.8563167   -.0719317
        1945  |  -.4899179   .1987759    -2.46   0.015    -.8823214   -.0975143
        1946  |  -.4001617   .1990092    -2.01   0.046    -.7930257   -.0072976
        1947  |  -.4790339   .1989003    -2.41   0.017    -.8716831   -.0863846
        1948  |  -.5142672   .1996569    -2.58   0.011    -.9084099   -.1201245
        1949  |  -.5740237   .2002121    -2.87   0.005    -.9692624    -.178785
        1950  |  -.5605841   .2005968    -2.79   0.006    -.9565823   -.1645859
        1951  |  -.5388423   .2014615    -2.67   0.008    -.9365476    -.141137
        1952  |  -.5185865   .2023363    -2.56   0.011    -.9180187   -.1191544
        1953  |  -.4669786   .2039159    -2.29   0.023    -.8695291   -.0644281
        1954  |  -.4724842   .2045572    -2.31   0.022    -.8763006   -.0686678
              |
        _cons |  -49.78202    17.3924    -2.86   0.005    -84.11637   -15.44768
--------------+----------------------------------------------------------------
      sigma_u |  93.001228
      sigma_e |  50.309329
          rho |  .77361624   (fraction of variance due to u_i)
-------------------------------------------------------------------------------
F test that all u_i=0: F(9, 169) = 52.02                     Prob > F = 0.0000

. predict yfe, xb

. corr invest yfe
(obs=200)

             |   invest      yfe
-------------+------------------
      invest |   1.0000
         yfe |   0.8878   1.0000


. di 0.8878^2
.78818884

Here, we get the same coefficients for mvalue and kstock using FE and LSDV. We are mostly not interested in the dummy variable coefficients. These are usually referred to as "nuisance parameters" in the model. However, when it comes to determining what fitted values we end up with, they play a very big role. In the FE model, you have 22 coefficients (corresponding to mvalue, kstock, 19 interactions and constant ) whereas you have 31 coefficients in LSDV (corresponding to mvalue, kstock, 9 firm dummies, 19 interactions and constant). You should notice that the coefficients on mvalue, kstock and the interactions are exactly the same, so the difference is the extra 9 firm dummies in LSDV and the different intercepts. Let us look at the effect of this on the fitted values for Firm 1 in the dataset

Code:

. list company year ylsdv yfe in 1/10

     +--------------------------------------+
     | company   year      ylsdv        yfe |
     |--------------------------------------|
  1. |       1   1935   273.7788   222.7162 |
  2. |       1   1936   448.8779   397.8153 |
  3. |       1   1937   558.7352   507.6727 |
  4. |       1   1938   318.5815    267.519 |
  5. |       1   1939   421.5447   370.4821 |
     |--------------------------------------|
  6. |       1   1940   492.1504   441.0878 |
  7. |       1   1941   532.0712   481.0086 |
  8. |       1   1942   425.4443   374.3817 |
  9. |       1   1943   459.6915   408.6289 |
 10. |       1   1944   461.8471   410.7846 |
     +--------------------------------------+

Here, you see that the fixed effects coefficients underestimate the fitted values (or the LSDV coefficients overestimate!) and this has an effect on the correlation because the outcome remains the same in both cases. Because Firm 1 is the base firm (omitted), the value of the dummy variable coefficient in LSDV is 0, so the difference in fitted values (LSDV and FE) is simply the difference in the intercept terms, i.e., (-49.78202 -1.280555)=-51.062575. Looking at the regressions without the interactions, you observe that the LSDV overall R2 is higher with interactions compared to without whereas the phenomenon that you report of a lower overall R2 can be observed here too with the regression with interactions having a lower overall R2. So if you think that this will confuse the reader, you may want to report the LSDV overall R2.

Code:

. reg invest mvalue kstock i.company

      Source |       SS           df       MS      Number of obs   =       200
-------------+----------------------------------   F(11, 188)      =    288.50
       Model |   8836465.8        11  803315.073   Prob > F        =    0.0000
    Residual |  523478.114       188  2784.45805   R-squared       =    0.9441
-------------+----------------------------------   Adj R-squared   =    0.9408
       Total |  9359943.92       199  47034.8941   Root MSE        =    52.768

------------------------------------------------------------------------------
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .1101238   .0118567     9.29   0.000     .0867345    .1335131
      kstock |   .3100653   .0173545    17.87   0.000     .2758308    .3442999
             |
     company |
          2  |   172.2025   31.16126     5.53   0.000     110.7319    233.6732
          3  |  -165.2751   31.77556    -5.20   0.000    -227.9576   -102.5927
          4  |    42.4874   43.90987     0.97   0.334    -44.13197    129.1068
          5  |  -44.32013   50.49225    -0.88   0.381    -143.9243    55.28406
          6  |   47.13539   46.81068     1.01   0.315    -45.20629    139.4771
          7  |   3.743212   50.56493     0.07   0.941    -96.00433    103.4908
          8  |   12.75103   44.05263     0.29   0.773    -74.14994      99.652
          9  |  -16.92558   48.45326    -0.35   0.727    -112.5075    78.65636
         10  |   63.72884   50.33023     1.27   0.207    -35.55572    163.0134
             |
       _cons |  -70.29669   49.70796    -1.41   0.159    -168.3537    27.76035
------------------------------------------------------------------------------

. xtreg invest mvalue kstock, fe

Fixed-effects (within) regression               Number of obs     =        200
Group variable: company                         Number of groups  =         10

R-sq:                                           Obs per group:
     within  = 0.7668                                         min =         20
     between = 0.8194                                         avg =       20.0
     overall = 0.8060                                         max =         20

                                                F(2,188)          =     309.01
corr(u_i, Xb)  = -0.1517                        Prob > F          =     0.0000

------------------------------------------------------------------------------
      invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      mvalue |   .1101238   .0118567     9.29   0.000     .0867345    .1335131
      kstock |   .3100653   .0173545    17.87   0.000     .2758308    .3442999
       _cons |  -58.74393   12.45369    -4.72   0.000    -83.31086     -34.177
-------------+----------------------------------------------------------------
     sigma_u |  85.732501
     sigma_e |  52.767964
         rho |  .72525012   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(9, 188) = 49.18                     Prob > F = 0.0000

Last edited by Andrew Musau; 20 Oct 2017, 06:09.

Comment

Ethan Schoolman

Join Date: May 2016

Posts: 31
#10

20 Oct 2017, 07:23

Andrew, this is also very helpful, and in fact certainly seems to solve the problem, at least where FE R2-overall is concerned... I have one question: Would the reasons you give above also explain why FE R2-within can decline when interaction terms are added to the model?

p.s. It appears that I can't delete the attached articles from my earlier post, but I will be more careful in the future.

Last edited by Ethan Schoolman; 20 Oct 2017, 07:26.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10215
#11

20 Oct 2017, 08:09

Would the reasons you give above also explain why FE R2-within can decline when interaction terms are added to the model?

No, the within R2 in xtreg,fe can never be lower in a model with extra parameters. The simple reason is that this is the R2 statistic from applying OLS on the mean deviated variables, so it is a comparison of an OLS regression model with the same model including additional variables. I illustrate the computation making a different point in #9 of the following link

https://www.statalist.org/forums/for...ted-as-missing

Again, note that this is Statacorp's definition of a within R2. I have never used XSMLE, but if you provide a complete example enclosing a dataset and showing this phenomenon, I can have a look at what may be going on. This may take time depending on my other commitments.
Comment
Ethan Schoolman

Join Date: May 2016

Posts: 31
#12

20 Oct 2017, 08:50

Thanks so much, Andrew. I think before I take up more of your time, I am going to upgrade to Stata 15 and try the -sp- commands.
Comment
Federico Belotti

Join Date: Apr 2014

Posts: 16
#13

20 Oct 2017, 09:53

Hi All,

-xsmle- computes the within, between and overall R² as in -xtreg-. While I fully agree with Andrew that, by construction, the within R²cannot decrease adding extra covariates in a linear panel data model estimated using the within-group estimator, I think that in the Ethan's case the very small reduction in the within R² is due to the impact of the added covariates on the estimated -rho- coefficient. Indeed, in the fixed-effects SAC model, the within R² is the squared correlation between (y_it - ybar_i) and y_it_what = B*[(x_it-xbar_i)*beta], the reduced form predictor for the fixed-effects SAC model, where B = (I-rho*W)^(-1), ybar_i = 1/T*sum_t (y_it) and xbar_i = 1/T*sum_t (x_it).

Best,

Last edited by Federico Belotti; 20 Oct 2017, 10:02.

Federico
Comment
Ethan Schoolman

Join Date: May 2016

Posts: 31
#14

20 Oct 2017, 11:54

Thanks, Federico -- this is very helpful. If -xsmle- computes r2 the same as -xtreg- (and presumably the same way also as the new -sp- commands in Stata 15), then it makes sense that the strange phenomenon of the (very slightly) declining r2-within would have to do with the interaction (in a non-technical sense!) of added covariates with what is distinctive about -xsmle-, namely its accounting for spatial relationships.

Barring further commentary, I think this case can be declared "closed". Thanks to everyone who participated. This forum is an invaluable resource.
Comment

Announcement