Stata users:
I am using the "xsmle" command to run fixed effects models on panel data, taking into account spatial autocorrelation ("sac" option); my unit of analysis is U.S. counties. My dataset has missing data, so I am using 25 imputed datasets to estimate the model using "mi estimate". For these models, I calculate r-squared by applying Rubin’s rules to imputed datasets as described in:
https://www.stata.com/support/faqs/s...-imputed-data/
My question concerns the "r-squared" measures of model fit, and especially “r-squared within”. In Model 1, there are no interaction terms. In Model 2, I interact an important explanatory variable with the period dummies (to see if the relationship between this variable and the dependent variable changes over time); these period interaction terms are all statistically significant. In Model 3, I also include interaction terms between this main explanatory variable and dummy variables for different regions of the country; none of these regional interaction terms are statistically significant.
Here is a table that describes the r-squared values for each model:
r-squared within: .481 (Model 1); .480 (Model 2); .480 (Model 3)
r-squared between: .776 (Model 1); .804 (Model 2); .791 (Model 3)
r-squared overall: .755 (Model 1); .777 (Model 2); .765 (Model 3)
My understanding has always been that “r-squared” measures (as opposed to adjusted r-squared) are supposed to never decrease when variables are added to a model. Does this rule apply differently to within, between, and overall r-squared? Even so, this wouldn’t necessarily explain Model 3, where all of the r-squared measures are lower than in Model 2.
Other possibilities that seem plausible… I am aware that xtreg, fe calculates r-squared differently from areg:
https://www.stata.com/support/faqs/s...rsus-xtreg-fe/
And I wonder if this is part of the issue with xsmle, fe. Another possibility is that r-squared measures for xsmle are calculated such that they behave more like adjusted r-equared measures, which can decrease when new variables are added if these new variables are not adding significant power to the model. This would make sense given, for instance, that none of the interaction terms added to Model 3 are statistically significant.
I would appreciate any help, methodological or theoretical, that could be provided regarding this issue.
I am using the "xsmle" command to run fixed effects models on panel data, taking into account spatial autocorrelation ("sac" option); my unit of analysis is U.S. counties. My dataset has missing data, so I am using 25 imputed datasets to estimate the model using "mi estimate". For these models, I calculate r-squared by applying Rubin’s rules to imputed datasets as described in:
https://www.stata.com/support/faqs/s...-imputed-data/
My question concerns the "r-squared" measures of model fit, and especially “r-squared within”. In Model 1, there are no interaction terms. In Model 2, I interact an important explanatory variable with the period dummies (to see if the relationship between this variable and the dependent variable changes over time); these period interaction terms are all statistically significant. In Model 3, I also include interaction terms between this main explanatory variable and dummy variables for different regions of the country; none of these regional interaction terms are statistically significant.
Here is a table that describes the r-squared values for each model:
r-squared within: .481 (Model 1); .480 (Model 2); .480 (Model 3)
r-squared between: .776 (Model 1); .804 (Model 2); .791 (Model 3)
r-squared overall: .755 (Model 1); .777 (Model 2); .765 (Model 3)
My understanding has always been that “r-squared” measures (as opposed to adjusted r-squared) are supposed to never decrease when variables are added to a model. Does this rule apply differently to within, between, and overall r-squared? Even so, this wouldn’t necessarily explain Model 3, where all of the r-squared measures are lower than in Model 2.
Other possibilities that seem plausible… I am aware that xtreg, fe calculates r-squared differently from areg:
https://www.stata.com/support/faqs/s...rsus-xtreg-fe/
And I wonder if this is part of the issue with xsmle, fe. Another possibility is that r-squared measures for xsmle are calculated such that they behave more like adjusted r-equared measures, which can decrease when new variables are added if these new variables are not adding significant power to the model. This would make sense given, for instance, that none of the interaction terms added to Model 3 are statistically significant.
I would appreciate any help, methodological or theoretical, that could be provided regarding this issue.
Comment