RMSE: why does stata adjust for degrees of freedom?

Paul Clist

Join Date: May 2015

Posts: 3
#1

RMSE: why does stata adjust for degrees of freedom?

05 Sep 2025, 08:39

I need to calculate RMSE a number of times, where I have saved data and various predictions. In checking the stats, I noticed my own calculations and stata's results (via cnsreg) were off. It seems Stata adjusts for the degrees of freedom in all RMSE calculations. I find this odd, as textbooks I've looked at do not adjust for degrees of freedom. (E.g. Wooldridge writes "This is essentially the sample standard deviation of the forecast errors (without any degrees of freedom adjustment)." Greene also implies no adjustment. Is this a well known issue? Is it normally such as small effect people don't mind? Are the textbooks out of date? I can find stata forum posts confidently asserting stata's approach is correct/normal, but nothing acknowledging the discrepancy between econometrics textbooks and stata's implementation.

My own code was running cnsreg with a constraint of 1 on the only RHS variable (the model prediction), and without a constant, as a quick way of calculating the RMSE. That approach means cnsreg actually adds one to the sample size, as there is one constraint.

Is there a way, other than coding the whole calculation, to opt for an unadjusted RMSE? And is this adjustment standard outside of my favourite textbooks?

Many thanks.
Tags: None
Richard Williams

Join Date: Apr 2014

Posts: 5025
#2

05 Sep 2025, 20:23

I don’t know off the top of my head, but have you tried running the same problem in two different programs, e.g. Stata and SPSS? Or taken a textbook problem that includes the data needed to replicate the example and see what Stata says? I’d want to make sure there is a discrepancy first, and that you use aren’t misunderstanding things.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://academicweb.nd.edu/~rwilliam/
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10294
#3

06 Sep 2025, 04:40

Originally posted by Paul Clist View Post

It seems Stata adjusts for the degrees of freedom in all RMSE calculations.

This is not a Stata thing! There are different ways to estimate the error variance, but let's first consider the linear regression model before presenting these approaches. We have:

\[
y = X\beta + u,
\]
with \(n\) observations and \(k\) parameters (including the intercept). We can define the vector of OLS residuals as:

\[
e = y - X\hat{\beta},
\]

which is an \(n \times 1\) vector. The quadratic form:

\[
e'e = \sum_{i=1}^n e_i^2
\]

is the sum of squared residuals (SSR). Now, because the population error variance \(\sigma^2 = \mathbb{E}[u_i^2]\) is unknown, we need to estimate it. There are two common approaches:

(1) Maximum likelihood estimate (MLE):

\[
\hat{\sigma}^2_{\text{MLE}} = \frac{e'e}{n},
\]
which divides by \(n\), but is biased downward.

(2) Unbiased estimate:

\[
s^2 = \frac{e'e}{n-k}.
\]
This divides by \(n-k\) (degrees of freedom), correcting the bias. Under the classical OLS assumptions,

\[
\frac{e'e}{\sigma^2} \sim \chi^2_{n-k},
\]

so that

\[
\mathbb{E}\!\left[\frac{e'e}{\sigma^2}\right] = n-k.
\]

Thus dividing \(e'e\) by \(n-k\) gives an unbiased estimator of \(\sigma^2\). The Root Mean Squared Error (RMSE) is then defined as

\[
\text{RMSE} = \sqrt{\frac{e'e}{n-k}} = \sqrt{s^2}.
\]

For large \(n\), the difference between dividing by \(n\) and \(n-k\) is negligible (unimportant).

Is there a way, other than coding the whole calculation, to opt for an unadjusted RMSE?

Given that the RMSE is an estimate of the standard deviation of the regression residuals, just predict the regression residuals and compute their standard deviation.

Code:

sysuse auto, clear regress mpg weight displacement predict res, res sum res di `r(sd)'

Res.:

Code:

. regress mpg weight displacement Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(2, 71) = 66.79 Model | 1595.40969 2 797.704846 Prob > F = 0.0000 Residual | 848.049768 71 11.9443629 R-squared = 0.6529 -------------+---------------------------------- Adj R-squared = 0.6432 Total | 2443.45946 73 33.4720474 Root MSE = 3.4561 ------------------------------------------------------------------------------ mpg | Coefficient Std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- weight | -.0065671 .0011662 -5.63 0.000 -.0088925 -.0042417 displacement | .0052808 .0098696 0.54 0.594 -.0143986 .0249602 _cons | 40.08452 2.02011 19.84 0.000 36.05654 44.11251 ------------------------------------------------------------------------------ . . predict res, res . . sum res Variable | Obs Mean Std. dev. Min Max -------------+--------------------------------------------------------- res | 74 4.51e-09 3.40839 -6.965423 13.8371 . . di `r(sd)' 3.4083896
5 likes
Comment
Paul Clist

Join Date: May 2015

Posts: 3
#4

08 Sep 2025, 09:53

Thank you both. For future readers...

1. Yes, it is possible I've misunderstood.

2. I'm interested in the RMSE error because of a paper called 'Disguising Lies—Image Concerns and Partial Lying in Cheating Games' By Kiryl Khalmetski and Dirk Sliwka, published in American Economic Journal: Microeconomics 2019. They calculate the RMSE manually in Mathematica without adjusting for the degrees of freedom (code below). Below, where k=5 (not shown) they merely average the squared error in predictions over six points.The model has 2 parameters, but they don't adjust for these.

To me, this appears to be in line with Greene and Wooldridge, but not in line with Stata or Andrew.

3. I worked out a simple workaround to give the unadjusted RMSE:

Code:

local rmse = e(rmse) * sqrt(e(df_r) / e(N))

. This readjusts, using the model degrees of freedom that Stata thinks you have, and the sample size.
(Andrew provides another approach above, but that doesn't seem to lead to the same result as I wish to get - see this code to compare.)

Code:

sysuse auto, clear regress mpg weight displacement predict res, res sum res di `r(sd)' gen se = res^2 sum se di sqrt(`r(mean)') di e(rmse) * sqrt(e(df_r) / e(N))

4. I imagine this is yet another example of the same statistical term being used to mean different things in stats/econometrics, and people (or me at least) being unaware of the variety of definitions.

Many thanks for the replies, and hope this helps someone.
Comment
Paul Clist

Join Date: May 2015

Posts: 3
#5

08 Sep 2025, 10:02

Richard - I forgot to post that the unadjusted RMSE appears to be used in at least some R packages, e.g. metrics just uses y and yhat: https://www.r-bloggers.com/2021/07/h...ror-rmse-in-r/
But you can't tempt me to venture into SPSS to find out their approach!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35807
#6

08 Sep 2025, 11:06

A detail that reconciles any apparent contradiction here is that summarize also uses an unbiased estimator for variance, namely it uses in the divisor the sample size MINUS 1

This is documented, but here is a simple demonstration. Consider the values 1 2 3 4 5 where the mean is 3 and the deviations from the mean are -2 -1 0 1 2 and their squares are 4 1 0 1 4 and so the sum of squared deviations is 10. If you use maximum likelihood you then divide by 5 and get 2 for the variance but if you use the unbiased estimator you divide by 4 and get 2.5. To estimate the SD you take the square root in either case.

Here is the whole kit and caboodle, especially because I can't do most square roots in my head.

Code:

. clear . set obs 5 number of observations (_N) was 0, now 5 . gen x = _n . list +---+ | x | |---| 1. | 1 | 2. | 2 | 3. | 3 | 4. | 4 | 5. | 5 | +---+ . su x, d x ------------------------------------------------------------- Percentiles Smallest 1% 1 1 5% 1 2 10% 1 3 Obs 5 25% 2 4 Sum of Wgt. 5 50% 3 Mean 3 Largest Std. Dev. 1.581139 75% 4 2 90% 5 3 Variance 2.5 95% 5 4 Skewness 0 99% 5 5 Kurtosis 1.7 . . mata ------------------------------------------------- mata (type end to exit) -------------------- : x = (1::5) : sum((x :- 3):^2) / 5 2 : sum((x :- 3):^2) / 4 2.5 : sqrt(2) 1.414213562 : sqrt(2.5) 1.58113883 : end

Tactical tip: To try out simple examples, create simple variables in Stata -- or simple vectors in Mata.

Technical detail: The square root of the variance, even calculated this way, is not an unbiased estimator of the SD, although sloppy texts will claim or imply that to be true, yet the bias is usually slight for reasonable sample sizes (and that circularly defines "reasonable" and "slight").
1 like
Comment

George Ford

Join Date: Aug 2014
Posts: 3207

08 Sep 2025, 11:31

Code:

sysuse auto, clear
regress mpg weight displacement
predict res, res
predict yhat, xb
sum res
di `r(sd)'

gen se = res^2
sum se
di sqrt(`r(mean)')
di e(rmse) * sqrt(e(df_r) / e(N))
rmse mpg yhat , raw

capture drop rmse
g rmse = (mpg - yhat)^2 
summ rmse
di sqrt(r(sum) / 74)

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10294
#8

08 Sep 2025, 13:52

Originally posted by Nick Cox View Post

A detail that reconciles any apparent contradiction here is that summarize also uses an unbiased estimator for variance, namely it uses in the divisor the sample size MINUS 1

Good catch, Nick! That should have been apparent to me, since it follows the same convention as the calculation of the RMSE.
Comment

Announcement

RMSE: why does stata adjust for degrees of freedom?

Comment

Comment

Comment

Comment

Comment

Comment

Comment