Randomized quantile residuals (Dunn-Smyth residuals) for GLM diagnostics

Lili Bulfone

Join Date: Apr 2021

Posts: 15
#1

Randomized quantile residuals (Dunn-Smyth residuals) for GLM diagnostics

09 Mar 2026, 00:28

I am a student undertaking my Master's in Biostatistics and am working through GLM diagnostics and want to compute randomised quantile residuals (Dunn-Smyth residuals) after fitting GLMs in Stata.

My understanding is that randomised quantile residuals are considered the gold standard diagnostic tool for GLMs - they are approximately standard normal if the model is correct, regardless of the outcome distribution, which makes them much more useful than deviance or Pearson residuals for non-normal outcomes.

In R, these are straightforwardly available via qresid() from the statmod package. However, I prefer to work in Stata and am not aware of an official or SSC-validated command that computes these residuals after glm.

I found a GitHub package (https://github.com/psotob91/qresid) by Percy Soto-Becerra, which appears to implement randomised quantile residuals for normal linear regression, Gaussian GLMs, binomial/Bernoulli models (via both glm and logit/logistic commands), Poisson models (via both glm and poisson commands), and negative binomial models (via both glm and nbreg commands). Since I'm not expert enough to know if code is correct, and the package is pre-release and has not been submitted to SSC, can anyone point me to a validated Stata command for computing randomised quantile residuals after glm?
Tags: None
George Ford

Join Date: Aug 2014

Posts: 3337
#2

09 Mar 2026, 09:06

I suspect you have searched tireless for such programs, and have found but 1 (coded 4 years ago).

It looks like the qresid includes a Monte Carlo simulation.

You could use the MC or some other data to estimate in R and Stata and compare the results.

The ado file does not look that complicated, so you could probably check it against the theory.
Comment
Lili Bulfone

Join Date: Apr 2021

Posts: 15
#3

10 Mar 2026, 05:08

Thanks for the suggestion, George. Given my limited knowledge and time constraints, I'm wondering if I'd be better off not being so precious about my preference for Stata and learn to code in R. Thanks again.
Comment
George Ford

Join Date: Aug 2014

Posts: 3337
#4

10 Mar 2026, 07:11

I've given that a try; did not work out.

I've had some success having Claude rewrite R code to run in Stata. It takes a little tweaking sometimes, but works remarkably well.

Or, you could drop the ado file into the Claude/ChatGPT and get an explanation and check for comparability across the Stata and R versions.
Comment
Lili Bulfone

Join Date: Apr 2021

Posts: 15
#5

10 Mar 2026, 08:29

Thanks again, George. I'll give your suggestions a try.
Comment
George Ford

Join Date: Aug 2014

Posts: 3337
#6

12 Mar 2026, 11:13

There were a few bugs in the Stata version. Here's a fix. No guarantee but matches R's qresid. Models must be estimated by GLM (Gaussian, Bernoulli/Binomial, Poisson, or Negative Binomial supported).

Here's some MC to see it work.

Code:

qresid.ado
Comment

George Ford

Join Date: Aug 2014
Posts: 3337

12 Mar 2026, 11:13

Code:

*-------------------------------------------------------------------------------
**# Bernoulli example simulation:
clear all
set seed 123
set obs 100
gen y = rbinomial(1, 0.3)
gen x = _n

**## command: glm
glm y x, family(binomial) link(logit)
predict dev, dev

qresid res1
qresid res2, nqres(1)
qresid res3, standardized nqres(1)

qnorm dev, name(qq1, replace) nodraw
qnorm res11, name(qq2, replace) nodraw
qnorm res21, name(qq3, replace) nodraw
qnorm res3_std1, name(qq4, replace) nodraw
graph combine qq1 qq2 qq3 qq4

*-------------------------------------------------------------------------------
**# Binomial example simulation:
clear all
set seed 123
set obs 30
*gen n = runiformint(100, 200)
gen n = 20
gen y = rbinomial(n, 0.05)
gen x = _n

**## command: glm
glm y x, family(binomial n) link(logit)
predict dev, dev

qresid res1
qresid res2, nqres(1)
qresid res3, standardized nqres(1)

qnorm dev, name(qq1, replace) nodraw
qnorm res11, name(qq2, replace) nodraw
qnorm res21, name(qq3, replace) nodraw
qnorm res3_std1, name(qq4, replace) nodraw
graph combine qq1 qq2 qq3 qq4

*-------------------------------------------------------------------------------
**# Poisson example simulation:
**## Simulation 
clear all 
set seed 123
set obs 50
gen y = rpoisson(1)
gen x = _n

**## command: glm
glm y x, family(poisson)
predict dev, dev

qresid res1
qresid res2, nqres(1)
qresid res3, standardized nqres(1)

qnorm dev, name(qq1, replace) nodraw
qnorm res11, name(qq2, replace) nodraw
qnorm res21, name(qq3, replace) nodraw
qnorm res3_std1, name(qq4, replace) nodraw
graph combine qq1 qq2 qq3 qq4

*------------------------------------------------------------------------------
**# Negative Binomial example simulation:
**## Simulation 
clear all 
set seed 123
set obs 50
gen y = rnbinomial(2,0.5)
gen x = _n

**## command: glm
glm y x, family(nbinomial ml)
predict dev, dev

qresid res1
qresid res2, nqres(1)
qresid res3, standardized nqres(1)

qnorm dev, name(qq1, replace) nodraw
qnorm res11, name(qq2, replace) nodraw
qnorm res21, name(qq3, replace) nodraw
qnorm res3_std1, name(qq4, replace) nodraw
graph combine qq1 qq2 qq3 qq4

Comment

Percy Soto-Becerra MD

Join Date: Oct 2019

Posts: 2
#8

09 May 2026, 01:21

Thank you very much for your interest in the package, and especially to Lili for raising the question that motivated me to revisit and resume the project after this discussion.

Due to personal circumstances — and also because during the last few years I have been working mainly with R — development of the package remained unfinished for some time. However, this thread motivated me to prioritize the project again.

I am currently updating and restructuring the codebase to prepare a first official public release before the end of this month. The initial version will focus on the original independent-response regression models described in the Dunn & Smyth framework, including continuous and count GLMs.

After that, I am planning a second development stage extending support to more complex count-data settings such as zero-inflated, hurdle, and related mixture models. A later major version is planned for multilevel and correlated-data models.

The repository will remain fully open-source, and contributions, suggestions, validation efforts, and collaborations are very welcome.

Thanks again for the interest and discussion.
1 like
Comment
Percy Soto-Becerra MD

Join Date: Oct 2019

Posts: 2
#9

10 May 2026, 23:33

Thanks again for the interest and for the earlier discussion around randomized quantile residuals.

I have now released the first public version of qresid, and the package has been submitted to SSC for review.

GitHub repository:
https://github.com/psotob91/qresid

The current release focuses on supported regression models for non-correlated data. You will find interesting implementations in count data models. Additional extensions for correlated-data modls are planned for future versions.

Feedback and suggestions are very welcome.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2291
#10

11 May 2026, 07:36

I was interested to read about Dunn-Smyth residuals, as I hadn't heard of them before this thread. Perhaps this is a difference across fields. I do wonder how they're applied. For example, if Y is a binary response I know it's distribution is Bernoulli. Therefore, any diagnostic is necessarily a functional form test for the response probability, P(Y = 1|X). I think using the D-S residuals must be inferior to other direct methods, such as a RESET-type test that puts in higher powers of X_i*b^. Or, just include squares and cross products directly.

In other cases, such as when Y is a count variable and I use, say, Poisson regression, using D-S gives an omnibus test that can reject for many reasons. But different misspecifications have different levels of importance. For estimating mean effects, Poisson regression is very well suited because the estimator is completely robust for the mean parameters regardless of the actual distribution. If I conclude, using D-S, that there's some sort of model misspecification, what should I do? It could be that the standardized fourth moment in my data doesn't match up with the Poisson distribution, something I wouldn't care much about. If the conditional mean is misspecified then using D-S to move to, say, a negative binomial distribution is counterproductive because it could be that the Poisson distribution is correct but the mean is incorrect.

When applied to roughly continuous outcomes after OLS estimation, again, the D-S residuals are sensitive to a misspecified mean, heteroskedasticity, and nonnormality. But, again, these "misspecifications" have different levels of importance. OLS works very well if the mean is correctly specified, and fully robust inference is available provided N is not too small. Why should we reject the OLS estimation if, say, the distribution is not normal? One reason would be that if we need a full distribution to make statements about probabilities, and I wonder if that's the main motivation for using D-S residuals.
3 likes
Comment
Lili Bulfone

Join Date: Apr 2021

Posts: 15
#11

16 May 2026, 07:03

Closing the loop on this thread and wanting to very much thank both George and Percy.

George - thank you for the practical suggestions and for taking the time to identify and fix the bugs in the original code. That was a generous contribution.

Percy - I'm delighted you found the time to revisit the qresid package and submit it to SSC.

For everyone's benefit, it is now available to install directly in Stata with: ssc install qresid

The current release supports randomised quantile residuals for continuous and count GLMs including Gaussian, Bernoulli/Binomial, Poisson, and Negative Binomial models.

Thanks again to everyone who contributed to this thread - the Statalist community in action is a beautiful thing!

Lili
1 like
Comment

Announcement

Randomized quantile residuals (Dunn-Smyth residuals) for GLM diagnostics

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment