Unfamiliar error message while bootstrapping -mlogit-

Mike Lacy

Join Date: Apr 2014

Posts: 2421
#1

Unfamiliar error message while bootstrapping -mlogit-

09 Nov 2023, 15:15

I encountered the following error message while trying to bootstrap the -mlogit- command, and I'd like to know the exact problem in the bootstrap samples that causes it:

"collinearity in replicate sample is not the same as the full sample, posting missing values"
(I have an example to produce it below.)

While I understand that this message reflects some problem in estimating the model, I can't seem to find any documentation about it. I presume it reflects sparsity in the data, but as in the example below, I'm not necessarily getting this message in all bootstrap samples with sparse distributions. If I knew better what was going on, I'd feel more confident about choosing to ignore bootstrap samples that occasioned this message. Any thoughts about figuring out what it means?

Code:

cap prog drop test prog test tab rep78 foreign // inserted to display sparsity mlogit rep78 i.foreign end // sysuse auto bootstrap, noisily seed(74558) reps(5): test
Tags: None

2 likes
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#2

09 Nov 2023, 15:45

I've never personally encountered this before, but I don't think it's necessarily related to sparsity per se. I ran this (adding -baseoutcome(5)- to the -mlogit- command because -bootstrap- would not run it without a specified -baseoutcome()-) for a larger number of reps and I observed that this message occurs exactly in those reps where the bootstrap sample lacks any observations with rep78 == 1. So I think this is what the error message is telling us. This isn't colinearity in the usual sense we think of in regression, but in a more general sense it indicates that one of the matrices involved has insufficient rank.

Since -bootstrap- samples with replacement, it isn't surprising that a sample with some outcome level(s) unrepresented can arise. This would be more likely to happen with sparse data than with a more balanced full data set, but, in principle, it could happen with any full data set. In fact, in theory, with any data set, if you ran a sufficiently large number of reps, the probability that it would eventually happen would approach 1.

And, yes, I think it is safe to ignore the removal of those samples from the final summary calculations. Well, except that I would increase the number of reps I do so that the number of surviving, usable reps is large enough for your purposes.
2 likes
Comment
John Mullahy

Join Date: Dec 2016

Posts: 752
#3

10 Nov 2023, 06:46

I agree that collinearity is perhaps not the best terminology here. (By the way, why two Ls in collinearity?) But on the plus side I'll give credit to the program developers for anticipating that such a problem could arise during a bootstrap exercise.

I turned the problem around to see what happens with other commands. Here's a -probit- example where the "perfect prediction" issue in probit, logit, etc. gives the same warning Mike encountered with -mlogit-

collinearity in replicate sample is not the same as the full sample, posting missing values

Code:

cap preserve cap drop _all cap prog drop test prog test tab rep78 foreign // inserted to display sparsity probit foreign rep5 end // sysuse auto gen rep5=rep78==5 bootstrap, noisily seed(74558) reps(20): test cap restore

So it may be that the "collinearity" warning arises generally when parameters that are estimable in the original sample are not estimable in one or more of the bootstrap samples. For instance, using ordered probit or ordered logit estimation (oprobit, ologit) instead of -mlogit- in Mike's original code gives the same warning since the ordered regression models can't estimate all the original cut parameters if an outcome category originally present does not appear in a bootstrap sample.

Last edited by John Mullahy; 10 Nov 2023, 06:53.
2 likes
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10254
#4

10 Nov 2023, 08:37

Originally posted by John Mullahy View Post

By the way, why two Ls in collinearity?

Some thoughts: https://stats.stackexchange.com/ques...inear-colinear
3 likes
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2421
#5

10 Nov 2023, 08:42

Thanks to both of you. For whatever interest it might have: The situation that occasioned my wanting to bootstrap here was a tabular analysis with a 4-category and relatively unevenly distributed response variable for which I wanted to examine an association with a binary predictor, while controlling for a confounder. While for a binary response there is the so-called combined odds ratio for a categorical confounder, and methods to get an exact test/CI for it, I couldn't think of anything for a 4-category response, so I thought I'd just try using -mlogit- with -bootstrap- or -permute-.
1 like
Comment
John Mullahy

Join Date: Dec 2016

Posts: 752
#6

10 Nov 2023, 08:58

Re Andrew Musau in #4. Thanks for the investigative work, Andrew. You've perhaps accomplished for co*inearity what McCullough did years ago for heteros*edasticity.
https://www.econometricsociety.org/p...erosedasticity Perhaps consider an Econometrica submission?

P.S. The always-wise Arthur Goldberger advised against multico*inearity, preferring instead micronumerosity.
2 likes
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10254
#7

10 Nov 2023, 09:24

Econometrica remains above my pay grade!
1 like
Comment

Announcement

Unfamiliar error message while bootstrapping -mlogit-

Comment

Comment

Comment

Comment

Comment

Comment