Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unfamiliar error message while bootstrapping -mlogit-

    I encountered the following error message while trying to bootstrap the -mlogit- command, and I'd like to know the exact problem in the bootstrap samples that causes it:

    "collinearity in replicate sample is not the same as the full sample, posting missing values"
    (I have an example to produce it below.)

    While I understand that this message reflects some problem in estimating the model, I can't seem to find any documentation about it. I presume it reflects sparsity in the data, but as in the example below, I'm not necessarily getting this message in all bootstrap samples with sparse distributions. If I knew better what was going on, I'd feel more confident about choosing to ignore bootstrap samples that occasioned this message. Any thoughts about figuring out what it means?
    Code:
    cap prog drop test
    prog test
    tab rep78 foreign  // inserted to display sparsity
    mlogit rep78 i.foreign
    end
    //
    sysuse auto
    bootstrap, noisily seed(74558) reps(5): test

  • #2
    I've never personally encountered this before, but I don't think it's necessarily related to sparsity per se. I ran this (adding -baseoutcome(5)- to the -mlogit- command because -bootstrap- would not run it without a specified -baseoutcome()-) for a larger number of reps and I observed that this message occurs exactly in those reps where the bootstrap sample lacks any observations with rep78 == 1. So I think this is what the error message is telling us. This isn't colinearity in the usual sense we think of in regression, but in a more general sense it indicates that one of the matrices involved has insufficient rank.

    Since -bootstrap- samples with replacement, it isn't surprising that a sample with some outcome level(s) unrepresented can arise. This would be more likely to happen with sparse data than with a more balanced full data set, but, in principle, it could happen with any full data set. In fact, in theory, with any data set, if you ran a sufficiently large number of reps, the probability that it would eventually happen would approach 1.

    And, yes, I think it is safe to ignore the removal of those samples from the final summary calculations. Well, except that I would increase the number of reps I do so that the number of surviving, usable reps is large enough for your purposes.

    Comment


    • #3
      I agree that collinearity is perhaps not the best terminology here. (By the way, why two Ls in collinearity?) But on the plus side I'll give credit to the program developers for anticipating that such a problem could arise during a bootstrap exercise.

      I turned the problem around to see what happens with other commands. Here's a -probit- example where the "perfect prediction" issue in probit, logit, etc. gives the same warning Mike encountered with -mlogit-

      collinearity in replicate sample is not the same as the full sample, posting missing values
      Code:
      cap preserve
      cap drop _all
      
      cap prog drop test
      prog test
      tab rep78 foreign  // inserted to display sparsity
      probit foreign rep5
      end
      //
      sysuse auto
      gen rep5=rep78==5
      bootstrap, noisily seed(74558) reps(20): test
      
      cap restore
      So it may be that the "collinearity" warning arises generally when parameters that are estimable in the original sample are not estimable in one or more of the bootstrap samples. For instance, using ordered probit or ordered logit estimation (oprobit, ologit) instead of -mlogit- in Mike's original code gives the same warning since the ordered regression models can't estimate all the original cut parameters if an outcome category originally present does not appear in a bootstrap sample.
      Last edited by John Mullahy; 10 Nov 2023, 06:53.

      Comment


      • #4
        Originally posted by John Mullahy View Post
        By the way, why two Ls in collinearity?
        Some thoughts: https://stats.stackexchange.com/ques...inear-colinear

        Comment


        • #5
          Thanks to both of you. For whatever interest it might have: The situation that occasioned my wanting to bootstrap here was a tabular analysis with a 4-category and relatively unevenly distributed response variable for which I wanted to examine an association with a binary predictor, while controlling for a confounder. While for a binary response there is the so-called combined odds ratio for a categorical confounder, and methods to get an exact test/CI for it, I couldn't think of anything for a 4-category response, so I thought I'd just try using -mlogit- with -bootstrap- or -permute-.

          Comment


          • #6
            Re Andrew Musau in #4. Thanks for the investigative work, Andrew. You've perhaps accomplished for co*inearity what McCullough did years ago for heteros*edasticity.
            https://www.econometricsociety.org/p...erosedasticity Perhaps consider an Econometrica submission?

            P.S. The always-wise Arthur Goldberger advised against multico*inearity, preferring instead micronumerosity.

            Comment


            • #7
              Econometrica remains above my pay grade!

              Comment

              Working...
              X