r(2000) error: Attempting to build multivariate logistic regression

Patrick Albright

Join Date: Aug 2018

Posts: 28
#1

r(2000) error: Attempting to build multivariate logistic regression

29 Aug 2018, 10:50

I am trying to build a a mutivariate regression based on variables identified as significant in bivariate analysis. I have already assessed for collinearity and dropped any collinear variables.
Most of the variables are binary, categorical vars. Distance and lat_transdegree are both continuous variables. There is some missing data within each variable (that I cannot do anything with, this is simply the nature of the survey input and dataset).

My code:
xi: logistic clin1yr education1 support6 sitting side lat_transdegree nocoronangulation winquist_s op_type interlock_dist cont_up work_status distance primary_reop //Attempt 1
logit clin1yr education1 support6 sitting side lat_transdegree nocoronangulation winquist_s op_type interlock_dist cont_up work_status distance primary_reop //Attempt 2

Alternatively:
This coding gives me an r(101) o.operator type error

local p_cov education1 support6 sitting side lat_transdegree nocoronangulation winquist_s op_type interlock_dist cont_up work_status distance primary_reop
xi: logit clin1yr `p_cov', asis vce(robust)
swaic, m b

Not really sure what to do. Please advise.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30113
#2

29 Aug 2018, 11:36

Your post is quite confusing. The title refers to an r(2000) error, but in the body of the post you say you are getting r(101). These are completely different things.

Please show the exact code that produces your error and the exact error message(s). Do this by copying directly from your Results window or log file and pasting into the Forum editor without any editing whatsoever. Then wrap that between code delimiters for maximum readability. (If you are not familiar with code delimiters, please read Forum FAQ #12 for instructions.)
Comment

Patrick Albright

Join Date: Aug 2018
Posts: 28

29 Aug 2018, 11:48

Thanks for the reply and the direction. I am still new to stata and this forum.

Code:

. /***
> ##Patrick's multivariate and stepwise regression models
> ***/
. recode op_type (2=0)
(op_type: 106 changes made)

. label define optype 1 "Intramedullary nail" 0 "External Fixation"

. label values op_type optype

. recode interlock_dist (2 = 0)
(interlock_dist: 91 changes made)

. label define id 1 "1 nail" 0 "2 nail"

. label values interlock_dist id

. xi: logistic clin1yr  education1 support6 sitting side lat_transdegree nocoronangulation winquist_s op_type interlock_dist cont_up work_status distance primary_reop
outcome does not vary; remember:
                                  0 = negative outcome,
        all other nonmissing values = positive outcome
r(2000);

local p_cov education1 support6 sitting side lat_transdegree nocoronangulation winquist_s op_type interlock_dist cont_up work_status distance primary_reop

. xi: logit clin1yr `p_cov', asis vce(robust)

note: education1 omitted because of collinearity
note: support6 omitted because of collinearity
note: op_type omitted because of collinearity
note: cont_up omitted because of collinearity
Iteration 0:   log pseudolikelihood = -9.0109133  
Iteration 1:   log pseudolikelihood =          0  
Iteration 2:   log pseudolikelihood =          0  

Logistic regression                             Number of obs     =         13
                                                Wald chi2(0)      =          .
Log pseudolikelihood =          0               Prob > chi2       =          .

-----------------------------------------------------------------------------------
                  |               Robust
          clin1yr |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
       education1 |          0  (omitted)
         support6 |          0  (omitted)
          sitting |   3.00e-14          .        .       .            .           .
             side |  -3.75e-14          .        .       .            .           .
  lat_transdegree |   7.49e-15          .        .       .            .           .
nocoronangulation |   7.49e-14          .        .       .            .           .
       winquist_s |   3.00e-14          .        .       .            .           .
          op_type |          0  (omitted)
   interlock_dist |  -1.20e-13          .        .       .            .           .
          cont_up |          0  (omitted)
      work_status |   2.40e-13          .        .       .            .           .
         distance |   3.75e-15          .        .       .            .           .
     primary_reop |   2.40e-13          .        .       .            .           .
            _cons |      33.75          .        .       .            .           .
-----------------------------------------------------------------------------------
Note: 0 failures and 13 successes completely determined.

. swaic, m b
Stepwise Model Selection by AIC
logit regression.
number of obs = 13
------------------------------------------------------------------------------
clin1yr             |  Df     Chi2     P>Chi2  -2*ll    Df Res.  AIC
--------------------+---------------------------------------------------------
o. operator not allowed
r(101);

Comment

Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#4

29 Aug 2018, 11:54

you clearly have several problems here but the "worst" appears to be a sizable amount of missing data; see

Code:

help misstable

for some ways to investigate that and see if there is any possibility of at least estimating a model with fewer parameters

in addition, an important question is why there is so much missing data as it will be easier to deal with if you have some idea of what is causing the problem

on a much smaller point - there is generally not reason to use "xi"; see

Code:

help fvvarlist
Comment
Patrick Albright

Join Date: Aug 2018

Posts: 28
#5

29 Aug 2018, 12:00

Thanks for that. Unfortunately, there is not much that I can do about the missing data. The data collection instrument was an extensive series of questions across multiple surveys into which data was previously input. There is no opportunity to rectify the missing data.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30113
#6

29 Aug 2018, 12:18

With your first logistic command, Stata told you exactly what is wrong:

Code:

outcome does not vary; remember: 0 = negative outcome, all other nonmissing values = positive outcome

Rather than heed Stata's cogent warning, you decided to sweep the problem under the rug using the -asis- option. I won't say that the particular subsequent error you got from -swaic- is your just punishment for that, but it's as good away as any of stopping you from continuing to work with the meaningless output you got. (You are bold, I will say: the help file and documentation do point out that the user of the -asis- option is an invitation for trouble.)

So let's deal with that "outcome does not vary" problem. First, your outcome variable should be coded as 0 = negative, and any other non-mising value = positive. If you have it coded as 1/2 (no/yes) or something like that, this will cause Stata to interpret your result as not varying. The best practice is typically to code dichotomous variables as 0 = no, 1 = yes. If that's not what you have, recode the variable and try again.

Assuming it already is coded as 0/1, the other possibility is that your outcome really is a constant. If the outcome doesn't vary, then nothing can predict it: all the regression coefficients will necessarily be 0 (to within very small rounding errors). Now, it may be that if you run -tab clinyr-, you will see both 0's and 1's. Here we have to remember that when running any estimation command, Stata looks at only those observations that have non-missing values for all of the variables mentioned in the command. So after the regression runs, you need to run -tab clinyr if e(sample)- and you will see that for these observations the outcome variable is, in fact a constant. Most likely this represents some kind of error in your data. It may also be the result of tryiing to model a rare outcome in a data set that is too small to have any. I notice that in your -asis- version, the sample size is only 13. So if the probability of clinyr is around 1 in 13 or less, it's not surprising that it will always be 0 (or always 1, whichever is the less rare version) in that sample.

Your -asis- version actually confirms all of this reasoning. Look at those coefficients. They are all within rounding error of zero except the constant. The constant term is 33.75, which is an absurdly large constant term for any realistic situation, and its implication is that clinyr is always non-zero.

So the first thing you have to do is fix all of that up: either you are trying to model an outcome that never actually varies--which is both pointless and impossible, or you are trying to model an outcome that varies only rarely and using a data set far too small to observe that variation. Or you are trying to model this outcome but your data are incorrect.

A couple of other points, assuming you fix up the major problem above: -xi- does nothing in the commands you have used it with. It is also an almost-obsolete command. It has been superseded, for the most part, by factor-variable notation. (-help fvvarlist-) Depending on your variables, you may or may not need that here. The situations where -xi- is really needed in modern Stata are fairly few and far between, so you should probably try to nearly forget that you ever knew it.

I can't comment on the -swaic- command's error message. It is not part of official Stata, and I don't know anything about it. My guess is that it's an older command that was written before Stata had factor variable notation and that it is unable of coping with things like the o. operator that come with that.

Finally, many will agree that pre-screening variables for bivariate significance as a way of choosing variables to include in a multi-variable model is a bad idea. In addition, nearly all the experienced analysts who participate in this forum agree that stepwise variable selection is a particularly egregious form of statistical malpractice. While some might find doing with information criteria instead of p-values slightly less problematic, it's still very far from an ideal approach. This is not the place to have a tutorial about best practices for building models: that is a very lengthy topic and several excellent textbooks about it are available.

Added: Crossed with #4 and #5.
1 like
Comment
Patrick Albright

Join Date: Aug 2018

Posts: 28
#7

29 Aug 2018, 12:46

This is excellent feedback. Thank you very much for the thorough reply. I think of the 3 possible issues that you outlined that my issue is likely that the outcome is fine, but the data is incorrect. I will investigate this further.

Thank you for the feedback on the 'xi' and fvvarlist commands as well as on the (in)appropriate use of stepwise regression.

This has been very helpful for my general stata knowledge.
Comment

Announcement

r(2000) error: Attempting to build multivariate logistic regression

Comment

Comment

Comment

Comment

Comment

Comment