Help with r(2000) error message for logistic regression

Alice Richardson

Join Date: Nov 2020

Posts: 13
#1

Help with r(2000) error message for logistic regression

10 Dec 2020, 10:56

Hi,
I've been running a logistic regression with 5 explanatory categorical variables, trying to look at which factors predict confidence in police (cjspolb2cat). Today I tried to add in some continuous variables, but STATA said the message 'no observations, r(2000)'. Can anyone please help me understand what this means?

here is my code (quallife is the new continuous variable I have added in, but I have also tried a few others which produced the same error message):

logistic cjspolb2cat quallife ib1.persinc4cat ib2.age3cat ib1.sex ib1.ethgrp2a, allbaselevels
margins, dydx(*) allbaselevels

I'm a real beginner to STATA and really appreciate any help anyone can offer!

Thank you,
Alice
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35692
#2

10 Dec 2020, 11:03

no observations to do that with

meaning problems of

missing values

AND/OR

string variables where numeric variables are needed

Try

Code:

summarize cjspolb2cat quallife ersinc4cat age3cat ex ethgrp2a
Comment
Alice Richardson

Join Date: Nov 2020

Posts: 13
#3

10 Dec 2020, 11:19

Ah ok , thank you!
Sorry what does the 'ex' mean in the code?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35692
#4

10 Dec 2020, 11:25

Code:

sex

NB Please don't cite this post out of context.
2 likes
Comment
Alice Richardson

Join Date: Nov 2020

Posts: 13
#5

10 Dec 2020, 11:42

Oh god sorry, I thought it was meant to exclude the variable after it or something!

Ok great I've done that, but I'm not sure how to interpret my results now - I can see that 'quallife' has far fewer observations, so does that mean I should just avoid using this variable in this model?
Also I'm not sure why but it has put ethgrp2a on a separate line to the others

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
cjspolb2cat | 17,727 .3153382 .4646635 0 1
quallife | 3,892 3.062179 2.191016 1 10
persinc4cat | 29,932 1.858446 .9239072 1 4
age3cat | 35,253 2.212833 .5707147 1 3
sex | 35,371 1.542676 .4981825 1 2
-------------+---------------------------------------------------------
ethgrp2a | 35,338 1.230517 .7559222 1 5

Thanks again
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35692
#6

10 Dec 2020, 11:49

summarize by default draws separation lines every 5 variables. This is documented in the help.

separator(#) draw separator line after every # variables; default is separator(5)

It's nothing to worry about.

The summarize results don't resolve the issue yet. All the variables are numeric (good) but there are missing values (not so good). The issue may be (should be) that there are no observations in which all of the variables are non-missing.

So, count non-missings across observations

Code:

egen nOK = rownonmiss(cjspolb2cat quallife ersinc4cat age3cat sex ethgrp2a) tab nOK

You need this to be 6 some of the time.
Comment
Alice Richardson

Join Date: Nov 2020

Posts: 13
#7

10 Dec 2020, 11:56

This is the result, but I don't understand what each row represents

nOK | Freq. Percent Cum.
------------+-----------------------------------
2 | 27 0.08 0.08
3 | 2,206 6.24 6.31
4 | 14,849 41.98 48.29
5 | 18,289 51.71 100.00
------------+-----------------------------------
Total | 35,371 100.00
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35692
#8

11 Dec 2020, 02:07

Do look at

Code:

help egen

for more on the result.

It is the number of variables that are not missing in each observation, Your model fit fails whenever it is not 6, which is always. However, the fact that there are many observations with 5 variables non-missing doesn't mean it is always the same 5 variables.

Perhaps there is some structural reason why values are missing so much.

To make progress, something like this should help.

Code:

gen pattern = "" foreach v in cjspolb2cat quallife ersinc4cat age3cat sex ethgrp2a { replace pattern = pattern + string(!missing(`v')) } tab pattern

So your ideal is 111111 (not missing on all variables). It doesn't exist in the dataset.

There are 64 possible patterns, although from #7 neither 111111 nor 000000 occurs, and no pattern with one 1 occurs Your essential is a pattern that starts with 1 -- not missing on the outcome or response variable. So

Code:

tab pattern if substr(pattern, 1, 1) == "1"

focuses on better patterns.

Your choice is of a pattern that is common but includes all the really important predictors. (If they are really important, the bad news was already implicit in your first post.)
Comment
Alice Richardson

Join Date: Nov 2020

Posts: 13
#9

11 Dec 2020, 02:34

Hi,
Thank you for this. I tried putting your code in but I don't think I understand what you mean/it does, as STATA just said that 'pattern' isn't a variable. I've tried reading up on it but I just don't understand!
Sadly perhaps I should just take this variable out
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35692
#10

11 Dec 2020, 02:38

Did you start with

Code:

gen pattern = ""

You can't replace what does not exist.
Comment
Alice Richardson

Join Date: Nov 2020

Posts: 13
#11

11 Dec 2020, 02:54

I did, should the following line be in the speech marks? Or is the code exactly as you wrote it?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35692
#12

11 Dec 2020, 03:08

Hmm, yes: the code should be exactly as I wrote it unless you can see that it is wrong and know how to correct it.

The point is to generate a variable that starts out empty. Only then can you replace it, which makes it useful.
Comment
Alice Richardson

Join Date: Nov 2020

Posts: 13
#13

11 Dec 2020, 08:11

Sorry I really don't understand! I think I'm going to just abandon using that variable. Really appreciate your help though, thank you!
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35692

#14

11 Dec 2020, 09:57

Proof of concept

Code:

. webuse nlswork, clear 

. gen pattern = ""
(28,534 missing values generated)

. foreach v in union wks_ue wks_work tenure ind_code occ_code {
  2. replace pattern = pattern + string(!missing(`v'))
  3. }
(28,534 real changes made)
variable pattern was str1 now str2
(28,534 real changes made)
variable pattern was str2 now str3
(28,534 real changes made)
variable pattern was str3 now str4
(28,534 real changes made)
variable pattern was str4 now str5
(28,534 real changes made)
variable pattern was str5 now str6
(28,534 real changes made)

. tab pattern

    pattern |      Freq.     Percent        Cum.
------------+-----------------------------------
     000011 |          2        0.01        0.01
     000100 |          1        0.00        0.01
     000101 |         21        0.07        0.08
     000110 |          1        0.00        0.09
     000111 |        231        0.81        0.90
     001011 |          2        0.01        0.90
     001101 |          1        0.00        0.91
     001111 |         62        0.22        1.12
     010011 |          1        0.00        1.13
     010101 |          5        0.02        1.15
     010110 |          1        0.00        1.15
     010111 |         60        0.21        1.36
     011001 |          3        0.01        1.37
     011010 |          1        0.00        1.37
     011011 |        196        0.69        2.06
     011100 |         14        0.05        2.11
     011101 |        210        0.74        2.85
     011110 |         30        0.11        2.95
     011111 |      8,454       29.63       32.58
     100011 |          5        0.02       32.60
     100101 |          1        0.00       32.60
     100110 |          2        0.01       32.61
     100111 |        286        1.00       33.61
     101011 |         39        0.14       33.75
     101100 |          3        0.01       33.76
     101101 |         18        0.06       33.82
     101110 |         18        0.06       33.88
     101111 |      5,011       17.56       51.44
     110001 |          1        0.00       51.45
     110011 |          1        0.00       51.45
     110101 |          1        0.00       51.45
     110110 |          1        0.00       51.46
     110111 |         82        0.29       51.75
     111011 |        182        0.64       52.38
     111100 |         15        0.05       52.44
     111101 |         47        0.16       52.60
     111110 |         34        0.12       52.72
     111111 |     13,491       47.28      100.00
------------+-----------------------------------
      Total |     28,534      100.00

. count
  28,534

So if you wanted to use all those 6 variables in a model, you can do it -- with 13491 observations out of 28534.

Code you can copy and paste into a do-file editor window:

Code:

webuse nlswork, clear

gen pattern = ""
foreach v in union wks_ue wks_work tenure ind_code occ_code {
replace pattern = pattern + string(!missing(`v'))
}
tab pattern
count

Announcement

Help with r(2000) error message for logistic regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment