Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with r(2000) error message for logistic regression

    Hi,
    I've been running a logistic regression with 5 explanatory categorical variables, trying to look at which factors predict confidence in police (cjspolb2cat). Today I tried to add in some continuous variables, but STATA said the message 'no observations, r(2000)'. Can anyone please help me understand what this means?

    here is my code (quallife is the new continuous variable I have added in, but I have also tried a few others which produced the same error message):

    logistic cjspolb2cat quallife ib1.persinc4cat ib2.age3cat ib1.sex ib1.ethgrp2a, allbaselevels
    margins, dydx(*) allbaselevels

    I'm a real beginner to STATA and really appreciate any help anyone can offer!

    Thank you,
    Alice

  • #2
    no observations to do that with

    meaning problems of

    missing values

    AND/OR

    string variables where numeric variables are needed

    Try

    Code:
    summarize cjspolb2cat quallife ersinc4cat age3cat ex ethgrp2a

    Comment


    • #3
      Ah ok , thank you!
      Sorry what does the 'ex' mean in the code?

      Comment


      • #4
        Code:
        sex

        NB Please don't cite this post out of context.

        Comment


        • #5
          Oh god sorry, I thought it was meant to exclude the variable after it or something!

          Ok great I've done that, but I'm not sure how to interpret my results now - I can see that 'quallife' has far fewer observations, so does that mean I should just avoid using this variable in this model?
          Also I'm not sure why but it has put ethgrp2a on a separate line to the others

          Variable | Obs Mean Std. Dev. Min Max
          -------------+---------------------------------------------------------
          cjspolb2cat | 17,727 .3153382 .4646635 0 1
          quallife | 3,892 3.062179 2.191016 1 10
          persinc4cat | 29,932 1.858446 .9239072 1 4
          age3cat | 35,253 2.212833 .5707147 1 3
          sex | 35,371 1.542676 .4981825 1 2
          -------------+---------------------------------------------------------
          ethgrp2a | 35,338 1.230517 .7559222 1 5



          Thanks again

          Comment


          • #6
            summarize by default draws separation lines every 5 variables. This is documented in the help.

            separator(#) draw separator line after every # variables; default is separator(5)
            It's nothing to worry about.

            The summarize results don't resolve the issue yet. All the variables are numeric (good) but there are missing values (not so good). The issue may be (should be) that there are no observations in which all of the variables are non-missing.

            So, count non-missings across observations


            Code:
            egen nOK = rownonmiss(cjspolb2cat quallife ersinc4cat age3cat sex ethgrp2a) 
            
            tab nOK
            You need this to be 6 some of the time.

            Comment


            • #7
              This is the result, but I don't understand what each row represents


              nOK | Freq. Percent Cum.
              ------------+-----------------------------------
              2 | 27 0.08 0.08
              3 | 2,206 6.24 6.31
              4 | 14,849 41.98 48.29
              5 | 18,289 51.71 100.00
              ------------+-----------------------------------
              Total | 35,371 100.00

              Comment


              • #8
                Do look at

                Code:
                help egen 
                for more on the result.

                It is the number of variables that are not missing in each observation, Your model fit fails whenever it is not 6, which is always. However, the fact that there are many observations with 5 variables non-missing doesn't mean it is always the same 5 variables.

                Perhaps there is some structural reason why values are missing so much.

                To make progress, something like this should help.


                Code:
                gen pattern = "" 
                
                foreach v in cjspolb2cat quallife ersinc4cat age3cat sex ethgrp2a { 
                       replace pattern = pattern + string(!missing(`v'))  
                } 
                
                tab pattern
                So your ideal is 111111 (not missing on all variables). It doesn't exist in the dataset.

                There are 64 possible patterns, although from #7 neither 111111 nor 000000 occurs, and no pattern with one 1 occurs Your essential is a pattern that starts with 1 -- not missing on the outcome or response variable. So

                Code:
                tab pattern if substr(pattern, 1, 1) == "1"
                focuses on better patterns.

                Your choice is of a pattern that is common but includes all the really important predictors. (If they are really important, the bad news was already implicit in your first post.)

                Comment


                • #9
                  Hi,
                  Thank you for this. I tried putting your code in but I don't think I understand what you mean/it does, as STATA just said that 'pattern' isn't a variable. I've tried reading up on it but I just don't understand!
                  Sadly perhaps I should just take this variable out

                  Comment


                  • #10
                    Did you start with


                    Code:
                    gen pattern = "" 
                    You can't replace what does not exist.

                    Comment


                    • #11
                      I did, should the following line be in the speech marks? Or is the code exactly as you wrote it?

                      Comment


                      • #12
                        Hmm, yes: the code should be exactly as I wrote it unless you can see that it is wrong and know how to correct it.

                        The point is to generate a variable that starts out empty. Only then can you replace it, which makes it useful.

                        Comment


                        • #13
                          Sorry I really don't understand! I think I'm going to just abandon using that variable. Really appreciate your help though, thank you!

                          Comment


                          • #14
                            Proof of concept


                            Code:
                            . webuse nlswork, clear 
                            
                            . gen pattern = ""
                            (28,534 missing values generated)
                            
                            . foreach v in union wks_ue wks_work tenure ind_code occ_code {
                              2. replace pattern = pattern + string(!missing(`v'))
                              3. }
                            (28,534 real changes made)
                            variable pattern was str1 now str2
                            (28,534 real changes made)
                            variable pattern was str2 now str3
                            (28,534 real changes made)
                            variable pattern was str3 now str4
                            (28,534 real changes made)
                            variable pattern was str4 now str5
                            (28,534 real changes made)
                            variable pattern was str5 now str6
                            (28,534 real changes made)
                            
                            . tab pattern
                            
                                pattern |      Freq.     Percent        Cum.
                            ------------+-----------------------------------
                                 000011 |          2        0.01        0.01
                                 000100 |          1        0.00        0.01
                                 000101 |         21        0.07        0.08
                                 000110 |          1        0.00        0.09
                                 000111 |        231        0.81        0.90
                                 001011 |          2        0.01        0.90
                                 001101 |          1        0.00        0.91
                                 001111 |         62        0.22        1.12
                                 010011 |          1        0.00        1.13
                                 010101 |          5        0.02        1.15
                                 010110 |          1        0.00        1.15
                                 010111 |         60        0.21        1.36
                                 011001 |          3        0.01        1.37
                                 011010 |          1        0.00        1.37
                                 011011 |        196        0.69        2.06
                                 011100 |         14        0.05        2.11
                                 011101 |        210        0.74        2.85
                                 011110 |         30        0.11        2.95
                                 011111 |      8,454       29.63       32.58
                                 100011 |          5        0.02       32.60
                                 100101 |          1        0.00       32.60
                                 100110 |          2        0.01       32.61
                                 100111 |        286        1.00       33.61
                                 101011 |         39        0.14       33.75
                                 101100 |          3        0.01       33.76
                                 101101 |         18        0.06       33.82
                                 101110 |         18        0.06       33.88
                                 101111 |      5,011       17.56       51.44
                                 110001 |          1        0.00       51.45
                                 110011 |          1        0.00       51.45
                                 110101 |          1        0.00       51.45
                                 110110 |          1        0.00       51.46
                                 110111 |         82        0.29       51.75
                                 111011 |        182        0.64       52.38
                                 111100 |         15        0.05       52.44
                                 111101 |         47        0.16       52.60
                                 111110 |         34        0.12       52.72
                                 111111 |     13,491       47.28      100.00
                            ------------+-----------------------------------
                                  Total |     28,534      100.00
                            
                            . count
                              28,534
                            So if you wanted to use all those 6 variables in a model, you can do it -- with 13491 observations out of 28534.

                            Code you can copy and paste into a do-file editor window:


                            Code:
                            webuse nlswork, clear
                            
                            gen pattern = ""
                            foreach v in union wks_ue wks_work tenure ind_code occ_code {
                            replace pattern = pattern + string(!missing(`v'))
                            }
                            tab pattern
                            count

                            Comment

                            Working...
                            X