Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping multiple missing observations

    Currently I'm working on a project in which I use Item response theory. I have 8 variables from a lot of cases, which I would like to find the latent variable for. Now, the problem is that there is quite some missing data. I already worked out how to impute some of the values, but now I want to make another model, in which I drop all the observations which have missing values for more than 4 of the 8 variables, because imputing these can be seen as inreliable. I've looked around in multiple manuals, but can't seem to find what I'm looking for.

  • #2
    Code:
    egen count= rowmiss(var1 var2 var3... var8)
    drop if count>4
    See

    Code:
    help egen

    Comment


    • #3
      Code:
      help egen
      to learn about the rowmiss() and rownonmiss() functions. It doesn't matter which you use. Similarly drop or keep according to taste or whim.

      Comment


      • #4
        Thank you very much!

        Comment


        • #5
          Hi,

          Why doesn't this work as an alternative to egen rowmiss?

          local vlist v1 v2 v3 v4

          //Generating variable to capture the number of variables missing for
          //a given observation
          gen vars_missing =.
          gen tot_vars =.
          local i =1
          local j =1

          foreach ob of local vlist {
          //Counting the total no of variables
          replace tot_vars =`i'

          //Counting the total no of variables missing for a given observation
          replace vars_missing =`j' if `ob' ==""
          local i =`i'+1
          if `ob' ==""{
          local j =`j'+1
          }

          }

          drop if var_missing ==tot_vars

          Thanks!

          Comment


          • #6
            To get good answers to questions on code that isn't working, you should

            1. Provide an example dataset. (FAQ Advice #12). I can debug this without a data example, but in all problems one surely does help.

            2. Display code nicely using CODE delimiters (FAQ Advice #12). That makes it easier to read.

            3. Explain what is not working, not report that it doesn't work, explicitly advised against (FAQ Advice #12).

            Notice the repetition here, the reference to Advice #12 https://www.statalist.org/forums/help#stata

            Here I copy your code and first simplify the code for number of variables, which is just a constant you can get directly. Then I skip to a cleaned up version and comment on problems in your code.

            Code:
            * version 1
            
            local vlist v1 v2 v3 v4
            
            //Generating variable to capture the number of variables missing for
            //a given observation
            gen vars_missing =.
            gen tot_vars =.
            local i =1
            local j =1
            
            foreach ob of local vlist {
            //Counting the total no of variables
            replace tot_vars =`i'
            
            //Counting the total no of variables missing for a given observation
            replace vars_missing =`j' if `ob' ==""
            local i =`i'+1
            if `ob' ==""{
            local j =`j'+1
            }
            
            }
            
            drop if var_missing ==tot_vars
            
            * version 2: the total number of variables is a constant,  which we can calculate directly
            
            local vlist v1 v2 v3 v4
            
            //Generating variable to capture the number of variables missing for
            //a given observation
            gen vars_missing =.
            local j =1
            
            foreach ob of local vlist {
            
            //Counting the total no of variables missing for a given observation
            replace vars_missing =`j' if `ob' ==""
            if `ob' ==""{
            local j =`j'+1
            }
            
            }
            
            drop if var_missing == `: word count `vlist''
            
            * version 3 get the logic right
            
            local vlist v1 v2 v3 v4
            
            gen vars_missing = 0
            
            foreach ob of local vlist {
                replace vars_missing = vars_missing + missing(`ob')
            }
            
            drop if vars_missing == `: word count `vlist''
            Limitation. Your code tests variables against string missing "" so it is limited to string variables. However,

            Error. Your code tests variables against string missing "" so it will fail if tested against any numeric variable.

            Error. Your loop sets up a local macro .j which seems intended to run from 1 to the number of variables, which you then use in the result. But that confuses the question of whether (e.g.) the third variable looked at is missing and whether (e.g.) three variables have missing values.

            Error. You advance the local macro
            j using this test

            Code:
            if `ob' ==""
            but that is wrong in two different ways. If you need a counter for which variable you are looking at (you don't, but put that on one side), you should be advancing it unconditionally Also, and more subtle, the command above looks only in the first observation. For more on that, see https://www.stata.com/support/faqs/p...-if-qualifier/

            Error. You define vars_missing but then refer to var_missing -- which is a typo.

            I have not tested the last version of the code above, but I think it's more nearly right than the code you posted.


            Last edited by Nick Cox; 30 Jun 2020, 02:45.

            Comment

            Working...
            X