Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Screening variables for excess missing data

    I have a large data set in which various missing codes have been established from -8 to -2. Examining variables in the browser shows that very few have actual responses. Tallying them with tabulate gives me frequencies such as 37 good responses in 4900 cases. The missing values mostly indicate that the question was not asked. Can I write a routine that would allow me to cycle through all variables compute the mean of each and discard those where the mean is perhaps less than -7.5? I know how to set up the "for each" procedure, but am not sure how to feed the mean (perhaps from the command "mean") to the loop.

    for each var in varlist v1-v3000
    drop if [MEAN OF] `var' <-7.5

    I do not know the syntax well enough to write this test successfully.

    Thanks!

  • #2
    I'm not sure I completely understand, and there are several typo's in what you offer, but I think this is what you want:
    Code:
    foreach var of varlist v1=v3000 {
       qui sum `var', meanonly
       drop `var' if r(mean)<-7.5
    }
    Since there was no data sample, the code is untested - please read the Forum FAQ

    Comment


    • #3
      Thanks Rich! The code sounds right, but....
      Here is my test sample:
      foreach var of varlist plb0591-plb0593 {
      qui sum `var', meanonly
      drop `var' if r(mean)<(-7.6)
      }

      I run this on three variables. The first 2 have 2000 records of -8 and 2300 blank fields. The third has 4300 records,of which 3969 are -8 and 142 either +1 or+2.

      This is the error statement:


      . foreach var of varlist plb0591-plb0593 {
      2. qui sum `var', meanonly
      3. drop `var' if r(mean)<(-7.6)
      4. }
      invalid syntax
      r(198);

      If I comment out the "drop" line, the error disappears, but, of course, I do not drop these three variables.

      Reversing the ordering of the "if" statement gives me this:
      foreach var in plb0591-plb0593 {
      sum `var', meanonly
      if r(mean)<(-7.6) {
      drop `var'
      }
      }
      No error is generated, but the drop statement is also not executed.. So, I am a bit stuck. TIA
      Larry Nitz

      Comment


      • #4
        The response to TIA is RTFH! namely (please) read the fine help. That's just OTOH, BTW, FYI and so forth.

        The help for drop explains. You can drop variables, or you can drop observations by specifying if or in. You can't combine the two. Rich Goldstein slipped there, as we all do occasionally.

        Your last syntax is legal but not what you want. It boils down to a loop that implies

        Code:
        su plb05951-plb0593, meanonly 
        after which the r(mean) accessible refers to the last variable named, i.e. plb0593, and its mean evidently does not qualify.

        Elements of your earlier syntax help out.

        Code:
        foreach var of varlist plb0591-plb0593 {
            sum `var', meanonly
            if r(mean)<(-7.6)  drop `var'
        }
        The difference between foreach ... of .. and foreach ... in ... is subtle, but bites hard if you get it the wrong way round. It is documented.

        Like Rich, I can't test this, and I am not confident that your criterion using the mean is ideal, but I hope this helps.

        Comment


        • #5
          There is a typo in my reference to your last syntax, which I hope is not distracting. I am confident about the explanation of principle.

          Comment


          • #6
            Thank you Rich and Nick! That last procedure works. Sorry I am so dense, I am a bit of a convert from SAS and JMP. Larry Nitz

            Comment

            Working...
            X