Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop Variables with all Missing / Negative Values

    Dear Colleagues,

    I am analyzing a large panel survey data of family members with over 5000 variables. Since my analysis include only children and youth, I dropped the records for adults. However, many variables that are asked only of adults and not relevant to my analysis of children and youth are still in the data. Most of these variables have invalid values (unknown, not applicable) for youth/children. I would like to find out and drop these variables; however, since there are too many of them, it is non practical for me to identify and drop them using the "drop var list" command.

    I know there are commands like "dropmiss" that can be used to drop variables with all missing values. But the variables I would like to drop have negative values such as -1 -8 to indicate unknown or not applicable values instead of missing ".". Besides, I don't want to recode the negative values of -1,-8 to missing for all variables (they can affect the variables for children and youth).

    Would you please suggest any command/codes I can use to identify and drop these variables with all negative and/or missing values?

    Thanks in advance.

    Lijun





  • #2
    Code:
    foreach v of varlist _all {
     egen to_drop = min(`v' < 0 | missing(`v'))
        drop `v' if to_drop[1]
        drop to_drop
    }

    Comment


    • #3
      Perhaps you can simply keep the ones that you need for your analysis?

      Or
      Code:
      use var1 var2 var3 using "bigfile.dta"
      ...

      Comment


      • #4
        Try this one

        Code:
        . foreach var of varlist * {
          2. summ `var', meanonly
          3. if missing(r(mean)) | r(min)<=-1 drop `var'
          4. }

        Comment


        • #5
          This was nothing short of a spontaneous explosion of (good? not so good?) ideas :-).

          Hi Clyde Schechter , ​​​​​​​good to see you again !

          Comment


          • #6
            I do not think your code is going to work, Clyde. I tried something like this first, and it did not work:

            Code:
            . sysuse auto, clear
            (1978 Automobile Data)
            
            . gen todrop = 1
            
            . drop price if todrop
            invalid syntax
            r(198);
            so drop works in either of two ways, but not in both ways

            Code:
            drop varname
            Code:
            drop if expression
            But no
            Code:
            drop var if expression
            [

            Comment


            • #7
              You are quite correct. My error. Instead of -drop `v' if to_drop[[1]-, I should have written

              Code:
              if to_drop[1] {
                  drop `v'
                  drop to_drop
              }
              I think I like your approach in #4 better, though, anyway. But I think you mean `r(max)' <= -1, not `r(min)'.
              Last edited by Clyde Schechter; 24 Jul 2020, 18:21.

              Comment


              • #8
                Code:
                if to_drop[1] {
                    drop `v'
                    drop to_drop
                }
                in that case drop to_drop should be outside the if-branch?
                Otherwise egen will bump into an existing variable at the next iteration, right?

                Comment


                • #9
                  Right!

                  Comment


                  • #10
                    Yes indeed, I meant -max-. (In this case it is not going to make a difference, as far as I understood the data, the bad variables that are to be dropped are either missing or constant at -1 or -8. Therefore both the -mean- and the -max- below can be replaced by any of -min-,-mean- or -max-.)

                    Code:
                    . foreach var of varlist * {
                      2. summ `var', meanonly
                      3. if missing(r(mean)) | r(max)<=-1 drop `var'
                      4. }


                    Originally posted by Clyde Schechter View Post
                    You are quite correct. My error. Instead of -drop `v' if to_drop[[1]-, I should have written

                    Code:
                    if to_drop[1] {
                    drop `v'
                    drop to_drop
                    }
                    I think I like your approach in #4 better, though, anyway. But I think you mean `r(max)' <= -1, not `r(min)'.

                    Comment


                    • #11
                      As yet another variation findname (Stata Journal) can be used first:


                      Code:
                      findname, all(@ < 0 | missing(@)) 
                      drop `r(varlist)'

                      Comment


                      • #12
                        Dear Colleagues,
                        Thanks a lot for your good suggestions. The findname command served my purpose well. Now the number of variables have been greatly reduced.

                        Comment

                        Working...
                        X