Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Droping observations based on multiple conditions

    Hi,

    I am trying to correct for small cell size in my data set while adding some exceptions. More specifically, I have small cells in a variable I am interested in and therefore I need to drop those cells that have less than 20 observations in them. I do this using the below code:
    Code:
    gen one=1
    egen cellsize=count(one), by (variableofinterest)
    drop if cellsize<20
    However, I want to keep a selection of cells that have less than 20 observations so I want to tell Stata to drop those with less than 20 observations except the ones I specify. I tried to do this using the below code but it does not do what I want as it drops everything in my dataset

    Code:
    drop if cellsize<20 &variableofinterest!=5 |variableofinterest!=35|variableofinterest!=84
    Any suggestions on how I can make this work?

    thanks in advance

  • #2
    You might want to control the evaluation of your Boolean expressions by using parentheses. Try something like the following and see whether it does what you want to do.
    Code:
    drop if cellsize < 20 & (variableofinterest != 5 | variableofinterest != 35 | variableofinterest != 84)
    or
    Code:
    drop if cellsize < 20 & !inlist(variableofinterest, 5, 35, 84)

    Comment


    • #3
      The second code works perfectly for what I wanted to do.

      Thank you very much

      Comment


      • #4
        Note that

        Code:
        bysort variableofinterest : gen cellsize = _N
        is a direct alternative to

        Code:
        gen one=1
        egen cellsize=count(one), by(variableofinterest)
        Here's yet another way to do it:

        Code:
        egen cellsize=count(1), by(variableofinterest)

        Comment

        Working...
        X