Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • selecting and sorting based on various conditions

    Hello guys, I am stuck on this task for too long. Can you help me through this, please?



    For id 928, I want to keep all observations as the proportion of positive (prop1) for same day is 0.5.
    For id 205209, I want to remove the observations from the day if prop1 is smaller than the biggest prop1 from the same day, keeping all rep1==.
    Similar for id 842938 & 1359840. & keeping all rep1==.

    Then after I want to deal with 217851 & 755961 ; I want to keep only one observation out of same repeated measures (duplicates drop id testdate testresult, force) but still keep rep1==.




    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double id long testdate byte testresult float(startdate enddate rep countbyindv countsameday num1 prop1 rep1) byte tag_dup
        928 18359 0 18291 18581 0 1 1 0        0 . 0
        928 18459 0 18291 18581 0 2 1 0        0 . 0
        928 18581 0 18291 18581 0 3 1 0        0 1 0
        928 18581 1 18291 18581 1 4 2 1       .5 1 0
     205209 18444 1 18328 18588 0 1 1 1        1 . 0
     205209 18501 1 18328 18588 0 2 1 1        1 . 0
     205209 18585 0 18328 18588 0 3 1 0        0 1 0
     205209 18585 1 18328 18588 1 5 3 2 .6666667 1 1
     217851 19058 0 18989 19257 0 1 1 0        0 . 0
     217851 19220 1 18989 19257 0 2 1 1        1 . 0
     217851 19221 0 18989 19257 0 3 1 0        0 1 1
     217851 19221 0 18989 19257 1 4 2 0        0 1 1
     755961 18938 0 18853 19150 0 1 1 0        0 . 0
     755961 19026 0 18853 19150 0 2 1 0        0 . 0
     755961 19149 0 18853 19150 0 3 1 0        0 1 1
     755961 19149 0 18853 19150 1 4 2 0        0 1 1
     842938 17310 0 17132 17313 0 1 1 0        0 . 0
     842938 17311 0 17132 17313 0 2 1 0        0 1 0
     842938 17311 1 17132 17313 1 4 3 2 .6666667 1 1
    1359840 18266 0 18150 18401 0 1 1 0        0 . 0
    1359840 18401 0 18150 18401 0 2 1 0        0 1 0
    1359840 18401 1 18150 18401 1 4 3 2 .6666667 1 1
    end
    format %tdD_m_Y testdate
    format %tdD_m_Y startdate
    format %tdD_m_Y enddate
    label values testresult posneg
    label def posneg 0 "negative", modify
    label def posneg 1 "positive", modify

  • #2
    For "For id 928, I'm thinking you mean "drop any of observations for which the proportion is not 0.5," in which case you could do this:
    Code:
    // Note that one should never compare floating point numbers for strict equality, hence the float(0.5),
    // although 0.5 might not bite.
    drop if (id == 928) & (prop1 != float(0.5))
    For id 205209, I'm guessing you mean "drop any of its observations for which prop1 is less than the biggest observed value for the day for this id *unless* rep1 == . for that observation."
    Code:
    egen daymax = max(prop1), by(id)
    drop if (id == 205209) & (prop1 < daymax) & (rep1 != .)

    Comment

    Working...
    X