Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • keep one observation within a group

    Dear All, Suppose that I have a dataset
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id str1 typrep str3 indu float(roa roe bsdt)
    1 "A" "J66" .008948 .294976 17622
    1 "A" "J66"       .       . 17713
    1 "B" "J66"  .00971 .297848 17713
    1 "A" "J66" .010469 .306175 17805
    1 "B" "J66"       .       . 17805
    1 "A" "J66" .001485 .041761 17897
    1 "A" "J66" .001572 .047049 17987
    2 "A" "K70" .069677 .203744 17622
    2 "B" "K70" .042736 .090677 17622
    2 "A" "K70"       .       . 17713
    2 "B" "K70" .088329 .181697 17713
    2 "A" "K70" .054147 .176788 17805
    2 "B" "K70" .075659 .152047 17805
    end
    format %td bsdt
    There are twp kinds of reports, `typrep'=A,B. I wish to keep only one type of report A or B for each `id' and `bsdt'.
    1. If `roa' is available for both A and B, choose A.
    2. If `roa' is missing for B, keep A.
    3. if `roa' is missing for A, Keep A.
    Any suggestions? Thanks.
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    Code:
    bys id bsdt (typrep): keep if _n==1

    Comment


    • #3
      Dear Jorrit, My bad. I had a typo. In 3, it should be if `roa' is missing for A, Keep B.

      Ho-Chuan (River) Huang
      Stata 19.0, MP(4)

      Comment


      • #4
        Yeah that makes more sense actually.
        here's code with an example of where situations exist where: A is missing but it is the only obs by id and date, and A and B are both missing for a id and date.
        In both cases, A is preserved with a missing value for roa.
        Is that what you would want?

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long id str1 typrep str3 indu float(roa roe bsdt)
        1 "A" "J66" . .294976 17622
        1 "A" "J66"       .       . 17713
        1 "B" "J66"  . .297848 17713
        1 "A" "J66" .010469 .306175 17805
        1 "B" "J66"       .       . 17805
        1 "A" "J66" .001485 .041761 17897
        1 "A" "J66" .001572 .047049 17987
        2 "A" "K70" .069677 .203744 17622
        2 "B" "K70" .042736 .090677 17622
        2 "A" "K70"       .       . 17713
        2 "B" "K70" .088329 .181697 17713
        2 "A" "K70" .054147 .176788 17805
        2 "B" "K70" .075659 .152047 17805
        end
        format %td bsdt
        
        egen roamax = max(roa), by (id bsdt)
        bys id bsdt: drop if roa==. & roamax!=roa & _n==1
        bys id bsdt (typrep): keep if _n==1

        Comment


        • #5
          Dear Jorrit, Thanks for the suggestion. Similar results can be obtained by the following code as well.
          Code:
          bysort id bsdt (typrep): egen n = count(roa)
          drop if typrep == "B" & n == 2
          collapse roa roe, by(id bsdt)
          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment

          Working...
          X