Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • collapse string?

    Dear All, II find this question here (https://bbs.pinggu.org/forum.php?mod...=1#pid58387315). The data set is
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(patid clmid) str10 fst_dt str30(diag1 diag2 diag3 diag4 diag5 diag6 diag7 diag8)
    33003282913  147531993 "2018-09-17" "I739" "I2111" "Z955"  "I10"   "I2510" "" "" ""
    33003283151 4962568716 "2018-08-06" "I480" "I2510" "I2102" "I10"   "E782"  "" "" ""
    33003284508 4959930691 "2018-08-29" "I10"  "I214"  "R079"  "R9431" ""      "" "" ""
    33003284508 4960456903 "2018-08-25" ""     ""      ""      ""      "I2510" "" "" ""
    33003284508 4960456903 "2018-08-25" ""     "I10"   ""      "R0602" ""      "" "" ""
    33003284508 4960456903 "2018-08-25" "I214" ""      "R9431" ""      ""      "" "" ""
    33003284508 4966580498 "2018-09-12" "I10"  "I214"  ""      ""      ""      "" "" ""
    end
    As you can see, `patid', `clmid', and `fst_dt' are the same for observations 4,5,6. I'd like to collapse them into one row. Any suggestions? Thanks.
    Ho-Chuan (River) Huang
    Stata 19.0, MP(4)

  • #2
    This can only be done if there are never two conflicting values for the same diag* variable among the observations for a given patid clmid fstdt combination. So I start off by verifying that. If that goes through, -collapse- does it.

    Code:
    //    VERIFY THAT THERE ARE NO CLASHING VALUES OF DIAG* WITHIN 
    //    ANY COMBINATION OF PATID CLMID
    foreach v of varlist diag* {
        by patid clmid fst_dt (`v'), sort: assert `v' == `v'[_N] | missing(`v')
    }
    
    format patid clmid %11.0f
    collapse (firstnm) diag*, by(patid clmid fst_dt)
    list, noobs clean
    Note: the -format- command is just to make the display of patid and clmid in the ouput listing show enough figures that you can see the differences among the different values of patid and clmid. It is not necessary for the calculations, only for the results display.

    Comment


    • #3
      Dear Clyde, Thanks so much. I didn't know that -collapse- can be applied to string variables.
      Ho-Chuan (River) Huang
      Stata 19.0, MP(4)

      Comment


      • #4
        Well, for obvious reasons, you can't do things like mean or sum or percentile. Oddly, although simple logic would suggest you should be able to do count of non-missing values [-collapse (count) string_var-], you can't.

        It isn't immediately obvious whether logic suggests that (min) and (max) should be applicable to strings--they do have an ordering, but we don't typically think about them that way. Strangest of all, (min) works with strings but (max) does not!!!

        (first), (last), (firstnm), and (lastnm) work with strings.

        Comment


        • #5
          Dear Clyde, Thanks again or the addition useful information.

          Ho-Chuan (River) Huang
          Stata 19.0, MP(4)

          Comment

          Working...
          X