Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subset data based on groups of observations if group meets 1 condition

    Hello,

    I have a dataset containg hospital billing data. 257000 observations to be exact. I have a variable called, ID. This variable uniquely identifies patients. Each row is a billed charge so there are mutliple rows per ID. I have another variable called, Department. This variable contains a string value denoting the type of service the patient received and was charged for.

    Example data:

    ID Department
    1 MRI
    1 PT/OT
    1 Surgery
    2 Office
    2 PTOT
    2 MRI
    3 PTOT
    3 PTOT
    4 Surgery
    5 MRI
    5 PTOT
    5 PTOT

    I need to subset my dataset to all patients that have received an MRI and analyze the other types of services (denoted in Department) that they have received. I would like to keep all same IDs if one observation is Department=="MRI". So, for example, In the example dataset, I would keep all of the observatrions for IDs: 1, 2, & 5. This would include keeping the observations where ID 1=="PT/OT", as example.

    My elementary approach has been:

    gen tagger="1" if Department=="MRI"
    sort ID
    replace tagger="1" if ID==ID[_n-1] & tag=="1"[_n-1]

    I usually repeat this until all IDs have been tagged. I know this is not the most efficient way to do this. Please advise.

  • #2
    Code:
    bysort ID: egen tokeep= max(Department=="MRI")
    keep if tokeep
    See https://www.stata.com/support/faqs/d...ble-recording/

    Comment


    • #3
      Hi Andrew,

      Thank you for the response. That unfortunately deleted all observations. The tokeep variable contained only 0s.

      Comment


      • #4
        Drew Nelson I don't think the problem is with the solution in #2.

        Code:
        clear
        input byte ID str10 Department
        1 "MRI"
        1 "PT/OT"
        1 "Surgery"
        2 "Office"
        2 "PTOT"
        2 "MRI"
        3 "PTOT"
        3 "PTOT"
        4 "Surgery"
        5 "MRI"
        5 "PTOT"
        5 "PTOT"
        end
        
        bysort ID: egen tokeep= max(Department=="MRI")
        keep if tokeep
        produces:

        Code:
          +------------------------+
          | ID   Depart~t   tokeep |
          |------------------------|
          |  1        MRI        1 |
          |  1      PT/OT        1 |
          |  1    Surgery        1 |
          |------------------------|
          |  2     Office        1 |
          |  2       PTOT        1 |
          |  2        MRI        1 |
          |------------------------|
          |  5        MRI        1 |
          |  5       PTOT        1 |
          |  5       PTOT        1 |
          +------------------------+
        If it deleted all your observations, there's something about the strings that is different. Are there leading/trailing spaces, perhaps? If so, you might want to amend the code slightly to

        Code:
        bysort ID: egen tokeep= max(trim(Department)=="MRI")
        keep if tokeep
        Last edited by Hemanshu Kumar; 21 Oct 2022, 08:53.

        Comment


        • #5
          #3 contradicts #1 as you're saying that you used the condition

          Code:
          Department=="MRI"
          with success. From my reading of #1 Andrew Musu's code in #2 should work fine. Did you do exactly what you said in #1?

          Comment


          • #6
            My apologies. Ran it on the wrong dataset. Worked perfectly. Thank you!

            Comment

            Working...
            X