Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summarize specific values recorded at similar time-points

    Dear Statalist,

    I am currently working with large datasets containing prescription-based drugs in which I want to summarize certain values recorded on the same date.
    Below, I have created an example of what the dataset looks like (unfortunately I cannot show you the exact data due to GDPR rules).
    ID disp_date pack_size strength
    1 21.mar.2008 100 2
    1 21.mar.2008 100 2
    2 12.jan.2011 250 8
    2 12.jan.2011 250 8
    2 06.jun.2016 1000 2

    This contains the dispensed date for the prescription (disp_date), how many pills each prescription contained (pack_size), and the strength of the medication in mg. Some individuals have filled several similar prescriptions on the same date.

    What I would like is to summarize the value of pack_size which the prescriptions have been filled on the same date, e.g.:

    For ID 1:
    - pack_size should be 200 for disp_date==21mar2008

    For ID 2:
    - pack_size should be 500 for disp_date==12jan2011
    - pack_size should remain 1000 for disp_date==06jun2016

    For some IDs, there are several hundred dates in which this medication has been filled, as they are followed for many years.

    Kind regards,
    Håkon
    Last edited by Haakon Vegrim; 13 May 2022, 04:52.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id str11 disp_date int pack_size byte strength
    1 "21.mar.2008"  100 2
    1 "21.mar.2008"  100 2
    2 "12.jan.2011"  250 8
    2 "12.jan.2011"  250 8
    2 "06.jun.2016" 1000 2
    end
    
    frame put id disp_date pack_size, into(stats)
    frame stats{
        collapse (sum) pack_size, by(id disp_date)
        list, sepby(id)
    }
    frame drop stats
    Res.:

    Code:
    . frame stats{
    . 
    .     collapse (sum) pack_size, by(id disp_date)
    . 
    .     list, sepby(id)
    
         +-----------------------------+
         | id     disp_date   pack_s~e |
         |-----------------------------|
      1. |  1   21.mar.2008        200 |
         |-----------------------------|
      2. |  2   06.jun.2016       1000 |
      3. |  2   12.jan.2011        500 |
         +-----------------------------+
    . 
    . }
    Last edited by Andrew Musau; 13 May 2022, 05:19.

    Comment


    • #3
      Frames as suggested by Andrew Musau are an excellent way to approach this. Here is another way to think about it:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte id str11 disp_date int pack_size byte strength
      1 "21.mar.2008"  100 2
      1 "21.mar.2008"  100 2
      2 "12.jan.2011"  250 8
      2 "12.jan.2011"  250 8
      2 "06.jun.2016" 1000 2
      end
      
      egen total_pack_size = total(pack_size), by(id disp_date)
      egen tag = tag(id disp_date)
      
      list id disp_date total if tag, noobs
      
        +-----------------------------+
        | id     disp_date   total_~e |
        |-----------------------------|
        |  1   21.mar.2008        200 |
        |  2   12.jan.2011        500 |
        |  2   06.jun.2016       1000 |
        +-----------------------------+
      
      .
      Last edited by Nick Cox; 13 May 2022, 05:46.

      Comment


      • #4
        Dear Andrew and Nick,

        Both of these approaches solved this issue perfectly! Thank you for your quick, professional, and nice replies.

        Sincerely,
        Håkon

        Comment

        Working...
        X