Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subsetting data

    Hello,
    I'd appreciate if you could help me with Stata syntax for the following analysis.

    I've got the HCV treatment dataset. I need to run analysis on the subset of patients who have received an approval to start treatment. There is a probability that the authorizations were issued more than one time during the time period by the treatment body/committee for the same patient. Thus, there are multiple records for the same patient ID.
    For this purpose, a) I need to calculate the number of unique patients who have received at least once an approval to initiate the treatment course.
    b) and if an approval was received >1 time, then those records corresponding to the most recent date of approval need to be included in the subset analysis.

    NOTE: the database is not time-ordered ( this is a bummer)

    Here are the variables:
    patient_ID;
    Approval_date ( The date when the approval to start treatment was issued);
    Approval ( code "0" for YES and code "1" for NO)

    For the a) part I'm doing following:
    by patient_ID: egen cmt_approved = total(Approval == 0)
    replace cmt_approved = 1 if cmt_approved > 0

    then to figure out the unique # of patients with at least one approval, I run this commend :
    bysort patient_ID Approval : gen ncommittee = _n == 1
    tab Approval ncommittee , miss
    and I generate the sample where those with " Approval==0 & ncommittee==1 " are included.
    But I need to ensure the relevant records with the date of last approval is included in the sample. ( something like if april>march then keep that record within the same ID)
    and this is the point where I am bog down

    Thank you in advance for you assistance,
    Regards,
    Lia Gvinjilia


  • #2
    What is the format of the approval date? If it is a Stata date or other numeric, then the following will flag the last (most current) entry (or the only entry if there is one):

    Code:
    bysort patient_id (Approval_date): gen flag=1 if _n==_N
    keep if flag
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Thank you Carole for your response. The format of the approval date looks like this e.g. "07sep2015"

      Comment


      • #4
        I've tried this code
        bysort patient_id (Approval_date): gen flag=1 if _n==_N keep if flag But it says (0 observations deleted)

        what should I do?

        Comment


        • #5
          I think I forgot to write
          keep if flag==1
          isn't this right?

          Comment


          • #6
            Either way should work. What is the result of: desc Approval_date
            Stata/MP 14.1 (64-bit x86-64)
            Revision 19 May 2016
            Win 8.1

            Comment


            • #7
              We shouldn't really be asking for the format, because that sounds to many people like "how is it shown". However, "07sep2015" could be either a string value or the result of a display format for a numeric date.

              You need to show us the storage type (or informatively the display format) as cited by describe

              If it's numeric, then (as a small twist on Carole's code)

              Code:
                
               bysort patient_id (Approval_date): keep if _n==_N
              keeps the last observation for each patient.

              On distinct observations, and why "unique" is not a good word, see e.g. http://www.stata-journal.com/sjpdf.h...iclenum=dm0042

              Comment


              • #8
                int %td is the result of desc Approval_date

                Comment


                • #9
                  Note that

                  Code:
                   
                  gen flag=1 if _n==_N 
                  keep if flag
                  won't work here to change the dataset. flag is either 1 or missing and both count as "true". You do need

                  Code:
                  keep if flag == 1
                  if you go down this road (but see #7).

                  Comment


                  • #10
                    It is in Stata date format.
                    Thanks for the reminder, Nick! Nick's point above is right, use keep if flag==1.
                    Code:
                    sort patient_id Approval_date
                    list patient_id Approval_date flag, sepby(patient_id)
                    Last edited by Carole J. Wilson; 13 Apr 2016, 08:15.
                    Stata/MP 14.1 (64-bit x86-64)
                    Revision 19 May 2016
                    Win 8.1

                    Comment


                    • #11
                      I don't know what has happened. but after using
                      gen flag=1 if _n==_N keep if flag==1 only 1 record was kept. the rest were deleted

                      Comment


                      • #12
                        Nick is correct that you can keep in a single command. I generally prefer to flag observations with more complex conditions (complex for me!) before keeping or dropping so that I can double-check that my code did what I wanted before moving on. sum if flag or various assert commands would have hopefully alerted me that something was wrong with my construction.
                        Stata/MP 14.1 (64-bit x86-64)
                        Revision 19 May 2016
                        Win 8.1

                        Comment


                        • #13
                          You did not use the full command

                          Code:
                          bysort patient_id (Approval_date): gen flag=1 if _n==_N
                          Stata/MP 14.1 (64-bit x86-64)
                          Revision 19 May 2016
                          Win 8.1

                          Comment


                          • #14
                            Code:
                            gen flag=1 if _n==_N
                            keep if flag==1
                            without anything else only applies to the whole dataset, with the result you report.

                            Comment


                            • #15
                              YES! you're right it was incomplete command:-) Thank you Nick and Carole for you help!
                              to recap- so I get the subset of distinct last observation for each patients right?

                              Comment

                              Working...
                              X