Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to deal with duplicates when creating propoprtions

    Dear all,
    I need your valuable help and advice in the following please:
    I have the following data , in which subject-id may have several visits dates and types (could be either screening , sampling or repeat sampling) and they may have different or same disease stage at each visit.
    I want to answer the following question: how many subjects had disease stage 1 at their first visit ?
    ( so basically I want stata to only consider first visit only when calculating the proportion of certain disease category without having to drop the other observations)
    Subject-id visit-date visit-name disease-stage
    001 01/01/2017 screening 1
    001 10/02/2018 sampling 1
    001 01/01/2019 screening 1
    001 05/06/2019 rep sampling 1
    002 13/11/2016 screening 4
    002 20/12/2016 sampling 4
    003 09/04/206 screening 3
    003 10/04/2016 sampling 4
    004 11/05/2019 screening 2


    TIA
    Last edited by Yara Hassan; 14 Oct 2019, 09:37.

  • #2
    What you are showing contains column headers that could not be legal Stata variable names. So this code may not run without modification. Please note FAQ Advice #12 on how better to give example data.

    Code:
    bysort subject_id (visit_date) : gen first = _n == 1
    
    count if first & disease_stage == 1 
    
    tab disease_stage if first

    Comment


    • #3
      This is Brilliant I used the code and applied it to my variables and it sorted every thing.
      I greatly appreciate your help and I apologise for posting the data example in wrong way but I will look onto how to better give example data , as you have suggested for any future enquiry.
      Many Thanks again for your great help.

      Comment

      Working...
      X