how to deal with duplicates when creating propoprtions

Yara Hassan

Join Date: Jul 2019
Posts: 15

how to deal with duplicates when creating propoprtions

14 Oct 2019, 09:30

Dear all,
I need your valuable help and advice in the following please:
I have the following data , in which subject-id may have several visits dates and types (could be either screening , sampling or repeat sampling) and they may have different or same disease stage at each visit.
I want to answer the following question: how many subjects had disease stage 1 at their first visit ?
( so basically I want stata to only consider first visit only when calculating the proportion of certain disease category without having to drop the other observations)

Subject-id	visit-date	visit-name	disease-stage
001	01/01/2017	screening	1
001	10/02/2018	sampling	1
001	01/01/2019	screening	1
001	05/06/2019	rep sampling	1
002	13/11/2016	screening	4
002	20/12/2016	sampling	4
003	09/04/206	screening	3
003	10/04/2016	sampling	4
004	11/05/2019	screening	2

TIA

Last edited by Yara Hassan; 14 Oct 2019, 09:37.

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35641
#2

14 Oct 2019, 09:35

What you are showing contains column headers that could not be legal Stata variable names. So this code may not run without modification. Please note FAQ Advice #12 on how better to give example data.

Code:

bysort subject_id (visit_date) : gen first = _n == 1 count if first & disease_stage == 1 tab disease_stage if first
Comment
Yara Hassan

Join Date: Jul 2019

Posts: 15
#3

14 Oct 2019, 09:45

This is Brilliant I used the code and applied it to my variables and it sorted every thing.
I greatly appreciate your help and I apologise for posting the data example in wrong way but I will look onto how to better give example data , as you have suggested for any future enquiry.
Many Thanks again for your great help.
Comment

Announcement

how to deal with duplicates when creating propoprtions

Comment

Comment