Syntax for calculating and charting trend of prevalence

Abdul-Kareem Abdul-Rahman

Join Date: Sep 2015

Posts: 24
#1

Syntax for calculating and charting trend of prevalence

02 May 2018, 18:23

Dear All,

I have a primary care patient population who have been categorised into categories of alcohol use. I would like to assess how the prevalence of the categories (this can also be viewed/thought of as proportions of the total population) change with time (in years).

An example of the dataset is as follows:

Code:

clear input long patid float(cohort expdate exitdate death) 1015 1 18898 19204 . 1018 4 13236 20324 . 1020 1 15465 16033 1 1025 2 19732 20310 . 1029 2 15111 19617 . 1050 4 13892 18507 . 1070 1 15108 15433 . 1071 6 14959 16149 1 1090 2 19930 20264 . 1092 2 19563 20248 . 1099 2 18927 19895 . end format %td expdate format %td exitdate label values cohort cohortlab label def cohortlab 1 "no alcohol data", modify label def cohortlab 2 "indeterminate", modify label def cohortlab 4 "low_risk", modify label def cohortlab 6 "alcohol_use_disorder", modify

with

'expdate': date that the patient categorised into the alcohol category
'exitdate': date the patient exited the study
'death': patients who died

I was not able to find, and have been struggling to develop a suitable syntax to generate the prevalence of the various alcohol categories across time, and to then chart it (x axis : time (year); y axis: prevalence), such as in figure 3 in this paper: https://www.bmj.com/content/358/bmj.j3984

I would be grateful for any help with this.

Thank you.
Tags: prevalence
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

02 May 2018, 20:56

I worry that I am missing something, because death does not seem to be relevant here. For purposes of calculating prevalence, it does not matter whether somebody has died or exited the cohort for some other reason. Did you have something different in mind?

Assuming I am correct in ignoring death, I believe the following will do it for you:

Code:

isid patid reshape long @date, i(patid) j(event) string format date %td sort date gen population = sum((event == "exp")) - sum((event == "exit")) levelsof cohort, local(cohorts) foreach c of local cohorts { gen numerator_`c' = sum(cond(cohort == `c', event == "exp", .)) /// - sum(cond(cohort == `c', event == "exit", .)) gen prevalence_`c' = numerator_`c'/population }

You can then use standard Stata graphing commands to plot the various prevalence_* variables against date.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35641
#3

03 May 2018, 00:39

Cross-posted at https://stackoverflow.com/questions/...-of-prevalence Please note our policy about cross-posting, which is that you are asked to tell us about it.
Comment
Abdul-Kareem Abdul-Rahman

Join Date: Sep 2015

Posts: 24
#4

03 May 2018, 00:49

Clyde,

Thanks for the reply. Will test this now.

My bad about the 'death' variable. Yes it doesnt matter how the exit.

Thanks
Comment
Abdul-Kareem Abdul-Rahman

Join Date: Sep 2015

Posts: 24
#5

03 May 2018, 00:50

Nick,

Ah. Sorry was not aware. Yes, will inform on all platforms whenever there is crossposting henceforth.

Thanks.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35641
#6

03 May 2018, 02:34

Everyone here is reminded to read the FAQ before posting.

https://www.statalist.org/forums/help#crossposting

Please do read it all before your next post.
Comment

Announcement

Syntax for calculating and charting trend of prevalence

Comment

Comment

Comment

Comment

Comment