Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Syntax for calculating and charting trend of prevalence

    Dear All,

    I have a primary care patient population who have been categorised into categories of alcohol use. I would like to assess how the prevalence of the categories (this can also be viewed/thought of as proportions of the total population) change with time (in years).

    An example of the dataset is as follows:

    Code:
    clear
    input long patid float(cohort expdate exitdate death)
    1015 1 18898 19204 .
    1018 4 13236 20324 .
    1020 1 15465 16033 1
    1025 2 19732 20310 .
    1029 2 15111 19617 .
    1050 4 13892 18507 .
    1070 1 15108 15433 .
    1071 6 14959 16149 1
    1090 2 19930 20264 .
    1092 2 19563 20248 .
    1099 2 18927 19895 .
    end
    format %td expdate
    format %td exitdate
    label values cohort cohortlab
    label def cohortlab 1 "no alcohol data", modify
    label def cohortlab 2 "indeterminate", modify
    label def cohortlab 4 "low_risk", modify
    label def cohortlab 6 "alcohol_use_disorder", modify
    with

    'expdate': date that the patient categorised into the alcohol category
    'exitdate': date the patient exited the study
    'death': patients who died

    I was not able to find, and have been struggling to develop a suitable syntax to generate the prevalence of the various alcohol categories across time, and to then chart it (x axis : time (year); y axis: prevalence), such as in figure 3 in this paper: https://www.bmj.com/content/358/bmj.j3984

    I would be grateful for any help with this.

    Thank you.

  • #2
    I worry that I am missing something, because death does not seem to be relevant here. For purposes of calculating prevalence, it does not matter whether somebody has died or exited the cohort for some other reason. Did you have something different in mind?

    Assuming I am correct in ignoring death, I believe the following will do it for you:

    Code:
    isid patid
    
    reshape long @date, i(patid) j(event) string
    format date %td
    sort date
    gen population = sum((event == "exp")) - sum((event == "exit"))
    
    levelsof cohort, local(cohorts)
    foreach c of local cohorts {
        gen numerator_`c' = sum(cond(cohort == `c', event == "exp", .)) ///
            - sum(cond(cohort == `c', event == "exit", .))
        gen prevalence_`c' = numerator_`c'/population
    }
    You can then use standard Stata graphing commands to plot the various prevalence_* variables against date.

    Comment


    • #3
      Cross-posted at https://stackoverflow.com/questions/...-of-prevalence Please note our policy about cross-posting, which is that you are asked to tell us about it.

      Comment


      • #4
        Clyde,

        Thanks for the reply. Will test this now.

        My bad about the 'death' variable. Yes it doesnt matter how the exit.

        Thanks

        Comment


        • #5
          Nick,

          Ah. Sorry was not aware. Yes, will inform on all platforms whenever there is crossposting henceforth.

          Thanks.

          Comment


          • #6
            Everyone here is reminded to read the FAQ before posting.

            https://www.statalist.org/forums/help#crossposting

            Please do read it all before your next post.

            Comment

            Working...
            X