Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • population distribution for stsplit

    I have a dataset where I calculated incidence by age groups using the following command:

    stset censor, failure(event==1) exit(end) enter(start) id(id) scale(365.25) origin(dob)
    stsplit agegroup, at(0(10) 80)
    strate agegroup, per(100000)

    I also need the distribution of people in _D (number of events) and _Y (person-years)
    How do I get the number of people in the group instead of the person-years when using strate in Stata?

    Thank you very much

  • #2
    You should be able to reproduce the values of _D and _Y that you get from strate using the collapse command. You will probably first need to create a person-time variable as _y=_t-_t0. You would then sum _d to get _D and sum _y to get _Y. To get the number of people you would count rather than sum. If you provide a data example then me or someone else might show you some code.

    Comment


    • #3
      Dear Paul,

      Thank you very much. Please below is data example


      id event censor start end dob
      12. 1. 12/12/2011. 01/01/2010 01/01/2020 07/08/1993
      ​​​​​​13. 1. 12/12/2014. 07/02/2011 02/02/2019. 01/03/2005
      ​​​​​​14. 0. 12/04/2011. 06/03/2010 03/03/2018. 02/07/2002
      ​​​​​​15. 1. 12/08/2012. 05/04/2012 06/04/2017. 03/03/1990
      16. 1. 12/12/2011. 04/05/2013 07/05/2016. 04/03/2003
      ​​​​​​17. 0. 12/09/2013. 03/06/2014 08/06/2016. 05/04/1996
      ​​​​​​18. 1. 12/12/2011. 01/07/2015 09/07/2019. 06/05/2002
      ​​​​​​19. 0. 12/12/2011. 08/01/2010 11/08/2020 07/06/2001



      I calculated the incidence by age groups using the following command:

      stset censor, failure(event==1) exit(end) enter(start) id(id) scale(365.25) origin(dob)
      stsplit agegroup, at(0(10) 80)
      strate agegroup, per(100000)

      Please I need the distribution of people in _D (number of events) and _Y (person-years)
      How do I get the number of people in the group instead of the person-years when using strate in Stata?

      Thank you very much

      Comment


      • #4
        Expanding on my previous response, I think the following should do what you want:

        Code:
        generate y=_t-_t0
        collapse (sum) _D=_d (count) N=y (sum) _Y=y, by(agegroup)
        list
        It should give you the same values for _D and _Y as strate, but will also give you the number of individuals contributing person-time in each group (N).

        NOTE: This code is untested. There may be both syntax errors and logic errors. If you provide code that can be copied into Stata and run (i.e., use -dataex- to present data) then you'll get much better answers.

        Comment


        • #5
          Thank you very much, Paul. It worked.

          Comment


          • #6
            Dear Paul,
            I have used the follwing codes.
            generate y=_t-_t0 collapse (sum) _D=_d (count) N=y (sum) _Y=y, by(agegroup) list
            However, the number of people in the baseline (N) has doubled. There are duplicates. Is there a way of going around this?

            Comment


            • #7
              Please post a fully worked example illustrating the problem. That is, code I can copy and run directly in Stata without changing anything. I have no idea what the issue might be.

              Comment


              • #8
                Dear Paul,

                Please I have 15,000 people in my dataset.

                When I used the following codes.

                stset censor, failure(event==1) exit(end) enter(start) id(id) scale(365.25) origin(dob)
                stsplit agegroup, at(0(10) 80)
                strate agegroup, per(100000)

                generate y=_t-_t0
                collapse (sum) _D=_d (count) N=y (sum) _Y=y, by(agegroup)
                list


                the number of people in the baseline (N) has doubled to over 30,000. The baseline population(N) should be equal to 15,000. There are duplicates due to the stsplit. Is there a way of going around this?
                Please I need the distribution of people in _D and _Y.

                An example dataset


                ​​​​​​id event censor start end dob
                12. 1. 12/12/2011. 01/01/2010 01/01/2020 07/08/1993
                ​​​​​​13. 1. 12/12/2014. 07/02/2011 02/02/2019. 01/03/2005
                ​​​​​​14. 0. 12/04/2011. 06/03/2010 03/03/2018. 02/07/2002
                ​​​​​​15. 1. 12/08/2012. 05/04/2012 06/04/2017. 03/03/1990
                16. 1. 12/12/2011. 04/05/2013 07/05/2016. 04/03/2003
                ​​​​​​17. 0. 12/09/2013. 03/06/2014 08/06/2016. 05/04/1996
                ​​​​​​18. 1. 12/12/2011. 01/07/2015 09/07/2019. 06/05/2002
                ​​​​​​19. 0. 12/12/2011. 08/01/2010 11/08/2020 07/06/2001

                Thank you very much

                Comment


                • #9
                  Originally posted by Paul Dickman View Post
                  Please post a fully worked example illustrating the problem. That is, code I can copy and run directly in Stata without changing anything. I have no idea what the issue might be.
                  The code you provided will not run. If you provide a worked example then it's much easier to provide help.

                  Your original question was
                  How do I get the number of people in the group instead of the person-years when using strate in Stata?
                  By 'in the group' I assumed age group and since you ran -strate- after -stsplit- then I assumed you meant in the split data.

                  Do you just want the distribution of age at baseline? That is, before splitting? That's very easy to get, but I would suggest caution is reporting those numbers together with the results from strate.

                  Comment


                  • #10
                    Yes. I want the distribution of of the population by age for the baseline.


                    Thank you very much.

                    Comment


                    • #11
                      Why not create a variable containing age group at baseline and use tabulate? I'm sorry, but I'm struggling to understand what you want.

                      Comment


                      • #12
                        Yes I have done that but realised the number of event (_d) is different for some of the age group when compared with _D from the strate command. Please what I want is the number of people in the numerator (number of events) and denominator (people not person-years). I want to know the number of people who contributed person-years for the denominator. For the number of people in the numerator, I can get that from the strate command.

                        Thank you very much.

                        Comment


                        • #13
                          This really would be much easier with an example. This is my last contribution to the thread without a data example.

                          Originally posted by Naa Naadu View Post
                          Yes I have done that but realised the number of event (_d) is different for some of the age group when compared with _D from the strate command.
                          Yes, this is because the distribution of deaths by age at diagnosis will be different to the distribution of deaths by age at death.

                          There are two tables you can produce:

                          age_group_at_diagnosis _d
                          attained_age_group _d

                          Attained age is age during follow-up. It is the agegroup variable you have after using stsplit. An individual can contribute to multiple age groups for attained age. As such, the total number of individuals in these two tables will differ. It seems you want to produce these two tables where the total number of individuals is the same in each table. That is, you want each individual to contribute to just one category of attained age. You could, of course, do that - create the table where you restrict to just the last category of attained age to which each individual contributed - but that would be an unusual analytic approach.

                          Originally posted by Naa Naadu View Post
                          I want to know the number of people who contributed person-years for the denominator.
                          That's what you get from the code above. When you sum the numbers across age groups it is expected that the total will be greater than the number of individuals in your study.


                          Comment


                          • #14
                            Dear Paul,

                            Thank you very much. It worked.

                            Comment

                            Working...
                            X