Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Direct standardization of variables (dstdsize) based on epidemiology

    Regarding direct standardization of variables based on epidemiology, I was wondering what is the correct way to implement the command dstdsize in the following situation:

    1) I need to age/sex-standardize a variable Z.

    2) To simplify, I have an unbalanced panel data set, by hospital ID.
    3) There is one variable that accounts the average number of users for each hospital (U), and four more variables that decompose the amount of users by age categories (U1, U2, U3, U4). The same happens for users’ sex in terms of percentage (S1 and S2). Then, I have two more variables related to ownership (O) and Integration (I).

    4)
    Code:
    dstdize Z U (??? how to include U1-U4 + S), by(O)
    From the help file of this command in Stata I’m not being able to extend for my case the presented examples, for which I’d appreciate some help.

    Thanks,
    Maria

  • #2
    Sorry to insist, but should I try a different approach? Any suggestion?

    Comment


    • #3
      Can you get data on individuals at each hospital? If so, you can operate on the dataset of individuals, as in this example in the Help.

      Code:
              . webuse hbp
              . generate pop = 1
              . dstdize hbp pop age race sex, by(city year)

      One other thing: Your word "insist" is unfortunate and will be off-putting to many. To echo Nick Cox's 2014 Advice

      It is important to remember that Statalist is a discussion list, not a help line. The distinction might seem a little obscure or subtle, so let's spell it out: On a help line, someone is obliged to reply....
      On a discussion list, no one is obliged to reply.
      Last edited by Steve Samuels; 23 Oct 2015, 10:10.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        First of all, I apologize for any inconvenience. I didn't intend to force an answer, but simply figure out if the example wasn't clear enough.

        In terms of the suggested approach, the problem is that I do not have the dataset of individuals for each hospital. For each hospital what I have is the population, and then 4 variables to decompose the number of users by age-category.

        Thanks,
        Maria

        Comment


        • #5


          Direct standardization is impossible with this data. I suggest binreg, followed by margins. First you'll have to create variables that represent the total number of events (ev) and the total number of users for each hospital (nu) (your U is the average number of users). Then, something like:

          Code:
          binreg ev  U2 U3 U4  S2 i.O, n(nu) vce(robust) link(logit)
          
          margins O, atmeans
          margins, r.O  //  margins contrast (difference)  of rates
          You have to drop one of the U's and one of the S's, because percentages for each group will add to 1.

          You'll also want to check fit of the model and other link options . You can of course include integration I in the model, and even get predictions for each IO combination by including the I O interaction. (i.O##i.I)

          If you have questions about checking binreg fit or about margins, please start another topic.

          Thanks for the apology. I was sure you didn't see the implication of what you said.

          Good luck!
          Last edited by Steve Samuels; 23 Oct 2015, 11:22.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment


          • #6
            Actually, I would probably not use the U's as predictors, but their logits (treating them as proportions):
            Code:
            gen l2 = logit(U2/100)
            gen l3 = logit(U3/100)
            gen l4 = logit(U4/100)
            The problem with proportions as predictors is that their range is very limited. However I really don't know what is best practice for these kinds of predictors since I rarely encounter them.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Trying different paths, I'm following the first suggestion, but replacing "link(logit)" by "or" to avoid an error message.

              However two main questions come up:
              1) my dependent variable is a percentage while Us(total population decomposed by age-category without the first threshold as suggested) and nu(total population) are absolute terms. Is it correct, or should I consider instead that nu=1 and Us transformed into percentage?
              2) What's the reason to adopt binreg rather than poisson?

              Thanks,
              Maria

              Comment


              • #8
                1. binreg for grouped data, requires a count of events as the outcome (so would poisson).

                2. The denominator for poisson would be person-years. You could use poisson if you had a length of stay for each user or amount of exposure.

                You said after speaking of the Us:
                The same happens for users’ sex in terms of percentage (S1 and S2).
                I naturally took the "same thing" to mean that the Us are percentages too. If that's not true, then convert to proportions (or logits of proportions). In any case, the proportions describe the age distributions, not the absolute numbers.
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment


                • #9
                  You haven't told us much about the data, but if the "users" are in fact "admissions" to the hospital, binreg would still be the method of choice.
                  Steve Samuels
                  Statistical Consulting
                  [email protected]

                  Stata 14.2

                  Comment


                  • #10
                    Users are in fact patients that have used the hospital at least once. I'm going to convert everything in percentages then to match nu=1.

                    Thank you for the clarification between binreg vs poisson.

                    Comment


                    • #11
                      Keep nu = the actual number of users and "Z" the actual number of users with events. Convert just the U's and the S's to percentages or proportions.
                      Last edited by Steve Samuels; 25 Oct 2015, 15:04.
                      Steve Samuels
                      Statistical Consulting
                      [email protected]

                      Stata 14.2

                      Comment


                      • #12
                        My Z variable corresponds to the share of users that from general(internal) medicine are then followed by specialist doctor due to acute problems.

                        On a next time I'll be much more clear on describing the variables that I have. Sorry for the inconvenience!

                        Comment


                        • #13
                          Thanks for the information. dstdize also requires counts so your original Z would not have worked for that either. I've changed my post above: for the U's and S's percentages are OK; you'd only divide by 100 if you were using the logit transformation..
                          Steve Samuels
                          Statistical Consulting
                          [email protected]

                          Stata 14.2

                          Comment

                          Working...
                          X