Direct standardization of variables (dstdsize) based on epidemiology

Maria Ribeiro

Join Date: Apr 2015

Posts: 42
#1

Direct standardization of variables (dstdsize) based on epidemiology

21 Oct 2015, 07:54

Regarding direct standardization of variables based on epidemiology, I was wondering what is the correct way to implement the command dstdsize in the following situation:

1) I need to age/sex-standardize a variable Z.

2) To simplify, I have an unbalanced panel data set, by hospital ID.
3) There is one variable that accounts the average number of users for each hospital (U), and four more variables that decompose the amount of users by age categories (U1, U2, U3, U4). The same happens for users’ sex in terms of percentage (S1 and S2). Then, I have two more variables related to ownership (O) and Integration (I).

4)

Code:

dstdize Z U (??? how to include U1-U4 + S), by(O)

From the help file of this command in Stata I’m not being able to extend for my case the presented examples, for which I’d appreciate some help.

Thanks,
Maria
Tags: None
Maria Ribeiro

Join Date: Apr 2015

Posts: 42
#2

23 Oct 2015, 04:37

Sorry to insist, but should I try a different approach? Any suggestion?
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#3

23 Oct 2015, 09:48

Can you get data on individuals at each hospital? If so, you can operate on the dataset of individuals, as in this example in the Help.

Code:

. webuse hbp . generate pop = 1 . dstdize hbp pop age race sex, by(city year)

One other thing: Your word "insist" is unfortunate and will be off-putting to many. To echo Nick Cox's 2014 Advice

It is important to remember that Statalist is a discussion list, not a help line. The distinction might seem a little obscure or subtle, so let's spell it out: On a help line, someone is obliged to reply....

On a discussion list, no one is obliged to reply.

Last edited by Steve Samuels; 23 Oct 2015, 10:10.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Maria Ribeiro

Join Date: Apr 2015

Posts: 42
#4

23 Oct 2015, 10:07

First of all, I apologize for any inconvenience. I didn't intend to force an answer, but simply figure out if the example wasn't clear enough.

In terms of the suggested approach, the problem is that I do not have the dataset of individuals for each hospital. For each hospital what I have is the population, and then 4 variables to decompose the number of users by age-category.

Thanks,
Maria
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#5

23 Oct 2015, 11:19

Direct standardization is impossible with this data. I suggest binreg, followed by margins. First you'll have to create variables that represent the total number of events (ev) and the total number of users for each hospital (nu) (your U is the average number of users). Then, something like:

Code:

binreg ev U2 U3 U4 S2 i.O, n(nu) vce(robust) link(logit) margins O, atmeans margins, r.O // margins contrast (difference) of rates

You have to drop one of the U's and one of the S's, because percentages for each group will add to 1.

You'll also want to check fit of the model and other link options . You can of course include integration I in the model, and even get predictions for each IO combination by including the I O interaction. (i.O##i.I)

If you have questions about checking binreg fit or about margins, please start another topic.

Thanks for the apology. I was sure you didn't see the implication of what you said.

Good luck!

Last edited by Steve Samuels; 23 Oct 2015, 11:22.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#6

23 Oct 2015, 13:55

Actually, I would probably not use the U's as predictors, but their logits (treating them as proportions):

Code:

gen l2 = logit(U2/100) gen l3 = logit(U3/100) gen l4 = logit(U4/100)

The problem with proportions as predictors is that their range is very limited. However I really don't know what is best practice for these kinds of predictors since I rarely encounter them.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Maria Ribeiro

Join Date: Apr 2015

Posts: 42
#7

24 Oct 2015, 09:20

Trying different paths, I'm following the first suggestion, but replacing "link(logit)" by "or" to avoid an error message.

However two main questions come up:
1) my dependent variable is a percentage while Us(total population decomposed by age-category without the first threshold as suggested) and nu(total population) are absolute terms. Is it correct, or should I consider instead that nu=1 and Us transformed into percentage?
2) What's the reason to adopt binreg rather than poisson?

Thanks,
Maria
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#8

24 Oct 2015, 15:48

1. binreg for grouped data, requires a count of events as the outcome (so would poisson).

2. The denominator for poisson would be person-years. You could use poisson if you had a length of stay for each user or amount of exposure.

You said after speaking of the Us:

The same happens for users’ sex in terms of percentage (S1 and S2).

I naturally took the "same thing" to mean that the Us are percentages too. If that's not true, then convert to proportions (or logits of proportions). In any case, the proportions describe the age distributions, not the absolute numbers.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#9

24 Oct 2015, 19:23

You haven't told us much about the data, but if the "users" are in fact "admissions" to the hospital, binreg would still be the method of choice.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Maria Ribeiro

Join Date: Apr 2015

Posts: 42
#10

25 Oct 2015, 14:26

Users are in fact patients that have used the hospital at least once. I'm going to convert everything in percentages then to match nu=1.

Thank you for the clarification between binreg vs poisson.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#11

25 Oct 2015, 14:37

Keep nu = the actual number of users and "Z" the actual number of users with events. Convert just the U's and the S's to percentages or proportions.

Last edited by Steve Samuels; 25 Oct 2015, 15:04.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Maria Ribeiro

Join Date: Apr 2015

Posts: 42
#12

25 Oct 2015, 14:54

My Z variable corresponds to the share of users that from general(internal) medicine are then followed by specialist doctor due to acute problems.

On a next time I'll be much more clear on describing the variables that I have. Sorry for the inconvenience!
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#13

25 Oct 2015, 15:08

Thanks for the information. dstdize also requires counts so your original Z would not have worked for that either. I've changed my post above: for the U's and S's percentages are OK; you'd only divide by 100 if you were using the logit transformation..

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement

Direct standardization of variables (dstdsize) based on epidemiology

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment