Indirect standardization using -margins- command

Victoria Soriano

Join Date: Mar 2020

Posts: 4
#1

Indirect standardization using -margins- command

17 Sep 2020, 19:30

Hi all,

I am working with 2 appended datasets of observational data set at different timepoints.
Dataset 0 is set in region 0 with approx 5000 participants - baseline region
Dataset 1 is set in region 1 with approx 2000 participants - comparator region
Outcome is binary 0/1
Factors are all binary 0/1 or categorical

I am trying to standardize the prevalence of a binary outcome in dataset 1 to be able to compare it to dataset 0 without the effect of covariates involved. To do this, I am trying to predict a standardized outcome for region 1 (the comparator), taking away changes that have occurred in the factors between the regions by basing these on the baseline region 0.

For this example I am using the example dataset margex and assuming the binary "treatment" variable is equivalent to my regions: (I am new to these examples and couldn't find another one with 2 groups and multiple covariates)

use http://www.stata-press.com/data/r15/margex.dta

logistic outcome i.treatment i.arm i.yc i.agegroup
preserve
keep if treatment == 0
margins, noesample at(treatment=(1))
restore

Alternatively, I have the code:
logistic outcome i.treatment i.arm i.yc i.agegroup
margins if treatment== 0, noesample at(treatment=(1)
I'm not sure how this differs from the above margins, but it gives me the same results in less lines of code.

I am not sure:
if this code is the correct way to standardize. I find it counter-intuitive and thought it should be:
margins if treatment== 1, noesample at(treatment=(0) //with the 0/1 switched.

What does this command line mean when the 0/1 are switched?

if it is, how do I interpret the output of this code. Is this output the expected prevalence of treatment 0 or treatment 1? Because the logisitic model is run on the whole dataset so I feel like the estimators then used in margins are of the whole dataset, not the baseline of treatment 0.

Also, if I want to add in an interaction term, do I have to specify the term in the "at()" function separately, or is it enough to have it in the regression as so:
logistic outcome i.treatment##i.arm i.yc i.agegroup
preserve
keep if treatment == 0
margins, noesample at(treatment=(1))
restore

Essentially, I am in graduate school and have been told to run these commands without enough explanation, but after spending weeks scouring the internet, the margins code book, and Stata list, have not been able to find an answer about why this command should be run the way I have described above nor how to interpret the output. Other people talk about using dydx()...

Thank you in advance for your time.

Kind regards,
Victoria

Last edited by Victoria Soriano; 17 Sep 2020, 19:35. Reason: margins; standardization
Tags: margins, regression, standardization
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

17 Sep 2020, 20:29

Code:

logistic outcome i.treatment i.arm i.yc i.agegroup preserve keep if treatment == 0 margins, noesample at(treatment=(1)) restore

calculates a counterfactual: it is the model's predictions of the overall probability of outcome among the treatment = 0 group (region) if they were in treatment = 1.

Code:

logistic outcome i.treatment i.arm i.yc i.agegroup margins if treatment== 0, noesample at(treatment=(1)

is entirely equivalent: it's just a different way of doing it. In the first code you restrict to treatment == 0 observations by dropping the other observations from the data set. In this version of the code you use an -if- condition to tell Stata to ignore the other observations. Either way, it's the same thing.

Code:

if this code is the correct way to standardize. I find it counter-intuitive and thought it should be:
margins if treatment== 1, noesample at(treatment=(0) //with the 0/1 switched.

What does this command line mean when the 0/1 are switched?

gives you a different counterfactual: it calculates the predicted probability of what outcome would be among the observations with treatment (region) == 1 if they were in treatment = 0.

Now, there is no way to say that one of these is correct and the other is incorrect. They are both ways of standardizing an outcome. In the first (two) approach(es) you are standardizing to the distributions of arm, yc, and agegroup in the treatment == 0 group. That is, you are saying, suppose the distributions of arm, yc, and agegroup in treatment == 1 were what we observed in treatment = 0. Then what would their expected outcome be given the other differences between treatments 0 and 1. You are using treatment == 1 as the standard population here.

In the way you propose, which reverses the 0 and 1, you are using treatment == 0 as the standard population.

Either way is a reasonable way to do things. A more common approach would just be to do

Code:

margins treatment

This would give you the expected outcomes in both treatment groups (regions) under the counterfactual condition that their values of arm, yc, and agegroup are exactly what was found in the combined sample. You would be using the combined sample as the standard population and standardizing the results in both treatment groups (regions) to that. Usually in epidemiology when we standardize results to facilitate comparisons across groups, we either use the combined groups as the standard, or some independent standard (e.g. comparing results from Florida and New York, standardizing both to the 2010 US Census population demographics).

But there is no "right" standard to choose. It's a matter of convenience. The results will be different with different standards, but the general magnitude and direction of the comparisons of groups will usually be about the same regardless of the standard chosen (within reason).

The presence of an interaction term would not change any of this.

Added: By the way, this procedure is direct standardization, not indirect.
2 likes
Comment
Victoria Soriano

Join Date: Mar 2020

Posts: 4
#3

17 Sep 2020, 21:33

Thank you so much Clyde. I really appreciate you taking the time to clarify my queries.
Comment

Announcement

Indirect standardization using -margins- command

Comment

Comment