Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adjusted Risk Difference (ARD) with svy

    Hi all,

    I'm trying to get adjusted risk differences (ARD) for several analyses I'm doing with complex survey data, including a few binary logistic regressions and a multinomial logistic regression, and I'm having some issues. I stumbled on the Norton, Miller and Kleinman paper "Computing adjusted risk ratios and risk differences in Stata" in the Stata Journal from 2013, describing their ADJRR command, which seems to work fine but only for binary predictors.
    I also found Clyde Schechter's kind answer on here, but couldn't figure out how to replicate it with a command such as:

    svy linearized : logit treatment age gender race

    ...which is trying to predict whether treatment is provided (binary) adjusting for age (continuous), gender (binary, for these purposes) and race (4 categories, for these purposes).

    If anyone figured out how to get ARD for covariates which aren't binary using ADJRR, or has any other way to get ARD for binomial or multinomial logistic regressions on svy data that would be greatly appreciated!

    Thanks,
    John

  • #2
    My day to day work doesn't involve survey data. However, if you check the help file for svy postestimation, it lists the margins command. Maybe margins didn't work when adjrr was developed, but right now, it doesn't seem to be necessary.

    Anyway, the generalized linear model command can and will spit out risk differences or risk ratios depending on your choice of link function. Clyde stated this in the thread you linked. If you know what a GLM is or you are able to teach yourself, you can start there. With the binomial family, the canonical logit link is equivalent to logistic regression. However, a log link will produce relative risks once you exponentiate the coefficients. An identity link will produce risk differences. So, that's one way where you don't even need to touch margins.

    In any case, I can show you that margins can produce risk differences equivalent to adjrr. Here, I'm going to ask both commands to set everyone's age to the sample mean age to show that both commands work with continuous covariates.

    Code:
    webuse nhanes2d
    quietly svy: logistic highbp height weight age c.age#c.age i.female i.race,baselevels
    adjrr female, at(age = 47.579654)
    R1  = 0.3603 (0.0178)    95% CI  (0.3254, 0.3951)
    R0  = 0.4577 (0.0210)    95% CI  (0.4165, 0.4990)
    ARR = 0.7871 (0.0247)    95% CI  (0.7401, 0.8370)
    ARD = -0.0975 (0.0131)    95% CI  (-0.1231, -0.0718)
    p-value (R0 = R1):  0.0000
    p-value (ln(R1/R0) = 0):  0.0000
    
    margins, dydx(female) vce(unconditional) at(age = 47.579654)
    
    Predictive margins                              Number of obs     =     10,351
    Model VCE    : Linearized
    
    Expression   : Pr(highbp), predict()
    at           : age             =    47.57965
    
    ------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          female |
              0  |   .4577216    .021142    21.65   0.000     .4146021     .500841
              1  |   .3602696    .018118    19.88   0.000     .3233178    .3972214
    ------------------------------------------------------------------------------
    First, the predicted means for both commands are the same. The SEs are different, but because I don't do survey research, I don't know which variance estimator is optimal, or which one adjrr defaults to. They are very close, however. You could have specified margins, dydx(female) vce(whatever) to get the risk difference directly. If you know the symbolic names for the coefficients, you can ask margins to calculate them, e.g.

    Code:
    margins, dydx(female) vce(unconditional) at(age = 47.579654) post coefl
    ------------------------------------------------------------------------------
                 |     Margin  Legend
    -------------+----------------------------------------------------------------
          female |
              0  |   .4577216  _b[0bn.female]
              1  |   .3602696  _b[1.female]
    ------------------------------------------------------------------------------
    
    lincom _b[1.female] - _b[0bn.female]
    
     ( 1)  - 0bn.female + 1.female = 0
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             (1) |   -.097452   .0130318    -7.48   0.000    -.1240305   -.0708735
    ------------------------------------------------------------------------------
    
    nlcom _b[1.female] / _b[0bn.female]
    
           _nl_1:  _b[1.female] / _b[0bn.female]
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           _nl_1 |   .7870933   .0247805    31.76   0.000     .7385244    .8356622
    ------------------------------------------------------------------------------
    You have to use margins' post option for the linear combination and non-linear combination commands to work. lincom gets you the risk difference. nlcom gets you the RR. Compare the poinbt estimates and SEs to adjrr. They should look essentially identical.

    Basically, adjrr appears to work fine with continuous predictors, but it also seems to have been superceded by margins working with survey data. Or you can use GLMs:

    Code:
    *Risk difference
    svy: glm highbp height weight age c.age#c.age i.female i.race, family(binomial) link(identity)
    *Relative risk:
    svy: glm highbp height weight age c.age#c.age i.female i.race, family(binomial) link(log)
    One interesting wrinkle happens if you try those bits of code with the stock dataset I provided: they keep running forever. If you omit the svy prefix, you'll see that they are refusing to converge. I am not sure why this is. It should not occur in all samples - I don't use this techinque day to day, but I was taught it in my program, and I believe it's general knowledge, so it has to work most of the time. If it happens not to work for you, you have margins as a backup. Be aware that the results from margins will differ slightly from what you get from a GLM. In margins with a non-linear model, it matters what you set the independent variables to, e.g. do you set everyone's covariates to the sample means (atmeans option), do you set them to specific values of interest, do you leave them as is, etc.
    Please use the code delimiters to show code and results - use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Please use the command -dataex- to show a representative sample of data; it is installed already if you have Stata 14.2 or 15.1, else you can install it by typing

    Code:
    ssc install dataex

    Comment


    • #3
      Thank you so much, Weiwen Ng, for this very detailed answer.

      As you suggested, I am able to get covariates in risk difference form using, for instance, margins dydx(age) vce(unconditional). This works great, as age is a continuous predictor, and so it really solves the issue I was having with adjrr, which forced me to contrast two specific values for continuous predictors.

      One final question I have has to do with getting the risk difference estimates from a multinomial logistic regression. When the predicted variable is categorical with 3 levels, say, I am used to having the multinomial model typically omit one level, and present the estimates for the other two levels' predictors, compared with the omitted one. However, when I run svy linearized : mlogit, with the predicted variable having 3 levels, and then run margins, I am getting three predicted outcomes, which I am not sure how to interpret.

      See here, using the example you provided, with
      race being the 3-level categorical predicted variable. My question is: what is the base level these risk differences are predicted against? Is there a way to get risk differences with one base level emitted?

      Code:
      webuse nhanes2d
      quietly svy linearized : mlogit race highbp height weight age i.female
      margins, dydx(highbp height weight age i.female) vce(unconditional)
      
      Average marginal effects                        Number of obs     =     10,351
      
      dy/dx w.r.t. : highbp height weight age 1.female
      1._predict   : Pr(race==White), predict(pr outcome(1))
      2._predict   : Pr(race==Black), predict(pr outcome(2))
      3._predict   : Pr(race==Other), predict(pr outcome(3))
      
      ------------------------------------------------------------------------------
                   |             Linearized
                   |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      highbp       |
          _predict |
                1  |    -.03933   .0133034    -2.96   0.006    -.0664626   -.0121975
                2  |    .029054   .0103249     2.81   0.008     .0079962    .0501118
                3  |    .010276   .0069088     1.49   0.147    -.0038147    .0243667
      -------------+----------------------------------------------------------------
      height       |
          _predict |
                1  |   .0041691   .0013261     3.14   0.004     .0014645    .0068737
                2  |  -.0011915   .0006619    -1.80   0.082    -.0025415    .0001584
                3  |  -.0029775   .0011462    -2.60   0.014    -.0053152   -.0006399
      -------------+----------------------------------------------------------------
      weight       |
          _predict |
                1  |  -.0007393   .0004112    -1.80   0.082     -.001578    .0000994
                2  |   .0015172   .0003151     4.82   0.000     .0008747    .0021598
                3  |  -.0007779   .0002338    -3.33   0.002    -.0012548    -.000301
      -------------+----------------------------------------------------------------
      age          |
          _predict |
                1  |   .0019821   .0003341     5.93   0.000     .0013008    .0026635
                2  |  -.0014267   .0003038    -4.70   0.000    -.0020464    -.000807
                3  |  -.0005554   .0000898    -6.18   0.000    -.0007386   -.0003722
      -------------+----------------------------------------------------------------
      0.female     |  (base outcome)
      -------------+----------------------------------------------------------------
      1.female     |
          _predict |
                1  |   .0594523   .0286762     2.07   0.047     .0009667    .1179378
                2  |   .0187135   .0105438     1.77   0.086    -.0027907    .0402176
                3  |  -.0781657   .0283667    -2.76   0.010    -.1360199   -.0203116
      ------------------------------------------------------------------------------
      Note: dy/dx for factor levels is the discrete change from the base level.

      Comment


      • #4
        Quoting myself:
        My question is: what is the base level these risk differences are predicted against? Is there a way to get risk differences with one base level emitted?
        So after giving it some more thought, and helped by John Mullahy's answer here, I think I figured it out. Because the probability of all 3 levels of the outcome has to amount to 1, adding the three average marginal effects for each predictor has to amount to 0. I assume that in this case, when presenting the average marginal effects (or risk differences), one does not have to present all three levels, since either can be inferred based on the other two. Hope I'm not wrong!

        Comment


        • #5
          Originally posted by John Kazowski View Post
          ...
          One final question I have has to do with getting the risk difference estimates from a multinomial logistic regression. When the predicted variable is categorical with 3 levels, say, I am used to having the multinomial model typically omit one level, and present the estimates for the other two levels' predictors, compared with the omitted one. However, when I run svy linearized : mlogit, with the predicted variable having 3 levels, and then run margins, I am getting three predicted outcomes, which I am not sure how to interpret.
          ...

          So after giving it some more thought, and helped by John Mullahy's answer here, I think I figured it out. Because the probability of all 3 levels of the outcome has to amount to 1, adding the three average marginal effects for each predictor has to amount to 0. I assume that in this case, when presenting the average marginal effects (or risk differences), one does not have to present all three levels, since either can be inferred based on the other two. Hope I'm not wrong![/CODE]
          Hmm... you are correct about the fact that the average marginal effects for each predictor sum to 0 across all 3 categories of the outcome.

          However, I would still present the marginal effects on all categories of the outcome. This reduces the amount of cognitive processing your readers have to engage in. The more processing they need to do to understand the results, the less able they will be to follow along. I think this is especially important if you have a relatively complex or less familiar model, and I suspect that multinomial logit is probably less familiar to many disciplines - exceptions might be economics or public policy, but I know I very rarely use multinomial logit in non-econ health services research. Also, some people may get cranky at seeing one category missing.
          Please use the code delimiters to show code and results - use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Please use the command -dataex- to show a representative sample of data; it is installed already if you have Stata 14.2 or 15.1, else you can install it by typing

          Code:
          ssc install dataex

          Comment


          • #6
            I think you are absolutely right. Thanks again for the helpful feedback.

            Comment

            Working...
            X