Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using atcontrasts in margins for a mixed effect model

    Hi everyone

    I am currently working on a project (using Stata 15) to analyse changes in kidney function (ckd) after switching from one medication to another (med1 to med2). I fitted a complex mixed effects model as follows:

    Code:
    mixed ckd_value c.time##c.time##i.med1_med2##i.ckdgroup_baseline2 c.time##i.med1_med2##i.ckdgroup_baseline2 age10 i.sex i.ethnicity_simple i.cd4_group2 i.rtv i.dtg i.bactrim i.cobi i.comorb_cvd i.comorb_aht i.comorb_dm i.comorb_dyslip i.hcv_chronic i.hbs_ag, || id:
    - output omitted -

    ckd_value: kidney function
    time: time from baseline to ckd_value measurement (in years)
    med1_med2: defines which medication used (0=med1 1=med2)
    ckdgroup_baseline2: stratifies data further into 3 groups of ckd_value at baseline
    rest are demographic variables / confounders

    I was able to describe changes over time with this model very nicely. To analyse this effect for subgroups, I would like to predict the change in ckd_value after 1 year for the demographic variables (age10=continuous, sex, ethnicity_simple, comorb_dm etc) when patients are switched to med2.

    I used following command which gives me ckd_values at baseline (time=0) and after 1 year:

    Code:
    margin sex ethnicity_simple cd4_group2 rtv dtg bactrim cobi comorb_cvd comorb_aht comorb_dm hcv_chronic hbs_ag, at( time=(0 1) med1_med2=1) atmeans
    
    
    
    1._at        : time            =           0
                   med1_med2         =           1
                  ***REST omitted***
    
    2._at        : time            =           1
                   med1_med2         =           1
                   ***REST omitted***
    
    
    --------------------------------------------------------------------------------------
                         |            Delta-method
                         |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------------+----------------------------------------------------------------
                 _at#sex |
                 1#male  |   93.47407   .2998836   311.70   0.000     92.88631    94.06183
               1#female  |     91.788   .3897248   235.52   0.000     91.02416    92.55185
                 2#male  |   92.21797   .2493154   369.88   0.000     91.72932    92.70662
               2#female  |    90.5319   .3523787   256.92   0.000     89.84125    91.22255
                         |
    _at#ethnicity_simple |
                1#White  |   91.33833   .2988618   305.62   0.000     90.75257    91.92409
                1#Black  |   102.4775   .4830898   212.13   0.000     101.5306    103.4243
                1#Other  |   91.64493   .5553936   165.01   0.000     90.55638    92.73348
                2#White  |   90.08222   .2478723   363.42   0.000      89.5964    90.56804
                2#Black  |   101.2214   .4536119   223.15   0.000     100.3323    102.1104
                2#Other  |   90.38882   .5305386   170.37   0.000     89.34899    91.42866
    
    *** Rest of output omitted
    What I want to is to calculate the difference between 1._at and 2._at for each demographic value. I was able to do it overall:

    Code:
    margins, at(time=(0 1) med1_med2=1) atmeans contrast(atcontrast(r._at))
    
    
    
    ------------------------------------------------
                 |         df        chi2     P>chi2
    -------------+----------------------------------
             _at |          1       14.47     0.0001
    ------------------------------------------------
    
    --------------------------------------------------------------
                 |            Delta-method
                 |   Contrast   Std. Err.     [95% Conf. Interval]
    -------------+------------------------------------------------
             _at |
       (2 vs 1)  |  -1.256105   .3302356     -1.903355   -.6088551
    --------------------------------------------------------------
    Which gives me the difference between time=1 and time=0 overall. When I try to calculate this for any given covariate (such as sex) in the margin, it doesn't work as expected:


    Code:
    margins sex, at(time=(0 1) med1_med2=1) atmeans contrast(atcontrast(r._at))
    
    
    
    ------------------------------------------------
                 |         df        chi2     P>chi2
    -------------+----------------------------------
         _at#sex |          1        0.00     1.0000
    ------------------------------------------------
    Is there a way to achieve this with margins? And how would you incude age10 (continuous)? do I have to specify a dummy_variable for age above/below 50 years?

    Any help in achieving this is warmly appreciated. Thanks in advance!


  • #2
    The simplest way to do a contrast in the expected outcome for the levels of a dichotomous variable is to ask -margins- for the marginal effect of that variable. So something like
    Code:
    margins, dydx(sex) at(time = (0 1) med1_med2 = 1) atmeans
    To include age = 10 as a condition, just add it to the -at()- option.

    And I don't understand why you would use an indicator for age above vs below 50 years to get information about age = 10. What am I missing here?

    Comment


    • #3
      Actually, the conditions specified in the -at- option are that time1 = 0 and 1, and Bernard wants the difference between those two, i.e. the marginal effect of a one-year change in follow-up time over some demographic groups. Hence, I think the syntax would be something like:

      Code:
      margins sex ethnicity, dydx(time) at(age10 = (10) med1_med2 = 1) atmeans
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        Thanks a lot for your response. I added some sample data to replicate the problem:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input long id float(ckd_value time) byte(med1_med2 ckdgroup_baseline2) float age10 byte(sex ethnicity_simple cd4_group2 rtv dtg bactrim cobi comorb_cvd comorb_aht comorb_dm comorb_dyslip hcv_chronic hbs_ag)
        10042 101.15215 -1.6931506 0 2 5.2 2 1 2 1 0 0 0 0 0 0 1 1 0
        10042  94.95408 -1.1068493 0 2 5.2 2 1 2 0 1 0 0 0 0 0 1 1 0
        10042   94.2894 -.58356166 0 2 5.2 2 1 2 0 1 0 0 0 0 0 1 1 0
        10042  64.48558 .005479452 0 2 5.2 2 1 2 0 1 0 0 0 0 0 1 1 0
        10042  59.17748  .52328765 0 2 5.2 2 1 2 0 1 0 0 0 0 0 1 1 0
        10042  81.37051   .9479452 0 2 5.2 2 1 2 0 1 0 0 0 0 0 1 1 0
        10042  70.20575  1.4657534 0 2 5.2 2 1 2 0 1 0 0 0 0 0 1 1 0
        10080 105.45656  -1.879452 0 1 6.1 1 1 1 0 0 0 0 0 1 1 1 0 0
        10080  91.20609  -1.358904 0 1 6.1 1 1 1 0 0 0 0 0 1 1 1 0 0
        10080  100.2141   -.769863 0 1 6.1 1 1 1 0 0 0 0 0 1 1 1 0 0
        10080   96.1601   -.309589 0 1 6.1 1 1 1 0 0 0 0 0 1 1 1 0 0
        10080   95.1256  .26849315 1 1 6.1 1 1 1 0 0 0 0 0 1 1 1 0 0
        10080  99.41071   .8054795 1 1 6.1 1 1 1 0 0 0 0 0 1 1 1 0 0
        10082  91.73024 -1.8684932 0 2 5.5 1 1 1 1 0 0 0 0 0 0 1 0 0
        10082  90.44225  -1.621918 0 2 5.5 1 1 1 1 0 0 0 0 0 0 1 0 0
        10082  89.18694 -1.4054794 0 2 5.5 1 1 1 1 0 0 0 0 0 0 1 0 0
        10082  83.87803  -1.139726 0 2 5.5 1 1 1 1 0 0 0 0 0 0 1 0 0
        10082  81.70213   -.890411 0 2 5.5 1 1 1 1 0 0 0 0 0 0 1 0 0
        10082  81.70213  -.6410959 0 2 5.5 1 1 1 1 0 0 0 0 0 0 1 0 0
        10082  91.08813  -.3726027 0 2 5.5 1 1 1 1 0 0 0 0 0 0 1 0 0
        end
        label values ckdgroup_baseline2 ckdgroup_baseline2
        label def ckdgroup_baseline2 1 "90+ml/min", modify
        label def ckdgroup_baseline2 2 "60-89 ml/min", modify
        label values sex sex
        label def sex 1 "male", modify
        label def sex 2 "female", modify
        label values ethnicity_simple ethnicity_simple
        label def ethnicity_simple 1 "White", modify
        label values cd4_group2 cd4_group2
        label def cd4_group2 1 "500+ cp/ml", modify
        label def cd4_group2 2 "below 500 cp/ml", modify
        label values comorb_cvd yesno
        label values comorb_aht yesno
        label values comorb_dm yesno
        label values comorb_dyslip yesno
        label values hbs_ag yesno
        label values hcv_chronic yesno
        label def yesno 0 "No", modify
        label def yesno 1 "Yes", modify


        As Weiwen pointed out, I am interested in the difference of time which I specified in the at-operator. However, Weiwens code suggestion did not result in a useful result since all estimates are the same:


        Code:
        margins sex ethnicity_simple, dydx(time) at(med1_med2=1) atmeans
        
        Conditional marginal effects                    Number of obs     =     36,112
        
        Expression   : Linear prediction, fixed portion, predict()
        dy/dx w.r.t. : time
        at           : time            =   -.3281331 (mean)
                       med1_med2       =           1
        
        
        *** Rest of output omitted****
        
        ----------------------------------------------------------------------------------
                         |            Delta-method
                         |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -----------------+----------------------------------------------------------------
        time             |
                     sex |
                   male  |  -1.364217   .7298945    -1.87   0.062    -2.794784    .0663502
                 female  |  -1.364217   .7298945    -1.87   0.062    -2.794784    .0663502
                         |
        ethnicity_simple |
                  White  |  -1.364217   .7298945    -1.87   0.062    -2.794784    .0663502
                  Black  |  -1.364217   .7298945    -1.87   0.062    -2.794784    .0663502
                  Other  |  -1.364217   .7298945    -1.87   0.062    -2.794784    .0663502
        ----------------------------------------------------------------------------------

        I am sorry I didn't provide enough detail for the age10 variable. It is a continuous variable with age/10. That means 3.56 means 35.6 years old. I am not interested in at(age=10) but to be able to provide insight in what is happening for those above or below the age of 50 (that means age10=5). My question is, do I need to change age10 from continuous into a dummy variable (with loss of information) in the model do the same predictions as for sex or ethnicity, or can I use margins to do that for me?

        Thanks for your help!



        Comment


        • #5
          However, Weiwens code suggestion did not result in a useful result since all estimates are the same
          Yes, of course, they are all the same because sex and ethnicity do not participate in any interactions with time in your original -mixed- command, so there is no reason to expect that the time 0 vs time1 difference will change with different values of sex and ethnicity. The levels of the outcome will change, but not the time 0 vs time 1 difference.

          I am not interested in at(age=10) but to be able to provide insight in what is happening for those above or below the age of 50 (that means age10=5). My question is, do I need to change age10 from continuous into a dummy variable (with loss of information) in the model do the same predictions as for sex or ethnicity, or can I use margins to do that for me?
          If there is some scientific basis for thinking that things will differ in a more or less discrete way between those over and under 50, then feel free to create an indicator (dummy) variable for that and use it in the model. Alternatively, if nothing special really happens at age 50, then you might just use -margins- specify a range of ages under and over 50 in the -at()- option. But there is no way that -margins- can give you something like a predictive margin corresponding to "any age over 50" when the underlying regression represents age as a continuous variable.

          Comment


          • #6
            I did not realize that age and ethnicity didn't get interacted with time. Anyway, as Clyde pointed out, this is a linear model, and it will estimate the average effect of one year's change on everyone, and it will be the same for everyone (except that you interacted year with treatment and with baseline CKD group.

            Originally posted by Bernard Surial View Post

            Code:
            mixed ckd_value c.time##c.time##i.med1_med2##i.ckdgroup_baseline2 c.time##i.med1_med2##i.ckdgroup_baseline2 age10 i.sex i.ethnicity_simple i.cd4_group2 i.rtv i.dtg i.bactrim i.cobi i.comorb_cvd i.comorb_aht i.comorb_dm i.comorb_dyslip i.hcv_chronic i.hbs_ag, || id:
            That said, I just noticed something puzzling in your code. You have c.time##c.time##medication##CKD baseline, then you have a linear version of the same expression immediately preceding that. c.time##c.time will generate both linear and squared terms for time. I believe that what you typed above is not the correct way to specify time and time^2 interacting with Tx and baseline. Stata may have omitted duplicate terms automatically, but I'd check to ensure it did so.
            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

            Comment


            • #7
              The levels of the outcome will change, but not the time 0 vs time 1 difference.
              Ah I see the problem! Thank you very much, this provides me with some more insight in these models...

              That said, I just noticed something puzzling in your code. You have c.time##c.time##medication##CKD baseline, then you have a linear version of the same expression immediately preceding that. c.time##c.time will generate both linear and squared terms for time. I believe that what you typed above is not the correct way to specify time and time^2 interacting with Tx and baseline. Stata may have omitted duplicate terms automatically, but I'd check to ensure it did so.
              Thank your for pointing this out to me. I find the notation with # and ## confusing and difficult to understand how Stata performs depending on which I use. I reran both models:

              Code:
              mixed ckd_value c.time##c.time##i.med1_med2##i.ckdgroup_baseline2 c.time##i.med1_med2##i.ckdgroup_baseline2 age10 sex i.ethnicity_simple i.cd4_group2 rtv dtg bactrim cobi comorb_cvd comorb_aht comorb_dm comorb_dyslip hcv_chronic hbs_ag, || id:
              and

              Code:
              mixed ckd_value c.time##c.time##i.med1_med2##i.ckdgroup_baseline2 age10 sex i.ethnicity_simple i.cd4_group2 rtv dtg bactrim cobi comorb_cvd comorb_aht comorb_dm comorb_dyslip hcv_chronic hbs_ag, || id:
              I get exactly the same estimates, Stata omitted the duplicate terms.

              I really appreciated your careful and insightful answers, thank you very much. I will need to reflect a little bit more on how I can achieve some posthoc subgroup analysis and will post on how I solved my problem.

              Thank you!

              Comment


              • #8
                Originally posted by Bernard Surial View Post

                Ah I see the problem! Thank you very much, this provides me with some more insight in these models...



                Thank your for pointing this out to me. I find the notation with # and ## confusing and difficult to understand how Stata performs depending on which I use. I reran both models:

                Code:
                mixed ckd_value c.time##c.time##i.med1_med2##i.ckdgroup_baseline2 c.time##i.med1_med2##i.ckdgroup_baseline2 age10 sex i.ethnicity_simple i.cd4_group2 rtv dtg bactrim cobi comorb_cvd comorb_aht comorb_dm comorb_dyslip hcv_chronic hbs_ag, || id:
                and

                Code:
                mixed ckd_value c.time##c.time##i.med1_med2##i.ckdgroup_baseline2 age10 sex i.ethnicity_simple i.cd4_group2 rtv dtg bactrim cobi comorb_cvd comorb_aht comorb_dm comorb_dyslip hcv_chronic hbs_ag, || id:
                I get exactly the same estimates, Stata omitted the duplicate terms.

                I really appreciated your careful and insightful answers, thank you very much. I will need to reflect a little bit more on how I can achieve some posthoc subgroup analysis and will post on how I solved my problem.

                Thank you!
                You are welcome, we try to be helpful here. I would recommend just sticking with the ## syntax, as it will generate all the two-way interactions you need. Ignore the single # operator.

                It seems that the bolded bit is the main effect you're interested in. For post-hoc subgroup analysis, I think you can just chain other characteristics of interest to that string of factor variables. You already did with baseline CKD group.

                Code:
                mixed ckd_value c.time##c.time##i.med1_med2##i.ckdgroup_baseline2 age10 sex i.ethnicity_simple i.cd4_group2 rtv dtg bactrim cobi comorb_cvd comorb_aht comorb_dm comorb_dyslip hcv_chronic hbs_ag, || id:
                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                Comment

                Working...
                X