Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Margins Command after Mixed Linear Model with Interaction Term / Guidelines to Marginal Effects

    I'm using mixed in Stata 15.1 to analyze a change in my variables at two time points. The data were generated in a randomized controlled medical study.

    I am using these commands:
    Code:
    mixed small i.trt##i.time || id:, vce(robust)
    margins, dydx(trt)
    Although I'm paying close attention to the magnitude of the average marginal effect and the confidence intervals generated, I'm required to report p-values. In previous posts, Stata forum guides have suggested that if the p-values found 1) on the interaction term are quite different than 2) those generated by the margins, dydx command, some additional thought is required about what may be going on in the data.

    My understanding is margins, dydx(var) generates a partial derivative with respect to var, measuring change in one variable while the others are held at their observed values.

    Question1: Previous posts warned against relying on p-values from average adjusted predictions generated by margins. Is this also true of average marginal effects?
    Question2: Could you please help me understand why certain variables (like "small" and "large" in my sample data) are significant by p-value on the interaction term (full derivative) but not in the margins command (partial derivative)? What can I learn about my data from these results?

    A sample of my data is included below. Thank you in advance for your help and patience with my questions.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int id byte trt float(time small large)
    1001 1 0 103.62028  2057.409
    1001 1 2  83.89479  1673.689
    1002 2 0  53.92493 1494.9586
    1002 2 2  47.69698  1342.497
    1003 1 0  95.35509 1537.0225
    1003 1 2 74.401726 1095.4296
    1004 2 0  61.61027 1367.5326
    1004 2 2  49.44355 1145.2207
    1005 2 0  94.76392  2612.909
    1005 2 2  51.50049 1128.3427
    1006 1 0  86.49782  1456.245
    1006 1 2  65.66128 1411.7137
    1007 2 0 121.33897 2947.7104
    1007 2 2  101.6294  2035.673
    1008 1 0  91.95323 1224.9602
    1008 1 2  79.91485  1342.048
    1010 2 0  78.80621 2073.3826
    1010 2 2  62.52343   1607.94
    1012 1 0  84.50602  1766.792
    1012 1 2   86.0701  1849.264
    1013 1 0  62.16099 1512.2836
    1013 1 2  97.44263  1719.307
    1015 2 0  83.27814 1706.0673
    1015 2 2  60.52657  933.7055
    1016 2 0 64.726036 1280.3625
    1016 2 2  40.03102  733.2215
    1017 2 0 104.22501 1412.9833
    1017 2 2   81.8728 1217.6375
    1018 2 0    81.716 1719.5686
    1018 2 2 68.464676  1478.304
    1020 2 0  143.5548 2186.3103
    1020 2 2 140.04892  1548.019
    1021 1 0 107.65491 2073.9907
    1021 1 2         .         .
    1022 1 0  76.36583  1707.781
    1022 1 2  65.04529  1766.207
    1023 2 0  57.13372  996.4772
    1023 2 2  52.33152  905.7919
    1024 1 0  61.39893   1060.22
    1024 1 2  56.26431  802.0148
    1025 1 0  127.1972   2630.19
    1025 1 2         .         .
    1026 2 0  79.88757 1726.9504
    1026 2 2  54.34812 1155.5836
    1027 1 0  60.56401  1684.059
    1027 1 2  85.44836 2059.1775
    1028 2 0  78.68946  1698.101
    1028 2 2 66.658676 1222.4415
    1029 2 0  97.76624  1751.821
    1029 2 2  98.43185 1980.2466
    end
    label values trt diet_lab
    label def diet_lab 1 "Treat1", modify
    label def diet_lab 2 "Treat2", modify
    label values time visitlab
    label def visitlab 0 "Baseline", modify
    label def visitlab 2 "6 Mos", modify
    label var id "ID"
    label var trt "Treatment"
    label var time "Time"
    label var small "Small"
    label var large "Large"

  • #2
    You do not describe the design of the study, but based on what you show, the most common situation would be one in which there is an intervention group and a control group, which are distinguished by the variable you call trt, and there is a pre-treatment assessment at time = 0 and a post-treatment assessment at time = 2 (what happened to 1?). And you wish to identify the average treatment effect of the intervention. If that is the case, -margins, dydx(trt)- is not the right thing to look at: it is a hodge-podge of the baseline outcome difference in the treatment groups and their post-intervention difference. What you should be looking at is the coefficient of 2.time#2.trt in the -mixed- output. That is the estimator of the treatment effect.

    I do not understand what you mean when you speak of "small" or "large" being "significant by p-value" since small, at least, is an outcome variable in your model and has no coefficient or p-value attached to it.

    The reason for not paying attention to the p-values of adjusted predictions from -margins- is that the null hypothesis that the expected value of the outcome variable equals 0 is almost never of interest, nor even remotely realistic. By contrast, to the extent that null hypothesis significance testing is ever meaningful, it applies most directly to marginal effects. So, if you believe in p-values, or are being forced to use them regardless of your beliefs, the ones you get from -margins, dydx()- are usually the ones of interest. But, as I've already indicated, in your case, -margins, dydx(trt)- is not a useful statistic for what I believe is your study design.

    In general, when you want help interpreting results, it is best to show the results. If you need more specific advice, please include them when you post back.

    Comment


    • #3
      Additional note1:

      I am using a simple example to post my question. The same issue emerges when this more complex mixed linear model is used with additional variables/covariates in the primary analysis. My questions here are primarily a question about what margins is doing in the case of average marginal effects in this context. If you'd like me to post the more complicated model, I can do that.

      Additional note2:

      I have compared my results using the mixed model above to those I get using a simple linear model clustered on ID.

      Code:
      regress small trt, cluster(id)
      I did this because I was told by a former professor that I could use one of two techniques to avoid the assumptions about the variance-covariance matrix used by mixed: 1) use the robust error option on mixed; or 2) given this is a randomized controlled study, I could use the approach of simple linear regression clustered on the individual.

      When I did this comparison, I saw the results from the linear regression clustered on the individual closely resembled the results from the average marginal effects.
      Last edited by Lois Fisher; 02 Nov 2021, 10:55.

      Comment


      • #4
        Thank you, Clyde, for this response and all of the other helpful remarks you have made on this topic in other posts.

        You surmised the design of the study correctly. It was a diet study with two treatments. Results on most measures were taken at baseline (0), 3 months (1), 6 months (2), and 12 months (3). Certain measures were made only at baseline and 6 months (0 and 2), and these measures are what I am analyzing now.

        I thought I could not take the coefficient of the interaction term as an estimate of the treatment effect given the presence of its two components: 1) trt; and 2) trt#time. I thought I needed to use margins (or lincom) or first principles to derive the full treatment effect. I did understand from previous learnings and posts here that if one must report a p-value, the one to use is the p-value on the interaction term. I also recognize the fixed effect for "trt" compares the two at baseline. (Although there is much I am still learning!)

        Could you please confirm that I understood you correctly about the coefficient of the interaction term? Would this still be the case if I added another covariate that was also entered as an interaction with time (covariate##time)?

        My error on the language used when I commented about significance: the treatment effect on the variable small was significant on the interaction term with time in the model, but not in the output of margins.

        By the way, my apologies for not posting the results. Here they are for the benefit of others.
        Code:
        mixed large i.trt##i.time || id:, vce(robust)
        
        Performing EM optimization:
        
        Performing gradient-based optimization:
        
        Iteration 0:   log pseudolikelihood = -662.89191  
        Iteration 1:   log pseudolikelihood = -662.89191  
        
        Computing standard errors:
        
        Mixed-effects regression                        Number of obs     =         91
        Group variable: id                              Number of groups  =         50
        
                                                        Obs per group:
                                                                      min =          1
                                                                      avg =        1.8
                                                                      max =          2
        
                                                        Wald chi2(3)      =      31.77
        Log pseudolikelihood = -662.89191               Prob > chi2       =     0.0000
        
                                             (Std. Err. adjusted for 50 clusters in id)
        -------------------------------------------------------------------------------
                      |               Robust
                large |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        --------------+----------------------------------------------------------------
                  trt |
              Treat2  |   240.4776   121.1293     1.99   0.047     3.068594    477.8866
                      |
                 time |
               6 Mos  |  -110.6883   69.73721    -1.59   0.112    -247.3708    25.99408
                      |
             trt#time |
        Treat2#6 Mos  |   -292.531   102.1459    -2.86   0.004    -492.7333   -92.32863
                      |
                _cons |   1558.548   85.51283    18.23   0.000     1390.946     1726.15
        -------------------------------------------------------------------------------
        
        ------------------------------------------------------------------------------
                                     |               Robust          
          Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
        -----------------------------+------------------------------------------------
        id: Identity                 |
                          var(_cons) |   101630.2   27359.49      59961.72    172254.9
        -----------------------------+------------------------------------------------
                       var(Residual) |   56506.89   15891.93      32561.96     98060.1
        ------------------------------------------------------------------------------
        
        . margins, dydx(trt)
        
        Average marginal effects                        Number of obs     =         91
        Model VCE    : Robust
        
        Expression   : Linear prediction, fixed portion, predict()
        dy/dx w.r.t. : 2.trt
        
        ------------------------------------------------------------------------------
                     |            Delta-method
                     |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 trt |
             Treat2  |   108.6779   104.4851     1.04   0.298    -96.10918     313.465
        ------------------------------------------------------------------------------
        Note: dy/dx for factor levels is the discrete change from the base level.
        Last edited by Lois Fisher; 02 Nov 2021, 11:20.

        Comment


        • #5
          I think I see the point more clearly now. Since the fixed effect of "trt" is the difference in the baseline values for small for Treat2 compared to Treat1, the coefficient for the interaction term is the differential treatment effect (different slope) relative to "control". The first derivative of the interaction term is, as the math would suggest, the differential. Margins and other techniques for calculating confidence intervals for [Treat2 + Treat2*Time] are needed to generate a prediction about the absolute value of "small" at the second time point.

          Is this correct?

          You mentioned the margins, dydx(trt) was a hodgepodge of the baseline difference and the later differential over time. Could you please give an example of a design where this is helpful?

          I think I see a need for a book of examples with margins, organized by first by study design and second by type of data. Does that exist?

          Thank you again for your time and patience.

          Comment


          • #6
            I think I see the point more clearly now. Since the fixed effect of "trt" is the difference in the baseline values for small for Treat2 compared to Treat1, the coefficient for the interaction term is the differential treatment effect (different slope) relative to "control". The first derivative of the interaction term is, as the math would suggest, the differential. Margins and other techniques for calculating confidence intervals for [Treat2 + Treat2*Time] are needed to generate a prediction about the absolute value of "small" at the second time point.

            Is this correct?
            Yes, that's correct.

            You mentioned the margins, dydx(trt) was a hodgepodge of the baseline difference and the later differential over time. Could you please give an example of a design where this is helpful?
            In a model where trt is not interacted with anything, -margins, dydx(trt)- would be that model's estimate of the marginal effect of trt. The approach you mentioned in #3 is an example of that. But, in that context, the output of -margins, dydx(trt)- will be the same as the coefficient of trt in the original model. Where -margins, dydx(trt)- would actually be useful is if the outcome were a dichotomous variable and the model used logistic regression. In that case, the logistic regression model provides the marginal effect as an odds ratio, where as -margins, dydx(trt)- provides the marginal effect as a difference in outcome probability. Although it is largely a matter of taste, I tend to prefer to show probability differences rather than odds ratios because they are easier for most people to understand. (In particular, most people do not grasp that if you are starting from an extremely low or high probability, even a massive odds ratio may correspond to a minuscule difference in outcome probability.)

            Finally, rather than recommending a book, I suggest you take a look at the excellent Richard Williams' explanation of interaction models at https://www3.nd.edu/~rwilliam/stats2/l53.pdf . After that, there are other PDFs the go deeper into the subject matter at his website. They are, in my opinion, the clearest explanations of this material around.
            Last edited by Clyde Schechter; 02 Nov 2021, 13:40.

            Comment


            • #7
              Thank you Clyde, for these additional comments. Your explanation on the best use of margins dydx was particularly helpful. I have a presentation by Richard Williams on margins stored away under this topic and refer to it periodically. I agree his explanations are excellent! I hadn't realized there was more on a website. I'll be sure to review his other materials.

              Little by little some of these subtle differences are finally getting through, and weaving into prior lessons and understanding. While I do make an effort to continue learning from all sources, I benefit greatly from reading your many posts, Clyde, and those of others, such as Richard Williams. Thank you for all your time and patient tutoring!

              Comment


              • #8
                Hello, I'm coming back with a question on a more complex version of the model discussed above. I've been reading Williams and other materials and I'm still not sure on this one. As a reminder, I'm working with a subset of a diet RCT that contains data at 2 timepoints (baseline & 6 months). I've been given the objective of estimating the association between changes in the dependent variable lipid (continuous, DVL) between baseline & 6 months with changes in a protein level (continuous, "protein") between baseline & 6 months. Additionally, I'm trying to adjust for the diet effect at 6 months, changes in BMI between baseline & 6 months, and age (continuous). My preference is to stay with a mixed model that allows for a random intercept for each individual.

                I approached this using interactions terms with time for: protein, bmi, and diet. I ran the command margins, dydx(protein) to get the marginal effects of the change in the DVL associated with protein at baseline compared to the change in the DVL associated with protein at 6 months. (I think I'm interpreting that correctly, even though I recognize it's not really meeting my objective.)

                Code:
                mixed dvl i.diet##i.visit c.bmi##i.visit c.protein##i.visit || study_id:, vce(robust)
                
                Performing EM optimization: 
                
                Performing gradient-based optimization: 
                
                Iteration 0:   log likelihood = -5446.5888  
                Iteration 1:   log likelihood = -5446.5878  
                Iteration 2:   log likelihood = -5446.5878  
                
                Computing standard errors:
                
                Mixed-effects ML regression                     Number of obs     =      1,071
                Group variable: study_id                        Number of groups  =        609
                
                                                                Obs per group:
                                                                              min =          1
                                                                              avg =        1.8
                                                                              max =          2
                
                                                                Wald chi2(7)      =    1048.53
                Log likelihood = -5446.5878                     Prob > chi2       =     0.0000
                
                ---------------------------------------------------------------------------------
                            dvl |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                ----------------+----------------------------------------------------------------
                           diet |
                      diet1  |  -3.831638   3.462925    -1.11   0.269    -10.61885     2.95557
                                |
                          visit |
                         6 Mos  |   .8784078   18.42015     0.05   0.962    -35.22442    36.98124
                                |
                     diet#visit |
                diet1#6 Mos  |   1.793699   3.651821     0.49   0.623    -5.363738    8.951136
                                |
                            bmi |  -1.404073   .5058258    -2.78   0.006    -2.395473   -.4126726
                                |
                    visit#c.bmi |
                         6 Mos  |   1.474908   .5507176     2.68   0.007     .3955213    2.554295
                                |
                        protein |   73.56325   2.731188    26.93   0.000     68.21022    78.91628
                                |
                visit#c.protein |
                         6 Mos  |   -15.6686   3.263648    -4.80   0.000    -22.06523   -9.271968
                                |
                          _cons |  -82.84185   17.51202    -4.73   0.000    -117.1648   -48.51892
                ---------------------------------------------------------------------------------
                
                ------------------------------------------------------------------------------
                  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
                -----------------------------+------------------------------------------------
                study_id: Identity           |
                                  var(_cons) |   1053.065   92.16197      887.0736    1250.116
                -----------------------------+------------------------------------------------
                               var(Residual) |   769.9998   50.72554      676.7305    876.1239
                ------------------------------------------------------------------------------
                LR test vs. linear model: chibar2(01) = 183.64        Prob >= chibar2 = 0.0000
                
                . margins, dydx(protein)
                
                Average marginal effects                        Number of obs     =      1,071
                
                Expression   : Linear prediction, fixed portion, predict()
                dy/dx w.r.t. : protein
                
                ------------------------------------------------------------------------------
                             |            Delta-method
                             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                     protein |   66.78962   2.410916    27.70   0.000     62.06431    71.51493
                ------------------------------------------------------------------------------
                As I said, I don't think this is meeting my objective as stated above, but I'm not sure how to proceed. Any ideas? Is there a better approach? And if so, what would be the best margins command to use afterward?

                Thank you in advance for your help and suggestions!

                Comment


                • #9
                  Well, your description of your goal is a bit vague and doesn't seem to correspond all that closely with your code. In fact, in your code, you speak only of adjusting for effects of other variables, so it isn't clear why you need any interaction terms at all here if that is the goal. But then, I think you probably didn't really mean only to adjust for effects of other variables. You probably want a model in which some of those variables actually modify the effect of protein on dvl. But you haven't spelled out just which those variables are, and how they do that.

                  According to your code, you believe that the marginal effect of protein on dvl is different at the baseline and 6 month visits, but is independent of the diet. You also assume that the marginal effect of bmi on dvl differs at the two visits, but is also independent of diet. Those assumptions seem odd to me, though perhaps they are correct. Are they? Have you thought about this in terms of a presumed mechanism whereby the protein influences the dvl? Is that mechanism going to work differently at the two times? Does diet modify how that mechanism works? Does bmi modify how that mechanism works?

                  Comment


                  • #10
                    Clyde, thank you for your insightful comments. Indeed, in this context of diet, fat, cholesterol and BMI, many of the variables impact each other and some exert effect modification. The same is true at the next level down -- that of the lipid fractions. It's one large, interconnected system. As the economists would say, there's a high level of endogeneity. From this larger system we are trying to tease out the activity of a single subsystem in isolation. There certainly is an exploratory aspect to the work.

                    I think the primary investigator's idea was that given a perturbation in the larger system (diet intervention), one could observe the shifts in the composition of the lipid fractions. To do this, we adjust for larger, more powerful variables such as diet and BMI. Adjustments for things like age were considered nuisance adjustments because age was slightly imbalanced at baseline in this RCT.

                    The investigator specifically requested an analysis that isolated the changes in selected lipid fractions from one time to another, adjusted for the changes in other measures. By looking at correlations in the changes over time, he thought he might gain new insights into the sometimes subtle roles of the various lipid fractions. I was trying to accommodate this precise request -- looking at the differences only -- without resorting to an analysis of deltas from the absolute values. Of course any results we have might be eliminated once an interaction in one of the larger, more powerful variables is entered into the equation.

                    From my perspective, I think a principal components analysis might be the best way to go given the interconnectedness of everything in this problem. We had been pursing this prior to a flash of excitement that came after an analysis of the deltas on the absolute values.

                    I'm sorry to be vague. Because I'm not the principal investigator I did not feel at liberty to discuss the topic in too much detail.

                    If you have any other suggestions based on this additional information, I'm all ears.

                    I genuinely appreciate your feedback on this question. Your words served to ground me again in discipline of knowing the research question. Thank you for the time you spent to review my questions - very helpful!

                    Comment

                    Working...
                    X