Margins Command after Mixed Linear Model with Interaction Term / Guidelines to Marginal Effects

Lois Fisher

Join Date: Nov 2016
Posts: 23

Margins Command after Mixed Linear Model with Interaction Term / Guidelines to Marginal Effects

01 Nov 2021, 22:57

I'm using mixed in Stata 15.1 to analyze a change in my variables at two time points. The data were generated in a randomized controlled medical study.

I am using these commands:

Code:

mixed small i.trt##i.time || id:, vce(robust)
margins, dydx(trt)

Although I'm paying close attention to the magnitude of the average marginal effect and the confidence intervals generated, I'm required to report p-values. In previous posts, Stata forum guides have suggested that if the p-values found 1) on the interaction term are quite different than 2) those generated by the margins, dydx command, some additional thought is required about what may be going on in the data.

My understanding is margins, dydx(var) generates a partial derivative with respect to var, measuring change in one variable while the others are held at their observed values.

Question1: Previous posts warned against relying on p-values from average adjusted predictions generated by margins. Is this also true of average marginal effects?
Question2: Could you please help me understand why certain variables (like "small" and "large" in my sample data) are significant by p-value on the interaction term (full derivative) but not in the margins command (partial derivative)? What can I learn about my data from these results?

A sample of my data is included below. Thank you in advance for your help and patience with my questions.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int id byte trt float(time small large)
1001 1 0 103.62028  2057.409
1001 1 2  83.89479  1673.689
1002 2 0  53.92493 1494.9586
1002 2 2  47.69698  1342.497
1003 1 0  95.35509 1537.0225
1003 1 2 74.401726 1095.4296
1004 2 0  61.61027 1367.5326
1004 2 2  49.44355 1145.2207
1005 2 0  94.76392  2612.909
1005 2 2  51.50049 1128.3427
1006 1 0  86.49782  1456.245
1006 1 2  65.66128 1411.7137
1007 2 0 121.33897 2947.7104
1007 2 2  101.6294  2035.673
1008 1 0  91.95323 1224.9602
1008 1 2  79.91485  1342.048
1010 2 0  78.80621 2073.3826
1010 2 2  62.52343   1607.94
1012 1 0  84.50602  1766.792
1012 1 2   86.0701  1849.264
1013 1 0  62.16099 1512.2836
1013 1 2  97.44263  1719.307
1015 2 0  83.27814 1706.0673
1015 2 2  60.52657  933.7055
1016 2 0 64.726036 1280.3625
1016 2 2  40.03102  733.2215
1017 2 0 104.22501 1412.9833
1017 2 2   81.8728 1217.6375
1018 2 0    81.716 1719.5686
1018 2 2 68.464676  1478.304
1020 2 0  143.5548 2186.3103
1020 2 2 140.04892  1548.019
1021 1 0 107.65491 2073.9907
1021 1 2         .         .
1022 1 0  76.36583  1707.781
1022 1 2  65.04529  1766.207
1023 2 0  57.13372  996.4772
1023 2 2  52.33152  905.7919
1024 1 0  61.39893   1060.22
1024 1 2  56.26431  802.0148
1025 1 0  127.1972   2630.19
1025 1 2         .         .
1026 2 0  79.88757 1726.9504
1026 2 2  54.34812 1155.5836
1027 1 0  60.56401  1684.059
1027 1 2  85.44836 2059.1775
1028 2 0  78.68946  1698.101
1028 2 2 66.658676 1222.4415
1029 2 0  97.76624  1751.821
1029 2 2  98.43185 1980.2466
end
label values trt diet_lab
label def diet_lab 1 "Treat1", modify
label def diet_lab 2 "Treat2", modify
label values time visitlab
label def visitlab 0 "Baseline", modify
label def visitlab 2 "6 Mos", modify
label var id "ID"
label var trt "Treatment"
label var time "Time"
label var small "Small"
label var large "Large"

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29998
#2

02 Nov 2021, 10:31

You do not describe the design of the study, but based on what you show, the most common situation would be one in which there is an intervention group and a control group, which are distinguished by the variable you call trt, and there is a pre-treatment assessment at time = 0 and a post-treatment assessment at time = 2 (what happened to 1?). And you wish to identify the average treatment effect of the intervention. If that is the case, -margins, dydx(trt)- is not the right thing to look at: it is a hodge-podge of the baseline outcome difference in the treatment groups and their post-intervention difference. What you should be looking at is the coefficient of 2.time#2.trt in the -mixed- output. That is the estimator of the treatment effect.

I do not understand what you mean when you speak of "small" or "large" being "significant by p-value" since small, at least, is an outcome variable in your model and has no coefficient or p-value attached to it.

The reason for not paying attention to the p-values of adjusted predictions from -margins- is that the null hypothesis that the expected value of the outcome variable equals 0 is almost never of interest, nor even remotely realistic. By contrast, to the extent that null hypothesis significance testing is ever meaningful, it applies most directly to marginal effects. So, if you believe in p-values, or are being forced to use them regardless of your beliefs, the ones you get from -margins, dydx()- are usually the ones of interest. But, as I've already indicated, in your case, -margins, dydx(trt)- is not a useful statistic for what I believe is your study design.

In general, when you want help interpreting results, it is best to show the results. If you need more specific advice, please include them when you post back.
Comment
Lois Fisher

Join Date: Nov 2016

Posts: 23
#3

02 Nov 2021, 10:36

Additional note1:

I am using a simple example to post my question. The same issue emerges when this more complex mixed linear model is used with additional variables/covariates in the primary analysis. My questions here are primarily a question about what margins is doing in the case of average marginal effects in this context. If you'd like me to post the more complicated model, I can do that.

Additional note2:

I have compared my results using the mixed model above to those I get using a simple linear model clustered on ID.

Code:

regress small trt, cluster(id)

I did this because I was told by a former professor that I could use one of two techniques to avoid the assumptions about the variance-covariance matrix used by mixed: 1) use the robust error option on mixed; or 2) given this is a randomized controlled study, I could use the approach of simple linear regression clustered on the individual.

When I did this comparison, I saw the results from the linear regression clustered on the individual closely resembled the results from the average marginal effects.

Last edited by Lois Fisher; 02 Nov 2021, 10:55.
Comment

Lois Fisher

Join Date: Nov 2016
Posts: 23

02 Nov 2021, 11:10

Thank you, Clyde, for this response and all of the other helpful remarks you have made on this topic in other posts.

You surmised the design of the study correctly. It was a diet study with two treatments. Results on most measures were taken at baseline (0), 3 months (1), 6 months (2), and 12 months (3). Certain measures were made only at baseline and 6 months (0 and 2), and these measures are what I am analyzing now.

I thought I could not take the coefficient of the interaction term as an estimate of the treatment effect given the presence of its two components: 1) trt; and 2) trt#time. I thought I needed to use margins (or lincom) or first principles to derive the full treatment effect. I did understand from previous learnings and posts here that if one must report a p-value, the one to use is the p-value on the interaction term. I also recognize the fixed effect for "trt" compares the two at baseline. (Although there is much I am still learning!)

Could you please confirm that I understood you correctly about the coefficient of the interaction term? Would this still be the case if I added another covariate that was also entered as an interaction with time (covariate##time)?

My error on the language used when I commented about significance: the treatment effect on the variable small was significant on the interaction term with time in the model, but not in the output of margins.

By the way, my apologies for not posting the results. Here they are for the benefit of others.

Code:

mixed large i.trt##i.time || id:, vce(robust)

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log pseudolikelihood = -662.89191  
Iteration 1:   log pseudolikelihood = -662.89191  

Computing standard errors:

Mixed-effects regression                        Number of obs     =         91
Group variable: id                              Number of groups  =         50

                                                Obs per group:
                                                              min =          1
                                                              avg =        1.8
                                                              max =          2

                                                Wald chi2(3)      =      31.77
Log pseudolikelihood = -662.89191               Prob > chi2       =     0.0000

                                     (Std. Err. adjusted for 50 clusters in id)
-------------------------------------------------------------------------------
              |               Robust
        large |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
          trt |
      Treat2  |   240.4776   121.1293     1.99   0.047     3.068594    477.8866
              |
         time |
       6 Mos  |  -110.6883   69.73721    -1.59   0.112    -247.3708    25.99408
              |
     trt#time |
Treat2#6 Mos  |   -292.531   102.1459    -2.86   0.004    -492.7333   -92.32863
              |
        _cons |   1558.548   85.51283    18.23   0.000     1390.946     1726.15
-------------------------------------------------------------------------------

------------------------------------------------------------------------------
                             |               Robust          
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   101630.2   27359.49      59961.72    172254.9
-----------------------------+------------------------------------------------
               var(Residual) |   56506.89   15891.93      32561.96     98060.1
------------------------------------------------------------------------------

. margins, dydx(trt)

Average marginal effects                        Number of obs     =         91
Model VCE    : Robust

Expression   : Linear prediction, fixed portion, predict()
dy/dx w.r.t. : 2.trt

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         trt |
     Treat2  |   108.6779   104.4851     1.04   0.298    -96.10918     313.465
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Last edited by Lois Fisher; 02 Nov 2021, 11:20.

Comment

Lois Fisher

Join Date: Nov 2016

Posts: 23
#5

02 Nov 2021, 12:22

I think I see the point more clearly now. Since the fixed effect of "trt" is the difference in the baseline values for small for Treat2 compared to Treat1, the coefficient for the interaction term is the differential treatment effect (different slope) relative to "control". The first derivative of the interaction term is, as the math would suggest, the differential. Margins and other techniques for calculating confidence intervals for [Treat2 + Treat2*Time] are needed to generate a prediction about the absolute value of "small" at the second time point.

Is this correct?

You mentioned the margins, dydx(trt) was a hodgepodge of the baseline difference and the later differential over time. Could you please give an example of a design where this is helpful?

I think I see a need for a book of examples with margins, organized by first by study design and second by type of data. Does that exist?

Thank you again for your time and patience.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29998
#6

02 Nov 2021, 13:37

I think I see the point more clearly now. Since the fixed effect of "trt" is the difference in the baseline values for small for Treat2 compared to Treat1, the coefficient for the interaction term is the differential treatment effect (different slope) relative to "control". The first derivative of the interaction term is, as the math would suggest, the differential. Margins and other techniques for calculating confidence intervals for [Treat2 + Treat2*Time] are needed to generate a prediction about the absolute value of "small" at the second time point.

Is this correct?

Yes, that's correct.

You mentioned the margins, dydx(trt) was a hodgepodge of the baseline difference and the later differential over time. Could you please give an example of a design where this is helpful?

In a model where trt is not interacted with anything, -margins, dydx(trt)- would be that model's estimate of the marginal effect of trt. The approach you mentioned in #3 is an example of that. But, in that context, the output of -margins, dydx(trt)- will be the same as the coefficient of trt in the original model. Where -margins, dydx(trt)- would actually be useful is if the outcome were a dichotomous variable and the model used logistic regression. In that case, the logistic regression model provides the marginal effect as an odds ratio, where as -margins, dydx(trt)- provides the marginal effect as a difference in outcome probability. Although it is largely a matter of taste, I tend to prefer to show probability differences rather than odds ratios because they are easier for most people to understand. (In particular, most people do not grasp that if you are starting from an extremely low or high probability, even a massive odds ratio may correspond to a minuscule difference in outcome probability.)

Finally, rather than recommending a book, I suggest you take a look at the excellent Richard Williams' explanation of interaction models at https://www3.nd.edu/~rwilliam/stats2/l53.pdf . After that, there are other PDFs the go deeper into the subject matter at his website. They are, in my opinion, the clearest explanations of this material around.

Last edited by Clyde Schechter; 02 Nov 2021, 13:40.
Comment
Lois Fisher

Join Date: Nov 2016

Posts: 23
#7

04 Nov 2021, 10:11

Thank you Clyde, for these additional comments. Your explanation on the best use of margins dydx was particularly helpful. I have a presentation by Richard Williams on margins stored away under this topic and refer to it periodically. I agree his explanations are excellent! I hadn't realized there was more on a website. I'll be sure to review his other materials.

Little by little some of these subtle differences are finally getting through, and weaving into prior lessons and understanding. While I do make an effort to continue learning from all sources, I benefit greatly from reading your many posts, Clyde, and those of others, such as Richard Williams. Thank you for all your time and patient tutoring!
Comment

Lois Fisher

Join Date: Nov 2016
Posts: 23

09 Nov 2021, 02:50

Hello, I'm coming back with a question on a more complex version of the model discussed above. I've been reading Williams and other materials and I'm still not sure on this one. As a reminder, I'm working with a subset of a diet RCT that contains data at 2 timepoints (baseline & 6 months). I've been given the objective of estimating the association between changes in the dependent variable lipid (continuous, DVL) between baseline & 6 months with changes in a protein level (continuous, "protein") between baseline & 6 months. Additionally, I'm trying to adjust for the diet effect at 6 months, changes in BMI between baseline & 6 months, and age (continuous). My preference is to stay with a mixed model that allows for a random intercept for each individual.

I approached this using interactions terms with time for: protein, bmi, and diet. I ran the command margins, dydx(protein) to get the marginal effects of the change in the DVL associated with protein at baseline compared to the change in the DVL associated with protein at 6 months. (I think I'm interpreting that correctly, even though I recognize it's not really meeting my objective.)

Code:

mixed dvl i.diet##i.visit c.bmi##i.visit c.protein##i.visit || study_id:, vce(robust)

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -5446.5888  
Iteration 1:   log likelihood = -5446.5878  
Iteration 2:   log likelihood = -5446.5878  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =      1,071
Group variable: study_id                        Number of groups  =        609

                                                Obs per group:
                                                              min =          1
                                                              avg =        1.8
                                                              max =          2

                                                Wald chi2(7)      =    1048.53
Log likelihood = -5446.5878                     Prob > chi2       =     0.0000

---------------------------------------------------------------------------------
            dvl |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
           diet |
      diet1  |  -3.831638   3.462925    -1.11   0.269    -10.61885     2.95557
                |
          visit |
         6 Mos  |   .8784078   18.42015     0.05   0.962    -35.22442    36.98124
                |
     diet#visit |
diet1#6 Mos  |   1.793699   3.651821     0.49   0.623    -5.363738    8.951136
                |
            bmi |  -1.404073   .5058258    -2.78   0.006    -2.395473   -.4126726
                |
    visit#c.bmi |
         6 Mos  |   1.474908   .5507176     2.68   0.007     .3955213    2.554295
                |
        protein |   73.56325   2.731188    26.93   0.000     68.21022    78.91628
                |
visit#c.protein |
         6 Mos  |   -15.6686   3.263648    -4.80   0.000    -22.06523   -9.271968
                |
          _cons |  -82.84185   17.51202    -4.73   0.000    -117.1648   -48.51892
---------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
study_id: Identity           |
                  var(_cons) |   1053.065   92.16197      887.0736    1250.116
-----------------------------+------------------------------------------------
               var(Residual) |   769.9998   50.72554      676.7305    876.1239
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 183.64        Prob >= chibar2 = 0.0000

. margins, dydx(protein)

Average marginal effects                        Number of obs     =      1,071

Expression   : Linear prediction, fixed portion, predict()
dy/dx w.r.t. : protein

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     protein |   66.78962   2.410916    27.70   0.000     62.06431    71.51493
------------------------------------------------------------------------------

As I said, I don't think this is meeting my objective as stated above, but I'm not sure how to proceed. Any ideas? Is there a better approach? And if so, what would be the best margins command to use afterward?

Thank you in advance for your help and suggestions!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29998
#9

09 Nov 2021, 09:18

Well, your description of your goal is a bit vague and doesn't seem to correspond all that closely with your code. In fact, in your code, you speak only of adjusting for effects of other variables, so it isn't clear why you need any interaction terms at all here if that is the goal. But then, I think you probably didn't really mean only to adjust for effects of other variables. You probably want a model in which some of those variables actually modify the effect of protein on dvl. But you haven't spelled out just which those variables are, and how they do that.

According to your code, you believe that the marginal effect of protein on dvl is different at the baseline and 6 month visits, but is independent of the diet. You also assume that the marginal effect of bmi on dvl differs at the two visits, but is also independent of diet. Those assumptions seem odd to me, though perhaps they are correct. Are they? Have you thought about this in terms of a presumed mechanism whereby the protein influences the dvl? Is that mechanism going to work differently at the two times? Does diet modify how that mechanism works? Does bmi modify how that mechanism works?
Comment
Lois Fisher

Join Date: Nov 2016

Posts: 23
#10

12 Nov 2021, 14:35

Clyde, thank you for your insightful comments. Indeed, in this context of diet, fat, cholesterol and BMI, many of the variables impact each other and some exert effect modification. The same is true at the next level down -- that of the lipid fractions. It's one large, interconnected system. As the economists would say, there's a high level of endogeneity. From this larger system we are trying to tease out the activity of a single subsystem in isolation. There certainly is an exploratory aspect to the work.

I think the primary investigator's idea was that given a perturbation in the larger system (diet intervention), one could observe the shifts in the composition of the lipid fractions. To do this, we adjust for larger, more powerful variables such as diet and BMI. Adjustments for things like age were considered nuisance adjustments because age was slightly imbalanced at baseline in this RCT.

The investigator specifically requested an analysis that isolated the changes in selected lipid fractions from one time to another, adjusted for the changes in other measures. By looking at correlations in the changes over time, he thought he might gain new insights into the sometimes subtle roles of the various lipid fractions. I was trying to accommodate this precise request -- looking at the differences only -- without resorting to an analysis of deltas from the absolute values. Of course any results we have might be eliminated once an interaction in one of the larger, more powerful variables is entered into the equation.

From my perspective, I think a principal components analysis might be the best way to go given the interconnectedness of everything in this problem. We had been pursing this prior to a flash of excitement that came after an analysis of the deltas on the absolute values.

I'm sorry to be vague. Because I'm not the principal investigator I did not feel at liberty to discuss the topic in too much detail.

If you have any other suggestions based on this additional information, I'm all ears.

I genuinely appreciate your feedback on this question. Your words served to ground me again in discipline of knowing the research question. Thank you for the time you spent to review my questions - very helpful!
Comment

Announcement

Margins Command after Mixed Linear Model with Interaction Term / Guidelines to Marginal Effects

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment