Interaction terms in mixed-effects regression

Francesco Nemore

Join Date: May 2018

Posts: 14
#1

Interaction terms in mixed-effects regression

17 Jan 2019, 05:20

Hi statalisters ... I need some help to interpret the coefficients of some mixed-effects models in my master thesis. I'll explain shortly. I performed three experimental treatments (G, C, CG). My dataset is a balanced panel in which 30 subjects per treatment are observed for 7 periods. I implemented a mixed-effects model like the following:

y = C + CG + period + C # period + CG # period + L1.M

where C and G are the dummy variables that identify my treatments. (I omitted G because it is my baseline treatment), period is my trend, C # period and CG # period are interaction terms to evaluate differences in trends across treatments, L1.M is the lag of another variable of interests. The most important treatment comparison for me is between CG and G so I would be interested in the CG coefficient. The code entered in stata is as follows:

xtmixed ce ib4.treatment treatment ## c.period L1.mpcr || group:

Now, my first model is

y = C + CG + L1.M

the code being used is this >> xtmixed ce ib4.treatment L1.mpcr || group:

here the CG cofficient is positive and highly significant. To evaluate differences in trends across treatments I add period and the two interaction terms thus running the complete model above mentioned. Here CG coefficient is no more significant but the two interaction terms are significant. What does it means? Maybe treatment CG is not statistically different from G? But interaction terms seems to reveal differences between trends in CG and G... I appreciate any useful clarification. I attached a word file in which I paste my stata output.

Thank you in advance!!! output stata.docx
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#2

17 Jan 2019, 13:18

To evaluate differences in trends across treatments I add period and the two interaction terms thus running the complete model above mentioned. Here CG coefficient is no more significant but the two interaction terms are significant. What does it means? Maybe treatment CG is not statistically different from G? But interaction terms seems to reveal differences between trends in CG and G... I appreciate any useful clarification.

Testing the CG and G coefficients in this interaction model does not test the difference between CG and G. In fact, by using an interaction model you have stipulated that there is no such thing as the difference between CG and G. Instead, CG and G may differ by different amounts in each period. What you found by testing the CG and G coefficients it that the difference between the expected outcomes in the CG and G groups is not statistically significant in the baseline period. It says nothing more general than that.

I attached a word file in which I paste my stata output.

Please read the Forum FAQ for excellent advice about the most helpful ways to post information here. Attachments in general are deprecated, and Word documents are particularly discouraged. Attaching a Word document limits the range of your potential responders to those who a) use Microsoft Word and b) are willing to risk downloading a file that can contain active malware from a complete stranger. The way to show Stata output so that everybody on the Forum can readily see it is to place it between code delimiters here in the Forum editor. FAQ #12 explains how to use code delimiters if you are not familiar with them.
Comment
Francesco Nemore

Join Date: May 2018

Posts: 14
#3

17 Jan 2019, 14:39

Thank you so much Clyde!!! How could I test the differences in time trends between my three treatments? That's my main objective. This is the graphical representation of my dependent variable. I want to show that time trends in CG and C are different to G.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#4

17 Jan 2019, 15:05

So, assuming you are using a relatively modern version of Stata, i.e. version 13 or later, I would start by using modern coding, and then rely on the -margins- command. From the graph it is clear that the G and CG groups start out at different points in period 1. It seems that the parameter of interest here is the slope of ce as a function of period. So you can get those slopes for each of the treatment groups, and then get pairwise comparisons for all of them with:

Code:

mixed ce ib4.treatment##c.period L1.mpcr || group: margins treatment, dydx(period) pwcompare

I don't grasp how you have coded the treatment variable. It seems you have three treatment groups, but your base level for the regression is 4. Now, you are free to use any three distinct non-negative integers you like for the treatment variable, but conventionally one uses 0, 1, 2 or 1, 2, 3. You used, what? 1, 2, 4? something like that?

Anyway, if you want an omnibus test contrasting treatment G with the other two, then assuming that treatment G is numerically coded as 1 in the variable treatment, and the others are coded 2 and 3, you can do this:

Code:

mixed ce ib1.treatment##c.period L1.mpcr || group: test 2.treatment#period 3.treatment#period
Comment

Francesco Nemore

Join Date: May 2018
Posts: 14

18 Jan 2019, 02:38

In the first version of my dataset I had four treatments. That's why I have 4 labels for the treatments. I imported the dataset with the four treatments, then I create factor variables with the egen command and after I dropped one treatment because data were not valid due to software crash during the experimental session. Now, I have run the two codes you suggested labelling G with 3 and I obtain this:

Code:

mixed ce ib3.treatment##c.period L1.mpcr || group:

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log likelihood =  79.408916  
Iteration 1:   log likelihood =  79.408916  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =        630
Group variable: group                           Number of groups  =         30

                                                Obs per group:
                                                              min =         21
                                                              avg =       21.0
                                                              max =         21

                                                Wald chi2(6)      =     143.98
Log likelihood =  79.408916                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------------
                ce |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
         treatment |
                C  |   .1968572   .0822962     2.39   0.017     .0355596    .3581548
               CG  |   .1049144   .0822961     1.27   0.202    -.0563829    .2662118
                   |
            period |  -.0076461   .0069473    -1.10   0.271    -.0212627    .0059704
                   |
treatment#c.period |
                C  |   .0348058    .009825     3.54   0.000     .0155491    .0540624
               CG  |   .0316959    .009825     3.23   0.001     .0124392    .0509526
                   |
              mpcr |
               L1. |   .7456328   .0824537     9.04   0.000     .5840265    .9072392
                   |
             _cons |  -.0599661   .0763795    -0.79   0.432    -.2096671     .089735
------------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
group: Identity              |
                  var(_cons) |   .0198663    .005629      .0114007    .0346179
-----------------------------+------------------------------------------------
               var(Residual) |   .0405429   .0023407      .0362052    .0454004
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 178.51        Prob >= chibar2 = 0.0000

. margins treatment, dydx(period) pwcompare

Pairwise comparisons of average marginal effects

Expression   : Linear prediction, fixed portion, predict()
dy/dx w.r.t. : period

--------------------------------------------------------------
             |   Contrast Delta-method         Unadjusted
             |      dy/dx   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
period       |
   treatment |
    CG vs C  |  -.0031099    .009825     -.0223665    .0161468
     G vs C  |  -.0348058    .009825     -.0540624   -.0155491
    G vs CG  |  -.0316959    .009825     -.0509526   -.0124392
--------------------------------------------------------------

According to these results it seems that the slopes are different in G vs C and G vs CG (p-value should be<0.01). Is this the correct interpretation?
Anyway after I ran the second code and I obtain this:

Code:

mixed ce ib3.treatment##c.period L1.mpcr || group:

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log likelihood =  79.408916  
Iteration 1:   log likelihood =  79.408916  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =        630
Group variable: group                           Number of groups  =         30

                                                Obs per group:
                                                              min =         21
                                                              avg =       21.0
                                                              max =         21

                                                Wald chi2(6)      =     143.98
Log likelihood =  79.408916                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------------
                ce |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------------+----------------------------------------------------------------
         treatment |
                C  |   .1968572   .0822962     2.39   0.017     .0355596    .3581548
               CG  |   .1049144   .0822961     1.27   0.202    -.0563829    .2662118
                   |
            period |  -.0076461   .0069473    -1.10   0.271    -.0212627    .0059704
                   |
treatment#c.period |
                C  |   .0348058    .009825     3.54   0.000     .0155491    .0540624
               CG  |   .0316959    .009825     3.23   0.001     .0124392    .0509526
                   |
              mpcr |
               L1. |   .7456328   .0824537     9.04   0.000     .5840265    .9072392
                   |
             _cons |  -.0599661   .0763795    -0.79   0.432    -.2096671     .089735
------------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
group: Identity              |
                  var(_cons) |   .0198663    .005629      .0114007    .0346179
-----------------------------+------------------------------------------------
               var(Residual) |   .0405429   .0023407      .0362052    .0454004
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 178.51        Prob >= chibar2 = 0.0000

. test 1.treatment#period 2.treatment#period

 ( 1)  [ce]1.treatment#c.period = 0
 ( 2)  [ce]2.treatment#c.period = 0

           chi2(  2) =   15.37
         Prob > chi2 =    0.0005

Here It seems the test is showing that CG and C are different to G. But what I do not understand is what it is testing. Is it testing jointly CG and C equal to 0?
Moreover, I do not understand the meaning of the coefficient treatment CG (second coefficient in the regression). Is it telling me that CG is not statistically different from G?

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#6

18 Jan 2019, 10:25

According to these results it seems that the slopes are different in G vs C and G vs CG (p-value should be<0.01). Is this the correct interpretation?

Well, allowing for the common abuse of language that "are different" is used to mean "the difference is statistically significant", yes, this is the correct interpretation. Of course, that is not what statistically significant actually means--but I don' want to digress to my rant about that.

Here It seems the test is showing that CG and C are different to G. But what I do not understand is what it is testing. Is it testing jointly CG and C equal to 0?

Again, this is correct if I am charitable in my assumptions about what you mean by your language here. It is testing the joint null hypothesis that the extent to which CG moderates the ce vs period slope and the extent to which C moderates the ce vs period slope are both zero.

Moreover, I do not understand the meaning of the coefficient treatment CG (second coefficient in the regression). Is it telling me that CG is not statistically different from G?

No. This is an interaction model. So the coefficient of CG here tells you about the expected difference in ce between CG and G when period = 0. Now, it appears from your graph that period is never 0 in your data; it starts at 1. If that is correct, then the coefficient of CG (and the coefficient of C as well), although an algebraically essential component of the regression equation, is, at best, an extrapolation beyond the range of the data. So it should not be regarded as having any real-world interpretation.
Comment
Francesco Nemore

Join Date: May 2018

Posts: 14
#7

19 Jan 2019, 03:31

Infinite thanks Clyde!
Comment
Dhruv Darji

Join Date: Feb 2024

Posts: 9
#8

21 Jul 2024, 09:19

Hi everyone, I am studying the effects of maternal HIV status on in-utero fetal growth using serial ultrasound measurements of different fetal biometry parameters. I am using linear mixed models, and I have included polynomial terms of gestational age as fixed effects in my model to account for the non-linearity of fetal growth during gestation. In this questions, I am showing my analyses for abdominal circumference (AC). I am testing for an interaction between gestation age and HIV status on AC (AC_mm).

I tested a three-way interaction term using the likelihood ratio test as follows:

mixed AC_mm i.hivpositive ///
age_mat ib1.bmicat_bas_mat i.smoke_mat i.alcohol_mat i.nullipara_cat i.maristat ///
i.sex_foet i.ironpreg i.folpreg i.wealth_index3 i.occup i.apo_bas ///
GA_div10 GA_div10_cb || ///
id: ga_at_usd_visits_wks_, ///
cov(unstructured) nolog
est store a

mixed AC_mm i.hivpositive##c.GA_div10##c.GA_div10_cb ///
age_mat ib1.bmicat_bas_mat i.smoke_mat i.alcohol_mat i.nullipara_cat i.maristat ///
i.sex_foet i.ironpreg i.folpreg i.wealth_index3 i.occup i.apo_bas ///
|| ///
id: ga_at_usd_visits_wks_, ///
cov(unstructured) nolog
est store b

lrtest a b

The stata output was as follows:
1. LMM output for model with interaction term

2. LRT output

3. Used margins to predict the mean AC at different gestational ages, as follows:

Can you please help me interpret these results?

Can I say that per unit increase in GA, AC is higher in the HIV positive compared to HIV negative?

Thank you!
Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#9

21 Jul 2024, 10:05

I don't quite understand what you did, and I'm not convinced by what is shown in #1 that it's what you say you did.

You refer to having included polynomial terms in your model. But the code does not exhibit any. It does include, in the interaction model, a three-way interaction i.hivpositive##c.GA_div10##c.GA_div10_cb. But that doesn't create any polynomial terms because c.GA_div10 and c.GA_div10_cb are different variables. So no variable appears in a higher power than 1. If GA_div10_cb is a clone of GA_div10, you will have succeeded in creating a quadratic regression on GA_div10, but then the -margins- output will be wrong because -margins- will not know that GA_div10_cb is supposed to be the same thing as GA_div10, and will treat them as if they are unrelated variables, not as being the same. So before we really try to interpret the results, we need to be clear on what the model is.

Then, I see you have a random slope specified for a variable ga_at_usd_visits_wks_, but this variable does not appear in the fixed part of the model. This is almost always a model mis-specification. What it does is estimate variation among women in the slope of the outcome:ga_at_usd_visits_wks_ relationship while constraining the mean slope to be zero. If that is what you intended, then proceed with it. But it usually isn't what is intended with these models. If you do not want to constrain the mean slope to be zero, then the same variable must also appear in the fixed part of the model. I'm also somewhat baffled by the appearance of three separate ga_* variables in the model: there is, at any time, only one gestational age, so I'm wondering what these three different variables represent. If they are all clones, then the model is multiply mis-specified, not only for the already mentioned reasons but also because it is almost always a mis-specification to have a random slope for the linear term but not the (implicit) quadratic.

Next, even assuming your code is correct (even though it does not reflect your word description of the model you want), the likelihood ratio test you have done is not helpful for determining the effect of hiv positivity on the outcome. To do that would require using a base model that makes no mention at all of hiv positivity--but your base model does include it. What that likelihood ratio test is really testing is the joint significance of all the interaction terms. That is, it is testing whether or not the effect of hiv positivity varies according to the values of GA_div10 and GA_div10_cb. Which may be an interesting question in its own right but is not the one you say you are trying to answer. Actually, I think the most effective way to examine the effect of hiv positivity on the outcome would be to break it down at different time points, as you did in your graph, but specifically plot its marginal effect:

Code:

margins, dydx(hiv_positive) at(GA_div10 = (13(6)39)) // OR APPROPRIATE VALUES OF GA_div10 marginsplot, xdimension(GA_div10)

That would focus in on the hiv positivity effect and would also show how it varies with gestational age.

One other thought: I notice the variable nullipara_cat is dropped from the model due to collinearity. There are no other variables in the regression that seem likely candidates to form a colinear relationship with nullipara_cat. So while I can't entirely rule that possibility out without seeing the data, it seems more likely that the collinearity is actually that everybody in the sample is nullliparous (or maybe that nobody in the sample is). If that is the case, there is no point in even having that variable. The other possibility that worries me, though, is that in the data every women is coded as nulliparous, but that is not the case in real life--in other words a data error. Whenever a variable is unexpectedly omitted from a regression, the cause should be investigated and, if errors are found, they should be fixed.
Comment
Dhruv Darji

Join Date: Feb 2024

Posts: 9
#10

21 Jul 2024, 11:06

Hi Clyde, thanks for the feedback. This is very helpful. Please let me elaborate some of the decisions behind this model:

You refer to having included polynomial terms in your model. But the code does not exhibit any. It does include, in the interaction model, a three-way interaction i.hivpositive##c.GA_div10##c.GA_div10_cb. But that doesn't create any polynomial terms because c.GA_div10 and c.GA_div10_cb are different variables. So no variable appears in a higher power than 1. If GA_div10_cb is a clone of GA_div10, you will have succeeded in creating a quadratic regression on GA_div10, but then the -margins- output will be wrong because -margins- will not know that GA_div10_cb is supposed to be the same thing as GA_div10, and will treat them as if they are unrelated variables, not as being the same. So before we really try to interpret the results, we need to be clear on what the model is.

Polynomial terms of gestational age, I derived these using the 'xrigls' (Reference Interval Estimation by Generalized Least Squares) command using this code:

Code:

xrigls AC_mm ga_at_usd_visits_wks_, fp(m:df 4,s:df 2) centile(3 50 97) detail nogr

Where AC_mm is the abdominal circumference, and ga_at_usd_visits_wks_ is the gestational age in weeks.

The output of the xrigls is:

I generated the GA_div10 (X) and GA_div10_cb (X^3) to represent the terms highlighted in the output above. Thereafter specified the LMM as follows:

Code:

mixed AC_mm i.hivpositive /// age_mat ib1.bmicat_bas_mat i.smoke_mat i.alcohol_mat i.nullipara_cat i.maristat /// i.sex_foet i.ironpreg i.folpreg i.wealth_index3 i.occup i.apo_bas /// GA_div10 GA_div10_cb || /// id: ga_at_usd_visits_wks_, /// cov(unstructured) nolog

Is this clearer (and correct)?

Then, I see you have a random slope specified for a variable ga_at_usd_visits_wks_, but this variable does not appear in the fixed part of the model. This is almost always a model mis-specification. What it does is estimate variation among women in the slope of the outcome:ga_at_usd_visits_wks_ relationship while constraining the mean slope to be zero. If that is what you intended, then proceed with it. But it usually isn't what is intended with these models. If you do not want to constrain the mean slope to be zero, then the same variable must also appear in the fixed part of the model. I'm also somewhat baffled by the appearance of three separate ga_* variables in the model: there is, at any time, only one gestational age, so I'm wondering what these three different variables represent. If they are all clones, then the model is multiply mis-specified, not only for the already mentioned reasons but also because it is almost always a mis-specification to have a random slope for the linear term but not the (implicit) quadratic.

I wanted to specify the gestational age as the random slope, hence I specified the ga_at_usd_visits_wks_ variable as such. My goal is to specify variation in the effect of gestational age on AC in different women (ID).

Next, even assuming your code is correct (even though it does not reflect your word description of the model you want), the likelihood ratio test you have done is not helpful for determining the effect of hiv positivity on the outcome. To do that would require using a base model that makes no mention at all of hiv positivity--but your base model does include it. What that likelihood ratio test is really testing is the joint significance of all the interaction terms. That is, it is testing whether or not the effect of hiv positivity varies according to the values of GA_div10 and GA_div10_cb. Which may be an interesting question in its own right but is not the one you say you are trying to answer.

Yes I am specifying an interaction term to assess if the effect of HIV positivity on AC_mm varies according to gestational age (this is the hypothesis). And to test it, is why I specified a model with the hivpositive and two geestational age fixed effects terms (GA_div10 and GA_div10_cb) seperately and another model with a 3-way interaction term with all 3 variables i.e., i.hivpositive##c.GA_div10##c.GA_div10_cb.

One other thought: I notice the variable nullipara_cat is dropped from the model due to collinearity. There are no other variables in the regression that seem likely candidates to form a colinear relationship with nullipara_cat. So while I can't entirely rule that possibility out without seeing the data, it seems more likely that the collinearity is actually that everybody in the sample is nullliparous (or maybe that nobody in the sample is). If that is the case, there is no point in even having that variable. The other possibility that worries me, though, is that in the data every women is coded as nulliparous, but that is not the case in real life--in other words a data error. Whenever a variable is unexpectedly omitted from a regression, the cause should be investigated and, if errors are found, they should be fixed.

Thanks for pointing that out. I noticed it too. I am investigating it, about 16% of the women in my data are nulliparous, so I don't think its a case of all or none. Do you suggest any other investigations?

Last edited by Dhruv Darji; 21 Jul 2024, 11:14.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#11

21 Jul 2024, 11:46

Thanks for the clarification.
If you want to use -margins-, which I strongly recommend, you cannot use the variables created with -xrigls- because -margins- has no way of knowing that they are related to each other as powers, and treats them as unrelated variable, leading to spurious results.

It is unusual to model a polynomial with a linear and cubic term but no quadratic. Not illegal, but strange and highly constraining in a way that would require a substantive justification as to why there can be no quadratic term. (Constraining the quadratic coefficient to zero constrains the model to a very unnatural family of cubic polynomials--those whose three roots sum to zero. It would be surprising to find a real world phenomenon that is constrained by such a law.)

Even if problems 1 and 2 are solved, the likelihood ratio test you did still tests only for whether there is some interaction between hiv_positive and gestational age. It says nothing about whether hiv_positive itself has significant effects at any point in time. So whether the likelihood ratio test is really helpful to you depends on the exact wording of your research question.

It is still almost certainly a mis-specified model to have a random slope on one power of gestational age and not the others.

Here's what I would do. Get rid of the extra gestational age variables: you should have one and only one such variable. For brevity, let's just call it GA. I would run the model as:

Code:

mixed AC_mm i.hivpositive##c.GA##c.GA##c.GA /// age_mat ib1.bmicat_bas_mat i.smoke_mat i.alcohol_mat i.nullipara_cat i.maristat /// i.sex_foet i.ironpreg i.folpreg i.wealth_index3 i.occup i.apo_bas /// || /// id: c.GA##c.GA##c.GA, /// cov(unstructured) nolog test 1.hivpositive#c.GA 1.hivpositive#c.GA#c.GA 1.hivpositive#c.GA#c.GA#c.GA margins, dydx(hivpositive) at(GA = (13(6)39)) marginsplot, xdimension(GA)

The -test- command will give you a clean and simple test of whether GA interacts with any or all of the first three powers of GA. The margins plot graph will show you the marginal effect of hiv positive on abdominal circumference at age of the specified time points--which is probably more useful than the -test- results.

Concerning nullipara_cat, perhaps it is coded 0 for every woman in the regression estimation sample. Remember that if an observation has missing values for any of the variables mentioned in the -mixed- command, it will be omitted. It may be that when all such incomplete observations are omitted, all the remaining ones happen to have nullipara_cat == 0. That would explain the odd behavior of that variable. So after you run the model (with i.nullipara_cat included) you can run -tab nullipara_cat if e(sample)- and I suspect it will turn up all zeroes. Then you have to figure out whether the null parity of all women with complete data is real or represents coding error(s). It also raises a question: how much missing data do you have. The output shown in #1 says that 469 women were included in your regression. Is that a substantial fraction of the total women in the study, or have you lost a large number of women to incomplete data? Perhaps there are coding errors in other variables that are causing things to be coded as missing when they are not? Or, if the missing codes really reflect missing data, this might be a problem you need to attend to.

As an aside, -xrigls- is not a part of official Stata. Nothing wrong with using that, but when referring to such programs in this Forum it is helpful to identify what it is and where it comes from.
Comment
Dhruv Darji

Join Date: Feb 2024

Posts: 9
#12

21 Jul 2024, 13:37

Hi Clyde,

Thanks a lot for this information as well as the code you have provided me.

Regarding the code, I noticed you have included the same gestational age terms as the random slope for the model. I have previously done this, and also ran the exact code which you provided, but the model did not converge. Consequently, I changed the covariance structure to exchangeable, and ran the model as follows: (this converges):

Code:

mixed AC_mm i.hivpositive##c.GA##c.GA##c.GA /// age_mat ib1.bmicat_bas_mat i.smoke_mat i.alcohol_mat i.nullipara_cat i.maristat /// i.sex_foet i.ironpreg i.folpreg i.wealth_index3 i.occup i.apo_bas /// || /// id: c.GA##c.GA##c.GA, /// cov(exchangeable) nolog test 1.hivpositive#c.GA 1.hivpositive#c.GA#c.GA 1.hivpositive#c.GA#c.GA#c.GA margins, dydx(hivpositive) at(GA = (13(6)39)) marginsplot, xdimension(GA)

Does this impact the robustness of the results?

As an aside, -xrigls- is not a part of official Stata. Nothing wrong with using that, but when referring to such programs in this Forum it is helpful to identify what it is and where it comes from.

Sure, please see here the source of xrigls

HTML Code:

https://www.researchgate.net/publication/4922139_XRIGLS_Stata_module_to_calculate_reference_intervals_via_generalised_least_squares
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#13

21 Jul 2024, 14:16

Consequently, I changed the covariance structure to exchangeable, and ran the model as follows: (this converges)

The unstructured covariance matrix is pretty large: a covariance must be estimated between each pair among _cons, GA, GA#GA, and GA#GA#GA + a variance for each of those. So that's a total of ₄C₂ + 4 = 10. By contrast, with exchangeable, only a single variance and a single covariance needs to be estimated. Now clearly the exchangeable structure is overly constrained and the common variance and covariance estimated cannot be taken seriously. But the non-convergence of the unstructured covariance model suggests that your data simply lacks sufficient information to identify all of the 10 separate variances and covariances. That might be a reflection of having only, on average, 4.3 usable observations per woman, and a range from 1 to 6. (That, by the way is also a skimpy amount of data for trying to fit cubic polynomials at all.)

How serious a problem this is depends on what your research question requires. The coefficients of the fixed part of the regression are not affected by this, but their standard errors are. Evidently the variance component estimates and their covariances are estimated very differently. If the coefficients are all you need, then there is no problem. But if, as is likely, you also need their standard errors, then there is. There is certainly a problem if you need those variance components and their covariances, though usually these are not really important and are included in the model for the purpose of getting everything else right.

Assuming it is a problem, the only solution I know of is to simplify the model. The use of a cubic model entails having more parameters than your data can support estimation of. Without knowing more about the substance of growth curves during gestation, I can't really propose a specific alternative model with fewer parameters. But looking back at the graph you show in #1, you can see that notwithstanding your high order polynomial model, over the range of values of GA actually studied, the graph is practically a perfectly straight line for both the HIV positive and HIV negative groups; there is only a slight hint of curvature at the far right end. Now, for the reasons we have already gone over, that model is not valid. But it would not surprise me to learn that the corrected model (disregarding the exchangeable covariance problem) also produces a graph that is, for practical purposes, a straight line. If that is so, I would not hesitate to go all the way back to a linear growth curve model, and I would expect that model to converge with cov(unstructured) and solve the problem.
1 like
Comment
Dhruv Darji

Join Date: Feb 2024

Posts: 9
#14

21 Jul 2024, 15:06

Thanks Clyde
Comment

Announcement