Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • mixed models, continuous variables

    Hi!
    I'm new on Stata and I apologize in advance for trivial questions
    I'm working on a mixed models analysis.

    Given the example command:
    "mixed edd preg age||id:"
    edd is a measure of heart dimensions
    preg is number of pregnancies
    age is age of patients
    I wonder when is necessary to use "c.age" or/and "c.preg" in the command to specify I'm using a continuous variable. I've seen this change significantly my results.

    Thank you in advance!
    Isotta

  • #2
    In a regression command, if you have no interaction terms, you do not need to specify the c. prefix. By default, Stata interprets unprefixed variables as continuous in that circumstance.

    It works the other way when you have interaction terms: in an interaction term, the default is to assume it is a discrete variable, so you must use the c. prefix when specifying a continuous variable.

    I think, as a matter of style, it is better to always specify each variable with either c. or i. so there is never a problem of misremembering which way Stata interprets the unprefixed, but if you follow the two rules I just mentioned, things will work out correctly.

    It definitely should not be the case that -mixed edd preg age || id:- produces different results from -mixed edd c.preg c.age || id:-. I suspect that that is not exactly what you did. If it is, and it really gave you different results, then something is wrong. In that case, I recommend you post back showing example data and the exact commands you used, along with the output you got from Stata. To show the example data, be sure to use the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data. When showing your code and output, please be sure to surround them with code delimiters so they will align readably. (If you are unfamiliar with code delimiters, please read the Forum FAQ, with special attention to #12.)

    Comment


    • #3
      Dear Clyde, thank you very much for your fast and detailed answer.
      As you suspected I was not explaining correctly what I did. In fact I was not just including c. prefix to but also trying to explore the interaction between my variables. I show you what I mean with these 3 variable (1 is different from the previous post because I got even more interesting results!)

      pqK: PQ interval on ECG
      preg: number of pregnancies
      age: age of patients

      With the command:
      Code:
       mixed pqK preg age||id:
      I get the following results:


      Click image for larger version

Name:	stataHelp1.png
Views:	1
Size:	49.3 KB
ID:	1583276


      Here both age and preg seems to be predictor for pqK with high significant p.

      Instead If I use the following command, including the check for interaction between c.age and c. preg:
      Code:
       mixed pqK c.age##c.preg ||id:
      I get these results:
      Click image for larger version

Name:	stataHelp2.png
Views:	1
Size:	54.9 KB
ID:	1583277




      Is my understanding that here the results show a interaction between age and preg (c.preg#c.age with p value 0.017) but it seems to me that age and preg "alone" are not more predictor for pqK (changed p value from above). I suspect these results are saying something different from my interpretation so I kindly ask to help me in the reading.
      It seems I bought the wrong Stata book, because the chapter about multilevel analysis is poor and I'm feeling a little lost

      I hope I'm not abusing of your kindness, again thank you for your help!

      Best
      Isotta

      Comment


      • #4
        Originally posted by Anna Isotta Castrini View Post
        I suspect these results are saying something different from my interpretation so I kindly ask to help me in the reading.
        Maybe the first question to ask yourself is: do they make biological sense? Would there be a biological basis for an interaction of woman's age and parity here? An if so, to manifest in flipping the sign of the coefficient for the latter?

        Also, what kind of ranges do you have for your two predictors? (They strongly covary, I suppose.) Are you looking at an age range that is restricted to child-bearing years, or is the age extending way beyond that when cardiac pathology may be expected regardless?

        Above you were looking at end-diastolic (ventricular?) diameter and here atrioventricular conduction. Are they two dimensions of the same underlying pathology?

        As an aside, for continuous-variable × continuous-variable interactions, you might want to consider centering both of the main effects variables beforehand. It can help with numerical stability and in some cases with interpretability, as well.

        Comment


        • #5
          Originally posted by Joseph Coveney View Post
          Are they two dimensions of the same underlying pathology?
          And if so are the results that you obtained with your interaction model for the one consistent with those here for the other? [Accidentally left out the point.]

          Comment


          • #6
            I think Joseph Coveney raises very important questions for Ms. Castrini to think about.

            But his responses do not address her statistical misinterpretation of the interaction model.
            but it seems to me that age and preg "alone" are not more predictor for pqK (changed p value from above).
            This is wrong in two fundamental ways. One is that the difference between statistically significant and not statistically significant is, itself, not statistically significant, or even meaningful at all.But the more important one is that in an interaction model you cannot speak of either of the variables "alone" as a predictor. When you set up an interaction model, you are specifically stating that there is no such thing as "the effect of age" or "the effect of preg." Rather you are specifically saying that there are infinitely many effects of age, depending on the value of preg, and vice versa. In particular, in the interaction model, the coefficient of age is not "the effect of age." It is the effect of age when preg == 0. And, similarly, the coefficient of preg is not the effect of preg. It is the effect of preg when age == 0. Notice that the latter is not even a possible value of age in this context. To get a better sense of how the interaction model works, run it again and follow it with:

            Code:
            margins, at(age = (20(5)50) preg = (0(1)5))
            marginsplot, xdimension(age)
            You will get a graph with six curves (straight lines, actually, in this case) on it, each showing the relationship between pqK and age for a different number of pregnancies (0 through 5). In particular, these curves will not be parallel, showing that in the model, the relationship between pqK and age differs depending on the value of preg. That is the whole point of doing an interaction model.

            Finally, I'll just point out that the difficulties and misunderstandings you are encountering here actually have nothing to do with multi-level analysis: they all arise from the use of interactions and you could encounter the same thing in a single-level model with interaction.

            Comment


            • #7
              Dear Mr. Schechter and Mr. Coveney,
              thank you very much for your important questions, suggestions and comments.

              I will start from the questions of Mr. Coveney:

              Above you were looking at end-diastolic (ventricular?) diameter and here atrioventricular conduction. Are they two dimensions of the same underlying pathology?
              Both end-diastolic diameter and atrioventricular conduction have been shown to increase in patients with this disease, that affects both the muscular tissue and electrical system of the heart. We do know that this genetic disease has a age-related penetrance and that both end-diastolic diameter and atrioventricular conduction increase with age. Age comes to be always a strong predictor in our analysis, and this is expected. The effect of pregnancy consider as a "cardiovascular work" is not known, but we do know that exercise (cardiovascular work) is a predictor for this disease.

              Also, what kind of ranges do you have for your two predictors? (They strongly covary, I suppose.) Are you looking at an age range that is restricted to child-bearing years, or is the age extending way beyond that when cardiac pathology may be expected regardless?
              We are not considering the variable age just restricted to the child-bearing years but extended. I tried with the command:
              Code:
              mixed pqK preg age||id: if age>25 & if age <35
              to explore if the pregnancy was still a predictor just in the childbearing range, and it seems so. We meet of course always a difficulty: women with more pregnancies are older. Than we come to next question:

              Maybe the first question to ask yourself is: do they make biological sense? Would there be a biological basis for an interaction of woman's age and parity here? An if so, to manifest in flipping the sign of the coefficient for the latter?
              Because women with more pregnancies are older with my interaction analysis I tried to "separate" the predictor effect of pregnancy and age.

              Do all this make sense for you?
              Another important point is: most of patients comes to our observation when they have already given birth, sometimes 20 years later. So the variable pregnancy is for most of the patients unchanged during the repeated observations. I wonder is this add any bias to my analysis.

              I'm not sure I understand what you mean with
              As an aside, for continuous-variable × continuous-variable interactions, you might want to consider centering both of the main effects variables beforehand.
              How can I do this?


              Mr. Schechter, thank you very much for your advises. I have for sure a long way to go to improve my statistical skills but I think I'm starting thank you to this interesting conversation.

              Now I understand what was wrong in my interpretation. I did use the suggested "margins" command and got the graph.

              I still have a question. I was analyzing the effect of pregnancy and age on survival outcome. Using the command:
              Code:
              melogit outDe preg age||id:
              OutDe is a "yes/no" variable (0/1) representing a combine survival outcome including death and heart transplant. Stata start to repeat the following analysis and I have to stop it after a while, so I don't get any result:

              Click image for larger version

Name:	stataHelp3.png
Views:	1
Size:	113.0 KB
ID:	1583441


              Do you know what I did wrong?

              Comment


              • #8
                Originally posted by Anna Isotta Castrini View Post
                Do all this make sense for you?
                Yes, thank you for taking the time to reply.

                [Regarding centering] I'm not sure I understand what you mean with How can I do this?
                You can subtract the mean (or median) from all values of the variable, and then use the mean-centered (or median-centered) variable in the regression model. You can Google multilevel model mean-centering for some background on the two basic approaches and some discussion on how centering affects interpretation of the fitted model.

                I still have a question. I was analyzing the effect of pregnancy and age on survival outcome. . . . Do you know what I did wrong?
                Perhaps nothing. You have only 54 patients' data with an average of only three observations each, and that might not be sufficient in order to successfully fit a multilevel logistic regression model with the timecourse of failures that you observe.

                You could try a couple of alternatives, such as
                Code:
                xtgee out c.(age preg), i(id) family(binomial) link(logit) corr(independent) vce(robust)
                or
                Code:
                bysort id (age): generate byte last = _n == _N
                logit out c.(age preg) if last
                to see whether you can get convergence in either of these.

                Comment


                • #9
                  I have a couple of thoughts about your analysis of the OutDe outcome.

                  One is that you may have given up prematurely. When Stata iterates in a not-concave area repeatedly and the log likelihood is not changing, you need to break the calculation and fix something. But from what you show, the log likelihood was still changing. It is possible that given more time, Stata would have found its way out of the non-concave region of the likelihood and found a solution. The more you work with multi-level models, the more you will be impressed with both the complexity and pathology that their likelihoods can offer, and also with the computational intensity of all known algorithms for estimating their parameters. In short, if you are going to use these models, you have to be extremely patient. I have run mixed models where I have had to let them run for weeks on end before they finally converge.

                  That said, the log-likelihood showing in your estimates does seem to be running in near-circles, so I suspect it is probably trapped. So let me ask you about this outcome variable. I have the intuition that you have defined it in such a way that it has the same value for a given patient throughout that patient's data. That is, you have defined it as 0 for all observations of a patient who is still alive/untransplanted at the end of data collection and 1 for all observations of a patient who ended up dead or with a transplant. If that is what you have done, there is no hope of estimating a mixed model like this. That outcome, by construction, has no within-person variance: its variance component is zero. But the logistic distribution which is part of the multi-level logistic regression model, has, mathematically, variance pi2/3. So Stata is trying to fit this data into the Procrustean bed of a model whose key feature is incompatible with it. That won't stop Stata from trying. Sometimes you can get convergence in this situation, but the results will usually be obviously wrong, with some astronomical variance component whose confidence interval runs from nearly zero to nearly infinity at the group level, and usually some regression coefficients that are orders of magnitude away from anything plausible. You were luckier: you got non-convergence. Anyway, if you have defined outDe in this way, it is simply not suitable for this model and you need a different approach. Perhaps a survival analysis of time to (or age at) death or transplant, or something like that.

                  Comment


                  • #10
                    Mr. Coveney and Mr. Schechter thank you again for your kind answers and suggestions. I'm very grateful.

                    About my outDe variable Mr. Schechter: I didn't define it as 0 or 1 for all the observations. All patients starts with outDe ==0 and this becomes 1 at the observation when the decision of transplantation is made or at the last observation before death. But maybe, as you said, I just have to wait a little bit longer.

                    Comment

                    Working...
                    X