Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help interpreting mixed model with cubic term

    Hi all,

    I have run the below model (code shown below) and have some questions about the first 3 independent variables – dar, dar_sq, and dar_cub.

    Description of variables:
    • dar = continuous
    • dar_sq = dar^2
    • dar_cub = dar^3
    A few questions:
    • I am trying to decide how to model the relationship between dar and the dependent variable. Dar^3 is statistically significant. Does that mean it must be included in the model along with dar and dar^2? I understand including dar^3 means that there are 2 changes in direction, something I have a hard time making sense of here.
    • I looked at a simple scatterplot between dar and lex (shown below). There does not seem to be a clear relationship. Also, given the data structure requires the use of a mixed model, I am unsure whether a simple scatterplot is correct. I also plotted a spaghetti plot with id for one of the random effects (bor)– shown below.
    • I’m having hard time interpreting what the coefficients on dar, dar_sq, and dar_cub mean. Is there a way after the mixed command to create a plot what the relationship looks like for example?
    Thanks in advance for any comments!


    Code:
    
    . mixed lex dar dar_sq dar_cub alp mup zol mupbyzol || _all: R.bor || _all: R.mol || _all: R.pop, reml
    
    Performing EM optimization:
    
    Performing gradient-based optimization:
    
    Iteration 0:   log restricted-likelihood =  -55055.98  
    Iteration 1:   log restricted-likelihood =  -55055.98  
    
    Computing standard errors:
    
    Mixed-effects REML regression                   Number of obs     =     12,040
    Group variable: _all                            Number of groups  =          1
    
                                                    Obs per group:
                                                                  min =     12,040
                                                                  avg =   12,040.0
                                                                  max =     12,040
    
                                                    Wald chi2(7)      =     558.36
    Log restricted-likelihood =  -55055.98          Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
             lex |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             dar |  -.1442413   .0679903    -2.12   0.034    -.2774999   -.0109827
          dar_sq |   .0039578   .0017025     2.32   0.020      .000621    .0072946
         dar_cub |  -.0000359   .0000113    -3.19   0.001     -.000058   -.0000138
             alp |    7.56537   1.792743     4.22   0.000     4.051658    11.07908
             mup |   18.59015   2.898774     6.41   0.000     12.90866    24.27164
             zol |   11.27925   .6055326    18.63   0.000     10.09243    12.46607
        mupbyzol |   -6.30276   .8396218    -7.51   0.000    -7.948388   -4.657131
           _cons |   70.63327    2.85485    24.74   0.000     65.03787    76.22868
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
      Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
    -----------------------------+------------------------------------------------
    _all: Identity               |
                      var(R.bor) |   81.68924   16.43907      55.06427    121.1881
    -----------------------------+------------------------------------------------
    _all: Identity               |
                      var(R.mol) |   47.87758   14.45347      26.49525    86.51596
    -----------------------------+------------------------------------------------
    _all: Identity               |
                      var(R.pop) |   42.53744   8.664709      28.53546       63.41
    -----------------------------+------------------------------------------------
                   var(Residual) |    528.271    6.84953      515.0153    541.8679
    ------------------------------------------------------------------------------
    LR test vs. linear model: chi2(3) = 2653.43               Prob > chi2 = 0.0000
    
    Note: LR test is conservative and provided only for reference.
    
    scatter lex dar
    
    spagplot lex dar, id(bor)
    Click image for larger version

Name:	Graph11.png
Views:	1
Size:	142.6 KB
ID:	1473752
    Click image for larger version

Name:	Graph22.png
Views:	1
Size:	205.6 KB
ID:	1473753

  • #2
    I suggest to a) avoid creating 3 variables, you may use the # notation for that matter; b) use margins and marginsplot for the interpretation.
    Best regards,

    Marcos

    Comment


    • #3
      Code:
      mixed lex c.dar##c.dar##c.dar alp mup zol mupbyzol || _all: R.bor || _all: R.mol || _all: R.pop, reml
      margins, at(dar = (0(5)100))
      marginsplot
      Like Marcos said, you can use the factor variable notation above to create squared and cubed terms, and these will work perfectly with -margins- and -marginsplot-, and everyone should use them. Manually generating interactions is a pretty common thing we see here; you very rarely need to do this.
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        Thank you Marcos for suggesting margins and marginsplot. Also, thanks Weiwen in particular for showing the exact code to use - that was EXTREMELY helpful!

        Comment


        • #5
          Just had a follow-up question. Below I show the results from margins and also marginsplot. In the case of the plot, it seems that the confidence intervals overlap in several cases. Could this mean that the cubic relationship is the potentially the wrong fit here? Thank you!!!

          Code:
          . mixed lex c.dar##c.dar##c.dar alp mup zol mupbyzol || _all: R.bor || _all: R.mol || _all: R.pop, reml
          
          Performing EM optimization:
          
          Performing gradient-based optimization:
          
          Iteration 0:   log restricted-likelihood =  -55055.98  
          Iteration 1:   log restricted-likelihood =  -55055.98  
          
          Computing standard errors:
          
          Mixed-effects REML regression                   Number of obs     =     12,040
          Group variable: _all                            Number of groups  =          1
          
                                                          Obs per group:
                                                                        min =     12,040
                                                                        avg =   12,040.0
                                                                        max =     12,040
          
                                                          Wald chi2(7)      =     558.36
          Log restricted-likelihood =  -55055.98          Prob > chi2       =     0.0000
          
          -----------------------------------------------------------------------------------
                        lex |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          ------------------+----------------------------------------------------------------
                        dar |  -.1442413   .0679903    -2.12   0.034    -.2774999   -.0109827
                            |
                c.dar#c.dar |   .0039578   .0017025     2.32   0.020      .000621    .0072946
                            |
          c.dar#c.dar#c.dar |  -.0000359   .0000113    -3.19   0.001     -.000058   -.0000138
                            |
                        alp |    7.56537   1.792743     4.22   0.000     4.051658    11.07908
                        mup |   18.59015   2.898774     6.41   0.000     12.90866    24.27164
                        zol |   11.27925   .6055326    18.63   0.000     10.09243    12.46607
                   mupbyzol |   -6.30276   .8396218    -7.51   0.000    -7.948388   -4.657131
                      _cons |   70.63327    2.85485    24.74   0.000     65.03787    76.22868
          -----------------------------------------------------------------------------------
          
          ------------------------------------------------------------------------------
            Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
          -----------------------------+------------------------------------------------
          _all: Identity               |
                            var(R.bor) |   81.68924   16.43907      55.06427    121.1881
          -----------------------------+------------------------------------------------
          _all: Identity               |
                            var(R.mol) |   47.87758   14.45347      26.49525    86.51596
          -----------------------------+------------------------------------------------
          _all: Identity               |
                            var(R.pop) |   42.53744   8.664709      28.53546       63.41
          -----------------------------+------------------------------------------------
                         var(Residual) |    528.271    6.84953      515.0153    541.8679
          ------------------------------------------------------------------------------
          LR test vs. linear model: chi2(3) = 2653.43               Prob > chi2 = 0.0000
          
          Note: LR test is conservative and provided only for reference.
          
          . margins, at(dar = (0(5)100))
          
          Predictive margins                              Number of obs     =     12,040
          
          Expression   : Linear prediction, fixed portion, predict()
          
          1._at        : dar             =           0
          
          2._at        : dar             =           5
          
          3._at        : dar             =          10
          
          4._at        : dar             =          15
          
          5._at        : dar             =          20
          
          6._at        : dar             =          25
          
          7._at        : dar             =          30
          
          8._at        : dar             =          35
          
          9._at        : dar             =          40
          
          10._at       : dar             =          45
          
          11._at       : dar             =          50
          
          12._at       : dar             =          55
          
          13._at       : dar             =          60
          
          14._at       : dar             =          65
          
          15._at       : dar             =          70
          
          16._at       : dar             =          75
          
          17._at       : dar             =          80
          
          18._at       : dar             =          85
          
          19._at       : dar             =          90
          
          20._at       : dar             =          95
          
          21._at       : dar             =         100
          
          ------------------------------------------------------------------------------
                       |            Delta-method
                       |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                   _at |
                    1  |   88.10403   2.107859    41.80   0.000      83.9727    92.23536
                    2  |   87.47728   2.079284    42.07   0.000     83.40196     91.5526
                    3  |   87.02151   2.080809    41.82   0.000      82.9432    91.09982
                    4  |   86.70979   2.092001    41.45   0.000     82.60954    90.81003
                    5  |   86.51521    2.10145    41.17   0.000     82.39644    90.63397
                    6  |   86.41084   2.104481    41.06   0.000     82.28614    90.53555
                    7  |   86.36978    2.10081    41.11   0.000     82.25227    90.48729
                    8  |   86.36511   2.092665    41.27   0.000     82.26356    90.46666
                    9  |    86.3699   2.083348    41.46   0.000     82.28661    90.45318
                   10  |   86.35724   2.076117    41.60   0.000     82.28812    90.42635
                   11  |   86.30021   2.073358    41.62   0.000      82.2365    90.36391
                   12  |   86.17189   2.076051    41.51   0.000     82.10291    90.24088
                   13  |   85.94537   2.083605    41.25   0.000     81.86158    90.02916
                   14  |   85.59373   2.094085    40.87   0.000      81.4894    89.69806
                   15  |   85.09005   2.104807    40.43   0.000      80.9647     89.2154
                   16  |   84.40741   2.113253    39.94   0.000     80.26551    88.54931
                   17  |    83.5189   2.118269    39.43   0.000     79.36717    87.67063
                   18  |   82.39759   2.121572    38.84   0.000     78.23939     86.5558
                   19  |   81.01658   2.129608    38.04   0.000     76.84262    85.19053
                   20  |   79.34893   2.155586    36.81   0.000     75.12406     83.5738
                   21  |   77.36775   2.220925    34.84   0.000     73.01481    81.72068
          ------------------------------------------------------------------------------
          
          . marginsplot



          Attached Files

          Comment


          • #6
            The graph doesn't show groups to make comparisons, hence you should not think about - let alone - see - overlapping. What you have is the predictive margins plus 95% CIs "at" the selected values of a continuous Xvar.

            With regards to the use of squared terms as well as cubic terms, it seems there is some improvement in the model, but you are supposed to check it out by performing postestimations.

            Hopefully that helps.
            Best regards,

            Marcos

            Comment


            • #7
              Very helpful Marcos. Thank you!!!!!!!

              Comment


              • #8
                I'm sorry I have another question about this. When creating the plot with marginsplot, is it best for both the Y and X axes to show the full range of the variables? For example, in the original plot, the full range of the X variable is shown but not full range of Y values (first plot below). In the second plot below, the full range of the Y variable is shown (although ticks are not added to all values), making the plots look quite different. Is one of these presentations considered more accurate?


                Click image for larger version

Name:	Graph1.png
Views:	1
Size:	26.8 KB
ID:	1474724
                Click image for larger version

Name:	Graph2.png
Views:	1
Size:	23.2 KB
ID:	1474725



                Comment


                • #9
                  I'm wondering how I could give you an insightful answer if you didn't explain much about the model's DV, aka "lex".

                  That said, I feel like being preposterous - if not, well, some sort of manipulation to misguide observers - if we present the Y-axis under a scale which, itself, won't present predictive values for more than 90% of the values.

                  What would be the purpose of such strategy, if not leading people to think that there is a straight line, where in fact you yourself saw there is a significant p-value for the Xvar, including its squared as well as cubic terms?

                  Othewise put, I fail to understand what on Earth would be the purpose of doing the whole modeling, just to hide the fittest model's results afterwards.

                  P.S: also, I feel the model would be better presented with the confidence intervals as shown in #5. Taking the CIs away would give me the impression there is "something wrong" with the model and somebody decided to hide it.
                  Last edited by Marcos Almeida; 13 Dec 2018, 10:52.
                  Best regards,

                  Marcos

                  Comment

                  Working...
                  X