Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting random effects: can statistical significance of variance components be determined?

    I use Stata 14.1 on Windows 10

    Using this code:
    Code:
    mixed tolind c.age##c.age Inspired Fables female Liberal Moderate white ceduc marry kids south urban rural suburban crelcom mainline blackprot catholic nofaith tolindsd [fweight=wtssall]|| _all:R.cohort5 || year:
    I get these results:
    HTML Code:
    -------------------------------------------------------------
                    |     No. of       Observations per Group
     Group Variable |     Groups    Minimum    Average    Maximum
    ----------------+--------------------------------------------
               _all |          1     14,604   14,604.0     14,604
               year |         18        348      810.8      1,733
    -------------------------------------------------------------
    
                                                    Wald chi2(21)     =    8042.40
    Log likelihood =  -39253.37                     Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
          tolind |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
             age |  -.0024984   .0133355    -0.19   0.851    -.0286355    .0236387
                 |
     c.age#c.age |  -.0003943   .0001274    -3.10   0.002    -.0006439   -.0001446
                 |
        Inspired |   1.680803    .072354    23.23   0.000     1.538992    1.822614
          Fables |   1.882974   .1039909    18.11   0.000     1.679155    2.086792
          female |   .1422362   .0607907     2.34   0.019     .0230886    .2613838
         Liberal |   .2627643   .0785938     3.34   0.001     .1087233    .4168053
        Moderate |   .1032633   .0708411     1.46   0.145    -.0355828    .2421094
           white |   .6327955   .0920592     6.87   0.000     .4523628    .8132283
           ceduc |   .1892372   .0084649    22.36   0.000     .1726463    .2058281
           marry |  -.0779974   .0683676    -1.14   0.254    -.2119955    .0560006
            kids |  -.2239905   .0805381    -2.78   0.005    -.3818422   -.0661387
           south |  -.5834785   .0644605    -9.05   0.000    -.7098187   -.4571383
           urban |   .0869833   .0832515     1.04   0.296    -.0761867    .2501532
           rural |  -.7469693   .1015166    -7.36   0.000    -.9459382   -.5480003
        suburban |   .3433786   .0730651     4.70   0.000     .2001736    .4865836
         crelcom |  -.0815977   .0096197    -8.48   0.000     -.100452   -.0627435
        mainline |   .4905731   .0900192     5.45   0.000     .3141387    .6670074
       blackprot |   .0190671   .1368865     0.14   0.889    -.2492256    .2873598
        catholic |   .2306323   .0786859     2.93   0.003     .0764108    .3848538
         nofaith |   .4159865   .1095781     3.80   0.000     .2012174    .6307556
        tolindsd |  -8.500621   .1499065   -56.71   0.000    -8.794432    -8.20681
           _cons |   11.76895   .3746199    31.42   0.000     11.03471    12.50319
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
      Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
    -----------------------------+------------------------------------------------
    _all: Identity               |
                  var(R.cohort5) |   .1133522   .0581664      .0414609    .3099001
    -----------------------------+------------------------------------------------
    year: Identity               |
                      var(_cons) |   .3264224   .1205952      .1582368    .6733679
    -----------------------------+------------------------------------------------
                   var(Residual) |   12.57375   .1473513      12.28824    12.86589
    ------------------------------------------------------------------------------
    LR test vs. linear model: chi2(2) = 0.00                  Prob > chi2 = 1.0000
    
    Note: LR test is conservative and provided only for reference.
    My question has to do with the random-effects parameters, specifically, can one say that the variance components (.113 and/or .326) are statistically significant? In some MLM studies, I see these variance components listed as significant, in others, I see no mention of it. I thought that the LR test was a test comparing two models using their log likelihoods - but it is not clear to me that this says anything about whether or not the variance components can be listed as statistically significant or not.

    Help is appreciated.
    Thanks,
    Marie

  • #2
    I think I figured it out. I believe I can determine the statistical significance of the variance components by dividing the variance (estimate) by the standard deviation (producing a number, say, X. And then use the follow code to obtain a p-value.
    Code:
    di 2*(1 - normal(abs(X)))

    Comment


    • #3
      That is not quite correct. The problem is that your null hypothesis is "on the boundary of the parameter space": A variance cannot be negative, so the possible values a variance can take are 0 or more. Your null hypothesis is that the variance is 0, so it is on the boundary of the possible values a variance could take. When we do testing we look at the sampling distribution of the parameter, that is, we imagine that we repeatedly draw a sample from the population, estimate our model in each of these samples and collect the parameters. These parameters will differ from sample to sample, and the distribution of these parameters is the sampling distribution. We obviously have only one sample from the population, but we know quite a bit about how sampling works and we can use that to compute an approximation. When you estimate a model with maximum likelihood (as you did when you typed mixed) then such an approximation that will work most of the time is that the sampling distribution is a Gaussian/normal distribution. That is where your formula came from. However, the Gaussian distribution assumes that all positive and negative values are possible, which is not the case for a variance. That normal distribution is an approximation, so it does not have to be exactly true, but you null hypothesis assumes the worst possible case: the population parameter is as close to the impossible values as possible. This is where that approximation breaks down.

      Also see help j_chibar or (Gutierrez, Carter, Drukker 2001).

      Roberto G. Gutierrez, Shana Carter, and David M. Drukker (2001) "On boundary-value likelihood-ratio tests" Stata Technical Bulletin, 60: 15-18. http://www.stata.com/products/stb/journals/stb60.pdf
      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Maarten,
        Thank you for this feedback. Essentially, whatever p-value I get from the code I referenced: di 2*(1 - normal(abs(X)))
        is invalid.

        This still leaves me stumped as to how I can get Stata to provide me with a valid calculation for the statistical significance of my period and cohort variances.
        Marie

        Comment


        • #5
          I'm confused: isn't what you want in "r(table)"; after estimating your model, type "mat li r(table)"

          Comment


          • #6
            Hi Rich - I typed in "mat li r(table)" but (unless I missed it), that does not provide me with the statistical significance of my random effects parameter estimates.

            Comment


            • #7
              if you show your results (within CODE blocks please so it will be legible - see the FAQ for instructions), I can point to what I am talking about and you can decide whether it is what you mean; it will be easier, I think, if you use the "stddeviations" option on your command line

              Comment


              • #8
                Rich,
                Thanks for your help. Below are my results (sorry for the delay, I was on the radio discussing election results).
                HTML Code:
                . mat li r(table)
                
                r(table)[9,24]
                            tolind:     tolind:     tolind:     tolind:     tolind:     tolind:     tolind:     tolind:
                                         c.age#                                                                        
                               age       c.age    Inspired      Fables      female     Liberal    Moderate       white
                     b    .0403631  -.00080222   1.8998337   2.4372588   .02386721   .45475009   .05881294   1.0598132
                    se   .01490936   .00014156   .07980834   .11436071   .06710728   .08673279   .07824571   .10134008
                     z   2.7072317   -5.666973   23.804954   21.312029   .35565751   5.2431161    .7516443   10.457987
                pvalue   .00678469   1.453e-08   2.97e-125   8.78e-101   .72209707   1.579e-07     .452265   1.347e-25
                    ll   .01114129  -.00107968   1.7434123   2.2131159  -.10766064   .28475694  -.09454583   .86119034
                    ul   .06958492  -.00052477   2.0562552   2.6614016   .15539505   .62474323   .21217171   1.2584362
                    df           .           .           .           .           .           .           .           .
                  crit    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964
                 eform           0           0           0           0           0           0           0           0
                
                            tolind:     tolind:     tolind:     tolind:     tolind:     tolind:     tolind:     tolind:
                                                                                                                       
                             ceduc       marry        kids       south       urban       rural    suburban     crelcom
                     b   .24286389  -.06616891  -.38847293  -.58764655   .01769834  -.85394901   .30845838  -.07799438
                    se   .00929203   .07551756   .08890605   .07120256   .09194642   .11211284   .08070164   .01062578
                     z   26.136804  -.87620572  -4.3694771  -8.2531663   .19248534  -7.6168706    3.822207  -7.3401114
                pvalue   1.39e-150   .38091821   .00001245   1.543e-16   .84736205   2.599e-14   .00013226   2.134e-13
                    ll   .22465185   -.2141806  -.56272558    -.727201  -.16251334  -1.0736861   .15028606  -.09882051
                    ul   .26107593   .08184278  -.21422028   -.4480921   .19791002  -.63421188   .46663069  -.05716824
                    df           .           .           .           .           .           .           .           .
                  crit    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964
                 eform           0           0           0           0           0           0           0           0
                
                            tolind:     tolind:     tolind:     tolind:     tolind:   lns1_1_1:   lns2_1_1:    lnsig_e:
                                                                                                                       
                          mainline   blackprot    catholic     nofaith       _cons       _cons       _cons       _cons
                     b   .46705153   .08739747   .01724263   .60193139   7.7892717  -.86818339  -.53699042   1.3652601
                    se   .09943144   .15119415   .08681632   .12098629     .415639   .23282169   .19097622   .00585857
                     z    4.697222   .57804796   .19861043   4.9752034   18.740473  -3.7289627  -2.8118182   233.03633
                pvalue   2.637e-06   .56323174   .84256749   6.518e-07   2.316e-78   .00019227   .00492623           0
                    ll    .2721695  -.20893761  -.15291424   .36480262   6.9746342  -1.3245055  -.91129694   1.3537775
                    ul   .66193357   .38373255   .18739949   .83906016   8.6039092  -.41186127   -.1626839   1.3767427
                    df           .           .           .           .           .           .           .           .
                  crit    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964    1.959964
                 eform           0           0           0           0           0           0           0           0
                
                My apologies, but I am not sure how you wanted me to use the "stddeviations" option.
                Thanks,
                Marie

                Comment


                • #9
                  r(table) is not the solution here. It contains the same problems as the earlier solution of using di 2*(normal(-abs(b/se ))) as discussed in #3. For example, if you look at lns2_1_1 we see that it uses the exact same formula:

                  Code:
                  . di 2*(normal(-abs(-.53699042/.19097622 )))
                  .00492623
                  That is not surprising, r(table) is a generic matrix created for many different commands. Only inside the actual command do specific commands like mixed filter out the exceptions and choose not to display p-values that are wrong.
                  Last edited by Maarten Buis; 09 Nov 2016, 01:24.
                  ---------------------------------
                  Maarten L. Buis
                  University of Konstanz
                  Department of history and sociology
                  box 40
                  78457 Konstanz
                  Germany
                  http://www.maartenbuis.nl
                  ---------------------------------

                  Comment


                  • #10
                    Maarten, I see.......the r(table) provides me with the exact same results as the di 2*(normal(-abs(b/se ))). How do I determine accurate p-values for my variance components? I have looked at the documentation you suggested previously, and my apologies that my statistical skills are severely lacking, but I have not been able to figure out how to do it.
                    Marie

                    Comment


                    • #11
                      Or......is my question a fool's errand? Is it simply not possible to calculate a valid statistical significance for variance components and thus, that is why the "mixed" command does not provide p-value's for those coefficients?

                      Comment


                      • #12
                        Originally posted by Marie Eisenstein View Post
                        Or......is my question a fool's errand? Is it simply not possible to calculate a valid statistical significance for variance components and thus, that is why the "mixed" command does not provide p-value's for those coefficients?
                        Try the following:
                        Code:
                        local predictors c.age##c.age Inspired Fables female Liberal Moderate ///
                            white ceduc marry kids south urban rural suburban ///
                            crelcom mainline blackprot catholic nofaith tolindsd
                        
                        mixed tolind `predictors' [fweight=wtssall] || _all:R.cohort5 || year:
                        
                        *
                        * Begin here
                        *
                        estimates store Both
                        
                        // Variance component for cohort5
                        quietly mixed tolind `predictors' [fweight=wtssall] || year:
                        lrtest Both .
                        display in smcl as text r(p) / 2
                        
                        // Variance component for year
                        mixed tolind `predictors' [fweight=wtssall] || _all:R.cohort5
                        lrtest Both .
                        display in smcl as text r(p) / 2
                        Why isn't i.year in your list of explanatory variables (fixed effects)?

                        Comment


                        • #13
                          Joseph, I have not yet tried the code, will do so shortly. Thank you.

                          Are you asking why I have not included year as both a fixed and a random effect? I am most interested in the societal level effect of cohort and period (year), thus I did not think year belonged as a fixed effect.

                          Comment


                          • #14
                            Yes, I was asking why you didn't included year in your fixed effects equation in addition to the random effects equation. I'm afraid that I don't understand your rationale.

                            Comment


                            • #15
                              Hi Joseph, I do not think year (or period) is an individual--level effec influence, thus I didn't include it as a fixed effect. Okay, I tried your code. Below is how I used your code (or at least how I thought it should be used) and the results.
                              Code:
                              Code:
                              mixed "predictors" [fweight=wtssall] || _all:R.cohort5 || year:
                              
                              estimates store Both.
                              
                              quietly mixed "predictors" [fweight=wtssall] || year:
                              
                              lrtest Both
                              Results:
                              HTML Code:
                              lrtest Both
                              observations differ: 14604.835 vs. 14636.068
                              r(498);
                              Ugh.

                              Comment

                              Working...
                              X