bayes, ... mixed: ... crashes for large burn-in periods

Alexander Koplenig

Join Date: Jul 2014
Posts: 39

bayes, ... mixed: ... crashes for large burn-in periods

30 May 2023, 01:10

Dear Statalisters,

I would like to point out a problem that occurs when running various rather complex Bayesian multilevel models using the bayes prefix. The problem occurs in both Stata 17/MP and Stata 18/MP for Unix (Linux 64-bit x86-64). A test data set for such a case is attached. Here I try to run a crossed effects model with two ('fixed') covariates, two crossed effects and random coefficients ('slopes') in both levels (using Isabel’s suggestions presented here Including covariates in crossed-effects models).
Here’s the code I’m using

Code:

*--------------------------------------------------*
cap log close
log using investigate_bayes_mixed, replace
use test_data, clear
 
*Including covariates in crossed-effects models as described by Isabel Canette
*https://blog.stata.com/2010/12/22/including-covariates-in-crossed-effects-models/
 
qui tab Group2, gen(id_Group2)
unab idvar: id_Group2*
foreach v of local idvar {
    gen inter`v' =covariate2*`v'
}
 
*--------------------------------------------------*
*Standard frequentist model
mixed outcome covariate1 covariate2 || _all: R.Group2 || _all: inter*, cov(identity) nocons || Group1: covariate2
 
*--------------------------------------------------*
 
bayes, burnin(200000) dots(1000)  rseed(42) saving(investigate_bayes_mixed_simulation, replace): ///
    mixed outcome covariate1 covariate2 || _all: R.Group2 || _all: inter*, cov(identity) nocons || Group1: covariate2
    estimates save investigate_bayes_mixed, replace
 
*--------------------------------------------------*
log close

The code crashes Stata after estimating the model, i.e. Stata exits without saving the estimates, here is the log-file.

Code:

. use test_data, clear
 
.
. *Including covariates in crossed-effects models as described by Isabel Canette
. *https://blog.stata.com/2010/12/22/including-covariates-in-crossed-effects-models/
.
. qui tab Group2, gen(id_Group2)
 
. unab idvar: id_Group2*
 
. foreach v of local idvar {
  2.     gen inter`v' =covariate2*`v'
  3. }
 
.
. *--------------------------------------------------*
. *Standard frequentist model
. mixed outcome covariate1 covariate2 || _all: R.Group2 || _all: inter*, cov(identity) nocons || Group1: covariate2
 
Performing EM optimization ...
 
Performing gradient-based optimization:
Iteration 0:  Log likelihood =  37.598276
Iteration 1:  Log likelihood =  38.546812
Iteration 2:  Log likelihood =  38.905076
Iteration 3:  Log likelihood =  38.952799
Iteration 4:  Log likelihood =  38.954077
Iteration 5:  Log likelihood =  38.954091
Iteration 6:  Log likelihood =  38.954091
 
Computing standard errors ...
 
Mixed-effects ML regression                             Number of obs =     97
 
        Grouping information
        -------------------------------------------------------------
                        |     No. of       Observations per group
         Group variable |     groups    Minimum    Average    Maximum
        ----------------+--------------------------------------------
                   _all |          1         97       97.0         97
                 Group1 |         24          1        4.0         20
        -------------------------------------------------------------
 
                                                        Wald chi2(2)  =  14.75
Log likelihood =  38.954091                             Prob > chi2   = 0.0006
 
------------------------------------------------------------------------------
     outcome | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
  covariate1 |  -.0130288    .004734    -2.75   0.006    -.0223072   -.0037503
  covariate2 |  -.2171314    .066152    -3.28   0.001    -.3467869   -.0874759
       _cons |   .7379007   .0734132    10.05   0.000     .5940134     .881788
------------------------------------------------------------------------------
 
------------------------------------------------------------------------------
  Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
_all: Identity               |
               var(R.Group2) |   .0233182   .0101985      .0098949    .0549513
-----------------------------+------------------------------------------------
_all: Identity               |
  var(inter~21..inte~219)(1) |   2.66e-11          .             .           .
-----------------------------+------------------------------------------------
Group1: Independent          |
             var(covariate2) |   .0042044   .0217997      1.62e-07    108.9365
                  var(_cons) |   1.68e-16   1.27e-12             0           .
-----------------------------+------------------------------------------------
               var(Residual) |   .0187106   .0032308      .0133385    .0262462
------------------------------------------------------------------------------
LR test vs. linear model: chi2(4) = 49.66                 Prob > chi2 = 0.0000
(1) interid_Group21 interid_Group22 interid_Group23 interid_Group24 interid_Group25 interid_Group26 interid_Group27 interid_Group28 interid_Group29 interid_Group210 interid_Group211 interid_Group212 interid_Group213
    interid_Group214 interid_Group215 interid_Group216 interid_Group217 interid_Group218 interid_Group219
 
Note: LR test is conservative and provided only for reference.
 
.
. *--------------------------------------------------*
. *Bayesian model, with burnin(200000)
. bayes, burnin(200000) dots(1000)  rseed(42) saving(investigate_bayes_mixed_simulation, replace): ///
>     mixed outcome covariate1 covariate2 || _all: R.Group2 || _all: inter*, cov(identity) nocons || Group1: covariate2
note: Gibbs sampling is used for regression coefficients and some variance components.
 
Burn-in 200000 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa done
Simulation 10000 .......... done
 
file investigate_bayes_mixed_simulation.dta not found; file saved.
 
Multilevel structure
------------------------------------------------------------------------------
Group2
    {U0}: random intercepts
 
_all
    {V0}: random intercepts
    {V1}: random coefficients for interid_Group21
    {V2}: random coefficients for interid_Group22
    {V3}: random coefficients for interid_Group23
    {V4}: random coefficients for interid_Group24
    {V5}: random coefficients for interid_Group25
    {V6}: random coefficients for interid_Group26
    {V7}: random coefficients for interid_Group27
    {V8}: random coefficients for interid_Group28
    {V9}: random coefficients for interid_Group29
    {V10}: random coefficients for interid_Group210
    {V11}: random coefficients for interid_Group211
    {V12}: random coefficients for interid_Group212
    {V13}: random coefficients for interid_Group213
    {V14}: random coefficients for interid_Group214
    {V15}: random coefficients for interid_Group215
    {V16}: random coefficients for interid_Group216
    {V17}: random coefficients for interid_Group217
    {V18}: random coefficients for interid_Group218
    {V19}: random coefficients for interid_Group219
 
Group1
    {W0}: random intercepts
    {W1}: random coefficients for covariate2
------------------------------------------------------------------------------
 
Model summary
------------------------------------------------------------------------------
Likelihood:
  outcome ~ normal(xb_outcome,{e.outcome:sigma2})
 
Priors:
  {outcome:covariate1 covariate2 _cons} ~ normal(0,10000)                  (1)
                                   {U0} ~ normal(0,{U:sigma2})             (1)
                                   {V0} ~ normal(0,{V:sigma2})             (1)
                                   {V1} ~ normal(0,{V:sigma2})             (1)
                                   {V2} ~ normal(0,{V:sigma2})             (1)
                                   {V3} ~ normal(0,{V:sigma2})             (1)
                                   {V4} ~ normal(0,{V:sigma2})             (1)
                                   {V5} ~ normal(0,{V:sigma2})             (1)
                                   {V6} ~ normal(0,{V:sigma2})             (1)
                                   {V7} ~ normal(0,{V:sigma2})             (1)
                                   {V8} ~ normal(0,{V:sigma2})             (1)
                                   {V9} ~ normal(0,{V:sigma2})             (1)
                                  {V10} ~ normal(0,{V:sigma2})             (1)
                                  {V11} ~ normal(0,{V:sigma2})             (1)
                                  {V12} ~ normal(0,{V:sigma2})             (1)
                                  {V13} ~ normal(0,{V:sigma2})             (1)
                                  {V14} ~ normal(0,{V:sigma2})             (1)
                                  {V15} ~ normal(0,{V:sigma2})             (1)
                                  {V16} ~ normal(0,{V:sigma2})             (1)
                                  {V17} ~ normal(0,{V:sigma2})             (1)
                                  {V18} ~ normal(0,{V:sigma2})             (1)
                                  {V19} ~ normal(0,{V19:sigma2})           (1)
                                   {W0} ~ normal(0,{W0:sigma2})            (1)
                                   {W1} ~ normal(0,{W1:sigma2})            (1)
                     {e.outcome:sigma2} ~ igamma(.01,.01)
 
Hyperpriors:
    {U:sigma2} ~ igamma(.01,.01)
    {V:sigma2} ~ igamma(.01,.01)
  {V19:sigma2} ~ igamma(.01,.01)
   {W0:sigma2} ~ igamma(.01,.01)
   {W1:sigma2} ~ igamma(.01,.01)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_outcome.
 
Bayesian multilevel regression                   MCMC iterations  =    210,000
Metropolis–Hastings and Gibbs sampling           Burn-in          =    200,000
                                                 MCMC sample size =     10,000
 
-------------------------------------------------------------
                |     No. of       Observations per group
 Group variable |     groups    Minimum    Average    Maximum
----------------+--------------------------------------------
           _all |          1         97       97.0         97
         Group1 |          1         97       97.0         97
-------------------------------------------------------------
 
                                                 Number of obs    =         97
                                                 Acceptance rate  =      .5045
                                                 Efficiency:  min =    .002255
                                                              avg =     .06496
Log marginal-likelihood                                       max =      .2841
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
outcome      |
  covariate1 | -.0110874   .0059492   .000344  -.0109953  -.0228833   .0004529
  covariate2 | -.1876556   .1218701   .006668  -.1864473  -.4312361     .05925
       _cons |  .6759434   .1800871   .037924   .6768317   .3181374   1.030745
-------------+----------------------------------------------------------------
Group2       |
    U:sigma2 |  .0267817   .0142745   .001079   .0236498   .0091008   .0632229
-------------+----------------------------------------------------------------
_all         |
    V:sigma2 |  .0328675   .0291046   .003738   .0227667   .0049685   .1147533
  V19:sigma2 |  648.4501   21147.34   396.721   .0976229   .0059268   45.86536
-------------+----------------------------------------------------------------
Group1       |
   W0:sigma2 |  .0076228    .005156   .000306   .0063366   .0020867   .0204744
   W1:sigma2 |  .0304866   .0299592   .002302    .020928   .0038986   .1133497
-------------+----------------------------------------------------------------
e.outcome    |
      sigma2 |   .019182   .0033781   .000083   .0188363   .0135053   .0268277
------------------------------------------------------------------------------
Note: Default priors are used for model parameters.
Note: There is a high autocorrelation after 500 lags.

My guess is that the problem must have to do with the size of the burn-in period: I have tried
burnin(20000), burnin(100000) and burnin(150000).
For all three sizes, everything works fine.
Has anyone else encountered similar problems and successfully fixed them?

Just in case: at the moment, I’m using Nikolay’s workaround to retrieve estimation results (described here Retrieving estimation results from "bayes [...] saving(xyz)" command):

Code:

.
. *Make temporary directory to store simulation dataset
. capture mkdir "temp_sim"
 
.
. copy "investigate_bayes_mixed_simulation.dta" "temp_sim/investigate_bayes_mixed_simulation.dta"
 
.
. *Re-run Bayesian model with burnin(1)
. use test_data, clear
 
.
.
. qui tab Group2, gen(id_Group2)
 
. unab idvar: id_Group2*
 
. foreach v of local idvar {
  2.     gen inter`v' =covariate2*`v'
  3. }
 
.
.
. *--------------------------------------------------*
. bayes, burnin(1) rseed(42) saving(investigate_bayes_mixed_simulation, replace): ///
>     mixed outcome covariate1 covariate2 || _all: R.Group2 || _all: inter*, cov(identity) nocons || Group1: covariate2
note: Gibbs sampling is used for regression coefficients and some variance components.
 
Burn-in 1  done
Simulation 10000 aaaaaaaaa1000aaaaaaaaa2000aaaaa....3000.........4000.........5000.........6000.........7000.........8000.........9000.........10000 done
 
file investigate_bayes_mixed_simulation.dta saved.
 
Multilevel structure
------------------------------------------------------------------------------
Group2
    {U0}: random intercepts
 
_all
    {V0}: random intercepts
    {V1}: random coefficients for interid_Group21
    {V2}: random coefficients for interid_Group22
    {V3}: random coefficients for interid_Group23
    {V4}: random coefficients for interid_Group24
    {V5}: random coefficients for interid_Group25
    {V6}: random coefficients for interid_Group26
    {V7}: random coefficients for interid_Group27
    {V8}: random coefficients for interid_Group28
    {V9}: random coefficients for interid_Group29
    {V10}: random coefficients for interid_Group210
    {V11}: random coefficients for interid_Group211
    {V12}: random coefficients for interid_Group212
    {V13}: random coefficients for interid_Group213
    {V14}: random coefficients for interid_Group214
    {V15}: random coefficients for interid_Group215
    {V16}: random coefficients for interid_Group216
    {V17}: random coefficients for interid_Group217
    {V18}: random coefficients for interid_Group218
    {V19}: random coefficients for interid_Group219
 
Group1
    {W0}: random intercepts
    {W1}: random coefficients for covariate2
------------------------------------------------------------------------------
 
Model summary
------------------------------------------------------------------------------
Likelihood:
  outcome ~ normal(xb_outcome,{e.outcome:sigma2})
 
Priors:
  {outcome:covariate1 covariate2 _cons} ~ normal(0,10000)                  (1)
                                   {U0} ~ normal(0,{U:sigma2})             (1)
                                   {V0} ~ normal(0,{V:sigma2})             (1)
                                   {V1} ~ normal(0,{V:sigma2})             (1)
                                   {V2} ~ normal(0,{V:sigma2})             (1)
                                   {V3} ~ normal(0,{V:sigma2})             (1)
                                   {V4} ~ normal(0,{V:sigma2})             (1)
                                   {V5} ~ normal(0,{V:sigma2})             (1)
                                   {V6} ~ normal(0,{V:sigma2})             (1)
                                   {V7} ~ normal(0,{V:sigma2})             (1)
                                   {V8} ~ normal(0,{V:sigma2})             (1)
                                   {V9} ~ normal(0,{V:sigma2})             (1)
                                  {V10} ~ normal(0,{V:sigma2})             (1)
                                  {V11} ~ normal(0,{V:sigma2})             (1)
                                  {V12} ~ normal(0,{V:sigma2})             (1)
                                  {V13} ~ normal(0,{V:sigma2})             (1)
                                  {V14} ~ normal(0,{V:sigma2})             (1)
                                  {V15} ~ normal(0,{V:sigma2})             (1)
                                  {V16} ~ normal(0,{V:sigma2})             (1)
                                  {V17} ~ normal(0,{V:sigma2})             (1)
                                  {V18} ~ normal(0,{V:sigma2})             (1)
                                  {V19} ~ normal(0,{V19:sigma2})           (1)
                                   {W0} ~ normal(0,{W0:sigma2})            (1)
                                   {W1} ~ normal(0,{W1:sigma2})            (1)
                     {e.outcome:sigma2} ~ igamma(.01,.01)
 
Hyperpriors:
    {U:sigma2} ~ igamma(.01,.01)
    {V:sigma2} ~ igamma(.01,.01)
  {V19:sigma2} ~ igamma(.01,.01)
   {W0:sigma2} ~ igamma(.01,.01)
   {W1:sigma2} ~ igamma(.01,.01)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_outcome.
 
Bayesian multilevel regression                   MCMC iterations  =     10,001
Metropolis–Hastings and Gibbs sampling           Burn-in          =          1
                                                 MCMC sample size =     10,000
 
-------------------------------------------------------------
                |     No. of       Observations per group
 Group variable |     groups    Minimum    Average    Maximum
----------------+--------------------------------------------
           _all |          1         97       97.0         97
         Group1 |          1         97       97.0         97
-------------------------------------------------------------
 
                                                 Number of obs    =         97
                                                 Acceptance rate  =      .5247
                                                 Efficiency:  min =    .001728
                                                              avg =      .1486
Log marginal-likelihood                                       max =          1
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
outcome      |
  covariate1 |  -.010756   .0057898   .000262  -.0107464   -.022353   .0008853
  covariate2 | -.1865147   .1113828   .005497  -.1863413  -.3999516   .0360584
       _cons |   .589891   .2360603    .05678   .6385582  -.1050936   .9248694
-------------+----------------------------------------------------------------
Group2       |
    U:sigma2 |  .0252251   .0130876   .000589   .0224263   .0083923   .0581508
-------------+----------------------------------------------------------------
_all         |
    V:sigma2 |  .0413304   .0492257    .00763   .0251147   .0051311   .1700203
  V19:sigma2 |  363.2888    14164.7   141.647   .1094709   .0061633    61.7817
-------------+----------------------------------------------------------------
Group1       |
   W0:sigma2 |  .0078951   .0050638   .000273   .0065598   .0022338   .0212309
   W1:sigma2 |  .0275205    .027656   .002792   .0185549   .0036315   .1026532
-------------+----------------------------------------------------------------
e.outcome    |
      sigma2 |  .0192395   .0035545   .000092   .0187985   .0135397   .0274202
------------------------------------------------------------------------------
Note: Default priors are used for model parameters.
Note: Adaptation continues during simulation.
Note: There is a high autocorrelation after 500 lags.
Note: Adaptation tolerance is not met in at least one of the blocks.
 
.     estimates save investigate_bayes_mixed, replace
file investigate_bayes_mixed.ster saved
 
.
. *--------------------------------------------------*
.
. *Copy simulation dataset back to current directory
. copy "temp_sim/investigate_bayes_mixed_simulation.dta" "investigate_bayes_mixed_simulation.dta", replace
 
.
. *Re-run model based on initial run
.
. bayes
 
Multilevel structure
------------------------------------------------------------------------------
Group2
    {U0}: random intercepts
 
_all
    {V0}: random intercepts
    {V1}: random coefficients for interid_Group21
    {V2}: random coefficients for interid_Group22
    {V3}: random coefficients for interid_Group23
    {V4}: random coefficients for interid_Group24
    {V5}: random coefficients for interid_Group25
    {V6}: random coefficients for interid_Group26
    {V7}: random coefficients for interid_Group27
    {V8}: random coefficients for interid_Group28
    {V9}: random coefficients for interid_Group29
    {V10}: random coefficients for interid_Group210
    {V11}: random coefficients for interid_Group211
    {V12}: random coefficients for interid_Group212
    {V13}: random coefficients for interid_Group213
    {V14}: random coefficients for interid_Group214
    {V15}: random coefficients for interid_Group215
    {V16}: random coefficients for interid_Group216
    {V17}: random coefficients for interid_Group217
    {V18}: random coefficients for interid_Group218
    {V19}: random coefficients for interid_Group219
 
Group1
    {W0}: random intercepts
    {W1}: random coefficients for covariate2
------------------------------------------------------------------------------
 
Model summary
------------------------------------------------------------------------------
Likelihood:
  outcome ~ normal(xb_outcome,{e.outcome:sigma2})
 
Priors:
  {outcome:covariate1 covariate2 _cons} ~ normal(0,10000)                  (1)
                                   {U0} ~ normal(0,{U:sigma2})             (1)
                                   {V0} ~ normal(0,{V:sigma2})             (1)
                                   {V1} ~ normal(0,{V:sigma2})             (1)
                                   {V2} ~ normal(0,{V:sigma2})             (1)
                                   {V3} ~ normal(0,{V:sigma2})             (1)
                                   {V4} ~ normal(0,{V:sigma2})             (1)
                                   {V5} ~ normal(0,{V:sigma2})             (1)
                                   {V6} ~ normal(0,{V:sigma2})             (1)
                                   {V7} ~ normal(0,{V:sigma2})             (1)
                                   {V8} ~ normal(0,{V:sigma2})             (1)
                                   {V9} ~ normal(0,{V:sigma2})             (1)
                                  {V10} ~ normal(0,{V:sigma2})             (1)
                                  {V11} ~ normal(0,{V:sigma2})             (1)
                                  {V12} ~ normal(0,{V:sigma2})             (1)
                                  {V13} ~ normal(0,{V:sigma2})             (1)
                                  {V14} ~ normal(0,{V:sigma2})             (1)
                                  {V15} ~ normal(0,{V:sigma2})             (1)
                                  {V16} ~ normal(0,{V:sigma2})             (1)
                                  {V17} ~ normal(0,{V:sigma2})             (1)
                                  {V18} ~ normal(0,{V:sigma2})             (1)
                                  {V19} ~ normal(0,{V19:sigma2})           (1)
                                   {W0} ~ normal(0,{W0:sigma2})            (1)
                                   {W1} ~ normal(0,{W1:sigma2})            (1)
                     {e.outcome:sigma2} ~ igamma(.01,.01)
 
Hyperpriors:
    {U:sigma2} ~ igamma(.01,.01)
    {V:sigma2} ~ igamma(.01,.01)
  {V19:sigma2} ~ igamma(.01,.01)
   {W0:sigma2} ~ igamma(.01,.01)
   {W1:sigma2} ~ igamma(.01,.01)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_outcome.
 
Bayesian multilevel regression                   MCMC iterations  =     10,001
Metropolis–Hastings and Gibbs sampling           Burn-in          =          1
                                                 MCMC sample size =     10,000
 
-------------------------------------------------------------
                |     No. of       Observations per group
 Group variable |     groups    Minimum    Average    Maximum
----------------+--------------------------------------------
           _all |          1         97       97.0         97
         Group1 |          1         97       97.0         97
-------------------------------------------------------------
 
                                                 Number of obs    =         97
                                                 Acceptance rate  =      .5247
                                                 Efficiency:  min =    .002255
                                                              avg =     .06496
Log marginal-likelihood                                       max =      .2841
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
outcome      |
  covariate1 | -.0110874   .0059492   .000344  -.0109953  -.0228833   .0004529
  covariate2 | -.1876556   .1218701   .006668  -.1864473  -.4312361     .05925
       _cons |  .6759434   .1800871   .037924   .6768317   .3181374   1.030745
-------------+----------------------------------------------------------------
Group2       |
    U:sigma2 |  .0267817   .0142745   .001079   .0236498   .0091008   .0632229
-------------+----------------------------------------------------------------
_all         |
    V:sigma2 |  .0328675   .0291046   .003738   .0227667   .0049685   .1147533
  V19:sigma2 |  648.4501   21147.34   396.721   .0976229   .0059268   45.86536
-------------+----------------------------------------------------------------
Group1       |
   W0:sigma2 |  .0076228    .005156   .000306   .0063366   .0020867   .0204744
   W1:sigma2 |  .0304866   .0299592   .002302    .020928   .0038986   .1133497
-------------+----------------------------------------------------------------
e.outcome    |
      sigma2 |   .019182   .0033781   .000083   .0188363   .0135053   .0268277
------------------------------------------------------------------------------
Note: Default priors are used for model parameters.
Note: There is a high autocorrelation after 500 lags.
 
.
. log close

Best,
Alex

Attached Files

test_data.dta (5.9 KB, 1 view)

Last edited by Alexander Koplenig; 30 May 2023, 01:16.

Tags: None

Alexander Koplenig

Join Date: Jul 2014
Posts: 39

30 May 2023, 02:06

PS: Further tests show that Stata doesn't crash with a burn-in period of 200,000 if any of the random coefficients are removed, e.g. running

Code:

mixed outcome covariate1 covariate2 || _all: R.Group2 || _all: inter*, cov(identity) nocons || Group1:

Code:

mixed outcome covariate1 covariate2 || _all: R.Group2 || Group1: covariate

instead of

Code:

mixed outcome covariate1 covariate2 || _all: R.Group2 || _all: inter*, cov(identity) nocons || Group1: covariate2

doesn't lead to a crash.

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4423
#3

30 May 2023, 21:28

Originally posted by Alexander Koplenig View Post

PS: Further tests show that Stata doesn't crash with a burn-in period of 200,000 if any of the random coefficients are removed . . .

I take that you really do intend to have the same covariate as the random slope for both grouping factors.

It's not entirely clear why V19:sigma2 is broken out from all of the other V's variances, which are pooled under V:sigma2. I suppose that it has to do with the linear dependence of the parameters, coupled with the noconstant option, but its value seems anomalous, poorly estimated and grossly inconsistent with the maximum likelihood estimate (which has collapsed to zero). You might want to look into constraining V19's to be equal to that pooled for the others.

Have you considered weakly informative priors, say, N(0, 1), for all parameters (variances included)? That might obviate the need for extraordinary burn-in. Stata's defaults are really just placeholders, I think.
Comment
Alexander Koplenig

Join Date: Jul 2014

Posts: 39
#4

30 May 2023, 23:42

Thanks Joseph for your reply.

I probably should have made it clearer that the test dataset is just a sample of a larger dataset I'm working with and that I wanted to keep the model specification to the bare minimum for illustration purposes, so I left out everything from choosing different priors (thanks for the valuable link!) to simulating more than one chain.

The problem I wanted to point out is that for such models, on the machine I'm working on, everything works fine for shorter burn-in periods (here 150,000 and below), but for longer burn-in periods (here 200,000 and above) Stata crashes. By "crashes" I mean that Stata exits without any error message.
Thanks again,
Alex

Last edited by Alexander Koplenig; 30 May 2023, 23:47.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4423
#5

31 May 2023, 00:30

Yeah, it's probably a bug, but an extreme edge case: burn-ins of hundreds of thousands shouldn't be necessary in any model, and long before reaching that point I'd have stepped back and reassessed things.

The bigger problem to me is how Stata's handling the hyperparameterization of the Vs. It doesn't make sense to me that Stata is factoring V19:sigma2 out from V:sigma2, and when it does it gives rise to nonsensical results. If I were to report anything to StataCorp, it would be that.
Comment

Alexander Koplenig

Join Date: Jul 2014
Posts: 39

31 May 2023, 01:31

Good point.
The following example shows that the same happens when using bayes: for Isabel’s original example.
Once again, the parameterizations of the Vs seems strange

Code:

.
. *Get data via: net from http://www.stata-press.com/data/mlmus2/
.
. use homework, clear
 
.
. tab region, gen(id_reg)
 
     region |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        100       19.27       19.27
          2 |        254       48.94       68.21
          3 |        165       31.79      100.00
------------+-----------------------------------
      Total |        519      100.00
 
.
. unab idvar: id_reg*
 
. foreach v of local idvar{
  2.     gen inter`v' = meanses*`v'
  3. }
 
.
. *--------------------------------------------*
. *Run frequentist mixed model
. mixed math meanses parented public || _all: R.region ||  _all:inter*, cov(identity) nocons || urban: public
 
Performing EM optimization ...
 
Performing gradient-based optimization:
Iteration 0:  Log likelihood = -1871.5973 
Iteration 1:  Log likelihood = -1871.5973 
 
Computing standard errors ...
 
Mixed-effects ML regression                             Number of obs =    519
 
        Grouping information
        -------------------------------------------------------------
                        |     No. of       Observations per group
         Group variable |     groups    Minimum    Average    Maximum
        ----------------+--------------------------------------------
                   _all |          1        519      519.0        519
                  urban |          3        122      173.0        242
        -------------------------------------------------------------
 
                                                        Wald chi2(3)  =  66.24
Log likelihood = -1871.5973                             Prob > chi2   = 0.0000
 
------------------------------------------------------------------------------
        math | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
     meanses |   3.243719   1.749818     1.85   0.064    -.1858614    6.673299
    parented |   2.536147   .3635308     6.98   0.000      1.82364    3.248654
      public |   1.456702   2.323287     0.63   0.531    -3.096857     6.01026
       _cons |   42.12556   2.868475    14.69   0.000     36.50345    47.74766
------------------------------------------------------------------------------
 
------------------------------------------------------------------------------
  Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
_all: Identity               |
               var(R.region) |   5.905095   6.292358      .7314804    47.67066
-----------------------------+------------------------------------------------
_all: Identity               |
  var(interi~1..interi~3)(1) |   3.910939   5.800271      .2137338    71.56305
-----------------------------+------------------------------------------------
urban: Independent           |
                 var(public) |   10.70444   11.65825      1.266243    90.49221
                  var(_cons) |    11.6372   12.13898      1.506416    89.89848
-----------------------------+------------------------------------------------
               var(Residual) |   75.94031   4.780382      67.12587     85.9122
------------------------------------------------------------------------------
LR test vs. linear model: chi2(4) = 21.77                 Prob > chi2 = 0.0002
(1) interid_reg1 interid_reg2 interid_reg3
 
Note: LR test is conservative and provided only for reference.
 
.
. *--------------------------------------------*
. *Run bayesian mixed model
. bayes, burnin(10000) dots(1000) rseed(42) saving(test_simulation, replace): mixed math parented meanses public || _all: R.region ||  _all:inter*, cov(identity) nocons || urban: public
note: Gibbs sampling is used for regression coefficients and some variance components.
 
Burn-in 10000 aaaaaaaaaa done
Simulation 10000 .......... done
 
file test_simulation.dta not found; file saved.
 
Multilevel structure
------------------------------------------------------------------------------
region
    {U0}: random intercepts
 
_all
    {V0}: random intercepts
    {V1}: random coefficients for interid_reg1
    {V2}: random coefficients for interid_reg2
    {V3}: random coefficients for interid_reg3
 
urban
    {W0}: random intercepts
    {W1}: random coefficients for public
------------------------------------------------------------------------------
 
Model summary
------------------------------------------------------------------------------
Likelihood:
  math ~ normal(xb_math,{e.math:sigma2})
 
Priors:
  {math:parented meanses public _cons} ~ normal(0,10000)                   (1)
                                  {U0} ~ normal(0,{U:sigma2})              (1)
                            {V0 V1 V2} ~ normal(0,{V:sigma2})              (1)
                                  {V3} ~ normal(0,{V3:sigma2})             (1)
                                  {W0} ~ normal(0,{W0:sigma2})             (1)
                                  {W1} ~ normal(0,{W1:sigma2})             (1)
                       {e.math:sigma2} ~ igamma(.01,.01)
 
Hyperpriors:
   {U:sigma2} ~ igamma(.01,.01)
   {V:sigma2} ~ igamma(.01,.01)
  {V3:sigma2} ~ igamma(.01,.01)
  {W0:sigma2} ~ igamma(.01,.01)
  {W1:sigma2} ~ igamma(.01,.01)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_math.
 
Bayesian multilevel regression                   MCMC iterations  =     20,000
Metropolis–Hastings and Gibbs sampling           Burn-in          =     10,000
                                                 MCMC sample size =     10,000
 
-------------------------------------------------------------
                |     No. of       Observations per group
 Group variable |     groups    Minimum    Average    Maximum
----------------+--------------------------------------------
           _all |          1        519      519.0        519
          urban |          1        519      519.0        519
-------------------------------------------------------------
 
                                                 Number of obs    =        519
                                                 Acceptance rate  =      .6513
                                                 Efficiency:  min =    .001324
                                                              avg =      .2068
Log marginal-likelihood                                       max =      .8525
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
math         |
    parented |  2.541958   .3708861   .004017   2.544392   1.822191   3.270409
     meanses |  1.348668   4.698861   1.00379   1.995458    -11.182   8.596084
      public |  .0095209   12.89427   3.45928   1.249603  -30.65934   25.40547
       _cons |  36.88891   14.84543    4.0792   37.47589   9.191595   62.45532
-------------+----------------------------------------------------------------
region       |
    U:sigma2 |  198.0477   868.0009    55.214    37.0407   1.510005   1280.511
-------------+----------------------------------------------------------------
_all         |
    V:sigma2 |  56.34161   81.79127   20.4278   14.34443   .3528027   282.8328
   V3:sigma2 |    616315   2.44e+07    383397   5.581477   .0124682    8683.06
-------------+----------------------------------------------------------------
urban        |
   W0:sigma2 |   223.604   1060.853   62.7861   48.30993   1.778735   1303.457
   W1:sigma2 |  471.6207   1965.668   162.069   90.02992   1.610921   3102.505
-------------+----------------------------------------------------------------
e.math       |
      sigma2 |  78.48072   6.199874   .072292   77.83189    68.3175   92.34401
------------------------------------------------------------------------------
Note: Default priors are used for model parameters.
Note: There is a high autocorrelation after 500 lags.
 
. estimates save test_bayes, replace
file test_bayes.ster saved
 
. log close

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4423
#7

31 May 2023, 06:34

Originally posted by Alexander Koplenig View Post

The following example shows that the same happens when using bayes: for Isabel’s original example.
Once again, the parameterizations of the Vs seems strange

Agreed. It's spewing junk.

I recommend contacting StataCorp's technical support about this.
Comment

Alexander Koplenig

Join Date: Jul 2014
Posts: 39

31 May 2023, 08:32

Will do!
Out of curiosity, I tried out your N(0, 1) suggestion for the priors. Below are the results for a burn-in period of 10,000 and 100,000.
10,000

Code:

. bayes, prior({math:} {V:sigma2} {U:sigma2} {V3:sigma2} {W0:sigma2} {W1:sigma2} , normal(0,1)) burnin(10000) dots(1000) rseed(42) saving(test_simulation, replace): mixed math parented meanses public || _all: R.region ||  _a
> ll:inter*, cov(identity) nocons || urban: public
note: Gibbs sampling is used for some variance components.
 
Burn-in 10000 aaaaaaaaaa done
Simulation 10000 .......... done
 
file test_simulation.dta saved.
 
Multilevel structure
------------------------------------------------------------------------------
region
    {U0}: random intercepts
 
_all
    {V0}: random intercepts
    {V1}: random coefficients for interid_reg1
    {V2}: random coefficients for interid_reg2
    {V3}: random coefficients for interid_reg3
 
urban
    {W0}: random intercepts
    {W1}: random coefficients for public
------------------------------------------------------------------------------
 
Model summary
------------------------------------------------------------------------------
Likelihood:
  math ~ normal(xb_math,{e.math:sigma2})
 
Priors:
  {math:parented meanses public _cons} ~ normal(0,1)                       (1)
                                  {U0} ~ normal(0,{U:sigma2})              (1)
                            {V0 V1 V2} ~ normal(0,{V:sigma2})              (1)
                                  {V3} ~ normal(0,{V3:sigma2})             (1)
                                  {W0} ~ normal(0,{W0:sigma2})             (1)
                                  {W1} ~ normal(0,{W1:sigma2})             (1)
                       {e.math:sigma2} ~ igamma(.01,.01)
 
Hyperpriors:
   {V:sigma2} ~ normal(0,1)
   {U:sigma2} ~ normal(0,1)
  {V3:sigma2} ~ normal(0,1)
  {W0:sigma2} ~ normal(0,1)
  {W1:sigma2} ~ normal(0,1)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_math.
 
Bayesian multilevel regression                   MCMC iterations  =     20,000
Metropolis–Hastings and Gibbs sampling           Burn-in          =     10,000
                                                 MCMC sample size =     10,000
 
-------------------------------------------------------------
                |     No. of       Observations per group
 Group variable |     groups    Minimum    Average    Maximum
----------------+--------------------------------------------
           _all |          1        519      519.0        519
          urban |          1        519      519.0        519
-------------------------------------------------------------
 
                                                 Number of obs    =        519
                                                 Acceptance rate  =      .4091
                                                 Efficiency:  min =    .003519
                                                              avg =     .05733
Log marginal-likelihood                                       max =      .3529
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
math         |
    parented |  4.227816   .3049179    .03058   4.215109    3.65802   4.863543
     meanses |  .0491451   .8224896   .059018   .0223693  -1.491569   1.776059
      public |  2.123705   .7945473   .058257   2.116638   .5409869   3.643668
       _cons |  4.746302   .9495195   .069709   4.774811    2.76222   6.617982
-------------+----------------------------------------------------------------
region       |
    U:sigma2 |  3.746065   .8538998   .076514   3.722601   2.143132   5.500148
-------------+----------------------------------------------------------------
_all         |
    V:sigma2 |   5.28936   .5071606   .085495    5.20253   4.428834    6.24269
   V3:sigma2 |  .8836466   .6373397   .022307   .7524564     .04531   2.435369
-------------+----------------------------------------------------------------
urban        |
   W0:sigma2 |  3.765399   .8039043   .065444   3.748756   2.219402   5.369989
   W1:sigma2 |  1.446508   .7196328   .035482   1.369089   .2560512   2.996493
-------------+----------------------------------------------------------------
e.math       |
      sigma2 |  83.14786   6.124751   .103099   82.75787   72.33765   96.36645
------------------------------------------------------------------------------
Note: Default priors are used for some model parameters.
Note: There is a high autocorrelation after 500 lags.
Note: Adaptation tolerance is not met in at least one of the blocks.

100,000

Code:

. bayes, prior({math:} {V:sigma2} {U:sigma2} {V3:sigma2} {W0:sigma2} {W1:sigma2} , normal(0,1)) burnin(100000) dots(1000) rseed(42) saving(test_simulation, replace): mixed math parented meanses public || _all: R.region ||  _
> all:inter*, cov(identity) nocons || urban: public
note: Gibbs sampling is used for some variance components.
 
Burn-in 100000 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa done
Simulation 10000 .......... done
 
file test_simulation.dta not found; file saved.
 
Multilevel structure
------------------------------------------------------------------------------
region
    {U0}: random intercepts
 
_all
    {V0}: random intercepts
    {V1}: random coefficients for interid_reg1
    {V2}: random coefficients for interid_reg2
    {V3}: random coefficients for interid_reg3
 
urban
    {W0}: random intercepts
    {W1}: random coefficients for public
------------------------------------------------------------------------------
 
Model summary
------------------------------------------------------------------------------
Likelihood:
  math ~ normal(xb_math,{e.math:sigma2})
 
Priors:
  {math:parented meanses public _cons} ~ normal(0,1)                       (1)
                                  {U0} ~ normal(0,{U:sigma2})              (1)
                            {V0 V1 V2} ~ normal(0,{V:sigma2})              (1)
                                  {V3} ~ normal(0,{V3:sigma2})             (1)
                                  {W0} ~ normal(0,{W0:sigma2})             (1)
                                  {W1} ~ normal(0,{W1:sigma2})             (1)
                       {e.math:sigma2} ~ igamma(.01,.01)
 
Hyperpriors:
   {V:sigma2} ~ normal(0,1)
   {U:sigma2} ~ normal(0,1)
  {V3:sigma2} ~ normal(0,1)
  {W0:sigma2} ~ normal(0,1)
  {W1:sigma2} ~ normal(0,1)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_math.
 
Bayesian multilevel regression                   MCMC iterations  =    110,000
Metropolis–Hastings and Gibbs sampling           Burn-in          =    100,000
                                                 MCMC sample size =     10,000
 
-------------------------------------------------------------
                |     No. of       Observations per group
 Group variable |     groups    Minimum    Average    Maximum
----------------+--------------------------------------------
           _all |          1        519      519.0        519
          urban |          1        519      519.0        519
-------------------------------------------------------------
 
                                                 Number of obs    =        519
                                                 Acceptance rate  =      .4389
                                                 Efficiency:  min =    .002596
                                                              avg =      .0235
Log marginal-likelihood                                       max =     .08252
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
-------------+----------------------------------------------------------------
math         |
    parented |  4.111427    .332615   .065279   4.100525    3.52727   4.874083
     meanses |  .0550377   .8415122   .085535   .0217303  -1.568142   1.792977
      public |  2.029672   .8472618   .082827   2.071854   .1924822   3.618146
       _cons |  4.724996   .9894112   .149306   4.772051   2.661189   6.558499
-------------+----------------------------------------------------------------
region       |
    U:sigma2 |  3.922214   .8262815   .072049    3.92789   2.309534   5.510978
-------------+----------------------------------------------------------------
_all         |
    V:sigma2 |  5.178442   .6120028   .093315   5.191006   3.970948   6.365479
   V3:sigma2 |  .8555653   .6095647    .02122   .7479953   .0509334   2.308674
-------------+----------------------------------------------------------------
urban        |
   W0:sigma2 |  3.851325   .8386632   .084985   3.830769   2.264572   5.528421
   W1:sigma2 |  1.481297   .7378747   .039616   1.438336   .2123395   3.057861
-------------+----------------------------------------------------------------
e.math       |
      sigma2 |  82.06462    5.77319   .229178   81.67425   71.84723   94.25149
------------------------------------------------------------------------------
Note: Default priors are used for some model parameters.
Note: There is a high autocorrelation after 500 lags.

So you're right, with weakly informative priors, longer burn-ins don't help that much.
Thanks!
Alex

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4423
#9

31 May 2023, 18:41

Originally posted by Alexander Koplenig View Post

Will do!

Thank you. I was regrettably harsh in my choice of terms above, but it will help everyone to bring these findings to the attention of the company so that future users interested in Bayesian analysis won't have to blaze the trail.
Comment
Mario Ferri

Join Date: Jul 2019

Posts: 190
#10

01 Jun 2023, 11:59

II founded that while running a BVAR, the -bayes:var- command, it triggered a -conformability error-. It cannot handle the so-called Large VARs.
Furthermore, on -bayes , prior assumption is a key. Is there a way to have Stata adopt priors for BVARs in line to Domenico Giannone, Michele Lenza, and Giorgio E. Primiceri , REVSTAT 2015?
https://hedibert.org/wp-content/uplo...15-REVSTAT.pdf
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4423
#11

01 Jun 2023, 19:25

Originally posted by Mario Ferri View Post

II founded that while running a BVAR, the -bayes:var- command, it triggered a -conformability error-.

You'll get better odds of a reply if you start a dedicated thread with a descriptive title. It'll help, too, if you follow the other recommendations in the forum's FAQ regarding example dataset, what you typed & got back etc.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4423
#12

01 Jun 2023, 21:18

Originally posted by Alexander Koplenig View Post

I would like to point out a problem that occurs when running various rather complex Bayesian multilevel models using the bayes prefix. . . .I try to run a crossed effects model with two ('fixed') covariates, two crossed effects and random coefficients ('slopes') in both levels . .

If I’ve read the user’s manual correctly, then it’s easy in bayesmh to specify a cross-classified random effects model with random slopes for each grouping variable—actually easier in bayesmh natively than with bayes: mixed. See below for an example fitting your sought-after model using your attached dataset. (I’ve shortened the variable names for readability.)

That’s the good news. The bad news is that you’ll discover an ugly secret about the corrugated surface of the likelihoods of such models, especially with relatively few members of the grouping factors with which to define their variances. In explorations with various simulated datasets, I’ve seen traces of multiple chains running in parallel across the graph, each chain happily ensconced in its own local maximum, and I’ve seen cases where the second chain flatlines, scarcely budging from initial values.

So, you can try something along the lines below, but unless you have a lot of well-behaved data, expect the traces to jump from one local maximum to another giving bimodal posterior density histograms like those seen when you run the code below. And expect that the model fit to be relatively sensitive to choice of priors and initial values. I know that what you attached to your original post is only a subset of your dataset, but given that three of your four variance components there are not estimable and have collapsed to zero in the maximum likelihood fit with mixed, you might find that a more parsimonious model will be required in order to get a solid fit even with the full dataset.

Code:

version 18.0 clear * use test_data rename (covariate# Group# outcome) (co# gr# out) quietly tabulate gr2, generate(c) local r `r(r)' foreach var of varlist c1-c`r' { generate double g`var' = `var' * co2 } /* mixed outcome covariate1 covariate2 || _all: R.Group2 || _all: inter*, cov(identity) nocons || Group1: covariate2 */ mixed out c.(co?) /// || _all: R.gr2 || _all:gc1-gc`r', covariance(identity) noconstant /// || gr1: co2, nogroup nolrtest nolog * * Begin here * bayesmh out c.co1 c.co2 U0[gr1] U1[gr2] c.co2#U2[gr1] c.co2#U3[gr2], /// likelihood(normal({Var})) /// prior({out:}, normal(1)) /// prior({var_U0 var_U1 var_U2 var_U3 Var}, igamma(0.1, 0.1)) /// rseed(466245242) thinning(5) set more on foreach graph in trace ac histogram { foreach parm in co1 co2 _cons var_U0 var_U1 var_U2 var_U3 Var { bayesgraph `graph' {`parm'} more } } set more off exit
3 likes
Comment

Announcement

bayes, ... mixed: ... crashes for large burn-in periods

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment