So I am working my way through [BAYES] bayesmh examples, and here we go with the mixed effects (random intercept) logistic regression:
Well that blew spectacularly, with the mean efficiency of 0.017. Turns out the version of the data set that is currently available on Stata website has an additional variable children that is a linear combination of child1, child2 and child3:
When you run
on these data, it of course drops that variable. So the combination of the code in the manual and the data actually available does not deliver the expected result.
But that's OK, we can use a different wildcard, and let us also try several chains so as to get the Gelman-Rubin Rhat statistics:
Wait what? Rc of 48??? How is that even technically possible?
Well, let's look at bayesgraph diagnostics {district:var}, sepchains of that parameter -- chain 3 did not converge, producing values between 5 and 10; chain 4 is even worse with apparently stable values between 20 and 80; and chain 5 hitting it way out of the field with values between 400 and 1000. If anything, chains 4 and 5 may look to have the best convergence, with autocorrelations lower than 0.2 in absolute value. Except they produce nonsensical values: this is logistic regression, so a coefficient or a random effect of 10 translates to probability of exp(-10) = .0000454 or 1 - exp(-10) = .9999546. The likelihood is essentially flat in that area (because all predicted probabilities are either 0 or 1), the chain learns nothing, and the posterior is probably the same as the prior.
A way to kick in sense to this model is to specify moderately informative priors. We should not expect regression coefficients to be much greater than 1, nor random effects have variance greater than 1, so this should work:
And it does.
HTH the Bayesian modelers out there.
Code:
webuse bangladesh, clear set seed 14 bayesmh c_use urban age child*, likelihood(logit) reffects(district) prior({c_use:i.district}, normal(0,{district:var})) prior({c_use:urban age child* _cons}, normal(0, 100)) prior({district:var}, igamma(0.01,0.01)) block({district:var}, gibbs) dots mcmcsize(2000)
Code:
assert children == child1 + 2*child2 + 3*child3
Code:
meglm
But that's OK, we can use a different wildcard, and let us also try several chains so as to get the Gelman-Rubin Rhat statistics:
Code:
set seed 255 bayesmh c_use urban age child?, likelihood(logit) reffects(district) prior({c_use:i.district}, normal(0,{district:var})) prior({c_use:urban age child? _cons}, normal(0, 100)) prior({district:var}, igamma(0.01,0.01)) block({district:var}, gibbs) dots mcmcsize(2000) nchains(5) burnin(5000) (output omitted) Avg efficiency: min = .008525 avg = .04285 max = .2125 Avg log marginal-likelihood = -1351.5279 Max Gelman-Rubin Rc = 48.4 (output omitted) district | var | 144.8097 303.6842 6.58754 6.613569 .1411951 830.7021
Well, let's look at bayesgraph diagnostics {district:var}, sepchains of that parameter -- chain 3 did not converge, producing values between 5 and 10; chain 4 is even worse with apparently stable values between 20 and 80; and chain 5 hitting it way out of the field with values between 400 and 1000. If anything, chains 4 and 5 may look to have the best convergence, with autocorrelations lower than 0.2 in absolute value. Except they produce nonsensical values: this is logistic regression, so a coefficient or a random effect of 10 translates to probability of exp(-10) = .0000454 or 1 - exp(-10) = .9999546. The likelihood is essentially flat in that area (because all predicted probabilities are either 0 or 1), the chain learns nothing, and the posterior is probably the same as the prior.
A way to kick in sense to this model is to specify moderately informative priors. We should not expect regression coefficients to be much greater than 1, nor random effects have variance greater than 1, so this should work:
Code:
set seed 255 bayesmh c_use urban age child?, likelihood(logit) reffects(district) prior({c_use:i.district}, normal(0,{district:var})) prior({c_use:urban age child? _cons}, normal(0, 5)) prior({district:var}, igamma(0.25,0.125)) block({district:var}, gibbs) dots mcmcsize(2000) nchains(5) burnin(5000)
HTH the Bayesian modelers out there.
Comment