Hi Statalist,
I would like to ask whether Stata's Bayesain analysis command bayesmh can do data augmentation to handle missing data or to make the estimation easier.
For the sake of demonstration, let's take the Probit model for example. Using Stata's built-in command, we can estimate the Probit model by:
However, I would like to learn how to use bayesmh evaluator to programme the algorithm of data augmentation.
If I understand correctly, the data augmentation algorithm of the Probit model can be done by two steps:
1. simulate the latent dependent variable y* from the truncated normal distributions, given the observed binary-choice dependent variable y, independent variable X, and the model parameters B.
2. simulate the model parameters B from the posterior density of the linear regression using y*, instead of y, as the dependent variable.
For more details about data augmentation, please refer to Simon Jackman's, Bayesian Analysis for the Social Sciences, page 381-2 (https://books.google.co.uk/books?id=...page&q&f=false).
-------------------------------------------------------

----- Sorry. If showing this picture here violates any rules, I'll remove it. -------------------------
Here is my code for doing data augmentation, but apparently it produced wrong result, and I don't know how to fix it. Could anyone help me to sort out this problem? Thank you very much.
---
Cheers,
Kirin
I would like to ask whether Stata's Bayesain analysis command bayesmh can do data augmentation to handle missing data or to make the estimation easier.
For the sake of demonstration, let's take the Probit model for example. Using Stata's built-in command, we can estimate the Probit model by:
Code:
use "http://www.stata-press.com/data/r14/auto", clear set seed 1234 bayesmh foreign mpg, likelihood(probit) /// prior({foreign:mpg }, normal(0,10000)) /// prior({foreign:_cons}, normal(100,0.5)) dots Bayesian probit regression MCMC iterations = 12,500 Random-walk Metropolis-Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 74 Acceptance rate = .2111 Efficiency: min = .08724 avg = .08743 Log marginal likelihood = -7390.0051 max = .08762 ------------------------------------------------------------------------------ | Equal-tailed foreign | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- mpg | -1.012607 .0132892 .00045 -1.01248 -1.03923 -.9864686 _cons | 20.43789 .1596249 .005393 20.43533 20.12213 20.7582 ------------------------------------------------------------------------------
However, I would like to learn how to use bayesmh evaluator to programme the algorithm of data augmentation.
If I understand correctly, the data augmentation algorithm of the Probit model can be done by two steps:
1. simulate the latent dependent variable y* from the truncated normal distributions, given the observed binary-choice dependent variable y, independent variable X, and the model parameters B.
2. simulate the model parameters B from the posterior density of the linear regression using y*, instead of y, as the dependent variable.
For more details about data augmentation, please refer to Simon Jackman's, Bayesian Analysis for the Social Sciences, page 381-2 (https://books.google.co.uk/books?id=...page&q&f=false).
-------------------------------------------------------
----- Sorry. If showing this picture here violates any rules, I'll remove it. -------------------------
Here is my code for doing data augmentation, but apparently it produced wrong result, and I don't know how to fix it. Could anyone help me to sort out this problem? Thank you very much.
Code:
cap prog drop ProbitLnP prog ProbitLnP version 14 args lnp xb /* simulate ystar from the truncate normal(0,+inf) if y=1*/ tempvar ystar qui gen double `ystar' = invnormal(0.5+runiform()*(1-0.5))*1 + `xb' /// if ${MH_touse} & ${MH_y}==1 /* simulate ystar from the truncate normal(-inf,0) if y=0*/ qui replace `ystar' = invnormal(0 +runiform()*(0.5-0))*1 + `xb' /// if ${MH_touse} & ${MH_y}==0 /* compute log likelihood of the simple linear regression using y* as the dep. var. */ tempvar lnfj qui gen double `lnfj' = lnnormalden(`ystar', `xb', 1) if ${MH_touse} qui summ `lnfj', meanonly if r(N) < $MH_n { scalar ‘lnp’ = . exit } tempname lnf scalar `lnf' = r(sum) /* compute log prior of the parameters */ tempname lnprior scalar `lnprior' = lnnormalden($MH_b[1,1], 0, sqrt(10000)) + /// lnnormalden($MH_b[1,2], 100, sqrt(0.5) ) /* compute log posterior */ scalar `lnp' = `lnf' + `lnprior' end set seed 1234 bayesmh foreign mpg, evaluator(ProbitLnP) dots Bayesian regression MCMC iterations = 12,500 Random-walk Metropolis-Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 74 Acceptance rate = .007199 Log marginal likelihood = . Efficiency = 1 ------------------------------------------------------------------------------ | Equal-tailed foreign | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- mpg | 2.234445 0 0 2.234445 2.234445 2.234445 _cons | 101.6031 0 0 101.6031 101.6031 101.6031 ------------------------------------------------------------------------------
---
Cheers,
Kirin
Comment