I encountered a surprising phenomenon while trying out the `oprobit` and `ologit` commands. Many texts (including the Stata documentation) relate these to a data generating process in which a latent continuous variable y* is binned into observed bins y. My understanding was that these models are intended to estimate:
1) The parameters of the latent continuous variable y*
2) The cutpoints separating the bins of y* into y
The following Stata code produced unusual results:
The surprising thing was that the cutpoints were not -0.5 and +0.5, as I would've expected. Furthermore, the cutpoints are systematically biased away from, rather than towards the unconditional mean. To verify this, I ran the above code again with more observations, and got the following
I verified that similar results obtain with `rlogistic` and the `ologit`. As a final check, I verified that the estimated cutpoints are still incorrect even when using the `offset` option
This seems like a straightforward example, so I'm a bit stuck as to what is causing this.
1) The parameters of the latent continuous variable y*
2) The cutpoints separating the bins of y* into y
The following Stata code produced unusual results:
Code:
clear * local N=999 set obs `=`N'*3' gen grp = mod(_n, 3) gen grp_mean = . replace grp_mean = -1 if grp == 0 replace grp_mean = +0 if grp == 1 replace grp_mean = +1 if grp == 2 gen x = grp_mean + rnormal() oprobit grp x
Code:
Iteration 0: Log likelihood = -3292.541 Iteration 1: Log likelihood = -2566.0424 Iteration 2: Log likelihood = -2562.1137 Iteration 3: Log likelihood = -2562.1062 Iteration 4: Log likelihood = -2562.1062 Ordered probit regression Number of obs = 2,997 LR chi2(1) = 1460.87 Prob > chi2 = 0.0000 Log likelihood = -2562.1062 Pseudo R2 = 0.2218 ------------------------------------------------------------------------------ grp | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- x | .7311482 .0209857 34.84 0.000 .6900169 .7722795 -------------+---------------------------------------------------------------- /cut1 | -.5895484 .0275614 -.6435677 -.5355291 /cut2 | .6169286 .0277134 .5626113 .6712458 ------------------------------------------------------------------------------
Code:
Ordered probit regression Number of obs = 299,997 LR chi2(1) = 151649.88 Prob > chi2 = 0.0000 Log likelihood = -253755.45 Pseudo R2 = 0.2301 ------------------------------------------------------------------------------ grp | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- x | .7549289 .0021425 352.35 0.000 .7507297 .7591282 -------------+---------------------------------------------------------------- /cut1 | -.6116182 .0027802 -.6170672 -.6061691 /cut2 | .6096994 .0027812 .6042484 .6151504 ------------------------------------------------------------------------------
Code:
. oprobit grp, offset(x) Iteration 0: Log likelihood = -272298.18 Iteration 1: Log likelihood = -259882.53 Iteration 2: Log likelihood = -259871.17 Iteration 3: Log likelihood = -259871.17 Ordered probit regression Number of obs = 299,997 Log likelihood = -259871.17 ------------------------------------------------------------------------------ grp | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- x | 1 (offset) -------------+---------------------------------------------------------------- /cut1 | -.6813364 .002864 -.6869497 -.6757231 /cut2 | .6785018 .0028682 .6728803 .6841234 ------------------------------------------------------------------------------
Comment