The new fracreg command in Stata 14 lets you estimate logit models for fractional models, e.g. models where the dependent variable is a proportion that varies between 0 and 1. I am trying to simulate data for fracreg but I seem to be doing something wrong. No matter what seed I use, the fracreg estimates are attenuated, i.e. less that I set them up to be. I generally don't have such problems when I do simulations -- so I don't know if I am doing something wrong or if estimates are inherently attenuated or what. Here is a simple example:
Here is selected output:
As you can see, regress gives me numbers like I would expect. But fracreg gives estimates about a third smaller than I should have gotten. It doesn't seem to be a sampling fluke, as I get similar results with other seeds.
Is my approach inherently flawed? Am I generating yprob and/or ystar wrong?
Code:
clear all
set obs 1000
set rng kiss32
set seed 125
gen x1 = rnormal()
* e1 has a standard logistic distribution
gen e1 = rlogistic()
local b1 = 1
local b0 = 1
gen ystar = `b0' + `b1'*x1 + e1
reg ystar x1
gen yprob = invlogit(ystar)
fracreg logit yprob x1, nolog
Code:
. gen ystar = `b0' + `b1'*x1 + e1
. reg ystar x1
Source | SS df MS Number of obs = 1,000
-------------+---------------------------------- F(1, 998) = 328.64
Model | 961.8941 1 961.8941 Prob > F = 0.0000
Residual | 2921.02103 998 2.92687478 R-squared = 0.2477
-------------+---------------------------------- Adj R-squared = 0.2470
Total | 3882.91513 999 3.88680193 Root MSE = 1.7108
------------------------------------------------------------------------------
ystar | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .990443 .0546346 18.13 0.000 .8832311 1.097655
_cons | .9569887 .0541051 17.69 0.000 .8508158 1.063162
------------------------------------------------------------------------------
. gen yprob = invlogit(ystar)
. fracreg logit yprob x1, nolog
Fractional logistic regression Number of obs = 1,000
Wald chi2(1) = 301.46
Prob > chi2 = 0.0000
Log pseudolikelihood = -602.71384 Pseudo R2 = 0.0751
------------------------------------------------------------------------------
| Robust
yprob | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 | .7081039 .0407833 17.36 0.000 .6281701 .7880376
_cons | .6438433 .0384121 16.76 0.000 .568557 .7191295
------------------------------------------------------------------------------
Is my approach inherently flawed? Am I generating yprob and/or ystar wrong?

Comment