The new fracreg command in Stata 14 lets you estimate logit models for fractional models, e.g. models where the dependent variable is a proportion that varies between 0 and 1. I am trying to simulate data for fracreg but I seem to be doing something wrong. No matter what seed I use, the fracreg estimates are attenuated, i.e. less that I set them up to be. I generally don't have such problems when I do simulations -- so I don't know if I am doing something wrong or if estimates are inherently attenuated or what. Here is a simple example:
Here is selected output:
As you can see, regress gives me numbers like I would expect. But fracreg gives estimates about a third smaller than I should have gotten. It doesn't seem to be a sampling fluke, as I get similar results with other seeds.
Is my approach inherently flawed? Am I generating yprob and/or ystar wrong?
Code:
clear all set obs 1000 set rng kiss32 set seed 125 gen x1 = rnormal() * e1 has a standard logistic distribution gen e1 = rlogistic() local b1 = 1 local b0 = 1 gen ystar = `b0' + `b1'*x1 + e1 reg ystar x1 gen yprob = invlogit(ystar) fracreg logit yprob x1, nolog
Code:
. gen ystar = `b0' + `b1'*x1 + e1 . reg ystar x1 Source | SS df MS Number of obs = 1,000 -------------+---------------------------------- F(1, 998) = 328.64 Model | 961.8941 1 961.8941 Prob > F = 0.0000 Residual | 2921.02103 998 2.92687478 R-squared = 0.2477 -------------+---------------------------------- Adj R-squared = 0.2470 Total | 3882.91513 999 3.88680193 Root MSE = 1.7108 ------------------------------------------------------------------------------ ystar | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | .990443 .0546346 18.13 0.000 .8832311 1.097655 _cons | .9569887 .0541051 17.69 0.000 .8508158 1.063162 ------------------------------------------------------------------------------ . gen yprob = invlogit(ystar) . fracreg logit yprob x1, nolog Fractional logistic regression Number of obs = 1,000 Wald chi2(1) = 301.46 Prob > chi2 = 0.0000 Log pseudolikelihood = -602.71384 Pseudo R2 = 0.0751 ------------------------------------------------------------------------------ | Robust yprob | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | .7081039 .0407833 17.36 0.000 .6281701 .7880376 _cons | .6438433 .0384121 16.76 0.000 .568557 .7191295 ------------------------------------------------------------------------------
Is my approach inherently flawed? Am I generating yprob and/or ystar wrong?
Comment