Covariate measurement error with binary outcomes

wbuchanan

Join Date: Mar 2014

Posts: 1361
#1

Covariate measurement error with binary outcomes

30 Mar 2019, 17:52

While it is possible to get parameter estimates corrected for measurement error using -sem- like:

Code:

use http://www.stata-press.com/data/r15/sem_rel.dta, clear sem (y <- x1) sem (x1 <- X)(y <- X), reliability(x1 0.5)

It isn't possible to use the reliability option with the -gsem- command. Additionally, the command -eivreg- is known to incorrectly estimate the standard errors for the parameter(s) of interest (see Lockwood, J. R., McCaffrey, D. F., & Savage, C. (2017). Errors-in-variables: Why Stata's -eivreg- is wrong and what to do instead. Presented at 2017 Society for Research on Effective Education Conference. Retrieved from: https://www.sree.org/conferences/201...pdf&item=slide):

Code:

eivreg y x1, r(x1 0.5)

Additionally, the reliability option generalizes to the multilevel context as long as the model specified uses an identity link and gaussian family:

Code:

g byte school = rbinomial(4, 0.5) // This model will fail to converge nearly immediately, but at least illustrates that it is possible to deal with covariate measurement error in the context of mixed effects gsem (x1 <- X)(y <- X RE[school]), reliability(x1 0.65) link(identity) family(gaussian) difficult

So, I am wondering if anyone is aware of any ways to deal with covariate measurement error when the outcome is binary and the model includes random intercepts and a random coefficient? If context helps, we are trying to estimate the probability of students being retained in grade conditional on test scores (which are measured with a non-trivial amount of error), demographics, school-level random intercepts, and prior schooling indicators (random coefficients). I imagine it might be possible to deal with this in a Bayesian framework (as well as being able to specify priors that could keep estimates away from the boundaries), but don't have much experience with Bayesian modeling.
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4375
#2

31 Mar 2019, 00:27

Are you saying that you have only a single pupil-level measure—one test score (x1)—and that you've determined that its measurement reliability is 0.65? Perhaps you could flesh out the measurement submodel with at least two other indicators of X from among, say, demographic information, teacher comments in report cards, disciplinary events during the school year. You might then be able to obviate the need for specifying a measurement reliability value, and could proceed with gsem using a binomial distribution family and suitable link function.
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#3

31 Mar 2019, 06:29

Joseph Coveney
In the full use case there are 3 separate sub-scales that would be used in one model and a total score derived from the same measure that would be used in a second model. There are different reliabilities depending on the age of the child at the time they take the test (5yrs 0-5 months and 5yrs 6-11 months).

The goal is partly to illustrate how much measurement error can bias parameter estimates because it has some fairly substantial policy implications that are not always implicitly discussed in the Ed policy world. Ideally, we’d like to be able to identify the value of the test score that best predicts whether or not students are retained in grade in order to compare it with thresholds defined in state policy regarding school readiness. If we specify a measurement model it would be way more challenging to get at that type of interpretation; do you have any ideas whether or not it is possible to constrain a latent indicator to basically take on the same scale or easily transformed version of the underlying scale? There are two other related studies, but in both of those cases the outcome can be modeled as a Gaussian family with an identity link function.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4375
#4

31 Mar 2019, 09:30

Originally posted by wbuchanan View Post

do you have any ideas whether or not it is possible to constrain a latent indicator to basically take on the same scale or easily transformed version of the underlying scale?

Not off the top of my head. I'm not familiar enough with measurement reliability values à la eivreg to be able to translate them into corresponding sets of constraints (e.g., on loadings of indicator variables on a latent factor) in gsem.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4375
#5

01 Apr 2019, 02:49

It looks as if you can fit a generalized linear model that is analogous to a linear errors-in-variables model. The first output below explores an errors-in-variable GLM using gsem. It first sets up a true predictor (w) to generate the binomial response variable (y1). It then derives the corresponding observed predictor (x) with a reliability of 0.65, chosen from what you state. It then plows ahead and fits a logistic errors-in-variable model using gsem and compares it to the logistic regression model fitted to the true predictor. Below that, it uses the same true and observed predictors analogously in linear models (normal response variable y2) for comparison.

The logistic regression coefficient for the measured-with-error variable (1.1 ± 0.2) is similar to that for the true (1.0 ± 0.1), and the linear model gives analogous results for the same two variables (1.0 ± 0.1 and 0.9 ± 0.1, respectively).

.ÿ
.ÿversionÿ15.1

.ÿ
.ÿclearÿ*

.ÿ
.ÿsetÿseedÿ`=strreverse("1490996")'

.ÿ
.ÿquietlyÿsetÿobsÿ500

.ÿ
.ÿgenerateÿdoubleÿwÿ=ÿrnormal()ÿ//ÿTrueÿpredictor

.ÿ
.ÿquietlyÿsummarizeÿw,ÿdetail

.ÿtempnameÿsigma2n

.ÿscalarÿdefineÿ`sigma2n'ÿ=ÿ0.35ÿ/ÿ(r(Var)ÿ-ÿ0.35)ÿÿ//ÿReliabilityÿ=ÿ0.65

.ÿdisplayÿinÿsmclÿasÿtextÿ"varianceÿofÿtheÿnoiseÿ=ÿ"ÿ%04.2fÿ`sigma2n'
varianceÿofÿtheÿnoiseÿ=ÿ0.51

.ÿgenerateÿdoubleÿxÿ=ÿwÿ+ÿrnormal(0,ÿsqrt(`sigma2n'))ÿ//ÿObserved,ÿmeasured-with-error,ÿpredictor

.ÿ
.ÿquietlyÿsummarizeÿx,ÿdetail

.ÿdisplayÿinÿsmclÿasÿtextÿ"Reliabilityÿ=ÿ"ÿ%04.2fÿ1ÿ-ÿ`sigma2n'ÿ/ÿr(Var)
Reliabilityÿ=ÿ0.64

.ÿ
.ÿgenerateÿbyteÿy1ÿ=ÿrbinomial(1,ÿinvlogit(w))

.ÿ
.ÿgsemÿ(xÿ<-ÿX)ÿ(y1ÿ<-ÿX,ÿlogit),ÿreliability(xÿ0.65)ÿnocnsreportÿnodvheaderÿnolog

GeneralizedÿstructuralÿequationÿmodelÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ500
Logÿlikelihoodÿ=ÿ-1120.2249

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
xÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿXÿ|ÿÿÿ1.065426ÿÿÿ.1824374ÿÿÿÿÿ5.84ÿÿÿ0.000ÿÿÿÿÿ.7078555ÿÿÿÿ1.422997
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0217106ÿÿÿ.0535163ÿÿÿÿÿ0.41ÿÿÿ0.685ÿÿÿÿ-.0831795ÿÿÿÿ.1266007
-------------+----------------------------------------------------------------
y1ÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿXÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.0232402ÿÿÿ.1052725ÿÿÿÿ-0.22ÿÿÿ0.825ÿÿÿÿ-.2295706ÿÿÿÿ.1830902
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿvar(X)|ÿÿÿ.8191065ÿÿÿ.2748801ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.4243162ÿÿÿÿ1.581216
-------------+----------------------------------------------------------------
ÿÿÿÿÿvar(e.x)|ÿÿÿ.5022045ÿÿ(constrained)
------------------------------------------------------------------------------

.ÿlogitÿy1ÿc.w,ÿnolog

LogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ500
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿLRÿchi2(1)ÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿ89.01
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0000
Logÿlikelihoodÿ=ÿ-302.05198ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿPseudoÿR2ÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.1284

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿy1ÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿwÿ|ÿÿÿ.9585408ÿÿÿ.1158036ÿÿÿÿÿ8.28ÿÿÿ0.000ÿÿÿÿÿ.7315699ÿÿÿÿ1.185512
ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.0351204ÿÿÿ.0979405ÿÿÿÿ-0.36ÿÿÿ0.720ÿÿÿÿ-.2270802ÿÿÿÿ.1568394
------------------------------------------------------------------------------

.ÿ
.ÿgenerateÿdoubleÿy2ÿ=ÿwÿ+ÿrnormal(0,ÿsqrt(2))

.ÿ
.ÿeivregÿy2ÿc.x,ÿreliab(xÿ0.65)

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿassumedÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿErrors-in-variablesÿregression
ÿÿÿÿvariableÿÿÿÿÿreliability
----------------------------ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ500
ÿÿÿÿÿÿÿÿÿÿÿxÿÿÿÿÿÿÿ0.6500ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿF(ÿÿ1,ÿÿÿÿÿÿÿ498)ÿ=ÿÿÿÿÿ188.96
ÿÿÿÿÿÿÿÿÿÿÿ*ÿÿÿÿÿÿÿ1.0000ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿFÿÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0000
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿR-squaredÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.3686
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿRootÿMSEÿÿÿÿÿÿÿÿÿÿ=ÿÿÿÿ1.28376

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿy2ÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿxÿ|ÿÿÿ1.014609ÿÿÿ.0738097ÿÿÿÿ13.75ÿÿÿ0.000ÿÿÿÿÿ.8695927ÿÿÿÿ1.159626
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿÿ.025738ÿÿÿ.0574338ÿÿÿÿÿ0.45ÿÿÿ0.654ÿÿÿÿ-.0871043ÿÿÿÿ.1385804
------------------------------------------------------------------------------

.ÿsemÿ(xÿ<-ÿX)ÿ(y2ÿ<-ÿX),ÿreliability(xÿ0.65)ÿnocnsreportÿnoheaderÿnolog

Endogenousÿvariables

Measurement:ÿÿxÿy2

Exogenousÿvariables

Latent:ÿÿÿÿÿÿÿX
------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿOIM
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
Measurementÿÿ|
ÿÿxÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿXÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0217106ÿÿÿ.0535164ÿÿÿÿÿ0.41ÿÿÿ0.685ÿÿÿÿ-.0831795ÿÿÿÿ.1266007
ÿÿ-----------+----------------------------------------------------------------
ÿÿy2ÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿXÿ|ÿÿÿ1.014609ÿÿÿ.0879124ÿÿÿÿ11.54ÿÿÿ0.000ÿÿÿÿÿ.8423042ÿÿÿÿ1.186915
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0477658ÿÿÿÿ.072106ÿÿÿÿÿ0.66ÿÿÿ0.508ÿÿÿÿ-.0935594ÿÿÿÿÿ.189091
-------------+----------------------------------------------------------------
ÿÿÿÿÿvar(e.x)|ÿÿÿ.5012001ÿÿ(constrained)
ÿÿÿÿvar(e.y2)|ÿÿÿ1.641443ÿÿÿ.1364456ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ1.394663ÿÿÿÿÿ1.93189
ÿÿÿÿÿÿÿvar(X)|ÿÿÿ.9308002ÿÿÿ.0905677ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.7691904ÿÿÿÿ1.126365
------------------------------------------------------------------------------
LRÿtestÿofÿmodelÿvs.ÿsaturated:ÿchi2(0)ÿÿÿ=ÿÿÿÿÿÿ0.00,ÿProbÿ>ÿchi2ÿ=ÿÿÿÿÿÿ.

.ÿtempnameÿB

.ÿmatrixÿdefineÿ`B'ÿ=ÿe(b)

.ÿgsemÿ(xÿ<-ÿX)ÿ(y2ÿ<-ÿX),ÿreliability(xÿ0.65)ÿfrom(`B')ÿnocnsreportÿnodvheaderÿnolog

GeneralizedÿstructuralÿequationÿmodelÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ500
Logÿlikelihoodÿ=ÿÿ-1679.078

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
xÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿXÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0217106ÿÿÿ.0535164ÿÿÿÿÿ0.41ÿÿÿ0.685ÿÿÿÿ-.0831795ÿÿÿÿ.1266007
-------------+----------------------------------------------------------------
y2ÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿXÿ|ÿÿÿ1.015705ÿÿÿ.0880494ÿÿÿÿ11.54ÿÿÿ0.000ÿÿÿÿÿ.8431311ÿÿÿÿ1.188278
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0477658ÿÿÿÿ.072106ÿÿÿÿÿ0.66ÿÿÿ0.508ÿÿÿÿ-.0935593ÿÿÿÿÿ.189091
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿvar(X)|ÿÿÿ.9297954ÿÿÿ.0905675ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.7682018ÿÿÿÿ1.125381
-------------+----------------------------------------------------------------
ÿÿÿÿÿvar(e.x)|ÿÿÿ.5022045ÿÿ(constrained)
ÿÿÿÿvar(e.y2)|ÿÿÿ1.640408ÿÿÿ.1365162ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ1.393523ÿÿÿÿ1.931033
------------------------------------------------------------------------------

.ÿregressÿy2ÿc.w

ÿÿÿÿÿÿSourceÿ|ÿÿÿÿÿÿÿSSÿÿÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿMSÿÿÿÿÿÿNumberÿofÿobsÿÿÿ=ÿÿÿÿÿÿÿ500
-------------+----------------------------------ÿÿÿF(1,ÿ498)ÿÿÿÿÿÿÿ=ÿÿÿÿ265.35
ÿÿÿÿÿÿÿModelÿ|ÿÿ451.833844ÿÿÿÿÿÿÿÿÿ1ÿÿ451.833844ÿÿÿProbÿ>ÿFÿÿÿÿÿÿÿÿ=ÿÿÿÿ0.0000
ÿÿÿÿResidualÿ|ÿÿÿ847.98572ÿÿÿÿÿÿÿ498ÿÿ1.70278257ÿÿÿR-squaredÿÿÿÿÿÿÿ=ÿÿÿÿ0.3476
-------------+----------------------------------ÿÿÿAdjÿR-squaredÿÿÿ=ÿÿÿÿ0.3463
ÿÿÿÿÿÿÿTotalÿ|ÿÿ1299.81956ÿÿÿÿÿÿÿ499ÿÿ2.60484883ÿÿÿRootÿMSEÿÿÿÿÿÿÿÿ=ÿÿÿÿ1.3049

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿy2ÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿwÿ|ÿÿÿ.9357658ÿÿÿ.0574457ÿÿÿÿ16.29ÿÿÿ0.000ÿÿÿÿÿ.8229001ÿÿÿÿ1.048632
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0369235ÿÿÿÿ.058361ÿÿÿÿÿ0.63ÿÿÿ0.527ÿÿÿÿ-.0777407ÿÿÿÿ.1515876
------------------------------------------------------------------------------

.ÿ
.ÿexit

endÿofÿdo-file

.

And this seems to be extendable to the case that you're actually interested in: pupils in schools, where a random intercept is included for schools. Here, I create true test scores (scr) to generate the binomial outcome (rtd, retained-in-grade) and corresponding observed test scores (sco) with the 0.65 reliability value that you give, along with a socioeconomic status variable (ses) that is taken as measured without error, and finally a random intercept for school (sid).

Coefficients from hierarchical logistic regression on the true scores are similar to those in the errors-in-variables (the constraint for the error is computed using the formula in the slideshow that you linked to). For the observed test score, the logistic regression coefficient from the errors-in-variables model (1.0 ± 0.1) is very similar to that for the true in the mixed model (1.0 ± 0.0). Regression coefficients for the socioeconomic status measured-without-error variable (0.9 ± 0.1 versus 0.9 ± 0.1) and random intercept variance (1.2 ± 0.3 versus 1.2 ± 0.3) are essentially the same between the two, as well.

.ÿversionÿ15.1

.ÿ
.ÿclearÿ*

.ÿ
.ÿsetÿseedÿ`=strreverse("1491017")'

.ÿ
.ÿ//ÿSchools
.ÿquietlyÿsetÿobsÿ50

.ÿgenerateÿbyteÿsidÿ=ÿ_n

.ÿgenerateÿdoubleÿsid_uÿ=ÿrnormal()

.ÿ
.ÿ//ÿPupils
.ÿquietlyÿexpandÿ100

.ÿ
.ÿgenerateÿdoubleÿscrÿ=ÿrnormal()ÿ//ÿTrueÿscore

.ÿquietlyÿsummarizeÿscr,ÿdetail

.ÿtempnameÿsigma2n

.ÿscalarÿdefineÿ`sigma2n'ÿ=ÿ0.35ÿ/ÿ(r(Var)ÿ-ÿ0.35)ÿÿ//ÿReliabilityÿ=ÿ0.65

.ÿdisplayÿinÿsmclÿasÿtextÿ"varianceÿofÿtheÿnoiseÿ=ÿ"ÿ%04.2fÿ`sigma2n'
varianceÿofÿtheÿnoiseÿ=ÿ0.55

.ÿgenerateÿdoubleÿscoÿ=ÿscrÿ+ÿrnormal(0,ÿsqrt(`sigma2n'))ÿ//ÿObservedÿscore,ÿmeasured-with-error

.ÿ
.ÿquietlyÿsummarizeÿsco,ÿdetail

.ÿdisplayÿinÿsmclÿasÿtextÿ"Reliabilityÿ=ÿ"ÿ%04.2fÿ1ÿ-ÿ`sigma2n'ÿ/ÿr(Var)
Reliabilityÿ=ÿ0.64

.ÿ
.ÿgenerateÿdoubleÿsesÿ=ÿruniform()ÿ-ÿ0.5

.ÿ
.ÿgenerateÿdoubleÿxbÿ=ÿsid_uÿ+ÿscrÿ+ÿses

.ÿ
.ÿgenerateÿbyteÿrtdÿ=ÿrbinomial(1,ÿinvlogit(xb))ÿ//ÿRetainedÿinÿgrade

.ÿ
.ÿ*
.ÿ*ÿBeginÿhere
.ÿ*
.ÿ//ÿRegressionÿonÿtrueÿ(unobserved)ÿtestÿscores
.ÿmelogitÿrtdÿc.(scrÿses)ÿ||ÿsid:ÿ,ÿnolrtestÿnolog

Mixed-effectsÿlogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ5,000
Groupÿvariable:ÿÿÿÿÿÿÿÿÿÿÿÿÿsidÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿÿ50

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿObsÿperÿgroup:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿ100
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿ100.0
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿ100

Integrationÿmethod:ÿmvaghermiteÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿIntegrationÿpts.ÿÿ=ÿÿÿÿÿÿÿÿÿÿ7

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(2)ÿÿÿÿÿÿ=ÿÿÿÿÿ640.00
Logÿlikelihoodÿ=ÿÿ-2715.186ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0000
------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿrtdÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿscrÿ|ÿÿÿ.9989565ÿÿÿ.0403324ÿÿÿÿ24.77ÿÿÿ0.000ÿÿÿÿÿ.9199065ÿÿÿÿ1.078006
ÿÿÿÿÿÿÿÿÿsesÿ|ÿÿÿ.8987155ÿÿÿ.1174644ÿÿÿÿÿ7.65ÿÿÿ0.000ÿÿÿÿÿ.6684895ÿÿÿÿ1.128942
ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.1140601ÿÿÿ.1579919ÿÿÿÿ-0.72ÿÿÿ0.470ÿÿÿÿ-.4237185ÿÿÿÿ.1955983
-------------+----------------------------------------------------------------
sidÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿvar(_cons)|ÿÿÿ1.188085ÿÿÿ.2574207ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.7769961ÿÿÿÿÿ1.81667
------------------------------------------------------------------------------

.ÿ
.ÿ//ÿComputingÿvarianceÿconstraintÿforÿerrors-in-variableÿgeneralizedÿSEM
.ÿtempnameÿsigma2n_hat

.ÿquietlyÿsummarizeÿsco,ÿdetail

.ÿscalarÿdefineÿ`sigma2n_hat'ÿ=ÿ(1ÿ-ÿ0.65)ÿ*ÿr(Var)ÿ*ÿ(r(N)ÿ-ÿ1)ÿ/ÿr(N)

.ÿ
.ÿ//ÿSettingÿupÿconstraints
.ÿconstraintÿdefineÿ1ÿ_b[/:var(e.sco)]ÿ=ÿ`sigma2n_hat'

.ÿconstraintÿdefineÿ2ÿ_b[sco:F]ÿ=ÿ1

.ÿ
.ÿ//ÿFittingÿmodelÿwithÿrandomÿinterceptÿforÿschool,ÿSESÿpredictorÿ(measuredÿwithoutÿerror)ÿandÿobservedÿtestÿscores
.ÿgsemÿ(scoÿ<-ÿF)ÿ(rtdÿ<-ÿFÿc.sesÿRE[sid],ÿlogit),ÿconstraints(1/2)ÿnocnsreportÿnodvheaderÿnolog

GeneralizedÿstructuralÿequationÿmodelÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ5,000
Logÿlikelihoodÿ=ÿ-10978.448

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
scoÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.0092571ÿÿÿ.0174071ÿÿÿÿ-0.53ÿÿÿ0.595ÿÿÿÿ-.0433744ÿÿÿÿ.0248603
-------------+----------------------------------------------------------------
rtdÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿsesÿ|ÿÿÿ.8788477ÿÿÿ.1226328ÿÿÿÿÿ7.17ÿÿÿ0.000ÿÿÿÿÿ.6384918ÿÿÿÿ1.119204
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿRE[sid]ÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿ1.033574ÿÿÿ.0568857ÿÿÿÿ18.17ÿÿÿ0.000ÿÿÿÿÿ.9220798ÿÿÿÿ1.145068
ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.1221873ÿÿÿ.1613105ÿÿÿÿ-0.76ÿÿÿ0.449ÿÿÿÿÿÿ-.43835ÿÿÿÿ.1939754
-------------+----------------------------------------------------------------
ÿvar(RE[sid])|ÿÿÿ1.229252ÿÿÿ.2681414ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.8016134ÿÿÿÿ1.885023
ÿÿÿÿÿÿÿvar(F)|ÿÿÿ.9847755ÿÿÿ.0303007ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.9271425ÿÿÿÿ1.045991
-------------+----------------------------------------------------------------
ÿÿÿvar(e.sco)|ÿÿÿ.5302651ÿÿ(constrained)
------------------------------------------------------------------------------

.ÿ
.ÿexit

endÿofÿdo-file

.

I don't know why the documentation for gsem restricts the reliability() option to normal linear models without censoring. Maybe it's because of fragility (I had to manually compute the predictor's error variance and feed it to gsem as a constraint; specifying the reliaiblity value directly in the option gave rise to difficulties in convergence). Perhaps its because it's difficult to get a good handle on the relative magnitude of the proposed reliability value to the total explained variance (R²) except in conventional linear regression models. StataCorp typically disallows things for cause, and maybe it would be worthwhile to inquire into their reasons and see whether they would be fatal to your intended analysis for your project.

Good luck, and I hope that the state does well by the kids.
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1361
#6

01 Apr 2019, 06:47

Joseph Coveney
Thanks for your thoughts and help with this. I came across a different paper that used the same CTT grounded approach in a Bayesian context (https://journals.sagepub.com/doi/ful...62280216667764)andwasthinkingthat the Bayesian approach might be useful to prevent the parameter estimates from lying too close to the boundary space (I’m assuming that will be an issue already). I also started looking into the internals of the eivtools/eivreg function that Lockwood developed. It seems their approach involves a modification to the VCE based on the full model instead of just adding constraints on a specific variable. Unfortunately, the reliability issue is only one of several issues with the measure being used; the model fit statistics that they report for their CFA tend to include contradictory information and in a few cases provide consistent information that their model does not fit the data.

That all said, I’m actually pretty surprised that the difference in the coefficients wasn’t larger. There’s a David Card study cited in the Cameron and Trivedi book on microeconometrics that suggested bias of 20-30%. If nothing else, maybe this could provide decent grounds for a simulation study paper.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4375
#7

01 Apr 2019, 17:05

Originally posted by wbuchanan View Post

the Bayesian approach might be useful to prevent the parameter estimates from lying too close to the boundary space (I’m assuming that will be an issue already).

The primary parameter in these models that has a boundary is the variance of the captive latent factor, and I never encountered any problem with its collapsing. So, I think that a regularizing prior on it might be unnecessary.

I’m actually pretty surprised that the difference in the coefficients wasn’t larger. There’s a David Card study cited in the Cameron and Trivedi book on microeconometrics that suggested bias of 20-30%.

I'm not familiar with the David Card precedent, but it didn't strike me as surprising that, once the measurement error is properly taken into account, the corrected regression coefficient is very close to (essentially identical with) the true. I think that the major issue—and maybe that's what David Card was exploring—is getting an accurate estimate of the measurement reliability of the covariate. If that's under- or overestimated, then I can see where there would be under- or overcorrection of the coefficient by 20 to 30%.
Comment

Announcement

Covariate measurement error with binary outcomes

Comment

Comment

Comment

Comment

Comment

Comment