Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Covariate measurement error with binary outcomes

    While it is possible to get parameter estimates corrected for measurement error using -sem- like:

    Code:
    use http://www.stata-press.com/data/r15/sem_rel.dta, clear
    sem (y <- x1)
    sem (x1 <- X)(y <- X), reliability(x1 0.5)
    It isn't possible to use the reliability option with the -gsem- command. Additionally, the command -eivreg- is known to incorrectly estimate the standard errors for the parameter(s) of interest (see Lockwood, J. R., McCaffrey, D. F., & Savage, C. (2017). Errors-in-variables: Why Stata's -eivreg- is wrong and what to do instead. Presented at 2017 Society for Research on Effective Education Conference. Retrieved from: https://www.sree.org/conferences/201...pdf&item=slide):

    Code:
    eivreg y x1, r(x1 0.5)
    Additionally, the reliability option generalizes to the multilevel context as long as the model specified uses an identity link and gaussian family:

    Code:
    g byte school = rbinomial(4, 0.5)
    // This model will fail to converge nearly immediately, but at least illustrates that it is possible to deal with covariate measurement error in the context of mixed effects
    gsem (x1 <- X)(y <- X RE[school]), reliability(x1 0.65) link(identity) family(gaussian) difficult
    So, I am wondering if anyone is aware of any ways to deal with covariate measurement error when the outcome is binary and the model includes random intercepts and a random coefficient? If context helps, we are trying to estimate the probability of students being retained in grade conditional on test scores (which are measured with a non-trivial amount of error), demographics, school-level random intercepts, and prior schooling indicators (random coefficients). I imagine it might be possible to deal with this in a Bayesian framework (as well as being able to specify priors that could keep estimates away from the boundaries), but don't have much experience with Bayesian modeling.

  • #2
    Are you saying that you have only a single pupil-level measure—one test score (x1)—and that you've determined that its measurement reliability is 0.65? Perhaps you could flesh out the measurement submodel with at least two other indicators of X from among, say, demographic information, teacher comments in report cards, disciplinary events during the school year. You might then be able to obviate the need for specifying a measurement reliability value, and could proceed with gsem using a binomial distribution family and suitable link function.

    Comment


    • #3
      Joseph Coveney
      In the full use case there are 3 separate sub-scales that would be used in one model and a total score derived from the same measure that would be used in a second model. There are different reliabilities depending on the age of the child at the time they take the test (5yrs 0-5 months and 5yrs 6-11 months).

      The goal is partly to illustrate how much measurement error can bias parameter estimates because it has some fairly substantial policy implications that are not always implicitly discussed in the Ed policy world. Ideally, we’d like to be able to identify the value of the test score that best predicts whether or not students are retained in grade in order to compare it with thresholds defined in state policy regarding school readiness. If we specify a measurement model it would be way more challenging to get at that type of interpretation; do you have any ideas whether or not it is possible to constrain a latent indicator to basically take on the same scale or easily transformed version of the underlying scale? There are two other related studies, but in both of those cases the outcome can be modeled as a Gaussian family with an identity link function.

      Comment


      • #4
        Originally posted by wbuchanan View Post
        do you have any ideas whether or not it is possible to constrain a latent indicator to basically take on the same scale or easily transformed version of the underlying scale?
        Not off the top of my head. I'm not familiar enough with measurement reliability values à la eivreg to be able to translate them into corresponding sets of constraints (e.g., on loadings of indicator variables on a latent factor) in gsem.

        Comment


        • #5
          It looks as if you can fit a generalized linear model that is analogous to a linear errors-in-variables model. The first output below explores an errors-in-variable GLM using gsem. It first sets up a true predictor (w) to generate the binomial response variable (y1). It then derives the corresponding observed predictor (x) with a reliability of 0.65, chosen from what you state. It then plows ahead and fits a logistic errors-in-variable model using gsem and compares it to the logistic regression model fitted to the true predictor. Below that, it uses the same true and observed predictors analogously in linear models (normal response variable y2) for comparison.

          The logistic regression coefficient for the measured-with-error variable (1.1 ± 0.2) is similar to that for the true (1.0 ± 0.1), and the linear model gives analogous results for the same two variables (1.0 ± 0.1 and 0.9 ± 0.1, respectively).

          .ÿ
          .ÿversionÿ15.1

          .ÿ
          .ÿclearÿ*

          .ÿ
          .ÿsetÿseedÿ`=strreverse("1490996")'

          .ÿ
          .ÿquietlyÿsetÿobsÿ500

          .ÿ
          .ÿgenerateÿdoubleÿwÿ=ÿrnormal()ÿ//ÿTrueÿpredictor

          .ÿ
          .ÿquietlyÿsummarizeÿw,ÿdetail

          .ÿtempnameÿsigma2n

          .ÿscalarÿdefineÿ`sigma2n'ÿ=ÿ0.35ÿ/ÿ(r(Var)ÿ-ÿ0.35)ÿÿ//ÿReliabilityÿ=ÿ0.65

          .ÿdisplayÿinÿsmclÿasÿtextÿ"varianceÿofÿtheÿnoiseÿ=ÿ"ÿ%04.2fÿ`sigma2n'
          varianceÿofÿtheÿnoiseÿ=ÿ0.51

          .ÿgenerateÿdoubleÿxÿ=ÿwÿ+ÿrnormal(0,ÿsqrt(`sigma2n'))ÿ//ÿObserved,ÿmeasured-with-error,ÿpredictor

          .ÿ
          .ÿquietlyÿsummarizeÿx,ÿdetail

          .ÿdisplayÿinÿsmclÿasÿtextÿ"Reliabilityÿ=ÿ"ÿ%04.2fÿ1ÿ-ÿ`sigma2n'ÿ/ÿr(Var)
          Reliabilityÿ=ÿ0.64

          .ÿ
          .ÿgenerateÿbyteÿy1ÿ=ÿrbinomial(1,ÿinvlogit(w))

          .ÿ
          .ÿgsemÿ(xÿ<-ÿX)ÿ(y1ÿ<-ÿX,ÿlogit),ÿreliability(xÿ0.65)ÿnocnsreportÿnodvheaderÿnolog

          GeneralizedÿstructuralÿequationÿmodelÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ500
          Logÿlikelihoodÿ=ÿ-1120.2249

          ------------------------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
          -------------+----------------------------------------------------------------
          xÿÿÿÿÿÿÿÿÿÿÿÿ|
          ÿÿÿÿÿÿÿÿÿÿÿXÿ|ÿÿÿ1.065426ÿÿÿ.1824374ÿÿÿÿÿ5.84ÿÿÿ0.000ÿÿÿÿÿ.7078555ÿÿÿÿ1.422997
          ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0217106ÿÿÿ.0535163ÿÿÿÿÿ0.41ÿÿÿ0.685ÿÿÿÿ-.0831795ÿÿÿÿ.1266007
          -------------+----------------------------------------------------------------
          y1ÿÿÿÿÿÿÿÿÿÿÿ|
          ÿÿÿÿÿÿÿÿÿÿÿXÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
          ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.0232402ÿÿÿ.1052725ÿÿÿÿ-0.22ÿÿÿ0.825ÿÿÿÿ-.2295706ÿÿÿÿ.1830902
          -------------+----------------------------------------------------------------
          ÿÿÿÿÿÿÿvar(X)|ÿÿÿ.8191065ÿÿÿ.2748801ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.4243162ÿÿÿÿ1.581216
          -------------+----------------------------------------------------------------
          ÿÿÿÿÿvar(e.x)|ÿÿÿ.5022045ÿÿ(constrained)
          ------------------------------------------------------------------------------

          .ÿlogitÿy1ÿc.w,ÿnolog

          LogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ500
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿLRÿchi2(1)ÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿ89.01
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0000
          Logÿlikelihoodÿ=ÿ-302.05198ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿPseudoÿR2ÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.1284

          ------------------------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿÿy1ÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
          -------------+----------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿÿÿwÿ|ÿÿÿ.9585408ÿÿÿ.1158036ÿÿÿÿÿ8.28ÿÿÿ0.000ÿÿÿÿÿ.7315699ÿÿÿÿ1.185512
          ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.0351204ÿÿÿ.0979405ÿÿÿÿ-0.36ÿÿÿ0.720ÿÿÿÿ-.2270802ÿÿÿÿ.1568394
          ------------------------------------------------------------------------------

          .ÿ
          .ÿgenerateÿdoubleÿy2ÿ=ÿwÿ+ÿrnormal(0,ÿsqrt(2))

          .ÿ
          .ÿeivregÿy2ÿc.x,ÿreliab(xÿ0.65)

          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿassumedÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿErrors-in-variablesÿregression
          ÿÿÿÿvariableÿÿÿÿÿreliability
          ----------------------------ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ500
          ÿÿÿÿÿÿÿÿÿÿÿxÿÿÿÿÿÿÿ0.6500ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿF(ÿÿ1,ÿÿÿÿÿÿÿ498)ÿ=ÿÿÿÿÿ188.96
          ÿÿÿÿÿÿÿÿÿÿÿ*ÿÿÿÿÿÿÿ1.0000ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿFÿÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0000
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿR-squaredÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.3686
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿRootÿMSEÿÿÿÿÿÿÿÿÿÿ=ÿÿÿÿ1.28376

          ------------------------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿÿy2ÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
          -------------+----------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿÿÿxÿ|ÿÿÿ1.014609ÿÿÿ.0738097ÿÿÿÿ13.75ÿÿÿ0.000ÿÿÿÿÿ.8695927ÿÿÿÿ1.159626
          ÿÿÿÿÿÿÿ_consÿ|ÿÿÿÿ.025738ÿÿÿ.0574338ÿÿÿÿÿ0.45ÿÿÿ0.654ÿÿÿÿ-.0871043ÿÿÿÿ.1385804
          ------------------------------------------------------------------------------

          .ÿsemÿ(xÿ<-ÿX)ÿ(y2ÿ<-ÿX),ÿreliability(xÿ0.65)ÿnocnsreportÿnoheaderÿnolog

          Endogenousÿvariables

          Measurement:ÿÿxÿy2

          Exogenousÿvariables

          Latent:ÿÿÿÿÿÿÿX
          ------------------------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿOIM
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
          -------------+----------------------------------------------------------------
          Measurementÿÿ|
          ÿÿxÿÿÿÿÿÿÿÿÿÿ|
          ÿÿÿÿÿÿÿÿÿÿÿXÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
          ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0217106ÿÿÿ.0535164ÿÿÿÿÿ0.41ÿÿÿ0.685ÿÿÿÿ-.0831795ÿÿÿÿ.1266007
          ÿÿ-----------+----------------------------------------------------------------
          ÿÿy2ÿÿÿÿÿÿÿÿÿ|
          ÿÿÿÿÿÿÿÿÿÿÿXÿ|ÿÿÿ1.014609ÿÿÿ.0879124ÿÿÿÿ11.54ÿÿÿ0.000ÿÿÿÿÿ.8423042ÿÿÿÿ1.186915
          ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0477658ÿÿÿÿ.072106ÿÿÿÿÿ0.66ÿÿÿ0.508ÿÿÿÿ-.0935594ÿÿÿÿÿ.189091
          -------------+----------------------------------------------------------------
          ÿÿÿÿÿvar(e.x)|ÿÿÿ.5012001ÿÿ(constrained)
          ÿÿÿÿvar(e.y2)|ÿÿÿ1.641443ÿÿÿ.1364456ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ1.394663ÿÿÿÿÿ1.93189
          ÿÿÿÿÿÿÿvar(X)|ÿÿÿ.9308002ÿÿÿ.0905677ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.7691904ÿÿÿÿ1.126365
          ------------------------------------------------------------------------------
          LRÿtestÿofÿmodelÿvs.ÿsaturated:ÿchi2(0)ÿÿÿ=ÿÿÿÿÿÿ0.00,ÿProbÿ>ÿchi2ÿ=ÿÿÿÿÿÿ.

          .ÿtempnameÿB

          .ÿmatrixÿdefineÿ`B'ÿ=ÿe(b)

          .ÿgsemÿ(xÿ<-ÿX)ÿ(y2ÿ<-ÿX),ÿreliability(xÿ0.65)ÿfrom(`B')ÿnocnsreportÿnodvheaderÿnolog

          GeneralizedÿstructuralÿequationÿmodelÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ500
          Logÿlikelihoodÿ=ÿÿ-1679.078

          ------------------------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
          -------------+----------------------------------------------------------------
          xÿÿÿÿÿÿÿÿÿÿÿÿ|
          ÿÿÿÿÿÿÿÿÿÿÿXÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
          ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0217106ÿÿÿ.0535164ÿÿÿÿÿ0.41ÿÿÿ0.685ÿÿÿÿ-.0831795ÿÿÿÿ.1266007
          -------------+----------------------------------------------------------------
          y2ÿÿÿÿÿÿÿÿÿÿÿ|
          ÿÿÿÿÿÿÿÿÿÿÿXÿ|ÿÿÿ1.015705ÿÿÿ.0880494ÿÿÿÿ11.54ÿÿÿ0.000ÿÿÿÿÿ.8431311ÿÿÿÿ1.188278
          ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0477658ÿÿÿÿ.072106ÿÿÿÿÿ0.66ÿÿÿ0.508ÿÿÿÿ-.0935593ÿÿÿÿÿ.189091
          -------------+----------------------------------------------------------------
          ÿÿÿÿÿÿÿvar(X)|ÿÿÿ.9297954ÿÿÿ.0905675ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.7682018ÿÿÿÿ1.125381
          -------------+----------------------------------------------------------------
          ÿÿÿÿÿvar(e.x)|ÿÿÿ.5022045ÿÿ(constrained)
          ÿÿÿÿvar(e.y2)|ÿÿÿ1.640408ÿÿÿ.1365162ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ1.393523ÿÿÿÿ1.931033
          ------------------------------------------------------------------------------

          .ÿregressÿy2ÿc.w

          ÿÿÿÿÿÿSourceÿ|ÿÿÿÿÿÿÿSSÿÿÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿMSÿÿÿÿÿÿNumberÿofÿobsÿÿÿ=ÿÿÿÿÿÿÿ500
          -------------+----------------------------------ÿÿÿF(1,ÿ498)ÿÿÿÿÿÿÿ=ÿÿÿÿ265.35
          ÿÿÿÿÿÿÿModelÿ|ÿÿ451.833844ÿÿÿÿÿÿÿÿÿ1ÿÿ451.833844ÿÿÿProbÿ>ÿFÿÿÿÿÿÿÿÿ=ÿÿÿÿ0.0000
          ÿÿÿÿResidualÿ|ÿÿÿ847.98572ÿÿÿÿÿÿÿ498ÿÿ1.70278257ÿÿÿR-squaredÿÿÿÿÿÿÿ=ÿÿÿÿ0.3476
          -------------+----------------------------------ÿÿÿAdjÿR-squaredÿÿÿ=ÿÿÿÿ0.3463
          ÿÿÿÿÿÿÿTotalÿ|ÿÿ1299.81956ÿÿÿÿÿÿÿ499ÿÿ2.60484883ÿÿÿRootÿMSEÿÿÿÿÿÿÿÿ=ÿÿÿÿ1.3049

          ------------------------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿÿy2ÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿtÿÿÿÿP>|t|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
          -------------+----------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿÿÿwÿ|ÿÿÿ.9357658ÿÿÿ.0574457ÿÿÿÿ16.29ÿÿÿ0.000ÿÿÿÿÿ.8229001ÿÿÿÿ1.048632
          ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0369235ÿÿÿÿ.058361ÿÿÿÿÿ0.63ÿÿÿ0.527ÿÿÿÿ-.0777407ÿÿÿÿ.1515876
          ------------------------------------------------------------------------------

          .ÿ
          .ÿexit

          endÿofÿdo-file


          .


          And this seems to be extendable to the case that you're actually interested in: pupils in schools, where a random intercept is included for schools. Here, I create true test scores (scr) to generate the binomial outcome (rtd, retained-in-grade) and corresponding observed test scores (sco) with the 0.65 reliability value that you give, along with a socioeconomic status variable (ses) that is taken as measured without error, and finally a random intercept for school (sid).

          Coefficients from hierarchical logistic regression on the true scores are similar to those in the errors-in-variables (the constraint for the error is computed using the formula in the slideshow that you linked to). For the observed test score, the logistic regression coefficient from the errors-in-variables model (1.0 ± 0.1) is very similar to that for the true in the mixed model (1.0 ± 0.0). Regression coefficients for the socioeconomic status measured-without-error variable (0.9 ± 0.1 versus 0.9 ± 0.1) and random intercept variance (1.2 ± 0.3 versus 1.2 ± 0.3) are essentially the same between the two, as well.

          .ÿversionÿ15.1

          .ÿ
          .ÿclearÿ*

          .ÿ
          .ÿsetÿseedÿ`=strreverse("1491017")'

          .ÿ
          .ÿ//ÿSchools
          .ÿquietlyÿsetÿobsÿ50

          .ÿgenerateÿbyteÿsidÿ=ÿ_n

          .ÿgenerateÿdoubleÿsid_uÿ=ÿrnormal()

          .ÿ
          .ÿ//ÿPupils
          .ÿquietlyÿexpandÿ100

          .ÿ
          .ÿgenerateÿdoubleÿscrÿ=ÿrnormal()ÿ//ÿTrueÿscore

          .ÿquietlyÿsummarizeÿscr,ÿdetail

          .ÿtempnameÿsigma2n

          .ÿscalarÿdefineÿ`sigma2n'ÿ=ÿ0.35ÿ/ÿ(r(Var)ÿ-ÿ0.35)ÿÿ//ÿReliabilityÿ=ÿ0.65

          .ÿdisplayÿinÿsmclÿasÿtextÿ"varianceÿofÿtheÿnoiseÿ=ÿ"ÿ%04.2fÿ`sigma2n'
          varianceÿofÿtheÿnoiseÿ=ÿ0.55

          .ÿgenerateÿdoubleÿscoÿ=ÿscrÿ+ÿrnormal(0,ÿsqrt(`sigma2n'))ÿ//ÿObservedÿscore,ÿmeasured-with-error

          .ÿ
          .ÿquietlyÿsummarizeÿsco,ÿdetail

          .ÿdisplayÿinÿsmclÿasÿtextÿ"Reliabilityÿ=ÿ"ÿ%04.2fÿ1ÿ-ÿ`sigma2n'ÿ/ÿr(Var)
          Reliabilityÿ=ÿ0.64

          .ÿ
          .ÿgenerateÿdoubleÿsesÿ=ÿruniform()ÿ-ÿ0.5

          .ÿ
          .ÿgenerateÿdoubleÿxbÿ=ÿsid_uÿ+ÿscrÿ+ÿses

          .ÿ
          .ÿgenerateÿbyteÿrtdÿ=ÿrbinomial(1,ÿinvlogit(xb))ÿ//ÿRetainedÿinÿgrade

          .ÿ
          .ÿ*
          .ÿ*ÿBeginÿhere
          .ÿ*
          .ÿ//ÿRegressionÿonÿtrueÿ(unobserved)ÿtestÿscores
          .ÿmelogitÿrtdÿc.(scrÿses)ÿ||ÿsid:ÿ,ÿnolrtestÿnolog

          Mixed-effectsÿlogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ5,000
          Groupÿvariable:ÿÿÿÿÿÿÿÿÿÿÿÿÿsidÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿÿ50

          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿObsÿperÿgroup:
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿ100
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿ100.0
          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿ100

          Integrationÿmethod:ÿmvaghermiteÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿIntegrationÿpts.ÿÿ=ÿÿÿÿÿÿÿÿÿÿ7

          ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(2)ÿÿÿÿÿÿ=ÿÿÿÿÿ640.00
          Logÿlikelihoodÿ=ÿÿ-2715.186ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0000
          ------------------------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿrtdÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
          -------------+----------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿscrÿ|ÿÿÿ.9989565ÿÿÿ.0403324ÿÿÿÿ24.77ÿÿÿ0.000ÿÿÿÿÿ.9199065ÿÿÿÿ1.078006
          ÿÿÿÿÿÿÿÿÿsesÿ|ÿÿÿ.8987155ÿÿÿ.1174644ÿÿÿÿÿ7.65ÿÿÿ0.000ÿÿÿÿÿ.6684895ÿÿÿÿ1.128942
          ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.1140601ÿÿÿ.1579919ÿÿÿÿ-0.72ÿÿÿ0.470ÿÿÿÿ-.4237185ÿÿÿÿ.1955983
          -------------+----------------------------------------------------------------
          sidÿÿÿÿÿÿÿÿÿÿ|
          ÿÿÿvar(_cons)|ÿÿÿ1.188085ÿÿÿ.2574207ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.7769961ÿÿÿÿÿ1.81667
          ------------------------------------------------------------------------------

          .ÿ
          .ÿ//ÿComputingÿvarianceÿconstraintÿforÿerrors-in-variableÿgeneralizedÿSEM
          .ÿtempnameÿsigma2n_hat

          .ÿquietlyÿsummarizeÿsco,ÿdetail

          .ÿscalarÿdefineÿ`sigma2n_hat'ÿ=ÿ(1ÿ-ÿ0.65)ÿ*ÿr(Var)ÿ*ÿ(r(N)ÿ-ÿ1)ÿ/ÿr(N)

          .ÿ
          .ÿ//ÿSettingÿupÿconstraints
          .ÿconstraintÿdefineÿ1ÿ_b[/:var(e.sco)]ÿ=ÿ`sigma2n_hat'

          .ÿconstraintÿdefineÿ2ÿ_b[sco:F]ÿ=ÿ1

          .ÿ
          .ÿ//ÿFittingÿmodelÿwithÿrandomÿinterceptÿforÿschool,ÿSESÿpredictorÿ(measuredÿwithoutÿerror)ÿandÿobservedÿtestÿscores
          .ÿgsemÿ(scoÿ<-ÿF)ÿ(rtdÿ<-ÿFÿc.sesÿRE[sid],ÿlogit),ÿconstraints(1/2)ÿnocnsreportÿnodvheaderÿnolog

          GeneralizedÿstructuralÿequationÿmodelÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ5,000
          Logÿlikelihoodÿ=ÿ-10978.448

          ------------------------------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
          -------------+----------------------------------------------------------------
          scoÿÿÿÿÿÿÿÿÿÿ|
          ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
          ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.0092571ÿÿÿ.0174071ÿÿÿÿ-0.53ÿÿÿ0.595ÿÿÿÿ-.0433744ÿÿÿÿ.0248603
          -------------+----------------------------------------------------------------
          rtdÿÿÿÿÿÿÿÿÿÿ|
          ÿÿÿÿÿÿÿÿÿsesÿ|ÿÿÿ.8788477ÿÿÿ.1226328ÿÿÿÿÿ7.17ÿÿÿ0.000ÿÿÿÿÿ.6384918ÿÿÿÿ1.119204
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
          ÿÿÿÿÿRE[sid]ÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
          ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
          ÿÿÿÿÿÿÿÿÿÿÿFÿ|ÿÿÿ1.033574ÿÿÿ.0568857ÿÿÿÿ18.17ÿÿÿ0.000ÿÿÿÿÿ.9220798ÿÿÿÿ1.145068
          ÿÿÿÿÿÿÿ_consÿ|ÿÿ-.1221873ÿÿÿ.1613105ÿÿÿÿ-0.76ÿÿÿ0.449ÿÿÿÿÿÿ-.43835ÿÿÿÿ.1939754
          -------------+----------------------------------------------------------------
          ÿvar(RE[sid])|ÿÿÿ1.229252ÿÿÿ.2681414ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.8016134ÿÿÿÿ1.885023
          ÿÿÿÿÿÿÿvar(F)|ÿÿÿ.9847755ÿÿÿ.0303007ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.9271425ÿÿÿÿ1.045991
          -------------+----------------------------------------------------------------
          ÿÿÿvar(e.sco)|ÿÿÿ.5302651ÿÿ(constrained)
          ------------------------------------------------------------------------------

          .ÿ
          .ÿexit

          endÿofÿdo-file


          .


          I don't know why the documentation for gsem restricts the reliability() option to normal linear models without censoring. Maybe it's because of fragility (I had to manually compute the predictor's error variance and feed it to gsem as a constraint; specifying the reliaiblity value directly in the option gave rise to difficulties in convergence). Perhaps its because it's difficult to get a good handle on the relative magnitude of the proposed reliability value to the total explained variance (R2) except in conventional linear regression models. StataCorp typically disallows things for cause, and maybe it would be worthwhile to inquire into their reasons and see whether they would be fatal to your intended analysis for your project.

          Good luck, and I hope that the state does well by the kids.

          Comment


          • #6
            Joseph Coveney
            Thanks for your thoughts and help with this. I came across a different paper that used the same CTT grounded approach in a Bayesian context (https://journals.sagepub.com/doi/ful...62280216667764)andwasthinkingthat the Bayesian approach might be useful to prevent the parameter estimates from lying too close to the boundary space (I’m assuming that will be an issue already). I also started looking into the internals of the eivtools/eivreg function that Lockwood developed. It seems their approach involves a modification to the VCE based on the full model instead of just adding constraints on a specific variable. Unfortunately, the reliability issue is only one of several issues with the measure being used; the model fit statistics that they report for their CFA tend to include contradictory information and in a few cases provide consistent information that their model does not fit the data.

            That all said, I’m actually pretty surprised that the difference in the coefficients wasn’t larger. There’s a David Card study cited in the Cameron and Trivedi book on microeconometrics that suggested bias of 20-30%. If nothing else, maybe this could provide decent grounds for a simulation study paper.

            Comment


            • #7
              Originally posted by wbuchanan View Post
              the Bayesian approach might be useful to prevent the parameter estimates from lying too close to the boundary space (I’m assuming that will be an issue already).
              The primary parameter in these models that has a boundary is the variance of the captive latent factor, and I never encountered any problem with its collapsing. So, I think that a regularizing prior on it might be unnecessary.

              I’m actually pretty surprised that the difference in the coefficients wasn’t larger. There’s a David Card study cited in the Cameron and Trivedi book on microeconometrics that suggested bias of 20-30%.
              I'm not familiar with the David Card precedent, but it didn't strike me as surprising that, once the measurement error is properly taken into account, the corrected regression coefficient is very close to (essentially identical with) the true. I think that the major issue—and maybe that's what David Card was exploring—is getting an accurate estimate of the measurement reliability of the covariate. If that's under- or overestimated, then I can see where there would be under- or overcorrection of the coefficient by 20 to 30%.

              Comment

              Working...
              X