Regression with latent trait

Nanco Bonnevieri

Join Date: Jul 2022

Posts: 9
#1

Regression with latent trait

30 Aug 2022, 03:42

To avoid biased confidence intervals and p-values, I want to include a latent trait with standard error (a scale from an assessment test) in a regression model. Here, as the dependent variable but it could have been an independent:
.irt pcm item1-item20
.predict Theta, latent se(ThetaSE)
.regress Theta X1 X2
Does this syntax sufficiently treat the outcome as a latent variable?
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

30 Aug 2022, 09:39

Good question. Your syntax is not, in my opinion, the theoretically best thing to do. It definitely ignores the standard error of theta. Some have suggested that coefficient estimates produced this way are biased. In practice, I don't know how much bias there is. It may be acceptable to just do things this way, and to report this as a limitation.

Bartosz Kondratek wrote a package called uirt that would enable you to fit an IRT model and then draw plausible values for the latent trait. These can be combined with Rubin's rules in the multiple imputation format. In the link, he described the syntax for the package, complete with a regression specification for the plausible values. He and I then showed how to manually import the imputations and use the mi prefix commands to estimate the model. I realize that MI can be tricky syntax to use.

The other theoretically correct ways to do this is to use an explanatory IRT model in gsem. However, this raises an additional question. Now, I'm most familiar with the graded response model. The PCM is based on a multinomial logit model with a bunch of constraints, some relatively restrictive. And I don't know how to specify those constraints. If you fit the PCM, then type gsem, the list of constraints is shown. I'm not sure how to copy these automatically, as they aren't returned with r() or e(). They could be manually copied.

You would use a syntax something like this, based on the GRM:

Code:

gsem (Theta -> item1-item20, ologit) (Theta <- X1 X2), var(Theta@1) nocapslatent latent(Theta)

That is, Theta causes ordered logit responses to items 1-20. X1 and X2 are regressed on Theta. The variance of Theta is constrained at 1. If you run this code, you will get an error. If you run the code below, you won't:

Code:

gsem (Theta -> item1-item20, ologit) (Theta <- X1 X2), var(e.Theta@1) nocapslatent latent(Theta)

That is, the error variance of Theta, not Theta's own variance, is constrained to 1. I typically interpret the error variance as the variance after accounting for the fixed effects of X1 and X2. Intuitively, it seems like this is not what we should be doing, and yet the syntax I consider intuitively correct returns an error (I forget what it is).

Now, the R package mirt, while it is a bit clunky to use, does allow for IRT regression models like the one above. I have tried it on my own data. It will a) specify that the variance of Theta is 1, not the error variance, and b) in one dataset, it returned coefficients basically equal to the ones I got from Stata (not exactly equal, because mirt uses a different optimizer than Stata, but I'd consider them to be asymptotically equivalent). If you understand what you're doing with multiple imputation, you could try both syntaxes and compare coefficients. I may pose this question to Stata's tech support.

At present, I'd prefer one of the two theoretically correct approaches if you understand what you're doing. As I mentioned, I don't know how meaningful the bias with your proposed approach (I'll call it a two-step IRT model) is. In my dissertation work, I am likely to be able to show that at least for my data, the two-step model produced similar betas as the explanatory IRT model for all but one variable (out of ~10).

Last edited by Weiwen Ng; 30 Aug 2022, 09:42.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
2 likes
Comment

Announcement

Regression with latent trait

Comment