Explanatory IRT Models in gllamm/Stata

Saul Gabriel

Join Date: May 2014

Posts: 44
#1

Explanatory IRT Models in gllamm/Stata

15 Oct 2018, 13:39

Hello,
Is anyone familiar with applications of the latent regression Rasch model in gllamm (Stata), equivalent to the NLMIXED PROC in SAS?
I understand that Zheng & Rabe-Hesketh as well as Rabe-Hesketh have some papers on IRT models in gllamm.
That is, can you point me to papers that have compared results for various Rasch models or explanatory IRT models in gllamm with NLMIXED?
Thanks,
Saul
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1240
#2

15 Oct 2018, 15:49

Saul,

In my experience, people here tend not to be very familiar with SAS. I can do some SAS programming, but I'm not familiar with PROC NLMIXED. I have also not used Stata's user-written -gllamm- command, either. From what I can tell, Stata's stock -gsem- command seems to have acquired the capabilities that -gllamm- has. I don't know if -gsem- can fit all the models that -gllamm- possibly can, but -gsem- can definitely fit IRT models in Stata 14 and 15. In fact, Stata's stock IRT commands are wrappers for -gsem- (that is, the IRT commands call -gsem-). From there, it is easy to include some sort of covariate to explain the latent trait (I assume this is what you mean by explanatory IRT; if it's not, please clarify). You will have to write the code in -gsem-, but I don't think you need to fit the models in -gllamm-.

For example, slide 10.4 onwards in this presentation by Rebecca Pope go over an IRT model with one covariate, then a multilevel IRT model, all in the -gsem- syntax.

Also, Rafal Raciborski of StataCorp pointed out, in post 3 at this thread, that after you fit a regular IRT model, you can get it to display the exact -gsem- syntax used. For example:

Code:

webuse charity irt grm ta1-ta5 display e(cmdline2)

For the graded response and the 2-parameter logistic IRT models, the syntax is quite simple anyway.

To demonstrate IRT with a covariate in actual data, you could fit that GRM to the charity data, then type:

Code:

predict theta, latent gen x = 0.5 * theta + rnormal() gsem (Theta -> ta1-ta5, ologit) (x -> Theta), variance(e.Theta@1) latent(Theta)

You must include the e. in front of Theta in the variance option (it indicates the error of Theta). You'll note, if you compared gsem code, that I omitted the from and constraint options. There aren't any constraints in the GRM, and the from option doesn't affect the final solution here.

If you want to compare results to SAS, I suppose you could download the charity data, then have SAS import the data and fit the model. I can't help with syntax.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Saul Gabriel

Join Date: May 2014

Posts: 44
#3

18 Oct 2018, 16:05

Thank you Weiwen. I think I might be able to implement many of the models I am interested in--and I am interested in work by McCullagh & Nelders as well as De Boeck & Wilson but more precisely Bauer & Hussong on moderated non linear factor analysis. I am indeed interested in predicting item responses but need different link functions given differently measured items.

So it appears that gllamm and gsem allow for models that are quite close to this, but not exactly the same...

Best regards,
S
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1240
#4

18 Oct 2018, 20:25

I glanced at this paper by Bauer and Hussong. I am pretty sure gsem would be able to fit this type of model. gsem can definitely fit IRT models with mixed item types. -irt hybrid- is the stock IRT command, from which you could extract the -gsem- syntax. Or, something like those two authors described might look like

Code:

gsem (Theta -> alcohol_use_disorder, logit) (Theta -> consequences, poisson) (Theta -> heavy_use use_frequency, ologit) (Theta -> expectancies, gaussian), variance(Theta@1)

I don't agree with the authors that the last item is a censored normal model. Censoring only happens if values below zero do exist but are coded as zero due to, say, instrument detection limits or bottom coding or something else. Nonetheless, if you do, gsem should be able to treat the variable as censored. Search the -gsem- help for family and link options.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Saul Gabriel

Join Date: May 2014

Posts: 44
#5

20 Oct 2018, 16:45

Thanks again! I will try to implement these models in Stata-gsem. B&H and their colleagues did introduce this procedure, MNLFA, in SAS originally so I may encounter some challenges. I do wonder if Stata would allow me to forgo the censored specification for items whose values are limited...
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1240
#6

20 Oct 2018, 18:49

Originally posted by Saul Gabriel View Post

Thanks again! I will try to implement these models in Stata-gsem. B&H and their colleagues did introduce this procedure, MNLFA, in SAS originally so I may encounter some challenges. I do wonder if Stata would allow me to forgo the censored specification for items whose values are limited...

To be clear, the pseudocode I wrote above does not treat the variable expectancies as censored. If you want to treat it as censored, you can read -gsem-'s family and link options, and that should be easy to implement (but beware convergence issues). I do not think that that it is substantively meaningful to treat that variable as censored. I should also point out, it would be much better to have the individual components of that variable (I think it was 3 ordinal variables).

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Weiwen Ng

Join Date: Jun 2015
Posts: 1240

13 Mar 2019, 16:17

I'm returning to this old post. The original question was, has anyone compared the results of Stata fitting an explanatory IRT model to SAS PROC NLMIXED? From a brief search of the web, said procedure needs you to reshape the data into a bit of a strange format, which I'm unwilling to do. (For the record, this article says you have to reshape it into some strange combo of long and wide data, as described in their appendix.)

I can, however, compare the results of Stata's output to SAS PROC IRT and the R package mirt (Phil Chalmers, available on CRAN or his Github site). PROC IRT can't seem to fit an explanatory IRT model, but it can fit a regular 2PL model.

First, let's simulate some data. I'm simulating 10 questions, with discrimination parameters coming from a log-normal distribution with mean 0.3, variance 1 (referred to as a), and difficulty parameters from a standard normal distribution. Then, I'm simulating 2 groups of 500 people each. Group 1's Theta scores are drawn from a standard normal distribution, but group 2's scores are drawn from a normal distribution with mean 0.2, variance 1. Thus, in an explanatory IRT model, where we are decomposing Theta = XB + epsilon, we'd expect the beta for group to be equal to 0.2.

Code:

set seed 4142
matrix a = J(1,10,.)
matrix d = J(1,10,.)
forvalues i = 1/10 {
    matrix a[1,`i'] = exp(rnormal(0.3, 1))
    matrix d[1,`i'] = rnormal()
    }
clear
set obs 1000
gen group = _n > 500
gen theta = rnormal() if group == 0
replace theta = rnormal() + 0.2 if group == 1
forvalues i = 1/10 {
    gen q`i' = runiform() < invlogit(a[1,`i'] * (theta - d[1,`i']))
    *gen p_q`i' = invlogit(a[1,`i'] * (theta - d[1,`i']))
    }

Fitting 2PL model in Stata with the standard commands:

Code:

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
q1           |
     Discrim |   5.106961   .7990473     6.39   0.000     3.540857    6.673065
        Diff |  -.1660727    .040855    -4.06   0.000     -.246147   -.0859984
-------------+----------------------------------------------------------------
q2           |
     Discrim |    1.22011   .1210252    10.08   0.000     .9829049    1.457315
        Diff |  -1.156868   .1048785   -11.03   0.000    -1.362426   -.9513096
-------------+----------------------------------------------------------------
q3           |
     Discrim |   3.353441   .5230281     6.41   0.000     2.328325    4.378558
        Diff |  -1.581332   .0891136   -17.75   0.000    -1.755991   -1.406672
-------------+----------------------------------------------------------------
q4           |
     Discrim |   1.261461   .1165501    10.82   0.000     1.033027    1.489895
        Diff |   .3887097   .0702344     5.53   0.000     .2510527    .5263666
-------------+----------------------------------------------------------------
q5           |
     Discrim |   .1239904   .0717811     1.73   0.084    -.0166981    .2646788
        Diff |  -.5192549   .5925015    -0.88   0.381    -1.680537    .6420268
-------------+----------------------------------------------------------------
q6           |
     Discrim |   2.097894   .1833799    11.44   0.000     1.738476    2.457312
        Diff |  -.5817061   .0566207   -10.27   0.000    -.6926806   -.4707316
-------------+----------------------------------------------------------------
q7           |
     Discrim |   1.244643   .1221168    10.19   0.000     1.005299    1.483988
        Diff |  -1.110511   .1005592   -11.04   0.000    -1.307604   -.9134186
-------------+----------------------------------------------------------------
q8           |
     Discrim |   2.332568    .206318    11.31   0.000     1.928192    2.736944
        Diff |   .1629517   .0492375     3.31   0.001      .066448    .2594554
-------------+----------------------------------------------------------------
q9           |
     Discrim |   2.030126    .172732    11.75   0.000     1.691578    2.368675
        Diff |  -.0583262   .0512112    -1.14   0.255    -.1586983     .042046
-------------+----------------------------------------------------------------
q10          |
     Discrim |   1.760748   .2483608     7.09   0.000     1.273969    2.247526
        Diff |  -2.330548   .1993862   -11.69   0.000    -2.721338   -1.939758
------------------------------------------------------------------------------

Using SAS PROC IRT:

q1 Difficulty	-0.1671	0.04191
q2 Slope	5.00903	0.77077
q2 Difficulty	-1.159	0.10539
q2 Slope	1.21696	0.12066
q3 Difficulty	-1.5825	0.08931
q3 Slope	3.35981	0.52659
q4 Difficulty	0.39093	0.07057
q4 Slope	1.25768	0.11632
q5 Difficulty	-0.5201	0.59479
q5 Slope	0.12357	0.07164
q6 Difficulty	-0.5827	0.05725
q6 Slope	2.09254	0.18274
q7 Difficulty	-1.1125	0.10106
q7 Slope	1.24149	0.12174
q8 Difficulty	0.16491	0.04986
q8 Slope	2.31988	0.20476
q9 Difficulty	-0.0575	0.05195
q9 Slope	2.02157	0.17157
q10 Difficulty	-2.3327	0.19975
q10 Slope	1.7597	0.24854

(Note: slope = discrimination)

And in the mirt package:
$`q1`
a b g u
par 5.003 -0.167 0 1

$q2
a b g u
par 1.217 -1.159 0 1

$q3
a b g u
par 3.365 -1.582 0 1

$q4
a b g u
par 1.258 0.391 0 1

$q5
a b g u
par 0.124 -0.52 0 1

$q6
a b g u
par 2.093 -0.583 0 1

$q7
a b g u
par 1.242 -1.113 0 1

$q8
a b g u
par 2.32 0.165 0 1

$q9
a b g u
par 2.022 -0.058 0 1

$q10
a b g u
par 1.76 -2.333 0 1

(a is discrimination, b is difficulty)

I am sorry this is a bit hard to read, but I can't really control the formatting from SAS or R to be more conducive to this forum. Anyway, all 3 packages produce results that are in close agreement. Comparing the estimated parameters to the simulated ones, there are a few that are slightly off, but I'd expect this is par for the course.

Using explanatory IRT models, in Stata, I did this:

Code:

gsem (Th -> q1-q10, logit) (Th <- i.group), variance(e.Th@1)
...
Th           |
     1.group |   .2066424   .0696031     2.97   0.003     .0702229     .343062

In the mirt package, the corresponding estimate for group is 0.230, 95% CI (0.056, 0.404). There's less agreement on this score. For the record, in the simulated data, the difference in theta scores is 0.218. I am pretty sure mirt uses a very different estimation algorithm than Stata, and I also think the default convergence criteria is less strict than Stata. Overall, I'd expect that differences in parameter estimates are attributable to differences in the estimation machinery.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

Comment

Saul Gabriel

Join Date: May 2014

Posts: 44
#8

16 Mar 2019, 16:41

Hi Weiwen, thanks for this!

indeed the data restructuring seems odd...even here:
https://www.ncbi.nlm.nih.gov/pmc/art...5/#!po=38.3333, a newer treatment on the topic, it is done. See, for instance: http://supp.apa.org/psycarticles/supplemental/met_14_2_101/met_bauer0079_supp.pdf

Note, the model in the main paper is an extension of explanatory IRT models, but both can be seem as generalized latent variable models.

Both proc nlmixed in SAS and Mplus fit models where all factor parameters and item factors are “explained” by various covariates. I don’t think Stata can estimate all these parameters, even manually. Another issue is exporting the estimates (MAPs). Specifics can be found here:
https://www.ncbi.nlm.nih.gov/pmc/art...pplement-1.pdf

As I have a specific interest in implementing this type of mode in Stata, perhaps we can chat offline?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1240
#9

17 Mar 2019, 17:26

Saul, feel free to PM me, but I would like us to post any general findings on the thread if it benefits the wider Stata community.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement