Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Explanatory IRT Models in gllamm/Stata

    Hello,
    Is anyone familiar with applications of the latent regression Rasch model in gllamm (Stata), equivalent to the NLMIXED PROC in SAS?
    I understand that Zheng & Rabe-Hesketh as well as Rabe-Hesketh have some papers on IRT models in gllamm.
    That is, can you point me to papers that have compared results for various Rasch models or explanatory IRT models in gllamm with NLMIXED?
    Thanks,
    Saul

  • #2
    Saul,

    In my experience, people here tend not to be very familiar with SAS. I can do some SAS programming, but I'm not familiar with PROC NLMIXED. I have also not used Stata's user-written -gllamm- command, either. From what I can tell, Stata's stock -gsem- command seems to have acquired the capabilities that -gllamm- has. I don't know if -gsem- can fit all the models that -gllamm- possibly can, but -gsem- can definitely fit IRT models in Stata 14 and 15. In fact, Stata's stock IRT commands are wrappers for -gsem- (that is, the IRT commands call -gsem-). From there, it is easy to include some sort of covariate to explain the latent trait (I assume this is what you mean by explanatory IRT; if it's not, please clarify). You will have to write the code in -gsem-, but I don't think you need to fit the models in -gllamm-.

    For example, slide 10.4 onwards in this presentation by Rebecca Pope go over an IRT model with one covariate, then a multilevel IRT model, all in the -gsem- syntax.

    Also, Rafal Raciborski of StataCorp pointed out, in post 3 at this thread, that after you fit a regular IRT model, you can get it to display the exact -gsem- syntax used. For example:

    Code:
    webuse charity
    irt grm ta1-ta5
    display e(cmdline2)
    For the graded response and the 2-parameter logistic IRT models, the syntax is quite simple anyway.

    To demonstrate IRT with a covariate in actual data, you could fit that GRM to the charity data, then type:

    Code:
    predict theta, latent
    gen x = 0.5 * theta + rnormal()
    gsem (Theta -> ta1-ta5, ologit) (x -> Theta), variance(e.Theta@1) latent(Theta)
    You must include the e. in front of Theta in the variance option (it indicates the error of Theta). You'll note, if you compared gsem code, that I omitted the from and constraint options. There aren't any constraints in the GRM, and the from option doesn't affect the final solution here.

    If you want to compare results to SAS, I suppose you could download the charity data, then have SAS import the data and fit the model. I can't help with syntax.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Thank you Weiwen. I think I might be able to implement many of the models I am interested in--and I am interested in work by McCullagh & Nelders as well as De Boeck & Wilson but more precisely Bauer & Hussong on moderated non linear factor analysis. I am indeed interested in predicting item responses but need different link functions given differently measured items.

      So it appears that gllamm and gsem allow for models that are quite close to this, but not exactly the same...

      Best regards,
      S

      Comment


      • #4
        I glanced at this paper by Bauer and Hussong. I am pretty sure gsem would be able to fit this type of model. gsem can definitely fit IRT models with mixed item types. -irt hybrid- is the stock IRT command, from which you could extract the -gsem- syntax. Or, something like those two authors described might look like

        Code:
        gsem (Theta -> alcohol_use_disorder, logit) (Theta -> consequences, poisson) (Theta -> heavy_use use_frequency, ologit) (Theta -> expectancies, gaussian), variance(Theta@1)
        I don't agree with the authors that the last item is a censored normal model. Censoring only happens if values below zero do exist but are coded as zero due to, say, instrument detection limits or bottom coding or something else. Nonetheless, if you do, gsem should be able to treat the variable as censored. Search the -gsem- help for family and link options.
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          Thanks again! I will try to implement these models in Stata-gsem. B&H and their colleagues did introduce this procedure, MNLFA, in SAS originally so I may encounter some challenges. I do wonder if Stata would allow me to forgo the censored specification for items whose values are limited...

          Comment


          • #6
            Originally posted by Saul Gabriel View Post
            Thanks again! I will try to implement these models in Stata-gsem. B&H and their colleagues did introduce this procedure, MNLFA, in SAS originally so I may encounter some challenges. I do wonder if Stata would allow me to forgo the censored specification for items whose values are limited...
            To be clear, the pseudocode I wrote above does not treat the variable expectancies as censored. If you want to treat it as censored, you can read -gsem-'s family and link options, and that should be easy to implement (but beware convergence issues). I do not think that that it is substantively meaningful to treat that variable as censored. I should also point out, it would be much better to have the individual components of that variable (I think it was 3 ordinal variables).
            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

            Comment


            • #7
              I'm returning to this old post. The original question was, has anyone compared the results of Stata fitting an explanatory IRT model to SAS PROC NLMIXED? From a brief search of the web, said procedure needs you to reshape the data into a bit of a strange format, which I'm unwilling to do. (For the record, this article says you have to reshape it into some strange combo of long and wide data, as described in their appendix.)

              I can, however, compare the results of Stata's output to SAS PROC IRT and the R package mirt (Phil Chalmers, available on CRAN or his Github site). PROC IRT can't seem to fit an explanatory IRT model, but it can fit a regular 2PL model.

              First, let's simulate some data. I'm simulating 10 questions, with discrimination parameters coming from a log-normal distribution with mean 0.3, variance 1 (referred to as a), and difficulty parameters from a standard normal distribution. Then, I'm simulating 2 groups of 500 people each. Group 1's Theta scores are drawn from a standard normal distribution, but group 2's scores are drawn from a normal distribution with mean 0.2, variance 1. Thus, in an explanatory IRT model, where we are decomposing Theta = XB + epsilon, we'd expect the beta for group to be equal to 0.2.

              Code:
              set seed 4142
              matrix a = J(1,10,.)
              matrix d = J(1,10,.)
              forvalues i = 1/10 {
                  matrix a[1,`i'] = exp(rnormal(0.3, 1))
                  matrix d[1,`i'] = rnormal()
                  }
              clear
              set obs 1000
              gen group = _n > 500
              gen theta = rnormal() if group == 0
              replace theta = rnormal() + 0.2 if group == 1
              forvalues i = 1/10 {
                  gen q`i' = runiform() < invlogit(a[1,`i'] * (theta - d[1,`i']))
                  *gen p_q`i' = invlogit(a[1,`i'] * (theta - d[1,`i']))
                  }
              Fitting 2PL model in Stata with the standard commands:

              Code:
              ------------------------------------------------------------------------------
                           |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
              q1           |
                   Discrim |   5.106961   .7990473     6.39   0.000     3.540857    6.673065
                      Diff |  -.1660727    .040855    -4.06   0.000     -.246147   -.0859984
              -------------+----------------------------------------------------------------
              q2           |
                   Discrim |    1.22011   .1210252    10.08   0.000     .9829049    1.457315
                      Diff |  -1.156868   .1048785   -11.03   0.000    -1.362426   -.9513096
              -------------+----------------------------------------------------------------
              q3           |
                   Discrim |   3.353441   .5230281     6.41   0.000     2.328325    4.378558
                      Diff |  -1.581332   .0891136   -17.75   0.000    -1.755991   -1.406672
              -------------+----------------------------------------------------------------
              q4           |
                   Discrim |   1.261461   .1165501    10.82   0.000     1.033027    1.489895
                      Diff |   .3887097   .0702344     5.53   0.000     .2510527    .5263666
              -------------+----------------------------------------------------------------
              q5           |
                   Discrim |   .1239904   .0717811     1.73   0.084    -.0166981    .2646788
                      Diff |  -.5192549   .5925015    -0.88   0.381    -1.680537    .6420268
              -------------+----------------------------------------------------------------
              q6           |
                   Discrim |   2.097894   .1833799    11.44   0.000     1.738476    2.457312
                      Diff |  -.5817061   .0566207   -10.27   0.000    -.6926806   -.4707316
              -------------+----------------------------------------------------------------
              q7           |
                   Discrim |   1.244643   .1221168    10.19   0.000     1.005299    1.483988
                      Diff |  -1.110511   .1005592   -11.04   0.000    -1.307604   -.9134186
              -------------+----------------------------------------------------------------
              q8           |
                   Discrim |   2.332568    .206318    11.31   0.000     1.928192    2.736944
                      Diff |   .1629517   .0492375     3.31   0.001      .066448    .2594554
              -------------+----------------------------------------------------------------
              q9           |
                   Discrim |   2.030126    .172732    11.75   0.000     1.691578    2.368675
                      Diff |  -.0583262   .0512112    -1.14   0.255    -.1586983     .042046
              -------------+----------------------------------------------------------------
              q10          |
                   Discrim |   1.760748   .2483608     7.09   0.000     1.273969    2.247526
                      Diff |  -2.330548   .1993862   -11.69   0.000    -2.721338   -1.939758
              ------------------------------------------------------------------------------
              Using SAS PROC IRT:
              q1 Difficulty -0.1671 0.04191
              q2 Slope 5.00903 0.77077
              q2 Difficulty -1.159 0.10539
              q2 Slope 1.21696 0.12066
              q3 Difficulty -1.5825 0.08931
              q3 Slope 3.35981 0.52659
              q4 Difficulty 0.39093 0.07057
              q4 Slope 1.25768 0.11632
              q5 Difficulty -0.5201 0.59479
              q5 Slope 0.12357 0.07164
              q6 Difficulty -0.5827 0.05725
              q6 Slope 2.09254 0.18274
              q7 Difficulty -1.1125 0.10106
              q7 Slope 1.24149 0.12174
              q8 Difficulty 0.16491 0.04986
              q8 Slope 2.31988 0.20476
              q9 Difficulty -0.0575 0.05195
              q9 Slope 2.02157 0.17157
              q10 Difficulty -2.3327 0.19975
              q10 Slope 1.7597 0.24854
              (Note: slope = discrimination)

              And in the mirt package:
              $`q1`
              a b g u
              par 5.003 -0.167 0 1

              $q2
              a b g u
              par 1.217 -1.159 0 1

              $q3
              a b g u
              par 3.365 -1.582 0 1

              $q4
              a b g u
              par 1.258 0.391 0 1

              $q5
              a b g u
              par 0.124 -0.52 0 1

              $q6
              a b g u
              par 2.093 -0.583 0 1

              $q7
              a b g u
              par 1.242 -1.113 0 1

              $q8
              a b g u
              par 2.32 0.165 0 1

              $q9
              a b g u
              par 2.022 -0.058 0 1

              $q10
              a b g u
              par 1.76 -2.333 0 1

              (a is discrimination, b is difficulty)

              I am sorry this is a bit hard to read, but I can't really control the formatting from SAS or R to be more conducive to this forum. Anyway, all 3 packages produce results that are in close agreement. Comparing the estimated parameters to the simulated ones, there are a few that are slightly off, but I'd expect this is par for the course.

              Using explanatory IRT models, in Stata, I did this:

              Code:
              gsem (Th -> q1-q10, logit) (Th <- i.group), variance(e.Th@1)
              ...
              Th           |
                   1.group |   .2066424   .0696031     2.97   0.003     .0702229     .343062
              In the mirt package, the corresponding estimate for group is 0.230, 95% CI (0.056, 0.404). There's less agreement on this score. For the record, in the simulated data, the difference in theta scores is 0.218. I am pretty sure mirt uses a very different estimation algorithm than Stata, and I also think the default convergence criteria is less strict than Stata. Overall, I'd expect that differences in parameter estimates are attributable to differences in the estimation machinery.
              Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

              When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

              Comment


              • #8
                Hi Weiwen, thanks for this!

                indeed the data restructuring seems odd...even here:
                https://www.ncbi.nlm.nih.gov/pmc/art...5/#!po=38.3333, a newer treatment on the topic, it is done. See, for instance: http://supp.apa.org/psycarticles/supplemental/met_14_2_101/met_bauer0079_supp.pdf

                Note, the model in the main paper is an extension of explanatory IRT models, but both can be seem as generalized latent variable models.


                Both proc nlmixed in SAS and Mplus fit models where all factor parameters and item factors are “explained” by various covariates. I don’t think Stata can estimate all these parameters, even manually. Another issue is exporting the estimates (MAPs). Specifics can be found here:
                https://www.ncbi.nlm.nih.gov/pmc/art...pplement-1.pdf

                As I have a specific interest in implementing this type of mode in Stata, perhaps we can chat offline?

                Comment


                • #9
                  Saul, feel free to PM me, but I would like us to post any general findings on the thread if it benefits the wider Stata community.
                  Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                  When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                  Comment

                  Working...
                  X