Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with Multidimensional IRT

    Good day StataList.

    I am having a bit of trouble in doing Multidimensional IRT. I have already done confirmatory factor analysis and extracted the latent abilities for the model that I have below.



    So I would like to ask the following to the community
    1. Is the code I used to extract the latent ability correct?
    2. Is it possible to create a model with another latent ability that is dependent on those 3 latent ability? An overall latent ability if I may say so.
    3. How can I estimate item parameters (item difficulty and discrimination) in a Multidimensional IRT model like this one? Items are dichotomous
    Code I used is as follows:
    Code:
    gsem (L1 -> q1, ) (L1 -> q2, ) (L1 -> q3, ) (L1 -> q4, ) (L1 -> q6, ) (L1 -> q7, ) (L1 -> q9, ) (L1 -> q10, ) (L2 -> q11, ) (L2 -> q12, ) (L2 -> q13, ) (L2 -> q15, ) (L2 -> q16, ) (L2 -> q17, ) (L2 -> q19, ) (L3 -> q21, ) (L3 -> q22, ) (L3 -> q23, ) (L3 -> q24, ) (L3 -> q25, ) (L3 -> q26, ) (L3 -> q27, ) (L3 -> q28, ) (L3 -> q29, ) (L3 -> q30, ), covstruct(_lexogenous, diagonal) latent(L1 L2 L3 ) cov( L2*L1 L3*L1 L3*L2) nocapslatent
    predict L*, latent
    Thank you to anyone who could shed some light to this.

  • #2
    Perhaps I should check the Stata forum more frequently.

    1) The gsem code above may not technically be an item response theory model in that it doesn't constrain the variances of the latent traits to 1. IRT is a type of (generalized) SEM, but not all GSEMs are IRT models, and in the above you won't get results in IRT parameterization. Also, there's an issue that seems like syntax confusion. First, Emmanuel specified that the latent exogenous variables (L1, L2, L3) have a diagonal covariance structure. That means that L1, L2, and L3 have their covariances fixed at 0, i.e. they are constrained to be independent of each other, and variances are unrestricted. However, the syntax also tells Stata to estimate covariances between L1 and L2, L1 and L3, and L2 and L3. I don't know which command overrides which one, but conceptually, you will need to choose one. Assuming Emmanuel meant the traits to be correlated, the code might look like:

    Code:
    gsem (L1 -> q1-q4 q6) (L2 -> q11 q12 q13 q15 q16 q17 q19) (L3 -> q21-q30), variance(L1@1 L2@1 L3@1) covstructure(_LEx, unstructured)
    NB: I think that if you tell Stata that the covariance structure is unstructured but you have constrained the variances to 1, that should freely estimate the covariances between them. You could just use the cov(L2*L1 ...) syntax if not.

    Anyway, I assume that Emmanuel wanted to predict each observation's value of each of the latent traits. The code there is correct - it's just that with the original syntax, you won't get the latent traits in an IRT metric, plus the answer may be wrong.

    2) I am less familiar with this topic, but this sounds like a second-order factor model. I am not sure how many first-order factors you need to identify the second-order factor. I think it was at least 3 first order factors. There's a (linear) SEM example of a second order factor model in the manual, so I would just adapt that syntax. One thing to note is that in a second-order factor model, you have to constrain the first order factors to be uncorrelated, i.e. the diagonal covariance structure, but the variances still get constrained to 1.

    3) Here's a little secret about Stata: all the IRT models call on gsem in the background. Now, gsem parameterizes things differently than IRT. You run gsem, you get slopes and intercepts. Here's the other little secret: the slope you get is the same as the discrimination parameter. In binary models, the difficulty parameters are minus intercept divided by slope. If you looked at the Stata 13 manual's SEM example 29, you'd see this to be the case (I'm not sure if they took that example out of the current manual). In ordinal models, they are intercept divided by slope (no minus).

    You can use Excel to do your calculations, or you can use nlcom. The manual has you create a new dataset and use Stata to calculate the difficulty parameters, so you can also do that. This way and nlcom will require you to use the symbolic names for the parameters. If you type gsem, coeflegend after estimation, you'll see everything with its symbolic name.

    Hope this helps someone, if not Emmanuel.

    For teaching purposes, note that (if I read the syntax correctly) each trait has unique questions that load on it. In some contexts, you might want to specify some questions loading on multiple traits. For example, in education, my understanding is that some complex word problems might draw on both verbal and math ability, so this sort of structure would make sense there. I am in health services research, and here, a multidimensional IRT model would generally not have this sort of structure; we would have designed each question to measure only one of the latent traits of interest.

    (Unless it's a bifactor IRT model, but that's another story.)
    Last edited by Weiwen Ng; 19 Jan 2021, 12:08.
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Whoops. Don't use my suggested syntax above. Building off SEM example 31:

      Code:
      use http://www.stata-press.com/data/r13/gsem_cfa
      gsem (MathAb -> q1-q8, logit) (MathAtt -> att1-att5, ologit), variance(MathAb@1 MathAtt@1)
      The code above fits a two-dimensional hybrid IRT model. If you add the option for an unstructured covariance structure, you override the specification that the latent trait variances get constrained to 1. Sorry about that! Take note of the estimated covariance between the latent traits: it's 0.4274. In this case, because the variance of each trait is 1, covariance = correlation. If you read through the SEM example to the footnotes where they go and calculate the correlation from the variances and the covariance, they also get 0.4274.

      Perhaps you are in the education field, and perhaps it's convention that verbal and math ability would be constrained to have 0 covariance. Let's pretend we can do that to the data above:

      Code:
      gsem (MathAb -> q1-q8, logit) (MathAtt -> att1-att5, ologit), variance(MathAb@1 MathAtt@1) covariance(MathAb*MathAtt@0)
      Back to the first model. If you went and fit 2 separate IRT models, you'd see that the discrimination parameters are pretty close to the slopes in our multidimensional IRT model (we wouldn't expect them to be equivalent, but we would expect them to be close). If you compare the values of the intercepts, things would seem off unless the discrimination/slope was close to 1. Here's some example output from the first half of the model, the one where we have binary indicators for math ability.

      Code:
      ------------------------------------------------------------------------------
                   |      Coef.  Legend
      -------------+----------------------------------------------------------------
      q1           |
            MathAb |   1.516799  _b[q1:MathAb]
             _cons |    .044612  _b[q1:_cons]
      -------------+----------------------------------------------------------------
      q2           |
            MathAb |   .5226993  _b[q2:MathAb]
             _cons |  -.4572216  _b[q2:_cons]
      -------------+----------------------------------------------------------------
      Here, MathAb is the slope or discrimination parameter. _cons is the intercept, which is mathematically related to the difficulty parameter. In binary models, it's minus one * intercept / slope. Here's how to calculate with nlcom:

      Code:
      nlcom (- _b[q1:_cons] / _b[q1:MathAb]) (-_b[q2:_cons] / _b[q2:MathAb])
      ------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
             _nl_1 |  -.0294119   .0838327    -0.35   0.726    -.1937209     .134897
             _nl_2 |   .8747315   .2708416     3.23   0.001     .3438917    1.405571
      ------------------------------------------------------------------------------
      Fit a 2 parameter logistic model to the first dimension, and you get pretty similar difficulty parameters:

      Code:
      irt 2pl q1-q8
      ------------------------------------------------------------------------------
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      q1           |
           Discrim |   1.466636   .2488104     5.89   0.000     .9789765    1.954296
              Diff |  -.0254571   .0853625    -0.30   0.766    -.1927645    .1418503
      -------------+----------------------------------------------------------------
      q2           |
           Discrim |   .5597118   .1377584     4.06   0.000     .2897102    .8297134
              Diff |    .824244   .2495516     3.30   0.001     .3351318    1.313356
      -------------+----------------------------------------------------------------
      The problem with nlcom is that I'm assuming you want to export your results to Excel, and that's a lot of typing, especially if you have an ordinal model. You can copy and paste the output table into Excel or use something like estout (Ben Jann, available on SSC), then calculate manually. That's a fair bit of copying and pasting. SEM example 29 does demonstrate how to create an output dataset and populate it using the symbolic coefficient names from the model you just calculated. You can then export that dataset to Excel. I suspect this is the easiest method to do.
      Last edited by Weiwen Ng; 20 Jan 2021, 12:08.
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment

      Working...
      X