Create disability scores using Item Response Theory model

Dung Le

Join Date: May 2018
Posts: 120

Create disability scores using Item Response Theory model

15 Jun 2018, 10:01

Hi all,

I am doing a research on functional disability and I have read some papers that used Item Response Theory model (IRT) to create disability scores instead of using binary or continuous variables. Here is a quote "Basically, IRT is used to develop a calibrated disability score (the dependent variable) which is derived using a partial credit model with item calibration. An item calibration is obtained for each item. To determine how well each item contributed to common global health measurement, chi-square fit statistics are calculated. The calibration for each of the health items is taken into account, and the raw scores are transformed through Rasch modeling into a continuous cardinal scale (0, worst health; 100, best health)".

In fact, I have read instructions on how to construct disability scores using IRT from some papers but it seems still confused to me and I do now know how to start. I am posting an example of a data set so that ones may take a look and I hope that someone is familiar with these techniques and can give me a right direction.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(i9a1 i9a2 i9b1 i9b2 i9c1 i9c2 i9d1 i9d2 i9e1 i9e2 i9f1 i9f2 i9g1 i9g2 i10a1 i10a2 i10b1 i10b2 i10c1 i10c2 i10d1 i10d2 i10e1 i10e2)
2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 .
2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 .
2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 .
1 1 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 .
1 3 1 3 1 4 1 2 1 4 1 3 1 2 1 2 1 3 1 4 1 4 1 2
2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 .
2 . 2 . 2 . 2 . 1 1 2 . 2 . 2 . 2 . 2 . 2 . 2 .
2 . 2 . 1 2 2 . 1 2 1 2 2 . 2 . 2 . 2 . 2 . 2 .
1 2 1 3 1 4 2 . 1 1 1 1 2 . 2 . 2 . 2 . 2 . 2 .
2 . 1 2 1 1 1 1 2 . 1 1 2 . 2 . 2 . 2 . 2 . 2 .
2 . 2 . 1 1 2 . 1 2 1 1 1 1 2 . 2 . 2 . 2 . 2 .
1 4 1 2 1 4 2 . 1 2 1 2 2 . 1 3 2 . 1 3 1 2 1 3
2 . 2 . 1 3 1 1 1 2 2 . 2 . 2 . 2 . 2 . 2 . 2 .
1 3 1 3 1 2 2 . 1 3 1 1 2 . 2 . 2 . 2 . 2 . 2 .
2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 . 2 .
end
label values i9a1 LABEL_I9A1
label def LABEL_I9A1 1 "Yes", modify
label def LABEL_I9A1 2 "No", modify
label values i9a2 LABEL_I9A2
label def LABEL_I9A2 1 "Mild", modify
label def LABEL_I9A2 2 "Moderate", modify
label def LABEL_I9A2 3 "Severe", modify
label def LABEL_I9A2 4 "Can not do at all", modify
label values i9b1 LABEL_I9B1
label def LABEL_I9B1 1 "Yes", modify
label def LABEL_I9B1 2 "No", modify
label values i9b2 LABEL_I9B2
label def LABEL_I9B2 2 "Moderate", modify
label def LABEL_I9B2 3 "Severe", modify
label values i9c1 LABEL_I9C1
label def LABEL_I9C1 1 "Yes", modify
label def LABEL_I9C1 2 "No", modify
label values i9c2 LABEL_I9C2
label def LABEL_I9C2 1 "Mild", modify
label def LABEL_I9C2 2 "Moderate", modify
label def LABEL_I9C2 3 "Severe", modify
label def LABEL_I9C2 4 "Can not do at all", modify
label values i9d1 LABEL_I9D1
label def LABEL_I9D1 1 "Yes", modify
label def LABEL_I9D1 2 "No", modify
label values i9d2 LABEL_I9D2
label def LABEL_I9D2 1 "Mild", modify
label def LABEL_I9D2 2 "Moderate", modify
label values i9e1 LABEL_I9E1
label def LABEL_I9E1 1 "Yes", modify
label def LABEL_I9E1 2 "No", modify
label values i9e2 LABEL_I9E2
label def LABEL_I9E2 1 "Mild", modify
label def LABEL_I9E2 2 "Moderate", modify
label def LABEL_I9E2 3 "Severe", modify
label def LABEL_I9E2 4 "Can not do at all", modify
label values i9f1 LABEL_I9F1
label def LABEL_I9F1 1 "Yes", modify
label def LABEL_I9F1 2 "No", modify
label values i9f2 LABEL_I9F2
label def LABEL_I9F2 1 "Mild", modify
label def LABEL_I9F2 2 "Moderate", modify
label def LABEL_I9F2 3 "Severe", modify
label values i9g1 LABEL_I9G1
label def LABEL_I9G1 1 "Yes", modify
label def LABEL_I9G1 2 "No", modify
label values i9g2 LABEL_I9G2
label def LABEL_I9G2 1 "Mild", modify
label def LABEL_I9G2 2 "Moderate", modify
label values i10a1 LABEL_I10A1
label def LABEL_I10A1 1 "Yes", modify
label def LABEL_I10A1 2 "No", modify
label values i10a2 LABEL_I10A2
label def LABEL_I10A2 2 "Moderate", modify
label def LABEL_I10A2 3 "Severe", modify
label values i10b1 LABEL_I10B1
label def LABEL_I10B1 1 "Yes", modify
label def LABEL_I10B1 2 "No", modify
label values i10b2 LABEL_I10B2
label def LABEL_I10B2 3 "Severe", modify
label values i10c1 LABEL_I10C1
label def LABEL_I10C1 1 "Yes", modify
label def LABEL_I10C1 2 "No", modify
label values i10c2 LABEL_I10C2
label def LABEL_I10C2 3 "Severe", modify
label def LABEL_I10C2 4 "Can not do at all", modify
label values i10d1 LABEL_I10D1
label def LABEL_I10D1 1 "Yes", modify
label def LABEL_I10D1 2 "No", modify
label values i10d2 LABEL_I10C3
label def LABEL_I10C3 2 "Moderate", modify
label def LABEL_I10C3 4 "Can not do at all", modify
label values i10e1 LABEL_I10E1
label def LABEL_I10E1 1 "Yes", modify
label def LABEL_I10E1 2 "No", modify
label values i10e2 LABEL_I10C4
label def LABEL_I10C4 2 "Moderate", modify
label def LABEL_I10C4 3 "Severe", modify

Thank you.

Best regards,

DL

Tags: None

William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

15 Jun 2018, 10:16

In fact, I have read instructions on how to construct disability scores using IRT from some papers but it seems still confused to me and I do now know how to start.

If you are using Stata 14 or later, have you looked at the Stata Item Response Theory Reference Manual PDF included with your Stata installation and accessible from Stata's Help menu?
Comment
Dung Le

Join Date: May 2018

Posts: 120
#3

16 Jun 2018, 00:18

Originally posted by William Lisowski View Post

If you are using Stata 14 or later, have you looked at the Stata Item Response Theory Reference Manual PDF included with your Stata installation and accessible from Stata's Help menu?

Dear William,

Thank you for your response. Yes, I have read some of manuals that you mentioned but it seems abstract to me. Do you have any more detailed hints?

Best regards,

Dung Le
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

16 Jun 2018, 06:47

Dung,

It is hard to teach the whole concept of IRT from scratch on a forum. If you aren't familiar with the concept of Guttman scaling, I would familiarize yourself with that. If you are talking about physical function, then in older adults, the Activities of Daily Living items formulated by Katz in the mid 1970s arguably form a Guttman scale. For example, bathing and dressing are typically the first ADLs that older adults lose function in, followed by transferring, with eating last. If someone can't eat without assistance, it's a very good bet that they also can't bathe or dress without assistance. But being unable to bathe or dress without assistance is no guarantee that they can't eat without assistance. Math ability is one trait that probably exhibits scalability like this, and the social distance scale in the Wikipedia article probably does as well.

I think that originally, Guttman conceived of Guttman scales as non-probabilistic - e.g. if you can't eat without assistance, then you definitely can't bathe or dress. IRT requires some level of scalability, but it conceives of things probabilistically (as I phrased in in the paragraph above). Just grasp the concept of Guttman scaling.

I would assume you understand the concept of dimensionality - ideally, a concept should be as unidimensional as possible before it can be measured by IRT (at least, by unidimensional IRT models; multidimensional IRT is a big step up in difficulty). If you're not familiar with this concept, you should read up on it. For example, mood is generally considered to have at least two dimensions - negative and positive affect. Depression is generally considered to have at least two dimensions as well - those being negative affect and somatic symptoms (i.e. physical symptoms, like tiredness or agitation).

From there, familiarize yourself with logistic and ordered logistic regression. Here, we predict the log odds of having a 1 vs a 0 (or having a 1 or higher vs a 0, having a 2 or higher vs 1 or lower, etc for ordered logistic) from observed covariates. We basically say that the log odds of P(Y | X) = XB. We're trying to predict P for each person given covariates.

Then, be familiar with the concept of latent variables. We can measure things like blood pressure, or blood glucose, or healthcare spending directly. We can not measure depression directly. But we can estimate the level of depression from the observed indicators - if someone has a certain pattern of responses to a depression questionnaire, we can estimate how depressed they are (if the instrument is scalable and if it is unidimensional enough). It is like a logistic regression, but in the right hand side of the equation, we don't get to directly observe one of the Xs - i.e. the latent trait of depression. At this point, it will probably be better to start working through one of Stata's examples in the IRT manual.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
3 likes
Comment
Dung Le

Join Date: May 2018

Posts: 120
#5

16 Jun 2018, 07:52

Dear Weiwen Ng,

Thank you so much for your informative and concise concepts and examples. In general, I have some basic knowledge about those concepts. My problem is that I do not know where to start with Stata. You have pointed out a good advice, I think I will work with Stata's examples to get use to IRT models first and then try to apply into my data.

Best regards,

Dung Le
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#6

18 Jun 2018, 14:18

Dung,

When you finish reading up on the Stata examples, here is some code that will make processing your example data a bit easier. It looks like you have 12 items here, but they are asked in a two-part format: a) do you have any difficulty doing X? Then, if yes, b) how hard is it to do X? In this case, it would probably be easier if you just coded the b) series questions with 0s if the corresponding a) question is a no.

Given your data example, you can simply type:

Code:

recode i9?2 (. = 0) recode i10?2 (. = 0) irt grm i9?2 irt grm i10?2

? is one, and only one, wildcard character. Then, assuming that i9 and i10 each correspond to a different unidimensional trait, you are ready to run a graded response model (or a generalized partial credit model; they are both suitable for ordered responses, and both allow each item to have its own discrimination parameter, but I honestly can't tell if one is recommended over the other).

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Dung Le

Join Date: May 2018

Posts: 120
#7

26 Jun 2018, 06:42

Dear Weiwen Ng,

Thank you so much for your suggestions and example codes. I have been away for a while and just been back to work with IRT. I will try to follow your instructions and will get back to you soon.

Best regards,

Dung Le
Comment

Dung Le

Join Date: May 2018
Posts: 120

09 Jul 2018, 07:47

Originally posted by Weiwen Ng View Post

Dung,

When you finish reading up on the Stata examples, here is some code that will make processing your example data a bit easier. It looks like you have 12 items here, but they are asked in a two-part format: a) do you have any difficulty doing X? Then, if yes, b) how hard is it to do X? In this case, it would probably be easier if you just coded the b) series questions with 0s if the corresponding a) question is a no.

Given your data example, you can simply type:

Code:

recode i9?2 (. = 0)
recode i10?2 (. = 0)
irt grm i9?2
irt grm i10?2

? is one, and only one, wildcard character. Then, assuming that i9 and i10 each correspond to a different unidimensional trait, you are ready to run a graded response model (or a generalized partial credit model; they are both suitable for ordered responses, and both allow each item to have its own discrimination parameter, but I honestly can't tell if one is recommended over the other).

Dear @Weiwen Ng,

Based on your instructions, I have created raw scores of disability, however, I have spent time to search references for using chi2 square test to test goodness-of-fit of the model and then convert raw scores to continuous cardinal scale (0 worst health; 100 best health) but I could not find ones. Do you have clues for these two issues? Followings are my example data and codes to run partial scale models using irt command.

Data

Code:


	Code:
	* Example generated by -dataex-. To install: ssc install dataex
clear
input long qid float(k1 k2 k3 k4 k5 k6 k7 k8 k9 k10 k11 k12)
 11 0 0 0 0 0 0 0 0 0 0 0 0
 13 0 0 0 0 0 0 0 0 0 0 0 0
 15 3 3 4 2 4 3 2 2 3 4 4 2
 16 0 0 0 0 0 0 0 0 0 0 0 0
 18 0 0 2 0 2 2 0 0 0 0 0 0
 19 2 3 4 0 1 1 0 0 0 0 0 0
110 0 2 1 1 0 1 0 0 0 0 0 0
111 0 0 1 0 2 1 1 0 0 0 0 0
112 4 2 4 0 2 2 0 3 0 3 2 3
113 0 0 3 1 2 0 0 0 0 0 0 0
114 3 3 2 0 3 1 0 0 0 0 0 0
115 0 0 0 0 0 0 0 0 0 0 0 0
119 3 3 3 2 2 2 3 3 1 2 2 0
120 1 0 0 0 0 2 0 0 0 0 0 0
122 2 2 2 0 2 2 0 2 0 0 0 2
end
label values k1 k1
label def k1 0 "No difficulty", modify
label def k1 1 "Mild", modify
label def k1 2 "Moderate", modify
label def k1 3 "Severe", modify
label def k1 4 "Can not do at all", modify
label values k2 k2
label def k2 0 "No difficulty", modify
label def k2 2 "Moderate", modify
label def k2 3 "Severe", modify
label values k3 k3
label def k3 0 "No difficulty", modify
label def k3 1 "Mild", modify
label def k3 2 "Moderate", modify
label def k3 3 "Severe", modify
label def k3 4 "Can not do at all", modify
label values k4 k4
label def k4 0 "No difficulty", modify
label def k4 1 "Mild", modify
label def k4 2 "Moderate", modify
label values k5 k5
label def k5 0 "No difficulty", modify
label def k5 1 "Mild", modify
label def k5 2 "Moderate", modify
label def k5 3 "Severe", modify
label def k5 4 "Can not do at all", modify
label values k6 k6
label def k6 0 "No difficulty", modify
label def k6 1 "Mild", modify
label def k6 2 "Moderate", modify
label def k6 3 "Severe", modify
label values k7 k7
label def k7 0 "No difficulty", modify
label def k7 1 "Mild", modify
label def k7 2 "Moderate", modify
label def k7 3 "Severe", modify
label values k8 k8
label def k8 0 "No difficulty", modify
label def k8 2 "Moderate", modify
label def k8 3 "Severe", modify
label values k9 k9
label def k9 0 "No difficulty", modify
label def k9 1 "Mild", modify
label def k9 3 "Severe", modify
label values k10 k10
label def k10 0 "No difficulty", modify
label def k10 2 "Moderate", modify
label def k10 3 "Severe", modify
label def k10 4 "Can not do at all", modify
label values k11 k11
label def k11 0 "No difficulty", modify
label def k11 2 "Moderate", modify
label def k11 4 "Can not do at all", modify
label values k12 k12
label def k12 0 "No difficulty", modify
label def k12 2 "Moderate", modify
label def k12 3 "Severe", modify
------------------ copy up to and including the previous line ------------------

My codes

Code:

irt pcm k1-k12
estat report, byparm
predict score_pcm, latent se(score_pcm_se)

A part from this, I also use irtgraph icc (item characteristic curve), iif (item information function), and tif (test information function) to test the item curves. For example for variable k1

Code:

irtgraph icc k1, blocation
irtgraph icc k1, xlabel(-4 .9655477  -.1022154 .7842295 1.425207 4, grid)

Thank you.

DL

Comment

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#9

09 Jul 2018, 12:15

Dung,

I am not that familiar with goodness of fit statistics for IRT. IRT models are a subclass of generalized structural equation models (the one you chose is built on a multinomial response model). In general, I'm not aware of any absolute goodness of fit statistics for generalized SEMs (they are defined for linear SEMs, but you are not using this class of model). Alternatively, people have defined person- and item-level residual statistics for IRT models, called infit and outfit statistics. One of our members wrote a Stata program to do this, but it actually calls Java. I'm also not sure if those statistics apply only to Rasch models (which are a subset of all IRT models).

wbuchanan can you clarify if infit and outfit residuals apply to the partial credit model that Dung chose? And, side note, would they apply outside of Rasch models (e.g. in the graded response model)?

Back on topic, relative fit statistics (e.g. BIC) are always calculated by Stata and can be used to compare nested models.

Then, you asked about converting the scores to something on a cardinal scale. I now realize that I'm not actually 100% sure what that means. In principle, I believe the latent trait measured by IRT is continuous and estimated on an interval metric, but I think cardinal means something else entirely. If I remember right, the numbers that come out of utility-weighted instruments like the EuroQol-5 or the SF-6D have cardinal properties, such that there is a meaningful zero point.

In theory, the trait you estimated has a mean of 0 and a standard deviation of 1. The minimum level of the trait possible to measure on whatever your scale is probably does not correspond to 0 on something like a utility scale. If this is what you were trying to do, this is impossible. In theory, I suppose you would want to go and create utility weights for the scale by presenting people with standard gambles or time tradeoffs and eliciting their expected utility - assuming you buy the fundamental assumptions of that type of exercise, and I am not entirely sure I do. Either way, Stata can't help you do that. Some more reading on the definition of ordinal vs cardinal in the context of utility estimation is here.

If you just want to transform scores to something like the T-score metric, which has a mean of 50 and a standard deviation of 10, then my intuition says to take the latent trait, multiply by 10, and add 50. If someone has theta = 2, we know they're 2 standard deviations above the mean of theta (because we assume, for better or worse, that theta has a standard normal distribution, i.e. mean 0, variance 1, standard deviation = sqrt(variance) = 1). If we multiply by 10, then their theta = 20. If we add 50, then theta = 70. If someone has theta 0, then on this transformation, their score is now 50.

Mathematically, we know that if x is a random variable and k is a constant, then variance of kx is (k^2) * x. We multiplied theta, a standard normal variable, by 10, so its variance should now be 10^2 = 100. And because SD the square root of variance, we get an SD of 10. Merely adding a constant to a random variable does not change its variance.

Note that a) I have no idea how this affects the standard error of theta, which Stata can otherwise estimate for you and b) this is based on my intuition, but I've never had to convert any IRT score to a T-score before, so this could be wrong.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
2 likes
Comment
Dung Le

Join Date: May 2018

Posts: 120
#10

09 Jul 2018, 23:56

Dear Weiwen Ng,

Thank you so much for your informative explanations and intuition regarding my issues. What I am trying to do is based on WHO and World Bank's guidance on disability scoring that seemingly becomes disability's measurement standards in social and health research, especially for comparison studies (e.g., one country to another or multiple countries that use disability as a dependent variable).

However, the guidance and related studies that used the same techniques did not provide further information than I did here. The guidance can be found at page 289 onward, technical appendix C, World report on disability: http://www.who.int/disabilities/worl...011/report.pdf.

Basically, I think the reason they convert raw scores to continuous cardinal scale (0 worst health; 100 best health or vice versa) is to determine threshold for disability in next step (e.g., persons with 0-40 scores will be defined as disability and so on). What I meant the next step is that after converting the raw scores, they then take chronic diseases into account to determine the disability threshold under strong assumption that disability and chronic diseases are strongly related.

Regarding the goodness-of-fit test, a side from chi2 square test, I have seen some technical papers using pcmtest or gllamm for tests of fit. I will try to apply those techniques into my data.

pcmodel: https://ideas.repec.org/a/tsj/stataj...2p464-481.html
gllamm: https://www.stata-journal.com/sjpdf....iclenum=st0129

Best regards,

DL
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#11

10 Jul 2018, 12:13

Originally posted by Dung Le View Post

Dear Weiwen Ng,

Thank you so much for your informative explanations and intuition regarding my issues. What I am trying to do is based on WHO and World Bank's guidance on disability scoring that seemingly becomes disability's measurement standards in social and health research, especially for comparison studies (e.g., one country to another or multiple countries that use disability as a dependent variable).

However, the guidance and related studies that used the same techniques did not provide further information than I did here. The guidance can be found at page 289 onward, technical appendix C, World report on disability: http://www.who.int/disabilities/worl...011/report.pdf.

Basically, I think the reason they convert raw scores to continuous cardinal scale (0 worst health; 100 best health or vice versa) is to determine threshold for disability in next step (e.g., persons with 0-40 scores will be defined as disability and so on). What I meant the next step is that after converting the raw scores, they then take chronic diseases into account to determine the disability threshold under strong assumption that disability and chronic diseases are strongly related.

Regarding the goodness-of-fit test, a side from chi2 square test, I have seen some technical papers using pcmtest or gllamm for tests of fit. I will try to apply those techniques into my data.

pcmodel: https://ideas.repec.org/a/tsj/stataj...2p464-481.html
gllamm: https://www.stata-journal.com/sjpdf....iclenum=st0129

Best regards,

DL

Dung,

I don't have time to read the WHO technical appendix in detail, but it seems likely that they did a simple transformation to T-scores like I suggested. The latent trait they're estimating is disability, using the items outlined in C1. When I read the items, I would expect most people to report only mild limitations on the 'easier' items. For example, I'm 38, and I would report mild difficulty on bodily aches and pains, concentrating/remembering, and distance vision, followed by no limitations on anything else. In figure C1, nearly everyone reports an IRT disability score of less than 50. (i.e. the mean is driven up by some people with very high disability.) They may have centered the T-score metric on something other than 50, or they may have centered it at 50, but this looks pretty reasonable for something like a T-score on a trait that should have a very, very skewed distribution.

Side note: this article has some discussion on alternatives to the traditional standard normal latent trait distribution that IRT typically assumes.

Reise, S., Rodriguez, A., Spritzer, K.L., Hays, R.D. (2017) Alternative Approaches to Addressing Non-Normal Distributions in the Application of IRT Models to Personality Measures. Journal of Personality Assessment.

(Not a full citation, I haven't entered it into my citation manager yet.)

I can't comment on other goodness of fit statistics. I hope that someone else can.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
2 likes
Comment
Dung Le

Join Date: May 2018

Posts: 120
#12

10 Jul 2018, 23:11

Dear Weiwen Ng,

Thank you for your responses and the reference paper. I will try ways around and let's see how far I can get. Will get back to you once I have final solutions.

Cheers.

DL
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#13

11 Jul 2018, 03:24

Weiwen Ng
infit/outfit to the best of my knowledge are specific to Rasch derived models. There may be similar/comparable measures that have been developed for other IRT models, but I've not come across any. In terms of broader testing the goodness of fit of the models it typically is related to the determination of which model from the IRT world fits the data best (or confirming if the data fit the model in the case of the world of Rasch). In Stata you can use the log-likelihood ratio tests after fitting models to test whether freeing additional parameters improves the fit of the model to the data. For item specific fit statistics I'm not aware of any. There are, however, some goodness of fit measures that exist for SEM that use ordinal, nominal, count, and survival data, but they are not currently implemented in Stata at this time. Mplus definitely implements some of these GoF measures and I wouldn't be surprised if EQS did as well.
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#14

11 Jul 2018, 06:37

Originally posted by wbuchanan View Post

Weiwen Ng
infit/outfit to the best of my knowledge are specific to Rasch derived models. There may be similar/comparable measures that have been developed for other IRT models, but I've not come across any. In terms of broader testing the goodness of fit of the models it typically is related to the determination of which model from the IRT world fits the data best (or confirming if the data fit the model in the case of the world of Rasch). In Stata you can use the log-likelihood ratio tests after fitting models to test whether freeing additional parameters improves the fit of the model to the data. For item specific fit statistics I'm not aware of any. There are, however, some goodness of fit measures that exist for SEM that use ordinal, nominal, count, and survival data, but they are not currently implemented in Stata at this time. Mplus definitely implements some of these GoF measures and I wouldn't be surprised if EQS did as well.

Got it. Dung chose to use Andrich's partial credit model, which is a Rasch model (unless I read wrongly), and which constrains all items' discrimination parameters to be equal. Is the generalized partial credit model a Rasch model?

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
wbuchanan

Join Date: Mar 2014

Posts: 1362
#15

11 Jul 2018, 09:20

Weiwen Ng
To the best of my knowledge there aren't any official Stata commands for fitting any type of Rasch model. It isn't just that the discrimination parameters are constrained to be equal, but that they are specifically constrained to equal 1 for a Rasch model.
Comment

Announcement

Create disability scores using Item Response Theory model

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment