Differential Item Functioning (DIF) with irt grm

Elle Bell

Join Date: Mar 2019

Posts: 2
#1

Differential Item Functioning (DIF) with irt grm

25 Mar 2019, 07:49

Hello all,

I have scoured the internet for answers, and downloaded several packages in Stata to no success. I am hoping that someone may be able to guide me on this.

I am analyzing a 40 item survey instrument using a modified 4-point Likert scale. I have been conducting item response theory analyses using irt grm. I need to conduct differential item functioning (dif) between two groups. I have tried to use the diflogistic command, but get an error that data must be missing, 0, or 1. I have tried using uirt, but get errors here too.

I saw a post on this from 2018 without any answers. Could anyone help me with conducting dif on ordinal items in Stata?

Thank you so much in advance!

Elle
Tags: None

1 like
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

25 Mar 2019, 08:43

i may try coding the scores from 0 to 3. Hopefully that helps.

Best regards,

Marcos
Comment
Elle Bell

Join Date: Mar 2019

Posts: 2
#3

25 Mar 2019, 10:40

Originally posted by Marcos Almeida View Post

i may try coding the scores from 0 to 3. Hopefully that helps.

Thank you for the suggestion. Unfortunately, I still got the same error when I recoded like that.
Comment

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

25 Mar 2019, 14:50

Please look at this toy example and see how adding 1 to the Likert scale (hence) spoils the analysis:

Code:

. webuse masc2
(Data from De Boeck & Wilson (2004))

. diflogistic q1-q9, group(female)

Logistic Regression DIF Analysis

      |      Nonuniform       |      Uniform
 Item |      Chi2       Prob. |      Chi2       Prob.
------+-----------------------+----------------------
   q1 |      1.03      0.3092 |     13.20      0.0003
   q2 |      1.39      0.2388 |      1.80      0.1793
   q3 |      0.39      0.5316 |      6.90      0.0086
   q4 |      7.25      0.0071 |      4.89      0.0270
   q5 |      2.29      0.1300 |      5.91      0.0150
   q6 |      1.18      0.2780 |      0.43      0.5117
   q7 |      0.04      0.8352 |      2.61      0.1064
   q8 |      0.96      0.3270 |      2.24      0.1347
   q9 |      0.23      0.6285 |      2.23      0.1352
-----------------------------------------------------

. gen p1 = q1 +1

. diflogistic p1 q2-q9, group(female)
variable p1 has invalid values;
 requires item variables be coded 0, 1, or missing
r(198);
. tab q1 p1

           |          p1
    item 3 |         1          2 |     Total
-----------+----------------------+----------
         0 |       579          0 |       579
         1 |         0        921 |       921
-----------+----------------------+----------
     Total |       579        921 |     1,500

Note: diflogistic demands binary variables.

Last edited by Marcos Almeida; 25 Mar 2019, 14:55.

Best regards,

Marcos

Comment

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#5

25 Mar 2019, 15:10

Actually, the generalized process for testing for DIF is covered in Raykov and Marcoulides' book on IRT in Stata, and while it generalizes to graded response models unlike the stock commands, it is a bit of a complicated affair. Conceptually, you want to fit your model in gsem, using the multiple group option (e.g. group(female)) , and start by constraining the discrimination and difficulty parameters to be identical between the groups. Then, you want to relax one item's discrimination parameter at a time, then repeat for the difficulty parameters. You can use likelihood ratio tests for the unconstrained models vs the constrained model (the first one you fit). The book suggests using a specific correction for multiple testing (Benjamini Hochberg).

Unfortunately, this process isn't automated for the graded response model. I shall work on some example coding. I actually have to do DIF testing for my dissertation work, but I switched to an R package because I found one that does automate the DIF testing process.

Note: when using gsem, the loading is the discrimination parameter, but the difficulty parameter is given by the intercept divided by loading (i.e. difficulty). This is alluded to in the GRM methods and examples (see the part on how the GRM is parametized; this refers to the parametization under the hood, i.e. in the gsem syntax that the irt command calls). (For logistic models, it's minus intercept / loading.)

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment

Weiwen Ng

Join Date: Jun 2015
Posts: 1241

26 Mar 2019, 13:00

I've come up with some example data. I'm simulating 5x 4 point Likert items from 2 groups (i.e. coded 0 to 3). The groups' means on the latent variable don't differ. Group 1 has more extreme discrimination coefficients than group 0. The cutpoints for items 1-4 are the same, but that item is of higher difficulty for group 1. With this random seed, the discrimination parameters for items 3-5 differ materially, but they aren't materially different for items 1 and 2.

The sample code is below. If you want to skip over this, then the cutpoints are in matrices q1-q5b at the top of the code box, and the simulated discrimination parameters are at the bottom.

Code:

clear
set seed 4142
matrix q1 = (-2.0, -1.0, 0)
matrix q2 = (-1.5, -1.0, -0.5)
matrix q3 = (-1.0, 0, 1.0)
matrix q4 = (-0.5, 0, 0.5)
matrix q5a = (-0.5, 0.5, 1.5)
matrix q5b = (-0.5, 0.9, 2.1)
matrix d1 = J(1,5,.)
matrix d2 = J(1,5,.)
forvalues d = 1/5 {
    matrix d1[1,`d'] = exp(rnormal(0,0.5))
    matrix d2[1,`d'] = exp(rnormal(0.5,0.5))
    }

set obs 5000
gen group = _n > 2500
gen depression = rnormal()
forvalues q = 1/4 {
    tempvar pcut1 pcut2 pcut3
    forvalues t = 1/3 {
        gen `pcut`t'' = invlogit(d1[1,`q'] * (depression - q`q'[1,`t'])) if group == 0
        replace `pcut`t'' = invlogit(d2[1,`q'] * (depression - q`q'[1,`t'])) if group == 1
        }
    gen q`q' = irecode(runiform(),1 - `pcut1',1 - `pcut2',1 - `pcut3')
    }
tempvar pcut1 pcut2 pcut3
forvalues t = 1/3 {
    gen `pcut`t'' = invlogit(d1[1,5] * (depression - q5a[1,`t'])) if group == 0
    replace `pcut`t'' = invlogit(d2[1,5] * (depression - q5b[1,`t'])) if group == 1
    }
gen q5 = irecode(runiform(),1 - `pcut1',1 - `pcut2',1 - `pcut3')

mat list d1

d1[1,5]
           c1         c2         c3         c4         c5
r1  1.7426744  1.0571136   1.554108  .92711226   .2888946

mat list d2

d2[1,5]
           c1         c2         c3         c4         c5
r1  1.6070999  1.0775936  .80350322  2.2200268  1.3544204

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

Comment

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#7

26 Mar 2019, 13:14

Now, here's some code to test for differences in the discrimination (i.e. loading) parameters. First, we have to fit a two-group model, only with all discrimination and difficulty parameters constrained equal, and with the mean and variance of the latent trait (which I'll call depression, this is a fictitious example) fixed to 0 and 1 in group 0 (i.e. they're free to vary in group 1).

One minor point: I found that you need to issue a pair of extra model statements for the reference model and each tested model. If you don't, the discrimination parameter for Q1 in group 1 gets constrained to 1. This appears to be a Stata default.

Code:

gsem (Depression -> q1-q5, ologit) (0: Depression -> q1@a) (1: Depression -> q1@a), group(group) mean(0: Depression@0) variance(0: Depression@1) byparm est store m0 gsem (Depression -> q1-q5, ologit) (0: Depression -> q1@a) (1: Depression -> q1@b), group(group) mean(0: Depression@0) variance(0: Depression@1) byparm est store md1 lrtest m0 md1 Likelihood-ratio test LR chi2(1) = 0.02 (Assumption: m0 nested in md1) Prob > chi2 = 0.8912

So, the likelihood ratio test statistic is what I expected. Let's go ahead and test the remaining questions.

forval q = 2/5 {
quietly gsem (Depression -> q1-q5, ologit) (0: Depression -> q1@c) (1: Depression -> q1@c) (0: Depression -> q`q'@a) (1: Depression -> q`q'@b), ///
group(group) mean(0: Depression@0) variance(0: Depression@1) byparm
est store md`q'
lrtest m0
}
Likelihood-ratio test LR chi2(1) = 0.07
(Assumption: m0 nested in md2) Prob > chi2 = 0.7933

Likelihood-ratio test LR chi2(1) = 2.21
(Assumption: m0 nested in md3) Prob > chi2 = 0.1371

Likelihood-ratio test LR chi2(1) = 0.68
(Assumption: m0 nested in md4) Prob > chi2 = 0.4109

Likelihood-ratio test LR chi2(1) = 1.68
(Assumption: m0 nested in md5) Prob > chi2 = 0.1954
[/CODE]

And that's not what I expected at all! The likelihood ratio test should reject for items 3 through 5. This is very strange, because when I fit a separate GRM to each group, I correctly recovered the simulated IRT parameters. I am also pretty sure I'm following Raykov and Marcoulides' instructions correctly (for reference, this is from their book A Course In Item Response Theory And Modeling With Stata, Stata Press, 2018, chapter 10, section 10.5).

The instructions are: fix the reference group's latent mean and variance at 0 and 1, constrain all items' difficulty and discrimination parameters, then release one parameter at a time, all others constrained. I may need to do further research on this, e.g. bring that sample data into R.

Testing the difficulty parameters is more difficult. I can't come up with the appropriate syntax above. I think you have to manually define constraints; note that when you fit the reference model, the constraints are all listed in the model header, and you could copy those and then fit the appropriate constraints in gsem.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#8

12 Dec 2019, 08:05

Let me actually back up a step.

There are two classes of methods to test for DIF. In one, the one I described above, you fit a multiple group version of your IRT model. You start with all IRT parameters constrained equal between the two groups of interest. You then release parameters one by one, item by item, and conduct likelihood ratio tests. This functionality has been improved a bit in Stata 16, although you will have to do some code writing to automate the testing if you want to do so.

In another class of methods, you can take advantage of the fact that if you have a correctly-fitting IRT model, the probability of a positive/correct response or the probability of responding in a higher category should only depend on the value of the latent trait. If you fit a logistic or ordered logistic model treating each question as the dependent variable and an estimate of ability as the independent variable, you know you should not see differences by group. If you do observe differences by group, you know you have DIF. Forgive the lack of proper equation formatting, but if you fit a logistic or ordered logistic model to any one question indexed by i, you know that model 1 below will be true:

P(Y_i = 1) = invlogit{tau_0 + tau_1 * theta^hat} (model 1)

If there's uniform DIF, you would instead see that this model fits better than the above. Here, the log odds of endorsing (a higher category of) each item would differ by a constant amount for the focal group.

P(Y_i = 1) = invlogit{tau_0 + tau_1 * theta^hat + tau_2 * group} (model 2)

And, if there was non-uniform DIF, you'd see that this model fits the best:

P(Y_i = 1) = invlogit{tau_0 + tau_1 * theta^hat + tau_2 * group + tau_3 * group * theta^hat} (model 3)

I hadn't quite comprehended that this was an alternative technique at first, but there you are. The above is merely a restatement of the help file from the Stata command diflogistic, which implements an automated test for DIF using binaray logistic regression. It uses the sum score on all the items involved as its estimate of theta^hat. As we've observed, this command won't run on ordinal items. However, I can show you that the results are identical between home-brewing this analysis with logistic regression and using diflogistic. Note that for non-uniform DIF, diflogistic tests model 3 vs model 2 (my numbering differs from the manual). For uniform DIF, diflogistic tests model 2 vs model 1. Let's demonstrate this using question 1 in a sample Stata dataset. Note that difmh is a slightly different test, which I'm not addressing.

Code:

webuse masc2 diflogistic q?, group(female) Logistic Regression DIF Analysis | Nonuniform | Uniform Item | Chi2 Prob. | Chi2 Prob. ------+-----------------------+---------------------- q1 | 1.03 0.3092 | 13.20 0.0003 q2 | 1.39 0.2388 | 1.80 0.1793 q3 | 0.39 0.5316 | 6.90 0.0086 q4 | 7.25 0.0071 | 4.89 0.0270 q5 | 2.29 0.1300 | 5.91 0.0150 q6 | 1.18 0.2780 | 0.43 0.5117 q7 | 0.04 0.8352 | 2.61 0.1064 q8 | 0.96 0.3270 | 2.24 0.1347 q9 | 0.23 0.6285 | 2.23 0.1352 ----------------------------------------------------- egen sum = rowtotal(q?) quietly logit q1 sum est store q1_base quietly logit q1 sum i.female est store q1_unif quietly logit q1 c.sum##i.female est store q1_nonunif lrtest q1_nonunif q1_unif Likelihood-ratio test LR chi2(1) = 1.03 (Assumption: q1_unif nested in q1_nonunif) Prob > chi2 = 0.3092 lrtest q1_unif q1_base Likelihood-ratio test LR chi2(1) = 13.20 (Assumption: q1_base nested in q1_unif) Prob > chi2 = 0.0003

The chi-square values and p-values from our manual LR tests look familiar, don't they?

Thus, to use this method, the original poster simply needs to use ordinal logistic regression. Further, the authors of the R package lordif basically say that since you apparently buy the assumptions of IRT, why don't we use the IRT estimate of theta^hat instead of the sum score? It should be a more accurate estimator of theta than the sum score. You can do that, or you can use the sum score.

Thomas Frissen, I'm pinging you because you had this question as well.

SEM example 36 depicts a MIMIC model (multiple indicators, multiple causes) with a generalized response. A MIMIC model is basically explanatory IRT, but that's for another day. Let's investigate DIF in that dataset by sex (female is coded as 1).

Code:

use http://www.stata-press.com/data/r15/gsem_issp93 /*Fit an IRT model, predict the value of the latent trait, and calculate the sum score; I'm using the former below*/ irt grm y? predict theta_irt, latent egen theta_sum = rowtotal(y?) /*Open an Excel file in your working directory*/ /*Loop through items*/ foreach v of varlist y? { *Model 1 qui ologit `v' theta_irt est store `v'_base *Model 2 qui ologit `v' c.theta_irt i.sex est store `v'_unif *Model 3 qui ologit `v' c.theta_irt##i.sex est store `v'_nonunif lrtest `v'_unif `v'_nonunif lrtest `v'_base `v'_unif }

This will produce a long list of output. I'd anticipate that people would rather output things in some structured format to Excel. I'm omitting that for now. I may get to this later.

Also, one perhaps should implement a correction for multiple testing. From reading, the Benjamini Hochberg correction is less conservative than Bonferroni, and it seems to get positive mention in some IRT circles. It is quite simple to do, but you have to put your p-values in Excel, then sort them in ascending order, and implement the correction.

Readers might want to read through the lordif manual. You won't need to know R to comprehend the substantive bits. One thing that caught my interest is that some authors propose an effect size measure for DIF based on the change in pseudo R^2 from model to model. Anything 0.13 or greater is a moderate effect size. This can help you identify DIF that's substantively interesting, rather than just statistically detectable. ologit returns the scalar e(r2p) after each model, and you can use that scalar to compare pseudo R^2s if you desire.

The regression-based method and the multiple group method (as discussed in my previous posts, and also as implemented in Stata 16) should both be acceptable methods for identifying DIF. One theoretical disadvantage of the regression method as usually implemented is that the sum score is a poorer estimator of theta than the IRT estimate, but this is trivially easy to address. I'm not aware of any other theoretical advantages or disadvantages to either method. You may wish to search if interested. Practically, it will generally be slower to use the multiple group method.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#9

13 Dec 2019, 12:39

This code will collect the above DIF results into a matrix. It will then write that matrix to an Excel file in your working directory (type cd to see what that is).

Code:

foreach v of varlist y? { *Model 1 qui ologit `v' theta_irt est store `v'_base *Model 2 qui ologit `v' c.theta_irt i.sex est store `v'_unif *Model 3 qui ologit `v' c.theta_irt##i.sex est store `v'_nonunif matrix temp = J(1, 2, .) matrix rownames temp = `v' quietly lrtest `v'_unif `v'_nonunif matrix temp[1, 1] = r(p) quietly lrtest `v'_base `v'_unif matrix temp[1, 2] = r(p) matrix A = nullmat(A) \ temp } matrix colnames A = non-uniform uniform putexcel set dif.xlsx putexcel A1 = matrix(A), names nformat(.###) matrix drop A

If you want to apply Bonferroni's correction, you could simply change your critical alpha value from 0.05 to 0.05 / m, where m is the number of hypotheses you're testing. We tested 2 hypotheses per question above, and there are 4 questions, m = 8.

Say you want to apply the Benjamini-Hochberg correction. This involves sorting your p-values in ascending order. You then compare each one to a critical value (i / m) * alpha, where i is the rank of the p-value (i.e. 1 for the lowest) and m is the total number of comparisons. I don't currently know how to accomplish this in Stata or Mata. I'd suggest doing this manually in Excel. You can output your p-values in a single column list with this code:

Code:

foreach v of varlist y? { *Model 1 qui ologit `v' theta_irt est store `v'_base *Model 2 qui ologit `v' c.theta_irt i.sex est store `v'_unif *Model 3 qui ologit `v' c.theta_irt##i.sex est store `v'_nonunif matrix temp = J(2, 3, .) matrix rownames temp = `v'_unif `v'_nonunif quietly lrtest `v'_unif `v'_nonunif matrix temp[1, 1] = r(p) quietly lrtest `v'_base `v'_unif matrix temp[2, 1] = r(p) matrix A = nullmat(A) \ temp }

A closing technical note: I forgot to mention in the previous post that the R package lordif will automatically collapse categories if there are too few respondents (default 5) in each category. Recall the issue with complete or quasi-complete separation in logistic regression in general. I believe that's the rationale behind that choice. I would check your data and do this manually.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#10

30 Apr 2020, 12:30

Elle originally asked this question when we had Stata 15. As you can see, I had some trouble coming up with the correct syntax to do this task in gsem.

We now have Stata 16. If you are committed to the multiple group + LR test method over the ordinal logistic regression-based method above, then you can do it pretty easily. The PDF manual doesn't show much detail for graded response models, but here's a worked example using the same Stata stock dataset as I used in posts 8 and 9.

Another way to think of things is this: say you used the ordinal logistic procedure above and you decided that there was uniform DIF for question y1. You could just fit the IRT model incorporating that information, and you would get the mean and variance of Theta for the focal group (remember that the reference group's Theta is constrained to mean 0, variance 1). Omitting some extraneous output:

Code:

use http://www.stata-press.com/data/r15/gsem_issp93 irt grm y?, group(sex) est store base irt (grm y2 y3 y4) (0: grm y1, cns(a@a1)) (1: grm y1, cns(a@a1)), group(sex) noheader est store uniform_y1 irt (grm y2 y3 y4) (0: grm y1) (1: grm y1), group(sex) est store nonuniform_y1 lrtest base uniform_y1 Likelihood-ratio test LR chi2(4) = 8.59 (Assumption: base nested in uniform_y1) Prob > chi2 = 0.0721 lrtest uniform_y1 nonuniform_y1 Likelihood-ratio test LR chi2(1) = 4.97 (Assumption: uniform_y1 nested in nonuniform_y1) Prob > chi2 = 0.0257

One issue we see here is that without correction for multiple testing, if we compare the model with uniform DIF to the base model, we do not reject the null hypothesis (that both models explain the data equally well, thus you would prefer the base model for parsimony). However, if we compare the model with non-uniform DIF to uniform DIF, we would reject the null and prefer the non-uniform DIF model. So, you come to different conclusions than the ordinal logistic model. I don't have an easy explanation for why this is.

Anyway, not all of us have Stata 16. In theory, you should be able to perform DIF testing using the gsem command in Stata 15, or perhaps even 14. However, I'm pretty sure it involves manually writing the constraints. I have checked, and the syntax for this model does differ enough from the irt commands that I get different results when I try to specify things with the group option or manually. If anyone needs this information for Stata 14/15, please ask here and I'll try to investigate further, but there's no guarantee of success.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
2 likes
Comment
Jen Walker

Join Date: Mar 2019

Posts: 22
#11

13 Jul 2020, 23:21

Hi all

I would like to test item invariance using item response likelihood ratio (LR) as described above by Weiwen, and have a question about anchor item/s selection. I have one latent variable with 5 items; all items have 5 categorical ordered response categories.
I have two questions:
Is there an anchor selection strategy you recommend to test for DIF using LR for polytomous items?

Can this be easily implemented in Stata16?

To run a LR test I think anchor items need to be identified (approaches include iterative, rank based - I have seen other programs such as R and IRTLRDIF referenced to do this) that will be used to fit the augmented model (by holding the anchor items constant and allowing the parameters for the other items to vary) then compare to the base model (all items are held constant) using the LR test.

In the example above, Weiwen notes an ordinal logistic procedure that influenced the decision on which item to allow to vary – was this a way to identify the items free to vary in the augmented model? I am a little confused as it is also noted the ordinal logistic test that was used to identify the item and LR test produced different conclusions, so I’m not sure if one should precede the other?
I am using Stata 16.

Any advice is greatly appreciated!
Thanks in advance for your time.
Jen

Last edited by Jen Walker; 13 Jul 2020, 23:32.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#12

12 Jan 2021, 13:14

Originally posted by Jen Walker View Post

Hi all

I would like to test item invariance using item response likelihood ratio (LR) as described above by Weiwen, and have a question about anchor item/s selection. I have one latent variable with 5 items; all items have 5 categorical ordered response categories.
I have two questions:
Is there an anchor selection strategy you recommend to test for DIF using LR for polytomous items?

Can this be easily implemented in Stata16?

To run a LR test I think anchor items need to be identified (approaches include iterative, rank based - I have seen other programs such as R and IRTLRDIF referenced to do this) that will be used to fit the augmented model (by holding the anchor items constant and allowing the parameters for the other items to vary) then compare to the base model (all items are held constant) using the LR test.

In the example above, Weiwen notes an ordinal logistic procedure that influenced the decision on which item to allow to vary – was this a way to identify the items free to vary in the augmented model? I am a little confused as it is also noted the ordinal logistic test that was used to identify the item and LR test produced different conclusions, so I’m not sure if one should precede the other?
I am using Stata 16.

Any advice is greatly appreciated!
Thanks in advance for your time.
Jen

I'd been meaning to reply to this for some time. This may be too late for Jen, but hopefully it helps someone.

There are two models for DIF identification. One is the one I outlined in post #8 and that is prepackaged in diflogistic for binary items. In this method, there is no anchor. You just fit an IRT model, predict theta, then you fit a series of (binary or ordinal) logistic models with theta as a predictor, then you add group, then you add an interaction. Then you see what fits the best.

The other method is the likelihood ratio based model. Here, you start with a two group model, but you constrain all the item parameters to be equal between groups. Then, item by item, you let the difficulty, then the discrimination + difficulty parameters vary by group. Here, one thing I had not considered is that when you do this, the anchor is all the items with parameters constrained equal. In the ideal case, you actually want to start with an anchor that you know to be free of DIF. While I'm not deeply familiar with the theory here, a 'dirty' anchor may bias your analysis, maybe a bit like the problem with a 'dirty' instrumental variable if you're familiar with that issue.

Ok, how do you select an anchor that you know to be free of DIF? This may be one of those things that is difficult or impossible to know a priori. So, it seems that some IRT researchers have proposed anchor selection strategies. The problem is that no automated anchor selection procedure is implemented in Stata.

Below is one article dealing with anchor selection.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5965509/

The R package mirt does implement an automated anchor selection procedure (or procedures), but due to lack of familiarity, I have not used them or investigated the relative merits of the procedure(s) implemented there.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
2 likes
Comment
Casey Martin

Join Date: Aug 2019

Posts: 13
#13

09 Mar 2021, 08:08

This was helpful for me! Thanks!

Last edited by Casey Martin; 09 Mar 2021, 08:10.
1 like
Comment

Announcement