random effect model: repeted values of within-group variables for each group

Vincent Li

Join Date: Dec 2016
Posts: 63

random effect model: repeted values of within-group variables for each group

23 Mar 2026, 06:42

Hi! Thank you for your attention.

I am currently working on a dataset involving 61 individuals who rated 301 pieces of content. This content was provided by either children with special educational needs (SEN) or those without, and it can be categorized into three groups: A, B, and C.
I aim to investigate whether the experience of interacting with SEN children affects the ratings of SEN versus non-SEN content and how it varies among the different categories. To achieve this, I have identified five variables:
ID: Represents the 61 individuals.
exp: Indicates whether the individual has experience (dummy variable).
sen: Shows whether the content is provided by SEN children (dummy variable).
type: Categorizes the content into three types (0, 1, 2).
rating scores: Ranges from 1 to 5.
Here is an example of the data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long id byte(exp ratingscore sen type)
2025071 0 4 0 0
2025070 0 1 0 0
2025069 0 1 0 0
2025068 1 1 0 0
2025067 1 2 0 0
2025066 0 1 0 0
2025065 1 1 0 0
2025063 0 2 0 0
2025062 0 1 0 0
2025061 1 1 0 0
2025060 1 1 0 0
2025059 0 2 0 0
2025058 0 1 0 0
2025057 0 1 0 0
2025056 0 1 0 0
2025055 1 1 0 0
2025054 1 1 0 0
2025053 0 2 0 0
2025052 1 2 0 0
2025051 1 1 0 0
2025050 0 1 0 0
2025049 0 1 0 0
2025048 0 1 0 0
2025047 1 1 0 0
2025045 0 1 0 0
2025044 0 5 0 0
2025043 1 1 0 0
2025042 1 1 0 0
2025041 0 2 0 0
2025039 1 1 0 0
2025038 0 4 0 0
2025036 1 1 0 0
2025034 1 1 0 0
2025033 0 1 0 0
2025030 0 1 0 0
2025029 0 1 0 0
2025027 0 1 0 0
2025026 0 1 0 0
2025025 1 1 0 0
2025024 0 1 0 0
2025023 0 2 0 0
2025022 0 1 0 0
2025021 0 1 0 0
2025020 1 1 0 0
2025019 1 1 0 0
2025018 1 1 0 0
2025017 0 1 0 0
2025016 1 1 0 0
2025015 1 1 0 0
2025014 0 1 0 0
2025012 1 1 0 0
2025011 0 3 0 0
2025010 0 1 0 0
2025009 0 1 0 0
2025008 1 5 0 0
2025007 1 1 0 0
2025005 0 1 0 0
2025004 0 1 0 0
2025003 0 1 0 0
2025002 0 1 0 0
2025001 0 1 0 0
2025071 0 2 0 0
2025070 0 5 0 0
2025069 0 4 0 0
2025068 1 5 0 0
2025067 1 2 0 0
2025066 0 5 0 0
2025065 1 5 0 0
2025063 0 4 0 0
2025062 0 3 0 0
2025061 1 3 0 0
2025060 1 1 0 0
2025059 0 4 0 0
2025058 0 3 0 0
2025057 0 3 0 0
2025056 0 2 0 0
2025055 1 5 0 0
2025054 1 4 0 0
2025053 0 4 0 0
2025052 1 3 0 0
2025051 1 4 0 0
2025050 0 3 0 0
2025049 0 4 0 0
2025048 0 2 0 0
2025047 1 3 0 0
2025045 0 3 0 0
2025044 0 4 0 0
2025043 1 3 0 0
2025042 1 1 0 0
2025041 0 2 0 0
2025039 1 1 0 0
2025038 0 3 0 0
2025036 1 5 0 0
2025034 1 5 0 0
2025033 0 4 0 0
2025030 0 2 0 0
2025029 0 5 0 0
2025027 0 3 0 0
2025026 0 3 0 0
2025025 1 4 0 0
end

I plan to use random effect model with the code:

Code:

xtreg ratingscore i.exp##i.sen##i.type, i(id) re vce(robust)   
margins exp#sen#type
margins sen#type, dydx(exp)

The situation is that the value of within-group variables, including SEN and type, are repeated across several observations within each group (I don't know whether it's the correct way to describe the questions). For instance, among the 301 observations for ID 2025001, the sen variable may be 0 for 100 instances and 1 for 201 instances. Similarly, the type variable could be 0 for 50 instances, 1 for 150 instances, and 2 for 101 instances. Is this situation appropriate for a random effects model, or should I consider calculating some average scores instead?

Thank you!

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#2

23 Mar 2026, 10:04

Vincent:
which is the -timevar-?

Kind regards,
Carlo
(Stata 19.0)
Comment
Vincent Li

Join Date: Dec 2016

Posts: 63
#3

23 Mar 2026, 19:32

Originally posted by Carlo Lazzaro View Post

Vincent:
which is the -timevar-?

There’s no time variable; we’re only focusing on the between-subject and within-subject variables.

Should the 301 observations serve as a time variable? One content rated can be regarded as a time point.

What if there is no such a timevar? only these:
xtreg ratingscore i.exp##i.sen##i.type, i(id) re vce(robust)
Thanks!

Last edited by Vincent Li; 23 Mar 2026, 19:55.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17854
#4

24 Mar 2026, 01:02

Vincent:
if you do not have a -timevar-, why using -xtreg- instead of -regress-?

Kind regards,
Carlo
(Stata 19.0)
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4540
#5

24 Mar 2026, 01:15

Originally posted by Vincent Li View Post

. . . 61 individuals who rated 301 pieces of content. This content was provided by either children with special educational needs (SEN) or those without. . . For instance, among the 301 observations for ID 2025001 . .

It reads as if 61 raters scored each of 301 children's contents. So the same set of children? If so, then your dataset ought to have a child ID in addition to the rater ID, and you' might be better off fitting a cross-classified random effects model.

Also, a linear model might not be ideal for scores whose values are restricted to a limited discrete set of values (1, 2, 3, 4 and 5). Although it will take longer to converge, you might want to consider fitting an ordered-categorical regression model, maybe ultimately something like the following.

Code:

meoprobit ratingscore i.exp##i.sen##i.type || id: || cid:

(for illustration, I assigned cid as the variable name for child ID), although you might need to build up to that starting with a simpler model and checking whether one or another of the variance components collapses to zero.
Comment
Vincent Li

Join Date: Dec 2016

Posts: 63
#6

24 Mar 2026, 01:35

Originally posted by Carlo Lazzaro View Post

Vincent:
if you do not have a -timevar-, why using -xtreg- instead of -regress-?

Carlo:
Since each participant rated 301 identical-format items, I would like to treat this as a repeated measure for 301 times. In this way, we have ID, rating score (vary across individuals and over items), experience (between subject var), SEN (within subject var), type of content (within subject var), and I want to construct a two-level model.
xtreg ratingscore i.exp##i.sen##i.type, i(id) re vce(robust) Does it make sense?
Comment
Vincent Li

Join Date: Dec 2016

Posts: 63
#7

24 Mar 2026, 01:43

Originally posted by Joseph Coveney View Post

It reads as if 61 raters scored each of 301 children's contents. So the same set of children? If so, then your dataset ought to have a child ID in addition to the rater ID, and you' might be better off fitting a cross-classified random effects model.

Also, a linear model might not be ideal for scores whose values are restricted to a limited discrete set of values (1, 2, 3, 4 and 5). Although it will take longer to converge, you might want to consider fitting an ordered-categorical regression model, maybe ultimately something like the following.

Code:

meoprobit ratingscore i.exp##i.sen##i.type || id: || cid:

(for illustration, I assigned cid as the variable name for child ID), although you might need to build up to that starting with a simpler model and checking whether one or another of the variance components collapses to zero.

Thanks Joseph.

Apologies for the confusion. To clarify, 61 raters scored 301 items, which were provided by both children with and without special educational needs (SEN). Therefore, each of the 61 raters is measured across 301 instances. However, the children's data is not relevant to the analysis; I mentioned it solely to explain the source of the SEN variable (0,1). It is considered just a characteristic of the content rated by the 61 raters. Thus, there are still five variables with two levels (both between and within subjects):
ID: Represents the 61 individuals.
exp: Indicates whether the individual has experience (dummy variable).
sen: Shows whether the content is classified as SEN (dummy variable).
type: Categorizes the content into three types (0, 1, 2).
rating scores: Ranges from 1 to 5.

Thanks for the suggestion on the ratingscore. I'll try meoprobit. So the code could be like this?
meoprobit ratingscore i.exp##i.sen##i.type || id:
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4540
#8

24 Mar 2026, 02:07

Originally posted by Vincent Li View Post

. . . the children's data is not relevant to the analysis . . . It is considered just a characteristic of the content rated by the 61 raters.

Not sure about that: wouldn't individual characteristics of the child (even if not recorded in your dataset or even manifestly evident) contribute to characteristics of the content that the child generates, which in turn affect the rater's score?

Take a look at the variance components of the simplest model

Code:

meoprobit ratingscore || id: || cid:

and see whether the child's individual (latent) contribution to the contents that raters score can be safely ignored.
Comment
Vincent Li

Join Date: Dec 2016

Posts: 63
#9

24 Mar 2026, 02:46

Originally posted by Joseph Coveney View Post

Not sure about that: wouldn't individual characteristics of the child (even if not recorded in your dataset or even manifestly evident) contribute to characteristics of the content that the child generates, which in turn affect the rater's score?

Take a look at the variance components of the simplest model

Code:

meoprobit ratingscore || id: || cid:

and see whether the child's individual (latent) contribution to the contents that raters score can be safely ignored.

Thanks Joseph. I completely understand your concern and believe it's reasonable. However, no children's information (even ID) are included in this dataset at this stage. We'd like to try it in the future to see how the raters' and children's cheracteristics contribute to those rating scores.

Would you mind we get back into the initial questions?
If the rating scores are continuous variables, is it appropriate to conduct a -xtreg-? is it necessary to include the item number of 301 contents as a 'timevar'?
or using -mixed- instead? but I believe the underlying logic of -mixed- and -xtreg- is similar.

By the way, the -margins- takes a hundred years to run after -meoprobit-....
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 461
#10

24 Mar 2026, 09:44

There is no time variable in your study as it is presently constructed. You can use either the xt commands or mixed/me commands for the analyses. You need to use mixed or me when you are estimating more than two levels of nesting and/or you want to estimate a random slope such that a lower level variable's slope is allowed to vary across higher level units.

Speaking of which, you are interacting two lower-level variables with a higher-level variable (exp). This is referred to as a cross-level interaction in multilevel modeling and these types of interactions need special care. I suggest you look at Heisig & Schaffer's 2019 paper on the topic. In it, they show that the test of the significance of the cross-level interaction is biased when you do not estimate the slope of the lower-level variable involved as randomly varying across higher level groups.

Practically, you would need to estimate two random slopes, one for each of your lower-level variables. I would probably start by estimating them separately. You may find that there is almost no slope variance to speak of, in which case you can treat the slope as fixed/non-varying across clusters (your current model). If one or both have non-trivial slope heterogeneity (use likelihood ratio tests to help determine this), then you should include them in the model along with the interactions.

Code:

# Sequence of testing whether slope heterogeneity is present meoprobit ratingscore i.exp i.sen i.type || id: eststo m0 # Random slope + intercept-slope covariance meoprobit ratingscore i.exp i.sen i.type || id: sen, cov(unstructured) eststo m1 # LR test, note that you need to divide the p-value by 2 because the null hypothesis # is on the boundary of the parameter space (variance components cannot be negative) # A significant p-value (after dividing by 2) would indicate that there is slope heterogeneity lrtest m1 m0, stats
Comment
Vincent Li

Join Date: Dec 2016

Posts: 63
#11

24 Mar 2026, 20:28

Originally posted by Erik Ruzek View Post

There is no time variable in your study as it is presently constructed. You can use either the xt commands or mixed/me commands for the analyses. You need to use mixed or me when you are estimating more than two levels of nesting and/or you want to estimate a random slope such that a lower level variable's slope is allowed to vary across higher level units.

Speaking of which, you are interacting two lower-level variables with a higher-level variable (exp). This is referred to as a cross-level interaction in multilevel modeling and these types of interactions need special care. I suggest you look at Heisig & Schaffer's 2019 paper on the topic. In it, they show that the test of the significance of the cross-level interaction is biased when you do not estimate the slope of the lower-level variable involved as randomly varying across higher level groups.

Practically, you would need to estimate two random slopes, one for each of your lower-level variables. I would probably start by estimating them separately. You may find that there is almost no slope variance to speak of, in which case you can treat the slope as fixed/non-varying across clusters (your current model). If one or both have non-trivial slope heterogeneity (use likelihood ratio tests to help determine this), then you should include them in the model along with the interactions.

Code:

# Sequence of testing whether slope heterogeneity is present meoprobit ratingscore i.exp i.sen i.type || id: eststo m0 # Random slope + intercept-slope covariance meoprobit ratingscore i.exp i.sen i.type || id: sen, cov(unstructured) eststo m1 # LR test, note that you need to divide the p-value by 2 because the null hypothesis # is on the boundary of the parameter space (variance components cannot be negative) # A significant p-value (after dividing by 2) would indicate that there is slope heterogeneity lrtest m1 m0, stats

"Hi Eric, thank you for your response. I appreciate the detailed and helpful instructions. I will read the paper by Heisig and Schaffer from 2019 and test the slope variance. Thanks once more!
Comment

Announcement

random effect model: repeted values of within-group variables for each group

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment