Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • random effect model: repeted values of within-group variables for each group

    Hi! Thank you for your attention.

    I am currently working on a dataset involving 61 individuals who rated 301 pieces of content. This content was provided by either children with special educational needs (SEN) or those without, and it can be categorized into three groups: A, B, and C.
    I aim to investigate whether the experience of interacting with SEN children affects the ratings of SEN versus non-SEN content and how it varies among the different categories. To achieve this, I have identified five variables:
    ID: Represents the 61 individuals.
    exp: Indicates whether the individual has experience (dummy variable).
    sen: Shows whether the content is provided by SEN children (dummy variable).
    type: Categorizes the content into three types (0, 1, 2).
    rating scores: Ranges from 1 to 5.
    Here is an example of the data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id byte(exp ratingscore sen type)
    2025071 0 4 0 0
    2025070 0 1 0 0
    2025069 0 1 0 0
    2025068 1 1 0 0
    2025067 1 2 0 0
    2025066 0 1 0 0
    2025065 1 1 0 0
    2025063 0 2 0 0
    2025062 0 1 0 0
    2025061 1 1 0 0
    2025060 1 1 0 0
    2025059 0 2 0 0
    2025058 0 1 0 0
    2025057 0 1 0 0
    2025056 0 1 0 0
    2025055 1 1 0 0
    2025054 1 1 0 0
    2025053 0 2 0 0
    2025052 1 2 0 0
    2025051 1 1 0 0
    2025050 0 1 0 0
    2025049 0 1 0 0
    2025048 0 1 0 0
    2025047 1 1 0 0
    2025045 0 1 0 0
    2025044 0 5 0 0
    2025043 1 1 0 0
    2025042 1 1 0 0
    2025041 0 2 0 0
    2025039 1 1 0 0
    2025038 0 4 0 0
    2025036 1 1 0 0
    2025034 1 1 0 0
    2025033 0 1 0 0
    2025030 0 1 0 0
    2025029 0 1 0 0
    2025027 0 1 0 0
    2025026 0 1 0 0
    2025025 1 1 0 0
    2025024 0 1 0 0
    2025023 0 2 0 0
    2025022 0 1 0 0
    2025021 0 1 0 0
    2025020 1 1 0 0
    2025019 1 1 0 0
    2025018 1 1 0 0
    2025017 0 1 0 0
    2025016 1 1 0 0
    2025015 1 1 0 0
    2025014 0 1 0 0
    2025012 1 1 0 0
    2025011 0 3 0 0
    2025010 0 1 0 0
    2025009 0 1 0 0
    2025008 1 5 0 0
    2025007 1 1 0 0
    2025005 0 1 0 0
    2025004 0 1 0 0
    2025003 0 1 0 0
    2025002 0 1 0 0
    2025001 0 1 0 0
    2025071 0 2 0 0
    2025070 0 5 0 0
    2025069 0 4 0 0
    2025068 1 5 0 0
    2025067 1 2 0 0
    2025066 0 5 0 0
    2025065 1 5 0 0
    2025063 0 4 0 0
    2025062 0 3 0 0
    2025061 1 3 0 0
    2025060 1 1 0 0
    2025059 0 4 0 0
    2025058 0 3 0 0
    2025057 0 3 0 0
    2025056 0 2 0 0
    2025055 1 5 0 0
    2025054 1 4 0 0
    2025053 0 4 0 0
    2025052 1 3 0 0
    2025051 1 4 0 0
    2025050 0 3 0 0
    2025049 0 4 0 0
    2025048 0 2 0 0
    2025047 1 3 0 0
    2025045 0 3 0 0
    2025044 0 4 0 0
    2025043 1 3 0 0
    2025042 1 1 0 0
    2025041 0 2 0 0
    2025039 1 1 0 0
    2025038 0 3 0 0
    2025036 1 5 0 0
    2025034 1 5 0 0
    2025033 0 4 0 0
    2025030 0 2 0 0
    2025029 0 5 0 0
    2025027 0 3 0 0
    2025026 0 3 0 0
    2025025 1 4 0 0
    end
    I plan to use random effect model with the code:
    Code:
    xtreg ratingscore i.exp##i.sen##i.type, i(id) re vce(robust)   
    margins exp#sen#type
    margins sen#type, dydx(exp)
    The situation is that the value of within-group variables, including SEN and type, are repeated across several observations within each group (I don't know whether it's the correct way to describe the questions). For instance, among the 301 observations for ID 2025001, the sen variable may be 0 for 100 instances and 1 for 201 instances. Similarly, the type variable could be 0 for 50 instances, 1 for 150 instances, and 2 for 101 instances. Is this situation appropriate for a random effects model, or should I consider calculating some average scores instead?

    Thank you!



  • #2
    Vincent:
    which is the -timevar-?
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Vincent:
      which is the -timevar-?
      There’s no time variable; we’re only focusing on the between-subject and within-subject variables.

      Should the 301 observations serve as a time variable? One content rated can be regarded as a time point.

      What if there is no such a timevar? only these:
      xtreg ratingscore i.exp##i.sen##i.type, i(id) re vce(robust)
      Thanks!
      Last edited by Vincent Li; 23 Mar 2026, 19:55.

      Comment


      • #4
        Vincent:
        if you do not have a -timevar-, why using -xtreg- instead of -regress-?
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Originally posted by Vincent Li View Post
          . . . 61 individuals who rated 301 pieces of content. This content was provided by either children with special educational needs (SEN) or those without. . . For instance, among the 301 observations for ID 2025001 . .
          It reads as if 61 raters scored each of 301 children's contents. So the same set of children? If so, then your dataset ought to have a child ID in addition to the rater ID, and you' might be better off fitting a cross-classified random effects model.

          Also, a linear model might not be ideal for scores whose values are restricted to a limited discrete set of values (1, 2, 3, 4 and 5). Although it will take longer to converge, you might want to consider fitting an ordered-categorical regression model, maybe ultimately something like the following.
          Code:
          meoprobit ratingscore i.exp##i.sen##i.type || id: || cid:
          (for illustration, I assigned cid as the variable name for child ID), although you might need to build up to that starting with a simpler model and checking whether one or another of the variance components collapses to zero.

          Comment


          • #6
            Originally posted by Carlo Lazzaro View Post
            Vincent:
            if you do not have a -timevar-, why using -xtreg- instead of -regress-?
            Carlo:
            Since each participant rated 301 identical-format items, I would like to treat this as a repeated measure for 301 times. In this way, we have ID, rating score (vary across individuals and over items), experience (between subject var), SEN (within subject var), type of content (within subject var), and I want to construct a two-level model.
            xtreg ratingscore i.exp##i.sen##i.type, i(id) re vce(robust) Does it make sense?

            Comment


            • #7
              Originally posted by Joseph Coveney View Post
              It reads as if 61 raters scored each of 301 children's contents. So the same set of children? If so, then your dataset ought to have a child ID in addition to the rater ID, and you' might be better off fitting a cross-classified random effects model.

              Also, a linear model might not be ideal for scores whose values are restricted to a limited discrete set of values (1, 2, 3, 4 and 5). Although it will take longer to converge, you might want to consider fitting an ordered-categorical regression model, maybe ultimately something like the following.
              Code:
              meoprobit ratingscore i.exp##i.sen##i.type || id: || cid:
              (for illustration, I assigned cid as the variable name for child ID), although you might need to build up to that starting with a simpler model and checking whether one or another of the variance components collapses to zero.
              Thanks Joseph.

              Apologies for the confusion. To clarify, 61 raters scored 301 items, which were provided by both children with and without special educational needs (SEN). Therefore, each of the 61 raters is measured across 301 instances. However, the children's data is not relevant to the analysis; I mentioned it solely to explain the source of the SEN variable (0,1). It is considered just a characteristic of the content rated by the 61 raters. Thus, there are still five variables with two levels (both between and within subjects):
              ID: Represents the 61 individuals.
              exp: Indicates whether the individual has experience (dummy variable).
              sen: Shows whether the content is classified as SEN (dummy variable).
              type: Categorizes the content into three types (0, 1, 2).
              rating scores: Ranges from 1 to 5.

              Thanks for the suggestion on the ratingscore. I'll try meoprobit. So the code could be like this?
              meoprobit ratingscore i.exp##i.sen##i.type || id:

              Comment


              • #8
                Originally posted by Vincent Li View Post
                . . . the children's data is not relevant to the analysis . . . It is considered just a characteristic of the content rated by the 61 raters.
                Not sure about that: wouldn't individual characteristics of the child (even if not recorded in your dataset or even manifestly evident) contribute to characteristics of the content that the child generates, which in turn affect the rater's score?

                Take a look at the variance components of the simplest model
                Code:
                meoprobit ratingscore || id: || cid:
                and see whether the child's individual (latent) contribution to the contents that raters score can be safely ignored.

                Comment


                • #9
                  Originally posted by Joseph Coveney View Post
                  Not sure about that: wouldn't individual characteristics of the child (even if not recorded in your dataset or even manifestly evident) contribute to characteristics of the content that the child generates, which in turn affect the rater's score?

                  Take a look at the variance components of the simplest model
                  Code:
                  meoprobit ratingscore || id: || cid:
                  and see whether the child's individual (latent) contribution to the contents that raters score can be safely ignored.
                  Thanks Joseph. I completely understand your concern and believe it's reasonable. However, no children's information (even ID) are included in this dataset at this stage. We'd like to try it in the future to see how the raters' and children's cheracteristics contribute to those rating scores.

                  Would you mind we get back into the initial questions?
                  If the rating scores are continuous variables, is it appropriate to conduct a -xtreg-? is it necessary to include the item number of 301 contents as a 'timevar'?
                  or using -mixed- instead? but I believe the underlying logic of -mixed- and -xtreg- is similar.

                  By the way, the -margins- takes a hundred years to run after -meoprobit-....

                  Comment


                  • #10
                    There is no time variable in your study as it is presently constructed. You can use either the xt commands or mixed/me commands for the analyses. You need to use mixed or me when you are estimating more than two levels of nesting and/or you want to estimate a random slope such that a lower level variable's slope is allowed to vary across higher level units.

                    Speaking of which, you are interacting two lower-level variables with a higher-level variable (exp). This is referred to as a cross-level interaction in multilevel modeling and these types of interactions need special care. I suggest you look at Heisig & Schaffer's 2019 paper on the topic. In it, they show that the test of the significance of the cross-level interaction is biased when you do not estimate the slope of the lower-level variable involved as randomly varying across higher level groups.

                    Practically, you would need to estimate two random slopes, one for each of your lower-level variables. I would probably start by estimating them separately. You may find that there is almost no slope variance to speak of, in which case you can treat the slope as fixed/non-varying across clusters (your current model). If one or both have non-trivial slope heterogeneity (use likelihood ratio tests to help determine this), then you should include them in the model along with the interactions.

                    Code:
                    # Sequence of testing whether slope heterogeneity is present
                    meoprobit ratingscore i.exp i.sen i.type || id:
                    eststo m0
                    
                    # Random slope + intercept-slope covariance
                    meoprobit ratingscore i.exp i.sen i.type || id: sen, cov(unstructured)
                    eststo m1
                    
                    # LR test, note that you need to divide the p-value by 2 because the null hypothesis
                    #  is on the boundary of the parameter space (variance components cannot be negative)
                    #  A significant p-value (after dividing by 2) would indicate that there is slope heterogeneity
                    lrtest m1 m0, stats

                    Comment


                    • #11
                      Originally posted by Erik Ruzek View Post
                      There is no time variable in your study as it is presently constructed. You can use either the xt commands or mixed/me commands for the analyses. You need to use mixed or me when you are estimating more than two levels of nesting and/or you want to estimate a random slope such that a lower level variable's slope is allowed to vary across higher level units.

                      Speaking of which, you are interacting two lower-level variables with a higher-level variable (exp). This is referred to as a cross-level interaction in multilevel modeling and these types of interactions need special care. I suggest you look at Heisig & Schaffer's 2019 paper on the topic. In it, they show that the test of the significance of the cross-level interaction is biased when you do not estimate the slope of the lower-level variable involved as randomly varying across higher level groups.

                      Practically, you would need to estimate two random slopes, one for each of your lower-level variables. I would probably start by estimating them separately. You may find that there is almost no slope variance to speak of, in which case you can treat the slope as fixed/non-varying across clusters (your current model). If one or both have non-trivial slope heterogeneity (use likelihood ratio tests to help determine this), then you should include them in the model along with the interactions.

                      Code:
                      # Sequence of testing whether slope heterogeneity is present
                      meoprobit ratingscore i.exp i.sen i.type || id:
                      eststo m0
                      
                      # Random slope + intercept-slope covariance
                      meoprobit ratingscore i.exp i.sen i.type || id: sen, cov(unstructured)
                      eststo m1
                      
                      # LR test, note that you need to divide the p-value by 2 because the null hypothesis
                      # is on the boundary of the parameter space (variance components cannot be negative)
                      # A significant p-value (after dividing by 2) would indicate that there is slope heterogeneity
                      lrtest m1 m0, stats
                      "Hi Eric, thank you for your response. I appreciate the detailed and helpful instructions. I will read the paper by Heisig and Schaffer from 2019 and test the slope variance. Thanks once more!

                      Comment

                      Working...
                      X