Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predicting multi-level mixed model values using fixed + random effects for out-of-sample records

    Hello friends,

    I have fitted a mixed-effects model with three levels: Countries (3 countries), Participants (500 participants within each country), Questions (10 questions for different health conditions per participant). I am modelling the following model based on say 200 participants' data from each country to predict their responses to questions. In order to assess the performance of the model, I want to use the fitted model to predict responses for remaining 300 participants.

    My model is:

    Code:
    mixed response _Imo_2 _Imo_3 _Imo_4 _Imo_5 || country: _Imo_2 _Imo_3 _Imo_4 _Imo_5 || id:
    I am using country-specific intercepts and random-effects for each of the severity levels (5 levels mo1, mo2, mo3, mo4, mo5) of health conditions described in questions, and participant-specific intercepts.

    I am interested in predicting responses using all the fixed effects (constant, and _Imo_2 to _Imo_5) and their country-specific intercepts and random slopes, but ignoring participant-specific intercepts.

    I tried following command in Stata to predict the responses, but it only predicts for the 200 participants' sample which was used to build the model. However, I am interested in predicting values for the remaining 300 participants' data.

    Code:
    predict response, fitted relevel (country)
    Appreciate if anyone knows how I can do it. Thanks in advance.

    Mihir
    Last edited by Mihir Gandhi; 15 Mar 2018, 03:50.

  • #2
    You didn't get a quick answer. You'll increase your chances of a helpful answer by following the FAQ on asking question - provide Stata code (which you do), readable Stata output, and sample data using dataex. If we could duplicate your results on small sub-sample, someone might be motivated to delve into solving your problem. As presented, I'm not sure exactly how your data are set up for example. I'm not user of mixed so I can't tell you specifically about this routine.

    Often when Stata doesn't provide things, it suggests there is a legitimate reason to not provide them. I assume you've tried several of the options noted in mixed postestimation. You might be able to use predict to generate a lot of the random effects etc., and then try to do the calculations yourself.

    Comment


    • #3
      Did you report the exact command you typed? I ask because if you included an -if- qualifier in the estimation command, then when you ran predict, the manual reports that by default, -predict- will predict for all observations in the dataset, even if they were out of the estimation sample.

      Also, you look like you are using the -xi- syntax, but that's no longer necessary. You could have typed:

      Code:
      mixed response i.mo || country: i.mo || id: if cross_validation == 1
      I don't normally fit random slopes for categorical variables, so I can't be sure the i.mo clause in the country level effect is the right syntax.

      Back to your question. If you have all 500 participants in the dataset and you didn't exclude what I assume is your cross validation sample of 300 by an if clause, then it could be there are missing data of some sort for the 300 respondents' response variables and/or the independent variables.
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        I asked a very similar question a couple of days ago. Not sure why Stata won't calculate BLUPs out of sample.
        https://www.statalist.org/forums/for...f-sample-blups

        Comment


        • #5
          Originally posted by Andrea Discacciati View Post
          I asked a very similar question a couple of days ago. Not sure why Stata won't calculate BLUPs out of sample.
          https://www.statalist.org/forums/for...f-sample-blups
          Andrea and Mihir,

          It looks like I misspoke. The default may be to predict random effects out of sample, but if an entire cluster is not included in the regression, it won't have any random effects predicted.

          This is the code Andrea posted:

          Code:
          webuse pig, clear
          mixed weight week if id <= 47|| id: week, cov(unstructured) reml
          predict b*, reffects
          list if id >= 47
          Run the code, and you see there are no predictions for observation 48, which was excluded from the estimation command. However, run this code instead:

          Code:
          mixed weight week if week <9 || id: week, cov(unstructured) reml
          predict c*, reffects
          list if id >= 47
          I excluded week 9 from all observations from the estimation command. You'll see that despite that, -predict- calculated the predictions for week 9, as the manual implied. I am guessing that it is not actually possible using the command as written - the random effects are predicted from a combination of the grand mean and the cluster-specific mean, and Andrea's code left an entire cluster out. I am not sure if there is a theoretical issue.

          Andrea and Mihir, I am not sure where this leaves you, unfortunately.
          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Comment

          Working...
          X