Out-of-sample BLUPs

Andrea Discacciati

Join Date: Feb 2016
Posts: 194

Out-of-sample BLUPs

28 Feb 2018, 07:01

Is there a reason why Stata won't calculate BLUPs for out-of-sample observations even when it's actually possible to do it?
For example:

Code:

. webuse pig, clear
(Longitudinal analysis of pig weights)

. 
. 
. mixed weight week if id <= 47|| id: week, cov(unstructured) reml

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood = -853.70475  
Iteration 1:   log restricted-likelihood = -853.70475  

Computing standard errors:

Mixed-effects REML regression                   Number of obs     =        423
Group variable: id                              Number of groups  =         47

                                                Obs per group:
                                                              min =          9
                                                              avg =        9.0
                                                              max =          9

                                                Wald chi2(1)      =    4549.44
Log restricted-likelihood = -853.70475          Prob > chi2       =     0.0000

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        week |   6.189894   .0917707    67.45   0.000     6.010026    6.369761
       _cons |   19.32122   .4110553    47.00   0.000     18.51556    20.12687
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Unstructured             |
                   var(week) |   .3689112   .0825625      .2379154     .572033
                  var(_cons) |    7.08907   1.657235      4.483335    11.20927
             cov(week,_cons) |  -.1407323   .2647519     -.6596364    .3781718
-----------------------------+------------------------------------------------
               var(Residual) |   1.614982   .1259171      1.386121     1.88163
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 734.02                Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

. predict b*, reffects
(9 missing values generated)
(9 missing values generated)

. list if inlist(id,47,48), noobs sepby(id)

  +-----------------------------------------+
  | id   week   weight         b1        b2 |
  |-----------------------------------------|
  | 47      1     29.5   1.170386   3.55315 |
  | 47      2       37   1.170386   3.55315 |
  | 47      3       46   1.170386   3.55315 |
  | 47      4     52.5   1.170386   3.55315 |
  | 47      5       60   1.170386   3.55315 |
  | 47      6     67.5   1.170386   3.55315 |
  | 47      7       76   1.170386   3.55315 |
  | 47      8     81.5   1.170386   3.55315 |
  | 47      9       88   1.170386   3.55315 |
  |-----------------------------------------|
  | 48      1     28.5          .         . |
  | 48      2       36          .         . |
  | 48      3     42.5          .         . |
  | 48      4       49          .         . |
  | 48      5       55          .         . |
  | 48      6     63.5          .         . |
  | 48      7       72          .         . |
  | 48      8     78.5          .         . |
  | 48      9     85.5          .         . |
  +-----------------------------------------+

. 
. qui estat recovariance

. 
. 
. // BLUP for observation #48
. mata
------------------------------------------------- mata (type end to exit) ---------------------------------------------------------------------------------------------------------------
: beta = st_matrix("e(b)")

: b = beta[1::2]

: sigma2 = exp(beta[6])^2

: X = st_data((424::432), "week"),J(9,1,1)

: Z = X

: y = st_data((424::432), "weight")

: G = st_matrix("r(Cov2)")

: 
: G * Z' * invsym(Z*G*Z' + sigma2:*I(9)) * (y - X*b')
                 1
    +---------------+
  1 |  .9294846446  |
  2 |  1.750423814  |
    +---------------+

: end
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Tags: None

Andrea Discacciati

Join Date: Feb 2016

Posts: 194
#2

13 Nov 2018, 01:39

Bump.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#3

13 Nov 2018, 08:25

Had pigs 48 and 49 actually been included in the original estimation, the model parameter estimates would likely have been different. Your method of calculating the BLUPs for the random effects assumes that the parameter estimates you got just from pigs 1 through 47 still apply. I think Stata is unwilling to make that assumption. Of course, that logic could be applied to any kind of out-of-sample prediction from any model, so it isn't clear to me why there is a sticking point here.

Last edited by Clyde Schechter; 13 Nov 2018, 08:29.
Comment
Andrea Discacciati

Join Date: Feb 2016

Posts: 194
#4

13 Nov 2018, 08:58

Clyde, thank you very much for your reply.

My understanding is the same. I see no reason why, if one is willing to make some assumptions, this should be illegal.

I would do it for example to:

1) evaluate the *conditional* predictions on hold-out data to assess somehow the predictive ability of my model (simple hold-out, cross-validation etc.).

then, when I am satisfied with my model

2) update individual/conditional predictions on new pigs. Like, a new pig (ie not in the estimation sample) is examined the first week and its weight is recorded. Based on that, I predict the BLUPs and calculate the individual/conditional predicted weight for the next 8 weeks (for that pig). Then the same pig is examined again the second week, and a second weight is recorded. I predict a new pair of BLUPs and I give an updated predicted weight for the remaining 7 weeks for that pig. And so on and so forth.

I am willing to assume that the out-of-sample pigs come from the same population where the estimation sample comes from.

Last edited by Andrea Discacciati; 13 Nov 2018, 09:00.
Comment

Announcement

Out-of-sample BLUPs

Comment

Comment

Comment