Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Out-of-sample BLUPs

    Is there a reason why Stata won't calculate BLUPs for out-of-sample observations even when it's actually possible to do it?
    For example:

    Code:
    . webuse pig, clear
    (Longitudinal analysis of pig weights)
    
    . 
    . 
    . mixed weight week if id <= 47|| id: week, cov(unstructured) reml
    
    Performing EM optimization: 
    
    Performing gradient-based optimization: 
    
    Iteration 0:   log restricted-likelihood = -853.70475  
    Iteration 1:   log restricted-likelihood = -853.70475  
    
    Computing standard errors:
    
    Mixed-effects REML regression                   Number of obs     =        423
    Group variable: id                              Number of groups  =         47
    
                                                    Obs per group:
                                                                  min =          9
                                                                  avg =        9.0
                                                                  max =          9
    
                                                    Wald chi2(1)      =    4549.44
    Log restricted-likelihood = -853.70475          Prob > chi2       =     0.0000
    
    ------------------------------------------------------------------------------
          weight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            week |   6.189894   .0917707    67.45   0.000     6.010026    6.369761
           _cons |   19.32122   .4110553    47.00   0.000     18.51556    20.12687
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
      Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
    -----------------------------+------------------------------------------------
    id: Unstructured             |
                       var(week) |   .3689112   .0825625      .2379154     .572033
                      var(_cons) |    7.08907   1.657235      4.483335    11.20927
                 cov(week,_cons) |  -.1407323   .2647519     -.6596364    .3781718
    -----------------------------+------------------------------------------------
                   var(Residual) |   1.614982   .1259171      1.386121     1.88163
    ------------------------------------------------------------------------------
    LR test vs. linear model: chi2(3) = 734.02                Prob > chi2 = 0.0000
    
    Note: LR test is conservative and provided only for reference.
    
    . predict b*, reffects
    (9 missing values generated)
    (9 missing values generated)
    
    . list if inlist(id,47,48), noobs sepby(id)
    
      +-----------------------------------------+
      | id   week   weight         b1        b2 |
      |-----------------------------------------|
      | 47      1     29.5   1.170386   3.55315 |
      | 47      2       37   1.170386   3.55315 |
      | 47      3       46   1.170386   3.55315 |
      | 47      4     52.5   1.170386   3.55315 |
      | 47      5       60   1.170386   3.55315 |
      | 47      6     67.5   1.170386   3.55315 |
      | 47      7       76   1.170386   3.55315 |
      | 47      8     81.5   1.170386   3.55315 |
      | 47      9       88   1.170386   3.55315 |
      |-----------------------------------------|
      | 48      1     28.5          .         . |
      | 48      2       36          .         . |
      | 48      3     42.5          .         . |
      | 48      4       49          .         . |
      | 48      5       55          .         . |
      | 48      6     63.5          .         . |
      | 48      7       72          .         . |
      | 48      8     78.5          .         . |
      | 48      9     85.5          .         . |
      +-----------------------------------------+
    
    . 
    . qui estat recovariance
    
    . 
    . 
    . // BLUP for observation #48
    . mata
    ------------------------------------------------- mata (type end to exit) ---------------------------------------------------------------------------------------------------------------
    : beta = st_matrix("e(b)")
    
    : b = beta[1::2]
    
    : sigma2 = exp(beta[6])^2
    
    : X = st_data((424::432), "week"),J(9,1,1)
    
    : Z = X
    
    : y = st_data((424::432), "weight")
    
    : G = st_matrix("r(Cov2)")
    
    : 
    : G * Z' * invsym(Z*G*Z' + sigma2:*I(9)) * (y - X*b')
                     1
        +---------------+
      1 |  .9294846446  |
      2 |  1.750423814  |
        +---------------+
    
    : end
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

  • #2
    Bump.

    Comment


    • #3
      Had pigs 48 and 49 actually been included in the original estimation, the model parameter estimates would likely have been different. Your method of calculating the BLUPs for the random effects assumes that the parameter estimates you got just from pigs 1 through 47 still apply. I think Stata is unwilling to make that assumption. Of course, that logic could be applied to any kind of out-of-sample prediction from any model, so it isn't clear to me why there is a sticking point here.
      Last edited by Clyde Schechter; 13 Nov 2018, 08:29.

      Comment


      • #4
        Clyde, thank you very much for your reply.

        My understanding is the same. I see no reason why, if one is willing to make some assumptions, this should be illegal.

        I would do it for example to:

        1) evaluate the *conditional* predictions on hold-out data to assess somehow the predictive ability of my model (simple hold-out, cross-validation etc.).

        then, when I am satisfied with my model

        2) update individual/conditional predictions on new pigs. Like, a new pig (ie not in the estimation sample) is examined the first week and its weight is recorded. Based on that, I predict the BLUPs and calculate the individual/conditional predicted weight for the next 8 weeks (for that pig). Then the same pig is examined again the second week, and a second weight is recorded. I predict a new pair of BLUPs and I give an updated predicted weight for the remaining 7 weeks for that pig. And so on and so forth.

        I am willing to assume that the out-of-sample pigs come from the same population where the estimation sample comes from.
        Last edited by Andrea Discacciati; 13 Nov 2018, 09:00.

        Comment

        Working...
        X