Calculating predictions from fractional polynomials for values not actually observed

Sophie Hutchinson

Join Date: Jan 2016

Posts: 7
#1

Calculating predictions from fractional polynomials for values not actually observed

20 Jan 2017, 04:15

Hi Stata Forum

I have been using Patrick Royston's very useful suite of commands to carry out fractional polynomial regressions.

I was wondering if it is possible to use fracpred to calculate predictions for values not actually observed. For example:

Code:

. sysuse auto, clear (1978 Automobile Data) . mfp: regress mpg weight displacement foreign [output deleted] . fracpred pred, for(displacement) . sort displacement . l displacement pred in 1/5 +---------------------+ | displa~t pred | |---------------------| 1. | 79 29.81834 | 2. | 85 28.3481 | 3. | 86 28.1325 | 4. | 86 28.1325 | 5. | 89 27.52874 | +---------------------+

Is it possible to get a predicted value for displacement=80, for example?

Many thanks

Sophie
Tags: None

Sophie Hutchinson

Join Date: Jan 2016
Posts: 7

08 Feb 2017, 03:36

I wrote to Patrick Royston and he was kind enough to send me syntax files for predicting outside the dataset, for mfp and mfpa. I am quoting them below.

-----------------------------------------------

Code:

/*
    Example showing how to predict out of sample for a variable 
    (displacement) following model fitting with mfp.
    
    Patrick Royston, 27jan2017.
*/
clear all
sysuse auto

mfp: regress mpg weight displacement foreign
/*
    We predict for displacement in sample using -fracpred-,
    for later comparison with the recommended alternative, -xpredict-.
*/
fracpred pred_fracpred, for(displacement)
/*
    Note that the result of mfp for displacement is a transformed
    variable, Idisp__1, with details given by -describe-ing this variable:
*/
describe Idisp__1
/*
    We see the transformation is "X^-2-.2568962282: X = displacement/100".
    To predict successfully at chosen values of displacement, we must
    apply the same transformation to the desired values. This is done
    "manually".

    We assign to a new variable, v1, the 4 values 100, 200, 300, 400
    and then transform them to v2 as was done by mfp to displacement.

    [ADVANCED NOTE: The transformation is also made visible by typing
    -char list Idisp__1[fp]-. It can be captured in a local macro, called
    say zz, using the command -local zz: char Idisp__1[fp]-. This could be
    used, with some work, to automate applying the transformation.]
*/
range v1 100 400 4
generate v2 = (v1/100)^-2-.2568962282
/*
    Use -xpredict- to do the out of sample prediction; we include the
    regression constant, _b[_cons], via the -constant- option.
    
    Add prediction of the SE and calculate 95% CIs.
*/
xpredict pred_v1, with(Idisp__1) at(Idisp__1 v2) constant
xpredict pred_v1_se, with(Idisp__1) at(Idisp__1 v2) constant stdp
gen pred_v1_lb = pred_v1 - 1.96*pred_v1_se
gen pred_v1_ub = pred_v1 + 1.96*pred_v1_se
list v1 pred_v1 pred_v1_se pred_v1_lb pred_v1_ub if v1!=.
/*
    For the record, we compare the in-sample predictions from -xpredict-
    with those from -fracpred-.
*/
xpredict pred_xpredict, with(Idisp__1) constant
summarize pred_fracpred pred_xpredict
list displacement pred_fracpred pred_xpredict v1 pred_v1
/*
    All is well; the predictions are the same. Note for example
    that the predictions for displacement = 200
    agree when done in sample and out of sample. As they should.
    
    Note also that these "partial predictions" are done with
    covariates (other than transformed displacement) set to 0.
    For the predictor -weight-, this was centered on its mean,
    yielding predictor Iweig__1, so Iweig__1 = 0 corresponds
    to the mean vehicle weight. The predictor value foreign=0,
    which is also used, denotes domestic cars.
*/
exit

-----------------------------------------------

Code:

/*
    Example showing how to predict out of sample for an acd-transformed
    variable (gear_ratio) following model fitting with mfpa.
    
    Patrick Royston, 07feb2017.
*/
clear all
sysuse auto

mfpa, acd(gear_ratio): regress price gear_ratio displacement foreign
/*
    We predict for gear_ratio in sample using -xfracpred-,
    for later comparison with the recommended alternative, -xpredict-.
*/
xfracpred pred_xfracpred, for(gear_ratio)
/*
    Note that the result of mfpa for gear_ratio is a transformed
    variable, IAgea__1, with details given by -describe-ing this variable:
*/
describe IAgea__1
/*
    which gives Agear_ratio^-2-4. So IAgea__1 is an FP1 transformation
    of Agear_ratio, which itself is
*/
describe Agear_ratio
/*
    that is, "acd(ln(gear_ratio)), b0 = -7.054, b1 = 6.459"

    THE KEY CONCEPT:
    To predict successfully at chosen values of gear_ratio, we must
    apply the identical transformations to the chosen values.
    First we get, and store, accurate values of the constants that
    we need to apply the acd transformation to the out of sample
    values of gear_ratio (see below):
*/
quietly acd Ag = gear_ratio
return list
local b0 = r(b0)
local b1 = r(b1)
local power = r(power)
local shift = r(shift)
/*
    We no longer need Ag. It's a duplicate of Agear_ratio anyway.
*/
    drop Ag
/*
    Now we assign to a new variable, v1, four values, say 2.5 3 3.5 4,
    for out of sample prediction. (Of course, there is nothing specical
    about four values, it could be any number.) We transform v1 to Av1
    as was done "discreetly" by acd within mfpa. Note that this use of
    -acd- is effectively out of sample prediction of the ACD transformation
    according to the parameters needed for calculating acd(gear_ratio).
    See -help acd-.
*/
range v1 2.5 4 4
acd Av1 = v1, b(`b0' `b1') power(`power') shift(`shift')
/*
    We now apply to Av1 the FP transformation used for Agear_ratio:
*/
generate v2 = Av1^-2-4
/*
    Use -xpredict- to do the out of sample prediction; we include the
    regression constant, _b[_cons], via the -constant- option.
    
    Add prediction of the SE and calculate 95% CIs.
*/
xpredict pred_v1, with(IAgea__1) at(IAgea__1 v2) constant
xpredict pred_v1_se, with(IAgea__1) at(IAgea__1 v2) constant stdp
gen pred_v1_lb = pred_v1 - 1.96*pred_v1_se
gen pred_v1_ub = pred_v1 + 1.96*pred_v1_se
list v1 pred_v1 pred_v1_se pred_v1_lb pred_v1_ub if v1!=.
/*
    For the record, we compare the in-sample predictions from -xpredict-
    with those from -xfracpred-.
*/
xpredict pred_xpredict, with(IAgea__1) at(IAgea__1 v2) constant
summarize pred_xfracpred pred_xpredict
list gear_ratio pred_xfracpred pred_xpredict v1 pred_v1
/*
    It works; the predictions are the same, whether done in or
    out of sample.

    Note also that these "partial predictions" are done with
    covariates (other than transformed gear_ratio) set to 0.
    The predictor -displacement- was centered on its mean,
    yielding predictor Idisp__1, so Idisp__1 = 0 corresponds
    to the mean vehicle displacement. The predictor value foreign=0,
    which is also used, denotes domestic cars.
*/
exit

Comment

Tim Morris

Join Date: Apr 2014

Posts: 92
#3

08 Feb 2017, 04:57

Hi Sophie

Isn't this was what the -all- option is for? You generate a new observation with displacement==80 and weight and foreign set to some reasonable value (but mpg missing), then submit your command as mfp, all: ...

I think after this it bases predictions on the FP variables without checking if they were in e(sample). Does that not work?

Tim
Comment

Announcement

Calculating predictions from fractional polynomials for values not actually observed

Comment

Comment