Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating predictions from fractional polynomials for values not actually observed

    Hi Stata Forum

    I have been using Patrick Royston's very useful suite of commands to carry out fractional polynomial regressions.

    I was wondering if it is possible to use fracpred to calculate predictions for values not actually observed. For example:

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . mfp: regress mpg weight displacement foreign
    
    [output deleted]
    
    . fracpred pred, for(displacement)
    
    . sort displacement
    
    . l displacement pred in 1/5
    
         +---------------------+
         | displa~t       pred |
         |---------------------|
      1. |       79   29.81834 |
      2. |       85    28.3481 |
      3. |       86    28.1325 |
      4. |       86    28.1325 |
      5. |       89   27.52874 |
         +---------------------+
    Is it possible to get a predicted value for displacement=80, for example?

    Many thanks

    Sophie

  • #2
    I wrote to Patrick Royston and he was kind enough to send me syntax files for predicting outside the dataset, for mfp and mfpa. I am quoting them below.

    -----------------------------------------------
    Code:
    /*
        Example showing how to predict out of sample for a variable 
        (displacement) following model fitting with mfp.
        
        Patrick Royston, 27jan2017.
    */
    clear all
    sysuse auto
    
    mfp: regress mpg weight displacement foreign
    /*
        We predict for displacement in sample using -fracpred-,
        for later comparison with the recommended alternative, -xpredict-.
    */
    fracpred pred_fracpred, for(displacement)
    /*
        Note that the result of mfp for displacement is a transformed
        variable, Idisp__1, with details given by -describe-ing this variable:
    */
    describe Idisp__1
    /*
        We see the transformation is "X^-2-.2568962282: X = displacement/100".
        To predict successfully at chosen values of displacement, we must
        apply the same transformation to the desired values. This is done
        "manually".
    
        We assign to a new variable, v1, the 4 values 100, 200, 300, 400
        and then transform them to v2 as was done by mfp to displacement.
    
        [ADVANCED NOTE: The transformation is also made visible by typing
        -char list Idisp__1[fp]-. It can be captured in a local macro, called
        say zz, using the command -local zz: char Idisp__1[fp]-. This could be
        used, with some work, to automate applying the transformation.]
    */
    range v1 100 400 4
    generate v2 = (v1/100)^-2-.2568962282
    /*
        Use -xpredict- to do the out of sample prediction; we include the
        regression constant, _b[_cons], via the -constant- option.
        
        Add prediction of the SE and calculate 95% CIs.
    */
    xpredict pred_v1, with(Idisp__1) at(Idisp__1 v2) constant
    xpredict pred_v1_se, with(Idisp__1) at(Idisp__1 v2) constant stdp
    gen pred_v1_lb = pred_v1 - 1.96*pred_v1_se
    gen pred_v1_ub = pred_v1 + 1.96*pred_v1_se
    list v1 pred_v1 pred_v1_se pred_v1_lb pred_v1_ub if v1!=.
    /*
        For the record, we compare the in-sample predictions from -xpredict-
        with those from -fracpred-.
    */
    xpredict pred_xpredict, with(Idisp__1) constant
    summarize pred_fracpred pred_xpredict
    list displacement pred_fracpred pred_xpredict v1 pred_v1
    /*
        All is well; the predictions are the same. Note for example
        that the predictions for displacement = 200
        agree when done in sample and out of sample. As they should.
        
        Note also that these "partial predictions" are done with
        covariates (other than transformed displacement) set to 0.
        For the predictor -weight-, this was centered on its mean,
        yielding predictor Iweig__1, so Iweig__1 = 0 corresponds
        to the mean vehicle weight. The predictor value foreign=0,
        which is also used, denotes domestic cars.
    */
    exit
    -----------------------------------------------
    Code:
    /*
        Example showing how to predict out of sample for an acd-transformed
        variable (gear_ratio) following model fitting with mfpa.
        
        Patrick Royston, 07feb2017.
    */
    clear all
    sysuse auto
    
    mfpa, acd(gear_ratio): regress price gear_ratio displacement foreign
    /*
        We predict for gear_ratio in sample using -xfracpred-,
        for later comparison with the recommended alternative, -xpredict-.
    */
    xfracpred pred_xfracpred, for(gear_ratio)
    /*
        Note that the result of mfpa for gear_ratio is a transformed
        variable, IAgea__1, with details given by -describe-ing this variable:
    */
    describe IAgea__1
    /*
        which gives Agear_ratio^-2-4. So IAgea__1 is an FP1 transformation
        of Agear_ratio, which itself is
    */
    describe Agear_ratio
    /*
        that is, "acd(ln(gear_ratio)), b0 = -7.054, b1 = 6.459"
    
        THE KEY CONCEPT:
        To predict successfully at chosen values of gear_ratio, we must
        apply the identical transformations to the chosen values.
        First we get, and store, accurate values of the constants that
        we need to apply the acd transformation to the out of sample
        values of gear_ratio (see below):
    */
    quietly acd Ag = gear_ratio
    return list
    local b0 = r(b0)
    local b1 = r(b1)
    local power = r(power)
    local shift = r(shift)
    /*
        We no longer need Ag. It's a duplicate of Agear_ratio anyway.
    */
        drop Ag
    /*
        Now we assign to a new variable, v1, four values, say 2.5 3 3.5 4,
        for out of sample prediction. (Of course, there is nothing specical
        about four values, it could be any number.) We transform v1 to Av1
        as was done "discreetly" by acd within mfpa. Note that this use of
        -acd- is effectively out of sample prediction of the ACD transformation
        according to the parameters needed for calculating acd(gear_ratio).
        See -help acd-.
    */
    range v1 2.5 4 4
    acd Av1 = v1, b(`b0' `b1') power(`power') shift(`shift')
    /*
        We now apply to Av1 the FP transformation used for Agear_ratio:
    */
    generate v2 = Av1^-2-4
    /*
        Use -xpredict- to do the out of sample prediction; we include the
        regression constant, _b[_cons], via the -constant- option.
        
        Add prediction of the SE and calculate 95% CIs.
    */
    xpredict pred_v1, with(IAgea__1) at(IAgea__1 v2) constant
    xpredict pred_v1_se, with(IAgea__1) at(IAgea__1 v2) constant stdp
    gen pred_v1_lb = pred_v1 - 1.96*pred_v1_se
    gen pred_v1_ub = pred_v1 + 1.96*pred_v1_se
    list v1 pred_v1 pred_v1_se pred_v1_lb pred_v1_ub if v1!=.
    /*
        For the record, we compare the in-sample predictions from -xpredict-
        with those from -xfracpred-.
    */
    xpredict pred_xpredict, with(IAgea__1) at(IAgea__1 v2) constant
    summarize pred_xfracpred pred_xpredict
    list gear_ratio pred_xfracpred pred_xpredict v1 pred_v1
    /*
        It works; the predictions are the same, whether done in or
        out of sample.
    
        Note also that these "partial predictions" are done with
        covariates (other than transformed gear_ratio) set to 0.
        The predictor -displacement- was centered on its mean,
        yielding predictor Idisp__1, so Idisp__1 = 0 corresponds
        to the mean vehicle displacement. The predictor value foreign=0,
        which is also used, denotes domestic cars.
    */
    exit

    Comment


    • #3
      Hi Sophie

      Isn't this was what the -all- option is for? You generate a new observation with displacement==80 and weight and foreign set to some reasonable value (but mpg missing), then submit your command as mfp, all: ...

      I think after this it bases predictions on the FP variables without checking if they were in e(sample). Does that not work?

      Tim

      Comment

      Working...
      X