Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • out of sample predictions after mkspline or interpolation

    Dear Statalist,

    I need help for the following issue

    I have a variable named "age_midpoint" which is
    age_midpoint
    5<=x<15 "5-14" 10
    15<=x<25 "15-24" 20
    25<=x<35 "25-34" 30
    35<=x<45 "35-44" 40
    45<=x<55 "45-54" 50
    55<=x<65 "55-64" 60
    65<=x<75 "65-74" 70
    75<=x<85 "75-84" 80
    85<=x<95 ">85" 90



    I did restricted cubic spline using

    mkspline age_knots_3 3 = age_midpoint, displayknots

    mkspline agesp_3=age_midpoint, cubic knots(36.66667 63.33333 )

    regress service_item_age_gender_10 agesp_3*

    How could I get predictions for following "age_midpoint"s which are
    age_midpoint
    5<=x<10 "5-9" 7.5
    10<=x<15 "10-14" 12.5
    15<=x<20 "15-19" 17.5
    20<=x<25 "20-24" 22.5
    25<=x<30 "25-29" 27.5
    30<=x<35 "30-34" 32.5
    35<=x<40 "35-39" 37.5
    40<=x<45 "40-44" 42.5
    45<=x<50 "45-49" 47.5
    50<=x<55 "50-54" 52.5
    55<=x<60 "55-59" 57.5
    60<=x<65 "60-64" 62.5
    65<=x<70 "65-69" 67.5
    70<=x<75 "70-74" 72.5
    75<=x<80 "75-79" 77.5
    80<=x<85 "80-84" 82.5
    85<=x<90 "85-89" 87.5
    90<=x<95 "90-94" 92.5




    Regards,

  • #2
    Look at the margin's command. It will let you put in any x values you want.

    Comment


    • #3
      Originally posted by Phil Bromiley View Post
      Look at the margin's command. It will let you put in any x values you want.
      Thanks for the reply.

      However, using
      regress service_item_age_gender_10 agesp_3*

      I don't no how to specify agesp_31, agesp_32 and agesp_33 to reflect the new age_midpoint in the margins command.

      Considering knots(36.66667 63.33333 ), How about

      margins, at (at( agesp_31=(7.5(5)32.5))
      for fowlloing age_midpoint (age_midpoint<36.66667)
      age_midpoint
      5<=x<10 "5-9" 7.5
      10<=x<15 "10-14" 12.5
      15<=x<20 "15-19" 17.5
      20<=x<25 "20-24" 22.5
      25<=x<30 "25-29" 27.5
      30<=x<35 "30-34" 32.5


      margins, at (at( agesp_32=(37.5(5)62.5))
      for fowlloing age_midpoint (36.66667<age_midpoint<63.33333)
      age_midpoint
      35<=x<40 "35-39" 37.5
      40<=x<45 "40-44" 42.5
      45<=x<50 "45-49" 47.5
      50<=x<55 "50-54" 52.5
      55<=x<60 "55-59" 57.5
      60<=x<65 "60-64" 62.5
      margins, at (at( agesp_33=(62.7(5)92.5))
      for fowlloing age_midpoint (age_midpoint>63.33333)
      age_midpoint
      65<=x<70 "65-69" 67.5
      70<=x<75 "70-74" 72.5
      75<=x<80 "75-79" 77.5
      80<=x<85 "80-84" 82.5
      85<=x<90 "85-89" 87.5
      90<=x<95 "90-94" 92.5
      Last edited by Masoumeh Sanagou; 07 May 2017, 17:39.

      Comment


      • #4
        Originally posted by Masoumeh Sanagou View Post

        Thanks for the reply.

        However, using
        regress service_item_age_gender_10 agesp_3*

        I don't no how to specify agesp_31, agesp_32 and agesp_33 to reflect the new age_midpoint in the margins command.

        Considering knots(36.66667 63.33333 ), How about

        margins, at (at( agesp_31=(7.5(5)32.5))
        for fowlloing age_midpoint (age_midpoint<36.66667)
        age_midpoint
        5<=x<10 "5-9" 7.5
        10<=x<15 "10-14" 12.5
        15<=x<20 "15-19" 17.5
        20<=x<25 "20-24" 22.5
        25<=x<30 "25-29" 27.5
        30<=x<35 "30-34" 32.5


        margins, at (at( agesp_32=(37.5(5)62.5))
        for fowlloing age_midpoint (36.66667<age_midpoint<63.33333)
        age_midpoint
        35<=x<40 "35-39" 37.5
        40<=x<45 "40-44" 42.5
        45<=x<50 "45-49" 47.5
        50<=x<55 "50-54" 52.5
        55<=x<60 "55-59" 57.5
        60<=x<65 "60-64" 62.5
        margins, at (at( agesp_33=(62.7(5)92.5))
        for fowlloing age_midpoint (age_midpoint>63.33333)
        age_midpoint
        65<=x<70 "65-69" 67.5
        70<=x<75 "70-74" 72.5
        75<=x<80 "75-79" 77.5
        80<=x<85 "80-84" 82.5
        85<=x<90 "85-89" 87.5
        90<=x<95 "90-94" 92.5
        Sorry I don't know how to revise a post. so I post revised version here:
        Thanks for the reply.

        However, using
        regress service_item_age_gender_10 agesp_3*

        I don't no how to specify agesp_31, agesp_32 and agesp_33 to reflect the new age_midpoint in the margins command.

        Considering knots(36.66667 63.33333 ), How about

        margins, at( agesp_31=(7.5(5)32.5))
        for fowlloing age_midpoint (age_midpoint<36.66667)
        age_midpoint
        5<=x<10 "5-9" 7.5
        10<=x<15 "10-14" 12.5
        15<=x<20 "15-19" 17.5
        20<=x<25 "20-24" 22.5
        25<=x<30 "25-29" 27.5
        30<=x<35 "30-34" 32.5


        margins, at( agesp_32=(37.5(5)62.5))
        for fowlloing age_midpoint (36.66667<age_midpoint<63.33333)
        age_midpoint
        35<=x<40 "35-39" 37.5
        40<=x<45 "40-44" 42.5
        45<=x<50 "45-49" 47.5
        50<=x<55 "50-54" 52.5
        55<=x<60 "55-59" 57.5
        60<=x<65 "60-64" 62.5

        margins, at( agesp_33=(67.5(5)92.5))
        for fowlloing age_midpoint (age_midpoint>63.33333)
        age_midpoint
        65<=x<70 "65-69" 67.5
        70<=x<75 "70-74" 72.5
        75<=x<80 "75-79" 77.5
        80<=x<85 "80-84" 82.5
        85<=x<90 "85-89" 87.5
        90<=x<95 "90-94" 92.5

        Comment


        • #5
          This seems possibly over-elaborate. It seems that you have two variables

          Code:
          service_item_age_gender_10 age_midpoint
          and you want to interpolate predicted values for the first variables for different ages. What do the data look like? How many values do you have for each midpoint? Show us the result of

          Code:
          scatter service_item_age_gender_10 age_midpoint

          Comment


          • #6
            Originally posted by Nick Cox View Post
            This seems possibly over-elaborate. It seems that you have two variables

            Code:
            service_item_age_gender_10 age_midpoint
            and you want to interpolate predicted values for the first variables for different ages. What do the data look like? How many values do you have for each midpoint? Show us the result of

            Code:
            scatter service_item_age_gender_10 age_midpoint
            Thanks for the reply.


            Original data:
            age_midpoint service_item_age_gender service_item_age_gender_10
            10 495 49.5
            20 5031 503.1
            30 12818 1281.8
            40 23021 2302.1
            50 35997 3599.7
            60 50198 5019.8
            70 55706 5570.6
            80 37502 3750.2
            90 11090 1109
            service_item_age_gender_10=service_item_age_gender/10


            I’m going to do different modelling, compare their R2, MSE and MAD and choose the best fitted model and then use it to estimate service_item_age_gender in the new age midpoints
            Linear regression model
            regress service_item_age_gender_10 age_midpoint


            Exponential model:
            gen service_item_age_gender_10_ln=ln(service_item_age_ gender_10)
            regress service_item_age_gender_10_ln age_midpoint
            MSE_Exponential MAD_Exponential R2_Exponential
            7147296 2245.663 -0.5659826
            Box–Cox transform
            boxcox service_item_age_gender_10 age_midpoint, model(theta) lrtest
            Test Restricted
            H0: log likelihood chi2 Prob > chi2
            theta=lambda=-1 -89.225483 46.53 0
            theta=lambda=0 -75.594376 19.26 0
            theta=lambda=1 -78.817321 25.71 0
            The last output table shows that the linear, y=y-1 if lambda=1, multiplicative inverse, 1-1/y if lambda=-1 log specifications , y=ln(y) if lambda=0 are strongly rejected
            predict Y_hat_Box_Cox_theta
            summarize service_item_age_gender_10 Y_hat_Box_Cox_theta
            Variable Obs Mean Std. Dev. Min Max
            service_item_age_gender_10 9 2576 1998.395 49.5 5570.6
            Y_hat_Box_Cox_theta 9 2222 1214.578 49.50001 3122.491
            As the summary table illustrates, the mean of the dependent variable is not close to the mean of the predicted value Y_hat_Box_Cox_theta. This indicates that the theta model does not a good job approximating the true value of service_item_age_gender_10.
            boxcox service_item_age_gender_10 age_midpoint, model(rhsonly) nolog lrtest
            Test Restricted LR statistic P-value
            H0: log likelihood chi2 Prob > chi2
            theta=lambda=-1 -77.995124 0.51 0.476
            theta=lambda=0 -77.836883 0.19 0.661
            theta=lambda=1 -78.817321 2.15 0.142
            The last output table shows that the linear, y=y-1 if lambda=1, multiplicative inverse, 1-1/y if lambda=-1 log specifications , y=ln(y) if lambda=0 are not rejected.
            predict Y_hat_Box_Cox_rhsonly
            summarize service_item_age_gender_10 Y_hat_Box_Cox_rhsonly
            Variable Obs Mean Std. Dev. Min Max
            service_item_age_gender_10 9 2576.2 1998.395 49.5 5570.6
            Y_hat_Box_Cox_rhsonly 9 2576.2 1377.484 -391.541 3866.652
            As the summary table illustrates, the mean of the dependent variable is close to the mean of the predicted value yhat. This indicates that the theta model does a good job approximating the true value of service_item_age_gender_10
            boxcox service_item_age_gender_10 age_midpoint, model(lhsonly) nolog lrtest
            Test Restricted LR statistic P-value
            H0: log likelihood chi2 Prob > chi2
            theta=lambda=-1 -96.30057 38.52 0
            theta=lambda=0 -78.70839 3.33 0.068
            theta=lambda=1 -78.81732 3.55 0.059
            The last output table shows that multiplicative inverse, 1-1/y if lambda=-1 are strongly rejected
            the linear, y=y-1 if lambda=1 log specifications , y=ln(y) if lambda=0 are not rejected.
            Number of obs = 9
            LR chi2(1) = 4.86
            Log likelihood= -77.040957 Prob > chi2 = 0.027
            service_item_age_gender_10 Coef. Std. Err. z P>z [95% Conf. Interval]
            /theta 0.4156364 0.256971 1.62 0.106 -.0880169 0.91929
            the boxcox output shows that theta=0.4156364 as the optimal boxcox parameter for
            transforming dependent variable, in order to linearize its relationship.
            therefore the left hand side transformation is y(.4156364 )=(y^(.4156364)-1)/.4156364.
            predict Y_hat_Box_Cox_lhsonly
            summarize service_item_age_gender_10 Y_hat_Box_Cox_lhsonly
            Variable Obs Mean Std. Dev. Min Max
            service_item_age_gender_10 9 2576.2 1998.395 49.5 5570.6
            Y_ha~lhsonly 9 2590.179 1447.928 896.4871 5038.64
            As the summary table illustrates, the mean of the dependent variable is close to the mean of the predicted value yhat. This indicates that the theta model does a good job approximating the true value of service_item_age_gender_10
            MSE_Box_Cox_lhsonly MAD_Box_Cox_lhsonly R2_Box_Cox_lhsonly
            3027735 1320.799 0.1470814
            Restricted cubic spline
            mkspline age_knots_5 5 = age_midpoint, displayknots
            mkspline agesp_5=age_midpoint, cubic knots(26 42 58 74 )
            regress service_item_age_gender_10 agesp_5*

            R-squared = 0.9428
            Adj R-squared = 0.9085
            predict Y_hat_restricted_cubic_spline_5

            twoway scatter service_item_age_gender_10 age_midpoint ///
            , graphregion(color(white)) plotregion(lcolor(black) lwidth(small)) ///
            || line Y_hat_restricted_cubic_spline_5 age_midpoint, sort clstyle(solid)







            Comment


            • #7
              Graphs are attached
              Attached Files

              Comment


              • #8
                Working backwards:

                1. I can't read .docx documents (#7). See FAQ Advice #12 for why you are asked not to post them.

                2. There are various images in #6 I can't read. Same advice on what attachments work here.

                The data example is helpful. A simple plot shows a nonlinear relationship with a turning point.

                Therefore,

                3. Straight-line and logarithmic fits are doomed to failure. Box-Cox can't possibly help in any serious manner. No monotonic transformation of either variable will induce a turning point.

                Splines can help, but why treat this as a regression problem, when interpolation will provide a direct answer?

                mipolate (SSC) bundles together various interpolation routines. See http://www.statalist.org/forums/foru...-interpolation

                I used the pchip method, in effect a tweaked spline method. Here are code and results:

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float ge_midpoint long service_item_age_gender
                  10   495
                  20  5031
                  30 12818
                  40 23021
                  50 35997
                  60 50198
                  70 55706
                  80 37502
                  90 11090
                 7.5     .
                12.5     .
                17.5     .
                22.5     .
                27.5     .
                32.5     .
                37.5     .
                42.5     .
                47.5     .
                52.5     .
                57.5     .
                62.5     .
                67.5     .
                72.5     .
                77.5     .
                82.5     .
                87.5     .
                92.5     .
                end
                
                * before use, you must install:
                * ssc inst mipolate
                
                mipolate  service_item_age_gender ge_midpoint , gen(siage_pchip)  pchip
                
                twoway connected siage_pchip ge_midpoint, sort ms(Oh) msize(large) || scatter service_item_age_gender ge_midpoint, ms(+) msize(large) legend(order(2 "data" 1 "pchip")) scheme(s1color) ytitle( service_item_age_gender)
                Note that the graph here just joins observed and interpolated points by line segments. That's not a statement about what any interpolations would be if further points were interpolated.
                Click image for larger version

Name:	mipolate.png
Views:	1
Size:	11.7 KB
ID:	1391874


                Comment


                • #9
                  Perfect _thanks.

                  Comment


                  • #10
                    Hi again
                    I've tried to install mipolate. "ssc inst mipolate" does not work. I've got following message:
                    . ssc inst mipolate
                    connection timed out -- see help r(2) for troubleshooting
                    http://fmwww.bc.edu/repec/bocode/m/ either
                    1) is not a valid URL, or
                    2) could not be contacted, or
                    3) is not a Stata download site (has no stata.toc file).
                    r(2);


                    I thought maybe it is because of the firewall so I downloads the links from "Econpapers" website (MIPOLATE: Stata module to interpolate values) and copy them on "C:\Program Files (x86)\Stata14\ado\base" on my computer. Files are mipolate.ado, mipolate.sthlp, stripolate.ado, stripolate.sthlp. But it does not work and I've got "command mipolate is unrecognized" error message.

                    Do you have any suggestions?

                    I really appreciate your help.
                    Last edited by Masoumeh Sanagou; 09 May 2017, 22:51.

                    Comment


                    • #11
                      Don't store anything in ado\base That directory is reserved for official Stata. Instead you should store it in ado\plus\m\ and ado\plus\s\, depending on the first letter of the file you store.
                      ---------------------------------
                      Maarten L. Buis
                      University of Konstanz
                      Department of history and sociology
                      box 40
                      78457 Konstanz
                      Germany
                      http://www.maartenbuis.nl
                      ---------------------------------

                      Comment


                      • #12
                        Thanks for the help. However, I don't have "plus" folder in the "ado" folder. I have following paths:

                        C:\Program Files (x86)\Stata14\ado\base\m
                        C:\Program Files (x86)\Stata14\ado\base\s

                        Are these paths fine to copy the files in them or should I make ado\plus\m\ and ado\plus\s\ folders?

                        Comment


                        • #13
                          Run

                          Code:
                          adopath
                          and it should show you a PLUS area even if the directories do not yet exist.

                          Put mipolate.* in the m directory off PLUS and stripolate.* in the s directory.

                          If need be, create these directories.

                          Stata has an often useful

                          Code:
                          mkdir
                          command (which naturally just talks to the operating system, in this case Windows).

                          Comment


                          • #14
                            Thanks for help.

                            Just for the information:
                            I made
                            C:\Program Files (x86)\Stata14\ado\plus\m
                            C:\Program Files (x86)\Stata14\ado\plus\s

                            and copy the files in them but it did not work.
                            Then I copy the files in
                            C:\Program Files (x86)\Stata14\ado\base\m
                            C:\Program Files (x86)\Stata14\ado\base\s
                            and it works now.

                            I really appreciate your help.

                            Comment


                            • #15
                              Maarten is right in #11: you should never put user-written programs in \ado\base Stata won't explode, but you won't be able to use them when you upgrade.

                              Please show us the result of typing

                              Code:
                              adopath
                              and read FAQ Advice #12 to learn that you should never just say "did not work".

                              Comment

                              Working...
                              X