Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating predicted value using coefficients

    I would like to calculate predicted values for each company_ID in my sample by estimating the following regression equation for each company_ID quarterly and excluding firm i:

    Here is an example of the used variables included in the regression. The coefficient estimates, as shown below, will then be used to calculate for each company_ID and for each quarter a predicted value by multiplying the coefficient estimates for the full sample with the actual independent variable observations for each company_ID.

    Code:
     . xtreg CE lCE ATO accruals laccruals changesales negchangesales
    
    Random-effects GLS regression                   Number of obs     =     36,122
    Group variable: company_ID                      Number of groups  =      1,166
    
    R-sq:                                           Obs per group:
         within  = 0.0341                                         min =          1
         between = 0.5489                                         avg =       31.0
         overall = 0.0492                                         max =         53
    
                                                    Wald chi2(6)      =    1869.59
    corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
    
    --------------------------------------------------------------------------------
                CE |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------+----------------------------------------------------------------
               lCE |    .088218   .0051389    17.17   0.000     .0781459    .0982902
               ATO |  -5.80e-08   2.13e-06    -0.03   0.978    -4.24e-06    4.12e-06
          accruals |  -.0586675   .0015102   -38.85   0.000    -.0616275   -.0557076
         laccruals |   .0037709   .0015531     2.43   0.015     .0007268     .006815
       changesales |   .0005675   .0002451     2.32   0.021     .0000872    .0010478
    negchangesales |  -4.926075   .7808714    -6.31   0.000    -6.456555   -3.395595
             _cons |  -.1345862   .5538775    -0.24   0.808    -1.220166    .9509939
    ---------------+----------------------------------------------------------------
           sigma_u |          0
           sigma_e |  71.838386
               rho |          0   (fraction of variance due to u_i)
    --------------------------------------------------------------------------------


    I would like to know if the Code I want to use is correct:

    Code:
    egen id=group(company_ID)
    gen predicted_CE=.
    forvalues i=1(1)N {
    xtreg CE lCE ATO accruals laccruals changesales negchangesales if id==`i'
    predict px if id==`i'
    replace predicted_CE = px if id==`i'
    drop px
    }

  • #2
    No, the letter N is undefined. Better is:



    Code:
    egen id=group(company_ID)
    gen predicted_CE=.
    qui sum id, meanonly
    local N= r(N)
    forvalues i=1(1)`N' {
    xtreg CE lCE ATO accruals laccruals changesales negchangesales if id==`i'
    predict px if id==`i'
    replace predicted_CE = px if id==`i'
    drop px
    }
    Last edited by Carole J. Wilson; 09 May 2016, 12:24.
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Thanks Carole. Stata gives the following error:

      Code:
      . egen id=group(company_ID)
      
      . gen predicted_CE=.
      variable predicted_CE already defined
      r(110);
      
      . qui sum id, meanonly
      
      . local N= r(N)
      
      . forvalues i=1(1)`N' {
        2. xtreg CE lCE ATO accruals laccruals changesales negchangesales if id==`i'
        3. predict px if id==`i'
        4. replace predicted_CE = px if id==`i'
        5. drop px
        6. }
      insufficient observations
      r(2001);
      I don't understand the error because why does it give this error for each company regression but not for the full sample. Because for the full sample I was able to run the regression xtreg CE lCE ATO accruals laccruals changesales negchangesales.

      Comment


      • #4
        Run:

        Code:
        tab id
        If you have too few observations per company_ID, the regression won't run. You'll need to decide if a panel estimator (xtreg) makes sense if you are only running the regression on a single panel.
        Stata/MP 14.1 (64-bit x86-64)
        Revision 19 May 2016
        Win 8.1

        Comment


        • #5
          Code:
          . forvalues i=1(1)`N' {
            2. reg CE lCE ATO accruals laccruals changesales negchangesales if id==`i'
            3. predict px if id==`i'
            4. replace predicted_CE = px if id==`i'
            5. drop px
            6. }
          
                Source |       SS           df       MS      Number of obs   =        51
          -------------+----------------------------------   F(6, 44)        =     22.47
                 Model |  .637936043         6  .106322674   Prob > F        =    0.0000
              Residual |  .208173943        44  .004731226   R-squared       =    0.7540
          -------------+----------------------------------   Adj R-squared   =    0.7204
                 Total |  .846109986        50    .0169222   Root MSE        =    .06878
          
          --------------------------------------------------------------------------------
                      CE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ---------------+----------------------------------------------------------------
                     lCE |      1.056   .1143598     9.23   0.000     .8255232    1.286477
                     ATO |  -.0000163   .0000743    -0.22   0.828     -.000166    .0001335
                accruals |   .0805628   .0512344     1.57   0.123    -.0226933    .1838189
               laccruals |    .057415   .0497176     1.15   0.254    -.0427843    .1576144
             changesales |  -.1555899   .0419584    -3.71   0.001    -.2401514   -.0710284
          negchangesales |   .0308326   .0327986     0.94   0.352    -.0352686    .0969337
                   _cons |   .0257764   .0273522     0.94   0.351    -.0293484    .0809012
          --------------------------------------------------------------------------------
          (option xb assumed; fitted values)
          (59,246 missing values generated)
          (0 real changes made)
          
                Source |       SS           df       MS      Number of obs   =        20
          -------------+----------------------------------   F(6, 13)        =      3.14
                 Model |  .153217389         6  .025536232   Prob > F        =    0.0398
              Residual |  .105865384        13  .008143491   R-squared       =    0.5914
          -------------+----------------------------------   Adj R-squared   =    0.4028
                 Total |  .259082773        19  .013635935   Root MSE        =    .09024
          
          --------------------------------------------------------------------------------
                      CE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ---------------+----------------------------------------------------------------
                     lCE |   .8074566   .2431238     3.32   0.006     .2822195    1.332694
                     ATO |   .0000301   .0001156     0.26   0.799    -.0002197    .0002799
                accruals |   .3471477    .119809     2.90   0.012      .088316    .6059794
               laccruals |  -.2448291   .1194714    -2.05   0.061    -.5029314    .0132733
             changesales |   -.065761   .0209147    -3.14   0.008    -.1109445   -.0205775
          negchangesales |   .1109024   .0461477     2.40   0.032     .0112062    .2105985
                   _cons |  -.0034911   .0489053    -0.07   0.944    -.1091447    .1021624
          --------------------------------------------------------------------------------
          (option xb assumed; fitted values)
          (59,276 missing values generated)
          (0 real changes made)
          
                Source |       SS           df       MS      Number of obs   =        21
          -------------+----------------------------------   F(6, 14)        =      1.97
                 Model |  .054808713         6  .009134786   Prob > F        =    0.1392
              Residual |  .064994638        14  .004642474   R-squared       =    0.4575
          -------------+----------------------------------   Adj R-squared   =    0.2250
                 Total |  .119803351        20  .005990168   Root MSE        =    .06814
          
          --------------------------------------------------------------------------------
                      CE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ---------------+----------------------------------------------------------------
                     lCE |  -.3421595    .245086    -1.40   0.184    -.8678167    .1834977
                     ATO |   .0000729   .0000866     0.84   0.414    -.0001128    .0002587
                accruals |  -.1069667   .0722783    -1.48   0.161    -.2619882    .0480549
               laccruals |   .1558047   .0623149     2.50   0.025     .0221525    .2894568
             changesales |   .0112851   .0814755     0.14   0.892    -.1634625    .1860327
          negchangesales |   .0610164   .0577152     1.06   0.308    -.0627704    .1848032
                   _cons |   .1649147   .0514394     3.21   0.006     .0545882    .2752413
          --------------------------------------------------------------------------------
          (option xb assumed; fitted values)
          (59,276 missing values generated)
          (0 real changes made)
          
                Source |       SS           df       MS      Number of obs   =        47
          -------------+----------------------------------   F(6, 40)        =      6.13
                 Model |   .08211181         6  .013685302   Prob > F        =    0.0001
              Residual |  .089285073        40  .002232127   R-squared       =    0.4791
          -------------+----------------------------------   Adj R-squared   =    0.4009
                 Total |  .171396883        46  .003726019   Root MSE        =    .04725
          
          --------------------------------------------------------------------------------
                      CE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ---------------+----------------------------------------------------------------
                     lCE |   .6039451   .1286638     4.69   0.000     .3439059    .8639843
                     ATO |    .000304   .0004657     0.65   0.518    -.0006371    .0012452
                accruals |   .0181641   .0206713     0.88   0.385    -.0236142    .0599424
               laccruals |  -.0249186   .0201781    -1.23   0.224    -.0656999    .0158628
             changesales |  -.2552938    .076913    -3.32   0.002    -.4107407   -.0998469
          negchangesales |  -.0191584   .0243527    -0.79   0.436    -.0683771    .0300603
                   _cons |   .1489068   .0481586     3.09   0.004     .0515747    .2462388
          --------------------------------------------------------------------------------
          (option xb assumed; fitted values)
          (59,250 missing values generated)
          (0 real changes made)
          
                Source |       SS           df       MS      Number of obs   =        23
          -------------+----------------------------------   F(6, 16)        =      5.71
                 Model |  1.61232887         6  .268721478   Prob > F        =    0.0024
              Residual |  .752495079        16  .047030942   R-squared       =    0.6818
          -------------+----------------------------------   Adj R-squared   =    0.5625
                 Total |  2.36482395        22  .107491998   Root MSE        =    .21687
          
          --------------------------------------------------------------------------------
                      CE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ---------------+----------------------------------------------------------------
                     lCE |   .6713845     .24814     2.71   0.016     .1453511    1.197418
                     ATO |  -.0003652   .0015815    -0.23   0.820    -.0037178    .0029875
                accruals |  -.0380346   .0360513    -1.06   0.307    -.1144598    .0383907
               laccruals |   .0195132   .0343873     0.57   0.578    -.0533846     .092411
             changesales |  -.1877619   .0690248    -2.72   0.015    -.3340879    -.041436
          negchangesales |   .2827041   .1713372     1.65   0.118    -.0805145    .6459227
                   _cons |   .0825425    .169223     0.49   0.632    -.2761943    .4412793
          --------------------------------------------------------------------------------
          (option xb assumed; fitted values)
          (59,274 missing values generated)
          (0 real changes made)
          
                Source |       SS           df       MS      Number of obs   =        16
          -------------+----------------------------------   F(6, 9)         =   1492.83
                 Model |  278996.197         6  46499.3662   Prob > F        =    0.0000
              Residual |  280.336152         9  31.1484613   R-squared       =    0.9990
          -------------+----------------------------------   Adj R-squared   =    0.9983
                 Total |  279276.533        15  18618.4355   Root MSE        =    5.5811
          
          --------------------------------------------------------------------------------
                      CE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ---------------+----------------------------------------------------------------
                     lCE |   .0420694    .350835     0.12   0.907    -.7515744    .8357133
                     ATO |   .1925715   .1845965     1.04   0.324    -.2250148    .6101578
                accruals |  -.4685629   .0053562   -87.48   0.000    -.4806794   -.4564464
               laccruals |  -.1043686   .3282712    -0.32   0.758    -.8469696    .6382323
             changesales |   .7138923   2.004164     0.36   0.730    -3.819841    5.247625
          negchangesales |  -2.126174   3.308608    -0.64   0.536    -9.610765    5.358416
                   _cons |  -1.562012   2.410108    -0.65   0.533    -7.014056    3.890031
          --------------------------------------------------------------------------------
          (option xb assumed; fitted values)
          (59,281 missing values generated)
          (0 real changes made)
          
                Source |       SS           df       MS      Number of obs   =        41
          -------------+----------------------------------   F(6, 34)        =      4.35
                 Model |  .006267934         6  .001044656   Prob > F        =    0.0023
              Residual |  .008170973        34  .000240323   R-squared       =    0.4341
          -------------+----------------------------------   Adj R-squared   =    0.3342
                 Total |  .014438907        40  .000360973   Root MSE        =     .0155
          
          --------------------------------------------------------------------------------
                      CE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ---------------+----------------------------------------------------------------
                     lCE |   .2404834    .154109     1.56   0.128    -.0727038    .5536705
                     ATO |  -2.53e-06   6.45e-06    -0.39   0.697    -.0000156    .0000106
                accruals |  -.0863498   .0446101    -1.94   0.061    -.1770083    .0043088
               laccruals |   .0451694   .0487813     0.93   0.361    -.0539661     .144305
             changesales |   .0006967   .0010041     0.69   0.493     -.001344    .0027373
          negchangesales |    .018683   .0058163     3.21   0.003     .0068628    .0305032
                   _cons |   .0231053   .0087072     2.65   0.012     .0054101    .0408005
          --------------------------------------------------------------------------------
          (option xb assumed; fitted values)
          (59,253 missing values generated)
          (0 real changes made)
          
                Source |       SS           df       MS      Number of obs   =        50
          -------------+----------------------------------   F(6, 43)        =      9.61
                 Model |  46.3840299         6  7.73067164   Prob > F        =    0.0000
              Residual |  34.5772018        43  .804120973   R-squared       =    0.5729
          -------------+----------------------------------   Adj R-squared   =    0.5133
                 Total |  80.9612317        49  1.65227003   Root MSE        =    .89673
          
          --------------------------------------------------------------------------------
                      CE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ---------------+----------------------------------------------------------------
                     lCE |   .7130755   .0992428     7.19   0.000     .5129333    .9132177
                     ATO |    .000123   .0007165     0.17   0.865    -.0013221     .001568
                accruals |   .0420172   .0618734     0.68   0.501    -.0827624    .1667969
               laccruals |  -.0681179   .0621417    -1.10   0.279    -.1934386    .0572028
             changesales |   .2477435   .3006302     0.82   0.414    -.3585351    .8540222
          negchangesales |  -.0457732   .4664963    -0.10   0.922    -.9865528    .8950063
                   _cons |   -.183093   .3504218    -0.52   0.604    -.8897859    .5235998
          --------------------------------------------------------------------------------
          (option xb assumed; fitted values)
          (59,246 missing values generated)
          (0 real changes made)
          
                Source |       SS           df       MS      Number of obs   =        11
          -------------+----------------------------------   F(6, 4)         =     88.78
                 Model |  .129491732         6  .021581955   Prob > F        =    0.0003
              Residual |  .000972428         4  .000243107   R-squared       =    0.9925
          -------------+----------------------------------   Adj R-squared   =    0.9814
                 Total |   .13046416        10  .013046416   Root MSE        =    .01559
          
          --------------------------------------------------------------------------------
                      CE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ---------------+----------------------------------------------------------------
                     lCE |   .6330339    .238196     2.66   0.057    -.0283042    1.294372
                     ATO |  -.0000491   .0000377    -1.30   0.263    -.0001539    .0000556
                accruals |  -.0000116   .0106813    -0.00   0.999    -.0296677    .0296445
               laccruals |   .0184803   .0119313     1.55   0.196    -.0146463    .0516069
             changesales |   -.145816   .0081178   -17.96   0.000    -.1683547   -.1232773
          negchangesales |   .0562505   .0162854     3.45   0.026      .011035    .1014661
                   _cons |   .1920744   .1201751     1.60   0.185    -.1415852     .525734
          --------------------------------------------------------------------------------
          (option xb assumed; fitted values)
          (59,286 missing values generated)
          (0 real changes made)
          
                Source |       SS           df       MS      Number of obs   =        37
          -------------+----------------------------------   F(6, 30)        =      5.61
                 Model |  .034690266         6  .005781711   Prob > F        =    0.0005
              Residual |  .030936889        30   .00103123   R-squared       =    0.5286
          -------------+----------------------------------   Adj R-squared   =    0.4343
                 Total |  .065627155        36  .001822977   Root MSE        =    .03211
          
          --------------------------------------------------------------------------------
                      CE |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          ---------------+----------------------------------------------------------------
                     lCE |   .5823286   .1718264     3.39   0.002     .2314122    .9332449
                     ATO |  -.0000118   7.75e-06    -1.52   0.139    -.0000276    4.03e-06
                accruals |  -.0491576   .0426757    -1.15   0.258    -.1363129    .0379977
               laccruals |   .0205322   .0460622     0.45   0.659    -.0735393    .1146038
             changesales |  -.0531307   .0305738    -1.74   0.093    -.1155708    .0093094
          negchangesales |  -.0011832   .0200026    -0.06   0.953    -.0420339    .0396675
                   _cons |   .0327197   .0160911     2.03   0.051    -.0001428    .0655822
          --------------------------------------------------------------------------------
          (option xb assumed; fitted values)
          (59,260 missing values generated)
          (0 real changes made)
          no observations
          r(2000);

          Comment


          • #6
            I understand the problem. I have missing values (.) in some of the newly generated variables that I have included as independent variables in the regression.

            I have used the following code and predicted_CE includes missing values:

            Code:
            . foreach l of local levels {
              2. cap reg CE lCE ATO accruals laccruals changesales negchangesales if id==`i'
              3. cap predict px if id==`i'
              4. cap replace predicted_CE = px if id==`i'
              5. cap drop px
              6. }
            Since there are missing values in the independent variables that I need for my regression, what do I need to do in order to get rid of the missing values and still be able to run the regression with foreach or forvalues? Is deleting dropping all missing values a correct approach?

            Comment


            • #7
              Stata ignores missing values. However, if there is a missing value on any variable (independent or dependent) Stata excludes the entire observation (row) from analysis.
              Stata/MP 14.1 (64-bit x86-64)
              Revision 19 May 2016
              Win 8.1

              Comment


              • #8
                Actually there is something else going on here. Even if you had no missing data at all, you would still get the insufficient observations message when trying to run -xtreg- on one panel at a time. If you want to run -xtreg, fe- one panel at a time, that will work (if you have enough data in each panel, as you probably do), although it also makes no sense to do that for a single panel: I would just use -regress- for that.

                Remember that -xtreg- with no options specified defaults to -xtreg, re-. The random effects model is estimated by doing a weighted combination of the fixed-effects model and the between-effects model. But when you (covertly) do the between-effects regression on a single panel, you have only one observation. You can't do any kind of regression on a single observation, hence the error message. This is by no means obvious if you haven't seen it before. The only reason I know about it is because it once bit me, and Tech Support finally explained it to me.

                Comment


                • #9
                  Thank you Clyde and Carole. I have dropped company_ID's with missing values for each of the variables I use in my regression. However, when I use:

                  Code:
                  tab id
                  Stata tells me that some id's have only one observation. How can I drop those id's with for example less than 5 observations?

                  Comment


                  • #10
                    Your basic model is

                    Code:
                      
                    reg CE lCE ATO accruals laccruals changesales negchangesales
                    so you are estimating 7 parameters.

                    Precise advice differs, but few experienced researchers would feel comfortable without a sample size >>7. 10 observations per parameter is one rule of thumb,

                    Whether a hyperplane is suitable for this kind of data is of course another question.
                    Last edited by Nick Cox; 10 May 2016, 04:01.

                    Comment


                    • #11
                      Nick Cox, I need to estimate the 6 independent variables given in the regression, to eventually calculate unexpected CE (actual CE - predicted CE) for each unique company_ID. Are you suggesting me to drop id's with less than 7 observations?

                      Comment


                      • #12
                        Nick is suggesting that you eliminate models with id's that have fewer than 70 observations (or somewhere around there). Whatever threshold you choose to use, you can do so this way:

                        You said you already dropped those observations with missing values on your model variables, so this assumes they are all gone:

                        Code:
                        bysort company_ID : gen id_n=_N
                        Then use the following in your loop

                        Code:
                        cap reg CE lCE ATO accruals laccruals changesales negchangesales if id==`i' & id_n>70   // or whatever threshold you choose
                        Stata/MP 14.1 (64-bit x86-64)
                        Revision 19 May 2016
                        Win 8.1

                        Comment

                        Working...
                        X