Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • prediction after estimation by group

    Hi,

    I would like to get the estimation by groups (industry). Lets say that I have 10 industries, and would like to estimate OLS by those 10 industry (separately!). That seems to be simple, as long as I specify the group/ industry for which regression holds. Example

    Code:
    reg x y z if group==1
    The problem comes when I need to do the predictions, lets say of residuals within group 1. when I use command:

    Code:
    predict res, residual
    I get predictions for other groups too.. Does anyone know why? What do residual predictions mean for groups to which regression does not apply at all in the first place?

    When I specify prediction in the following manner:


    Code:
    predict res if group=1, residual
    I get prediction only for group 1 and they are identical to the residual prediction of group 1 when I do not specify the group for which predictions apply.

    Am I doing something wrong?

  • #2
    No, you're not doing anything wrong. As the help file for -predict- says:
    predict calculates the requested statistic for all possible observations, whether they were used in fitting the model or not.
    There are exceptions: some statistics cannot be calculated "out of sample." But most can. So if you want to make predictions only for a your group, you have to restrict -predict- explicitly with an -if- clause.

    This is actually often quite useful, as people frequently want to first fit a model to some of the data and then see what the model predicts in the rest of the data.

    Comment


    • #3
      Hi Mina

      So you do
      Code:
      reg x y z if group==1
      and Stata stores the coefficients for the regression that it then uses in
      Code:
      predict res, residual
      This last code applies to all observations, since you're not conditioning them to be from group 1, so it will calculate the residuals for all observations using the coefficients estimated only using the data for group 1.

      The command
      Code:
      predict res if group ==1, residual
      will estimate the residuals for only those observations that belong to group 1. Therefore, if you only consider observations from group 1 both predict commands will predict the same values. The first one predicts also for observations that don't belong in group 1, but using the estimates from the regression that uses only observations for group 1.

      To further illustrate consider the following
      Code:
      sysuse auto, clear
      
      quiet regress price mpg weight if foreign == 1
      
      predict res1, resid
      predict res2 if foreign == 1, resid
      
      sum res1 res2
      sum res1 res2 if foreign == 1
      The results of the summarize commands are the following
      Code:
      . sum res1 res2
      
          Variable |       Obs        Mean    Std. Dev.       Min        Max
      -------------+--------------------------------------------------------
              res1 |        74   -3915.555     3530.11     -10211   2150.149
              res2 |        22   -1.73e-06    1214.349  -2625.265   2150.149
      
      . sum res1 res2 if foreign == 1
      
          Variable |       Obs        Mean    Std. Dev.       Min        Max
      -------------+--------------------------------------------------------
              res1 |        22   -1.73e-06    1214.349  -2625.265   2150.149
              res2 |        22   -1.73e-06    1214.349  -2625.265   2150.149
      When we use all the observations (the first time) the values change, but if we only use the restricted set of observations for the group (the second one) the predictions are the same. Notice that the first summarize has different number of observations for the two residuals. The 22 are the ones that belong to the group foreign == 1 but the 74 are all observations. Notice also that for res2 the observations are always 22. This is because when Stata computed those residuals it did it for only those 22 observations, setting the value for all other observations to missing (.).
      Last edited by Alfonso Sánchez-Peñalver; 04 Oct 2016, 15:33.
      Alfonso Sanchez-Penalver

      Comment


      • #4
        Alfonso, Clyde- many, many thanks!!

        Comment

        Working...
        X