Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression by industry and year and saving predicted values and residuals

    Hi,
    I am trying to estimate predicted values and residuals from a regression of y on x1 x2 x3 performed by industry (two digit sic) and year.
    Also, the condition is that there are at least 10 observation for each industry/year, otherwise no estimation should be performed
    My sample has 29000 observations, two-digit sic goes from 10 to 99 and year spans from 1999 until 2014.

    I have tried the following code and seems to work. Does it seem right to the experts?

    gen y_hat=. // empty variable for predictions
    gen y_res=. // empty variable for residuals
    tempvar acc_tot_fitted acc_tot_res // temporary variables for each set of predictions
    levelsof sic_2_digit, local(levels)
    foreach x of local levels {
    foreach z of numlist 1999/2014 {
    capture reg y x1 x2 x3 if sic_2_digit==`x' & year==`z' & sic_year_numerosity>9
    if !_rc {
    predict `y_hat' // predictions are now in temporary variable
    replace y_hat=`y_hat' if e(sample) // transfer predictions from temp variable
    predict `y_res', residuals // residuals are now in temporary variable
    replace y_res=`y_res' if e(sample) // transfer residuals from temp variable
    drop `y_hat' `acc_tot_res' // drop temporary variables in preparation for next regression
    }
    }
    }

    Also, as a side, I would like to pull out the average R-square from the regressions.

    Thanks.
    Kind regards
    Amedeo

  • #2
    Your code seems to be fine, though quite complicated.
    I'm a bit surprised that it works, especially when you save predictions in a local variable that has the same name of another variable. If I were you, I'd have change the name, but if it works, that is great.

    Concerning the R-square, the -reg- command saves the R-square value in e(r2). So, if you want to do the mean, first save the R2 of each regression in a variable, and then do the mean.

    So insert the following (in bold) in your code
    Code:
    .....
    gen y_res=. // empty variable for residuals
    gen rsquare==.
    .....
    replace y_res=`y_res' if e(sample) // transfer residuals from temp variable
    replace rsquare=e(r2) if sic_2_digit==`x' & year==`z' & sic_year_numerosity>9
    drop `y_hat' `acc_tot_res' // drop temporary variables in preparation for next regression
    }
    }
    }
    table, c(mean rsquare) /* displays the mean*/
    egen mrsq= mean (rsquare) /*generates a new variable with the mean*/
    Of couse this gives you a weighted mean, (weighted by the number of obs concerned by each regression), if you don't want that, save each r-square in different variable and then do the simple mean between each.

    Hope this helps
    Charlie

    Comment


    • #3
      Hi Charlie
      thanks a lot for your code.
      I get the logic and seems to work. My only issue is what shall I table? there is no varlist after table hence it returns me an error.

      Thank you

      Amedeo

      Comment


      • #4
        You are right, there is nothing to table, I guess I was thinking of an industry/year couple variable, in order to display each R2 computed, but this is not what you asked for.

        Instead, to see the mean, try the following command
        Code:
        su rsquare,de
        it will display the mean of the R2(among other stats that could be usefull).
        Also the last command I gave you will create an uniformed variable equal to the mean of R2


        Comment


        • #5
          @Charlie Joyez wrote:

          I'm a bit surprised that it works, especially when you save predictions in a local variable that has the same name of another variable. If I were you, I'd have change the name, but if it works, that is great.
          It is not surprising that temporary variables can have the same name as variables in the dataset if you consider that the name of the temporary variable is only a macro. [P] macro explains this at the example of a temporary variable called sumsq.

          The tempvar sumsq command creates a local macro called sumsq and stores in it a name that is different from any name currently in the data. Subsequently, you then use `sumsq' with single quotes around it rather than sumsq in your calculation, so that rather than naming your temporary variable sumsq, you are naming it whatever Stata wants you to name it.
          Here is an example with the auto data. Note that in the dataset the temporary variable has the name __000000.
          Code:
          . sysuse auto
          . tempvar mpg
          . gen `mpg' = mpg
          . lab var `mpg' "Mileage (mpg) (copy)"
          . d mpg `mpg'
          
                        storage   display    value
          variable name   type    format     label      variable label
          ----------------------------------------------------------------------------------
          mpg             int     %8.0g                 Mileage (mpg)
          __000000        float   %9.0g                 Mileage (mpg) (copy)

          Comment


          • #6
            Thanks for the details Friedrich.
            I didn't know that. However to avoid confusion (in my mind, not for Stata) I never tried to have several variables named identically (one being only temporary and other not), and I think I'll continue this way, but it helps to know it's possible.

            Comment


            • #7
              Thank you guys. Extremely helpful. I think I used the tempvar with the same names just because I was lazy. First time I am lucky with Stata.

              Comment


              • #8
                Hi Amedeo,

                I want to do the same thing, predict the residual for each industry. However, I run your code but it does not work.
                I used this code
                Code:
                 tabulate isic2, generate (dsector)// create sector dummy variables
                forvalues x=1/20{
                eststo: regress lnva c.lnk##dsector`x' c.lnl##dsector`x' c.lnm##dsector`x'
                predict TFP`x' if e(sample), resid
                }
                I have a stupid question that how many residuals value that we are gonna obtain? Each obs will get a value of residuals or all obs of one industry will get a residual.
                Thank you so much.

                Comment

                Working...
                X