Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Predicting and saving residuals after running regressions on several sample units

    Dear Statalist,

    I am running regressions on farm economic data which I have set as panel data - each farm has five years' worth of observations. In this effort, I am trying to determine whether a short-run linear cost function (TC = a + bQ, where TC = total cost, a and b are constant, and Q is the quantity produced) or a short-run quadratic cost function (TC = a + bQ + cQ^2, where idem and c is also a constant) would work best with my data. To do so, I want to compare the sum of squared residuals (SSR) for each farm-level regression. I would like to save the residuals resulting from my regressions as a new variable so that I can then calculate the SSRs and compare the two models. Up to now I have the following code for the quadratic regression, which I will use as an example:

    Code:
    keep if COUNTRY == "XXX"
    xtset ID YEAR
    statsby, by(ID) saving (y.XXX.a.dta): xtreg SE131 c.SE281##c.SE281, fe
    merge m:1 ID using "M:\[...]\y.XXX.a.dta"
    drop _merge
    gen RES1 = .
    quietly bysort ID: xtreg SE131 c.SE281##c.SE281, fe
    predict temp, residuals
    replace RES1 = temp
    drop temp
    gen RES1_SQ = RES1^2
    bysort ID: egen SSR1 = total (RES1_SQ)
    However, I have noticed that the sums of the residuals by farm are far from equal to zero, which has led me to believe that the above code is incorrect: below is an example of this from one farm:

    Code:
    . list in 1/5
    
         +-------------------------------------------------------+
         | _b_SE281    _b_cons        RES1    RES1_SQ       SSR1 |
         |-------------------------------------------------------|
      1. | .0812744   149415.2   -28628.63   8.20e+08   8.13e+09 |
      2. | .0812744   149415.2    3330.112   1.11e+07   8.13e+09 |
      3. | .0812744   149415.2   -77954.47   6.08e+09   8.13e+09 |
      4. | .0812744   149415.2      -30600   9.36e+08   8.13e+09 |
      5. | .0812744   149415.2   -16828.83   2.83e+08   8.13e+09 |
         +-------------------------------------------------------+
    Alternatively, I tried to use code as suggested in previous posts for the same aim (such as: https://www.stata.com/statalist/archive/2008-02/msg00296.html ; https://www.statalist.org/forums/forum/general-stata-discussion/general/491152-predicted-values-and-residuals-with-by ; https://www.stata.com/support/faqs/d...ach/index.html), but I seem to still be having problems. I tried to use the following code but to no avail:

    Code:
    . keep if COUNTRY == "XXX"
    . xtset ID YEAR
    . statsby, by(ID) saving (y.XXX.a.dta): xtreg SE131 c.SE281##c.SE281, fe
    . merge m:1 ID using "M:\[...]\y.XXX.a.dta"
    . drop _merge
    . egen group = group(ID)
    . gen FIT1 = .
    . su group, meanonly
    . forval g = 1\`r(max)' {
      2. xtreg SE131 SE281, fe if group == `g'
      3. predict temp, residuals
      4. replace FIT1 = temp if group == `g'
      5. drop temp
      6. }
    invalid syntax
    r(198);
    I am not sure what could be the reason behind the error, or if there is a better way to do what I want, but I will appreciate any and all help and advice on the matter.

    Many thanks,

    Guy Low, MSc

  • #2
    The reason your first set of code produced incorrect SSR is that while you ran regressions on each ID separately, the predict command used the results from the final regression to do all the predictions.

    The syntax error in your second set of code is
    Code:
    forval g = 1\`r(max)' {
    which should be
    Code:
    forval g = 1/`r(max)' {
    But with that said, the second set of code can be simplified because generating groups are not necessary in this case.
    Code:
    keep if COUNTRY == "XXX"
    xtset ID YEAR
    statsby, by(ID) saving (y.XXX.a.dta): xtreg SE131 c.SE281##c.SE281, fe
    merge m:1 ID using "M:\[...]\y.XXX.a.dta"
    drop _merge
    
    gen FIT1 = .
    levelsof ID, local(IDlist)
    foreach id of local IDlist {
        xtreg SE131 SE281, fe if ID == `id'
        predict temp, residuals
        replace FIT1 = temp if ID == `id'
        drop temp
    }

    Comment


    • #3
      Code:
       
       xtreg SE131 SE281 if ID == `id', fe
      would be my guess here.

      Comment


      • #4
        Nick found a second error in the original that I had copied into my revised code. The mistaken forval command I found produces
        Code:
        invalid syntax
        r(198);
        Once you get past that, the mistaken xtreg command Nick found produces
        Code:
        option if not allowed
        r(198);

        Comment


        • #5
          Dear Nick and William,

          Many thanks for your patience and speedy responses. I will rectify my code, though I have to admit I am still rather new to Stata.

          Thanks again,

          Guy Low, MSc

          Comment


          • #6
            Let me note another shortfall.

            In post #1 your first code was for quadratic regression, and the same xtreg command squaring SE281 was used in both cases.

            Your second code does the initial xtreg squaring SE281 but the second xtreg does not, and thus it is producing residuals for a different model than the one that produced the coefficient estimates.

            Updating my code from post #2 to correct both problems with the second xtreg command gives the following.
            Code:
            keep if COUNTRY == "XXX"
            xtset ID YEAR
            statsby, by(ID) saving (y.XXX.a.dta): xtreg SE131 c.SE281##c.SE281, fe
            merge m:1 ID using "M:\[...]\y.XXX.a.dta"
            drop _merge
            
            gen FIT1 = .
            levelsof ID, local(IDlist)
            foreach id of local IDlist {
                xtreg SE131 c.SE281##c.SE281if ID == `id', fe
                predict temp, residuals
                replace FIT1 = temp if ID == `id'
                drop temp
            }

            Comment


            • #7
              Dear Stata members
              I have a similar question. In the dataset below(for demo only),
              Code:
              input str1 firm float (cashflow assets sales) int year float industry
              "a" 100 500 300 1991 1
              "a" 125 550 410 1992 1
              "a" 129 550 350 1993 1
              "a" 118 450 216 1994 1
              "a" 96 600 175 1995 1
              "b" 350 1500 600 1991 1
              "b" 560 1675 850 1992 1
              "b" 730 1300 755 1993 1
              "b" 900 1800 1065 1994 1
              "b" 1050 2000 1800 1995 1
              "c"  60 120 155 1991 2
              "c"  -10  120 180 1992 2
              "c"  50 160 168 1993 2
              "c"  200 150 260 1994 2
              "c"  -60 140 200 1995 2
              "d" 155  230 200 1991 2
              "d" 255 398 400 1992 2
              "d" 179 398 268 1993 2
              "d" 196 423 318 1994 2
              "d" 165 300 215 1995 2
              end
              I would like to run a regression with cashflow as my dependent variable and assets as my independent variable based on year and industry and then save the residuals after each regression. For instance, in the data above, I want to run a regression like
              Code:
              reg cashflow assets if year==1991 & industry==1
              and then predict residuals using
              Code:
              predict res if  year==1991 & industry==1, xb
              I also tried to group the industry and year first and then regression as follows
              Code:
              egen group=group( year industry)
              bys group:reg cashflow assets
              However, in this case if predict residuals, then I am getting wrong results as prediction is based on last regression run.
              My question
              1. How to run the above codes in most efficient manner. I know loops can help me but I havent used them so far. Can some on help me to build some readily usable comand that runs the regression with year industry combination, save the resiudals, and then proceed with next combination, save ITS residuals and so on.

              Comment


              • #8
                Clyde Schechter is there a way to tweak and use your code given in https://www.statalist.org/forums/forum/general-stata-discussion/general/1594435-help-required-with-statsby in my case #7 so that I can run regression and store residuals? Sorry for tagging, though I am not sure I think your code can be somehow used in my context

                Comment

                Working...
                X