Predicting and saving residuals after running regressions on several sample units

Guy Low

Join Date: Jun 2019

Posts: 11
#1

Predicting and saving residuals after running regressions on several sample units

13 Jul 2019, 08:39

Dear Statalist,

I am running regressions on farm economic data which I have set as panel data - each farm has five years' worth of observations. In this effort, I am trying to determine whether a short-run linear cost function (TC = a + bQ, where TC = total cost, a and b are constant, and Q is the quantity produced) or a short-run quadratic cost function (TC = a + bQ + cQ^2, where idem and c is also a constant) would work best with my data. To do so, I want to compare the sum of squared residuals (SSR) for each farm-level regression. I would like to save the residuals resulting from my regressions as a new variable so that I can then calculate the SSRs and compare the two models. Up to now I have the following code for the quadratic regression, which I will use as an example:

Code:

keep if COUNTRY == "XXX" xtset ID YEAR statsby, by(ID) saving (y.XXX.a.dta): xtreg SE131 c.SE281##c.SE281, fe merge m:1 ID using "M:\[...]\y.XXX.a.dta" drop _merge gen RES1 = . quietly bysort ID: xtreg SE131 c.SE281##c.SE281, fe predict temp, residuals replace RES1 = temp drop temp gen RES1_SQ = RES1^2 bysort ID: egen SSR1 = total (RES1_SQ)

However, I have noticed that the sums of the residuals by farm are far from equal to zero, which has led me to believe that the above code is incorrect: below is an example of this from one farm:

Code:

. list in 1/5 +-------------------------------------------------------+ | _b_SE281 _b_cons RES1 RES1_SQ SSR1 | |-------------------------------------------------------| 1. | .0812744 149415.2 -28628.63 8.20e+08 8.13e+09 | 2. | .0812744 149415.2 3330.112 1.11e+07 8.13e+09 | 3. | .0812744 149415.2 -77954.47 6.08e+09 8.13e+09 | 4. | .0812744 149415.2 -30600 9.36e+08 8.13e+09 | 5. | .0812744 149415.2 -16828.83 2.83e+08 8.13e+09 | +-------------------------------------------------------+

Alternatively, I tried to use code as suggested in previous posts for the same aim (such as: https://www.stata.com/statalist/archive/2008-02/msg00296.html ; https://www.statalist.org/forums/forum/general-stata-discussion/general/491152-predicted-values-and-residuals-with-by ; https://www.stata.com/support/faqs/d...ach/index.html), but I seem to still be having problems. I tried to use the following code but to no avail:

Code:

. keep if COUNTRY == "XXX" . xtset ID YEAR . statsby, by(ID) saving (y.XXX.a.dta): xtreg SE131 c.SE281##c.SE281, fe . merge m:1 ID using "M:\[...]\y.XXX.a.dta" . drop _merge . egen group = group(ID) . gen FIT1 = . . su group, meanonly . forval g = 1\`r(max)' { 2. xtreg SE131 SE281, fe if group == `g' 3. predict temp, residuals 4. replace FIT1 = temp if group == `g' 5. drop temp 6. } invalid syntax r(198);

I am not sure what could be the reason behind the error, or if there is a better way to do what I want, but I will appreciate any and all help and advice on the matter.

Many thanks,

Guy Low, MSc
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

13 Jul 2019, 10:16

The reason your first set of code produced incorrect SSR is that while you ran regressions on each ID separately, the predict command used the results from the final regression to do all the predictions.

The syntax error in your second set of code is

Code:

forval g = 1\`r(max)' {

which should be

Code:

forval g = 1/`r(max)' {

But with that said, the second set of code can be simplified because generating groups are not necessary in this case.

Code:

keep if COUNTRY == "XXX" xtset ID YEAR statsby, by(ID) saving (y.XXX.a.dta): xtreg SE131 c.SE281##c.SE281, fe merge m:1 ID using "M:\[...]\y.XXX.a.dta" drop _merge gen FIT1 = . levelsof ID, local(IDlist) foreach id of local IDlist { xtreg SE131 SE281, fe if ID == `id' predict temp, residuals replace FIT1 = temp if ID == `id' drop temp }
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35493
#3

13 Jul 2019, 11:19

Code:

xtreg SE131 SE281 if ID == `id', fe

would be my guess here.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

13 Jul 2019, 12:38

Nick found a second error in the original that I had copied into my revised code. The mistaken forval command I found produces

Code:

invalid syntax r(198);

Once you get past that, the mistaken xtreg command Nick found produces

Code:

option if not allowed r(198);
Comment
Guy Low

Join Date: Jun 2019

Posts: 11
#5

13 Jul 2019, 12:43

Dear Nick and William,

Many thanks for your patience and speedy responses. I will rectify my code, though I have to admit I am still rather new to Stata.

Thanks again,

Guy Low, MSc
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

13 Jul 2019, 13:06

Let me note another shortfall.

In post #1 your first code was for quadratic regression, and the same xtreg command squaring SE281 was used in both cases.

Your second code does the initial xtreg squaring SE281 but the second xtreg does not, and thus it is producing residuals for a different model than the one that produced the coefficient estimates.

Updating my code from post #2 to correct both problems with the second xtreg command gives the following.

Code:

keep if COUNTRY == "XXX" xtset ID YEAR statsby, by(ID) saving (y.XXX.a.dta): xtreg SE131 c.SE281##c.SE281, fe merge m:1 ID using "M:\[...]\y.XXX.a.dta" drop _merge gen FIT1 = . levelsof ID, local(IDlist) foreach id of local IDlist { xtreg SE131 c.SE281##c.SE281if ID == `id', fe predict temp, residuals replace FIT1 = temp if ID == `id' drop temp }
Comment
lal mohan kumar

Join Date: May 2019

Posts: 265
#7

22 Feb 2021, 00:14

Dear Stata members
I have a similar question. In the dataset below(for demo only),

Code:

input str1 firm float (cashflow assets sales) int year float industry "a" 100 500 300 1991 1 "a" 125 550 410 1992 1 "a" 129 550 350 1993 1 "a" 118 450 216 1994 1 "a" 96 600 175 1995 1 "b" 350 1500 600 1991 1 "b" 560 1675 850 1992 1 "b" 730 1300 755 1993 1 "b" 900 1800 1065 1994 1 "b" 1050 2000 1800 1995 1 "c" 60 120 155 1991 2 "c" -10 120 180 1992 2 "c" 50 160 168 1993 2 "c" 200 150 260 1994 2 "c" -60 140 200 1995 2 "d" 155 230 200 1991 2 "d" 255 398 400 1992 2 "d" 179 398 268 1993 2 "d" 196 423 318 1994 2 "d" 165 300 215 1995 2 end

I would like to run a regression with cashflow as my dependent variable and assets as my independent variable based on year and industry and then save the residuals after each regression. For instance, in the data above, I want to run a regression like

Code:

reg cashflow assets if year==1991 & industry==1

and then predict residuals using

Code:

predict res if year==1991 & industry==1, xb

I also tried to group the industry and year first and then regression as follows

Code:

egen group=group( year industry) bys group:reg cashflow assets

However, in this case if predict residuals, then I am getting wrong results as prediction is based on last regression run.
My question
1. How to run the above codes in most efficient manner. I know loops can help me but I havent used them so far. Can some on help me to build some readily usable comand that runs the regression with year industry combination, save the resiudals, and then proceed with next combination, save ITS residuals and so on.
Comment
lal mohan kumar

Join Date: May 2019

Posts: 265
#8

23 Feb 2021, 02:03

Clyde Schechter is there a way to tweak and use your code given in https://www.statalist.org/forums/forum/general-stata-discussion/general/1594435-help-required-with-statsby in my case #7 so that I can run regression and store residuals? Sorry for tagging, though I am not sure I think your code can be somehow used in my context
Comment

Announcement

Predicting and saving residuals after running regressions on several sample units

Comment

Comment

Comment

Comment

Comment

Comment

Comment