Trouble saving predicted values and residuals

michael joe

Join Date: May 2015

Posts: 50
#1

Trouble saving predicted values and residuals

14 Jul 2016, 15:05

I am stuck on the below code, which was posted before and was successful for the OP but I can't seem to obtain the same success. The code seems to run through very quickly while creating no values. I tried playing around a bit, such as removing the sic_year_numerosity restriction, but I still get the same end result of no values being created. I am not sure whats happening.

gen y_hat=. // empty variable for predictions
gen y_res=. // empty variable for residuals
tempvar acc_tot_fitted acc_tot_res // temporary variables for each set of predictions
levelsof sic_2_digit, local(levels)
foreach x of local levels {
foreach z of numlist 1999/2014 {
capture reg y x1 x2 x3 if sic_2_digit==`x' & year==`z' & sic_year_numerosity>9
if !_rc {
predict `y_hat' // predictions are now in temporary variable
replace y_hat=`y_hat' if e(sample) // transfer predictions from temp variable
predict `y_res', residuals // residuals are now in temporary variable
replace y_res=`y_res' if e(sample) // transfer residuals from temp variable
drop `y_hat' `acc_tot_res' // drop temporary variables in preparation for next regression
}
}
}
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30061
#2

14 Jul 2016, 15:42

First, next time please post your code in a code block so that things like indentation wll be better preserved. As it is, it's a bit jumbled and hard to read.

I see a couple of things. First, in the code shown, you never declare tempvar y_hat, nor y_res. So you should be getting a syntax error at -predict `yhat'-. But perhaps that happens elsewhere in the code that you did not show us?

If you have declared tempvars y_hat and y_res, then I suspect your problem is coming from the regression. Because you have it -capture-d, you get no warning if something is going wrong there, nor any idea what. And I suspect that's what is happening here. Take out the -capture- and Stata will complain about whatever the problem is. In these situations it often turns out that there are combinations of sic2_code and year for which there are either no observations at all, or for which the number of observations is too small for a regression to be done. Then you can explore whether that situation means there is a problem in your data set that needs fixing, or if the data should indeed have those gaps and you need to revise the code to work around this situation.
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

14 Jul 2016, 18:28

Following up on on Clyde's advice, to increase the likelihood that Statalist readers will be able to assist you instead of passing over difficult-to-understand presentation, please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. In particular, FAQ #12 describes the use dataex and CODE delimiters when posting to Statalist.
Comment
michael joe

Join Date: May 2015

Posts: 50
#4

15 Jul 2016, 00:25

Thanks, clyde. It turns out the problem was not due to the lack of observations but rather the fact that some of the x variables I was using in the regressions actually used _n within them (ex: ocf[_n-1]); I was getting the error message "_n unknown weight type"). After naming the variables outside the loop (ex: gen ocf1=ocf[_n-1]) and putting them back into the loop with the new names, I was able to obtain values. Thanks for the help.
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#5

15 Jul 2016, 04:19

Originally posted by michael joe View Post

Thanks, clyde. It turns out the problem was not due to the lack of observations but rather the fact that some of the x variables I was using in the regressions actually used _n within them (ex: ocf[_n-1]); I was getting the error message "_n unknown weight type"). After naming the variables outside the loop (ex: gen ocf1=ocf[_n-1]) and putting them back into the loop with the new names, I was able to obtain values. Thanks for the help.

Just a general word of warning - are you using panel data? In that case you should not use [_n-1] to create lagged values. The problem is that for the 2nd panel unit, observation [_n-1] is the last observation from the previous panel unit.

Either xtset your data and then use the lag operator ocf1 = L.ocf1 or place your generate command after a by: statement, e.g. by panelunit (timevar): gen ocf1 = ocf[_n-1].
3 likes
Comment
michael joe

Join Date: May 2015

Posts: 50
#6

15 Jul 2016, 14:44

Here is my updated code:

Code:

sort firm year by firm: gen ocf1=ocf[_n-1] by firm: gen ocf2=ocf by firm: gen ocf3=ocf[_n+1] gen ismissing = missing(tca, ocf1, ocf2, ocf3, change_in_revenues, ppe) bysort industry year: egen sic_year_numerosity = total(ismissing== 0) gen y_hat=. gen y_res=. tempvar y_hat y_res levelsof industry, local(levels) foreach x of local levels { foreach z of numlist 2000/2001 { xtset firm year capture xtreg tca L.ocf ocf F.ocf change_in_revenues ppe if industry==`x' & year==`z' & sic_year_numerosity>9 if !_rc { predict `y_hat' replace y_hat=`y_hat' if e(sample) predict `y_res', residuals replace y_res=`y_res' if e(sample) drop `y_hat' `y_res' } } }

Now, I am getting the error "insufficient observations" when I take out capture. I have over 4,500 unique firm year observations and many unique industry year observations as well. I'm not exactly sure what I am doing wrong as I am trying to replicate a prior study, for which the observation count is similar to my replication. I basically want to create a cross sectional regression where all firms are pooled in the same year within each industry group that have at least 9 firms in each industry classification. I thought this loop would do that but perhaps I am wrong.

Last edited by michael joe; 15 Jul 2016, 14:47.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

15 Jul 2016, 16:28

The problem, I think, is that the if clause on your xtreg does not prevent the xtreg from running, it just restricts the observations used. So in an industry/year where sic_year_numerosity<=9, no observations will be selected as xtreg attempst to run. I've tried my hand at repairing the logic below; no guarantees, though!

Code:

sort firm year
 
by firm (year): gen ocf1=ocf[_n-1]
by firm (year): gen ocf2=ocf
by firm (year): gen ocf3=ocf[_n+1]
 gen ismissing = missing(tca, ocf1, ocf2, ocf3, change_in_revenues, ppe)

xtset firm year
 
gen y_hat=.
gen y_res=.
tempvar y_hat y_res
levelsof industry, local(levels)
foreach x of local levels {
  foreach z of numlist 2000/2001 {
    count if industry==`x' & year==`z' & ismissing==0
    if `r(N)'>9 {
      capture xtreg tca L.ocf ocf F.ocf change_in_revenues ppe if industry==`x' & year==`z'
      if !_rc {
        predict `y_hat'
        replace y_hat=`y_hat' if e(sample)
        predict `y_res', residuals
        replace y_res=`y_res' if e(sample)
        drop `y_hat' `y_res'
      }
    }
  }
}

Comment

michael joe

Join Date: May 2015

Posts: 50
#8

16 Jul 2016, 15:34

Thanks, William. Still not working for me. However, the lacing the generate command after a by recommendation seems to be working for me. Any idea why this way produces results but the other gives me some observation error?
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#9

16 Jul 2016, 16:41

However, the lacing the generate command after a by recommendation seems to be working for me.

It would be more helpful were you to copy the code that works (the entire code, not just the changed line) and paste it into a code block.
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#10

17 Jul 2016, 08:27

Originally posted by William Lisowski View Post

The problem, I think, is that the if clause on your xtreg does not prevent the xtreg from running, it just restricts the observations used. So in an industry/year where sic_year_numerosity<=9, no observations will be selected as xtreg attempst to run. I've tried my hand at repairing the logic below; no guarantees, though!

Code:

sort firm year gen ismissing = missing(tca, ocf1, ocf2, ocf3, change_in_revenues, ppe)

Today I learned you can place multiple variables in a missing() function. Would be nice if "help missing##functions" included a link to help missing()8
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#11

17 Jul 2016, 09:00

Would be nice if "help missing##functions" included a link to help missing()

Agreed! While missing() is mentioned at the start of the help missing file, it would be better if the name were a clickable link to the complete documentation, just like the command names in the list of useful commands further down in the file. The description in help missing hints that missing() can accept more than one argument, but it is possible to read it as meaning something different.

Last edited by William Lisowski; 17 Jul 2016, 09:29.
1 like
Comment

Announcement

Trouble saving predicted values and residuals

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment