Estimating discretionary accruals using the modified Jones (1991)

Ally Zimmerman

Join Date: Oct 2014

Posts: 13
#16

15 Oct 2014, 12:53

Thank you very much Nick - I am learning a lot through the process of revising this code.
Comment
mike sam

Join Date: Nov 2014

Posts: 2
#17

15 Nov 2014, 15:14

Originally posted by Nick Cox View Post

Now that Clyde has worked to get the code in very decent form, further small improvements can be suggested:

Code:

gen sic_2 = real(substr(sic,1,2)) xtset sic_2 fyear gen uhat = . gen ta = (ib - oancf)/L.at gen x1 = 1/L.at gen x2 = (d.revt - d.rect)/L.at gen x3 = ppegt/L.at forvalues j = 1/`=_N' { capture noisily { reg ta x1 x2 x3 if sic_2 == sic_2[`j'] & fyear == fyear[`j'] & _n != `j', nocons if e(N) >= 10 { predict uhat_2 in `j', resid replace uhat = uhat_2 in `j' drop uhat_2 } } }

Commentary:

1. The call to destring can be cut. It's more direct just to use the function real(). The extra flexibility and security of destring is not needed here at all.

2. count was used to get the number of observations, but Stata already knows that as _N.

3. We want to use residuals if and only if they are for a regression with at least 10 observations. That decision can be made once and need not be repeated for every observation, including those not in the regression at all. Hence, replace the if qualifier with an if command.

4. predict with in in principle allows Stata to do the calculation just once where it's needed.

5. With a regression this simple, you could also sidestep predict. Instead of

Code:

predict uhat_2 in `j', resid replace uhat = uhat_2 in `j' drop uhat_2

you could just go

Code:

replace uhat = ta - (_b[x1] * x1 + _b[x2] * x2 + _b[x3] * x3) in `j'

Warning: None of this is tested.

Naturally, it is supremely rational to spend 10 minutes explaining how to save a few seconds in computation, but some of these issues arise much more widely.

Hello everyone! Thank you for the great posts. I just have one question. The script seems that it should be working but when the .do file reaches the xtset command it gives me the following message:

xtset sic_2 fyear
repeated time values within panel
r(451);

What do I do wrong?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35725
#18

15 Nov 2014, 16:53

We can't see your data, but see e.g. http://www.stata.com/support/faqs/da...d-time-values/

If you have (say) several firms with the same SIC2 code, SIC2 will not serve as a panel identifier.
Comment
mike sam

Join Date: Nov 2014

Posts: 2
#19

15 Nov 2014, 17:49

Yes. Thank you very much. Once more you are absolutely right. This is the case for me. Plenty of firms might belong to the same industry (SIC). The gvkey would be a viable choice for panel id but the problem is that I cannot think of a way to adjust the script to this, running at the same time industry - year regressions.

Last edited by mike sam; 15 Nov 2014, 17:52.
Comment
Robson Glasscock

Join Date: Apr 2014

Posts: 25
#20

17 Nov 2014, 11:05

Mike,
As long as you have already changed the gvkey to non-string format in your data, all you have to do is modify the -xtset- statement from:

Code:

xtset sic_2 fyear

to

Code:

xtset gvkey fyear

This allows ta- x3 to be generated at the firm level using the appropriate lag values, but the loop itself for the industry-year regression shouldn't be impacted. It will still run based on the observation number in the dataset.
Comment
Ali Ahmed

Join Date: Mar 2015

Posts: 26
#21

08 Apr 2015, 19:52

Originally posted by Clyde Schechter View Post

Well, 7,000 observations is not that large a data set, and regressions on 7,000 observations don't take very long. But you have a "combinatorial explosion" on your hands. Looking at your earlier output you have about 60 values of sic_2, 10 values of year, and 6,581 values of obs, for a total of just under 4,000,000 runs through the loops. But here's the thing: with 7,000 observations, nearly all of those 4,000,000 combinations will never even occur in your data. Now, because you loop over obs to exclude individual observations, I don't see an easy way to take that out. But you can certainly replace the double looping on sic_2 and year with a single loop on the combinations of it that actually exist.

Code:

egen combo = group(sic_2 year) summarize combo forvalues k = 1/`r(max)' { forvalues j= `=scalar(e)’/`=scalar(f)’ { capture noisily reg ta x1 x2 x3 if combo == `k' & obs != `j', nocons capture noisily predict uhat_2, resid capture noisily replace uhat_2=. if e(N) < 10 capture noisily replace uhat= uhat_2 if combo == `k' & obs == `j' capture noisily drop uhat_2 display `k', `j' } }

This should speed things up considerably.

. Hi Clyde, i also want to calculate the regression by year and industry (same as Ally looking for), in my case data also consist of 7 years and sic industry sectors. In my data sic is
> a string variable and data is unbalanced panel data. First i tried the code mentioned by you Post# 10, it give me syntax error. i used the following code:

egen combo = group(sic year)
summarize combo
forvalues k = 1/`r(max)' {
forvalues j= `=scalar(e)’/`=scalar(f)’ {
if combo[`j'] == `k' {
capture noisily reg y x1 x2 x3 if combo == `k' & obs != `j', nocons
capture noisily predict uhat_2, resid
capture noisily replace uhat_2=. if e(N) < 10
capture noisily replace uhat= uhat_2 if combo == `k' & obs == `j'
capture noisily drop uhat_2
}
display `k', `j'
}
}

invalid syntax
r(198);

Then I also tried the following code, I got from the forum:

egen combo = group (sic year )
su combo, meanonly
gen fitted1 =.
forval g = 1/`r(max)` {
regress y x1 x2 x3 if combo ==`g'
predict work
replace fitted1 = work if combo == `g'
drop work
}
invalid syntax
r(198);
gen residual = y - fitted1

In both cases I got syntax error. Can you please guide me where I am doing mistake, further can you please clarify that if both codes are doing same work or not. I want the same output one Ally looking for.Thanks
Comment
Ali Ahmed

Join Date: Mar 2015

Posts: 26
#22

08 Apr 2015, 20:30

Originally posted by Clyde Schechter View Post

Well, 7,000 observations is not that large a data set, and regressions on 7,000 observations don't take very long. But you have a "combinatorial explosion" on your hands. Looking at your earlier output you have about 60 values of sic_2, 10 values of year, and 6,581 values of obs, for a total of just under 4,000,000 runs through the loops. But here's the thing: with 7,000 observations, nearly all of those 4,000,000 combinations will never even occur in your data. Now, because you loop over obs to exclude individual observations, I don't see an easy way to take that out. But you can certainly replace the double looping on sic_2 and year with a single loop on the combinations of it that actually exist.

Code:

egen combo = group(sic_2 year) summarize combo forvalues k = 1/`r(max)' { forvalues j= `=scalar(e)’/`=scalar(f)’ { capture noisily reg ta x1 x2 x3 if combo == `k' & obs != `j', nocons capture noisily predict uhat_2, resid capture noisily replace uhat_2=. if e(N) < 10 capture noisily replace uhat= uhat_2 if combo == `k' & obs == `j' capture noisily drop uhat_2 display `k', `j' } }

This should speed things up considerably.

. Hi Clyde, i also want to calculate the regression by year and industry (same as Ally looking for), in my case data also consist of 7 years and sic industry sectors. In my data sic is
> a string variable and data is unbalanced panel data. First i tried the code mentioned by you Post# 10, it give me syntax error. i used the following code:

egen combo = group(sic year)
summarize combo
forvalues k = 1/`r(max)' {
forvalues j= `=scalar(e)’/`=scalar(f)’ {
if combo[`j'] == `k' {
capture noisily reg y x1 x2 x3 if combo == `k' & obs != `j', nocons
capture noisily predict uhat_2, resid
capture noisily replace uhat_2=. if e(N) < 10
capture noisily replace uhat= uhat_2 if combo == `k' & obs == `j'
capture noisily drop uhat_2
}
display `k', `j'
}
}

invalid syntax
r(198);

Then I also tried the following code, I got from the forum:

egen combo = group (sic year )
su combo, meanonly
gen fitted1 =.
forval g = 1/`r(max)` {
regress y x1 x2 x3 if combo ==`g'
predict work
replace fitted1 = work if combo == `g'
drop work
}
invalid syntax
r(198);
gen residual = y - fitted1

In both cases I got syntax error. Can you please guide me where I am doing mistake, further can you please clarify that if both codes are doing same work or not. I want the same output one Ally looking for.Thanks
Comment
Ali Ahmed

Join Date: Mar 2015

Posts: 26
#23

10 Apr 2015, 19:40

Now I am able to solve it. Thanks a lot!
Comment
Mariska van Elderen

Join Date: Apr 2015

Posts: 1
#24

19 Apr 2015, 11:42

Hi all,

I'm having a problem and I hope you can help me. I'm doing a research about the effect of audit tenure and auditor industry specialization on audit quality. I have 1058 observations, 114 Dutch firms. I need to calculate the discretionary accruals and I am using STATA to do this. Now I've tried this code:
gen sic2= substr(sic,1,2)
destring sic2, replace
egen combo= group(sic2 FYEAR)
levelsof combo, local(a)
gen uhat=.
xtset gvkey FYEAR
gen obs= [_n]
summ obs
scalar e= r(min)
scalar f= r(max)
gen TA= (NI-CFO)/lagAT
gen x1= 1/lagAT
gen x2= (dREV-dREC)/lagAT
gen x3= PPE/lagAT

foreach i in `a’ {
foreach x in `b’ {
forvalues j= `=scalar(e)’/`=scalar(f)’ {
capture noisily reg TA x1 x2 x3 if sic2==`i’ & FYEAR==`x’ & obs != `j’, nocons
capture noisily predict uhat_2, resid
capture noisily replace uhat_2=. if e(N) < 10
capture noisily replace uhat= uhat_2 if sic2==`i' & FYEAR==`x' & obs== `j'
capture noisily drop uhat_2
di `i'
di `x'
di `j'
}
}
}

And than I got the error: invalid syntax r(198).

Can you please help me? Because I really don't know what I am doing wrong?
Thank you in advance.
Comment
Claire Cui

Join Date: May 2015

Posts: 9
#25

10 Sep 2015, 01:01

Hi all,
Do you think the following code can work a little bit fast since it runs fewer regression? Let me know if you find any error in the code.
gen da=.
gen ta= (ib-oancf)/at_lag1
gen x1= 1/at_lag1
gen x2= (sales-sales_lag1-rect+rect_lag1)/at_lag1
gen x3= ppegt/at_lag1
egen id=group(sic2digit fyear )
bys id: egen count=count(fyear)
gen less20=1 if count<20
replace less20=0 if less20==.
forvalues i=1/1699 {
capture noisily{
reg ta x1 x2 x3 if id==`i' & less20==0
predict p if id==`i' & less20==0
replace da=ta-p if id==`i' & less20==0
drop p
}
}
Comment
Christian Mueller

Join Date: May 2016

Posts: 27
#26

15 Jun 2016, 09:28

Hello,
I have got a question. If I ran the following code on stata to estimate the discretionary accruals:
egen combo = group(sic_2 year) summarize combo forvalues k = 1/`r(max)' { forvalues j= `=scalar(e)’/`=scalar(f)’ { if combo[`j'] == `k' { capture noisily reg ta x1 x2 x3 if combo == `k' & obs != `j', nocons capture noisily predict uhat_2, resid capture noisily replace uhat_2=. if e(N) < 10 capture noisily replace uhat= uhat_2 if combo == `k' & obs == `j' capture noisily drop uhat_2 } display `k', `j' } } But I always get the following message:

no observations
last estimates not found
variable uhat_2 not found
uhat_2 not found
variable uhat_2 not found

Can anyone help me?

Thanks!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#27

15 Jun 2016, 09:37

Your code appears as an unreadable jumble. Please read FAQ #12 to get a better understanding of how to post information in the most usable form. Then try again.
Comment
Christian Mueller

Join Date: May 2016

Posts: 27
#28

15 Jun 2016, 09:38

For a better reading:
egen combo = group(sic_2 year)
summarize combo
forvalues k = 1/`r(max)' {
forvalues j= `=scalar(e)’/`=scalar(f)’ {
if combo[`j'] == `k' {
capture noisily reg ta x1 x2 x3 if combo == `k' & obs != `j', nocons
capture noisily predict uhat_2, resid
capture noisily replace uhat_2=. if e(N) < 10
capture noisily replace uhat= uhat_2 if combo == `k' & obs == `j'
capture noisily drop uhat_2 }
display `k', `j
}
}

Stata error messsage:

no observations
last estimates not found
variable uhat_2 not found
uhat_2 not found
variable uhat_2 not found

Thank you for your help!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#29

15 Jun 2016, 09:49

That's much better. A code block would have been better still. Next time.

The first error message is coming from your -capture noisily reg...- command. It means that there is some combination of `k' and `j' for which there are no usable observations with combo == `k' and obs != `j'. Remember that an observation is only usable if it has no missing values for any of the variables in the regression model. The first step is probably to find out which values of `k' and `j' are causing this. You can do that by putting -display `k', `j'- before your regression command. Then you can see what's going on with -list ta x1 x2 x3 if combo ==that_value_of_`k' & obs ! = that_value_of_`j'-. The output will either be empty because there are no such observations, or you will be able to see that each such observation has missing values for one of the variables. Then you will have to figure out if this represents an error in your data to fix, or is an expected situation, in which case you can just ignore it.

The other error message are all cascading from the same event. Because the -reg- command failed, there are no estimates to use in the -predict- command. Because the -predict- command failed, there is no variable uhat_2 to do anything with. But these other error messages will all go away when you fix the first problem.
Comment
Mahmud Hossain

Join Date: Sep 2018

Posts: 1
#30

02 Sep 2018, 10:08

Hi all:
I'm new in stata. Was trying to estimate discretionary accrual, and was using user posted command. Was getting error message.
The full message is posted below!
Any help would be simply great!!
Regards,

Mahmud

gen sic_2= substr(sic,1,2)

. destring sic_2, replace
sic_2: all characters numeric; replaced as byte

.
. egen combo= group(sic_2 fyear)
(575 missing values generated)

gen uhat=.
(254,697 missing values generated)

.
end of do-file

. do "C:\Users\mhossain\AppData\Local\Temp\STD784_00000 0.tmp"

. xtset gvkey fyear
panel variable: gvkey (unbalanced)
time variable: fyear, 1995 to 2018, but with gaps
delta: 1 unit

.
. gen obs= [_n]

. summ obs

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
obs | 254,697 127349 73524.84 1 254697

. scalar e= r(min)

. scalar f= r(max)

.
. gen ta= (ib-oancf)/L.at
(81,677 missing values generated)

. gen x1= 1/L.at
(65,104 missing values generated)

. gen x2= (d.revt – d.rect)/L.at
d: operator invalid
r(198);

end of do-file

r(198);
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment