standardized difference (SD) Burgstahler & Dichev

Vaggelis Ktenas

Join Date: Nov 2018

Posts: 16
#1

standardized difference (SD) Burgstahler & Dichev

22 Jan 2019, 06:47

Dear all,

I want to compute the standardized difference statistic using histograms for panel data so as to find which companies are below the thresholds. My problem is that in SD statistic proposed by the Burgstahler & Dichev the numerator has two independent components. I have panel data and i don't want to use them by year (at least a period of 2 years) so the assumption doesn't not exist as data in each bin of histogram depend on the adjacent ones. Is it right to drop companies that the difference in earnings between 2 years is less or equal with the width of bin? The optimal width (h) of the histogram depends on total sample (N) and the variance (s) but my second problem is that if i drop some observations then the h changes too..

Thank you in advance
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

23 Jan 2019, 10:38

You didn't get a quick answer. You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. In your case, remember that we are mostly not from your area. Referring to Burgstahler & Dichev without a full citation means even if people wanted to read a paper to help you, they couldn't be sure they had the right paper. How histograms fit with computing something is not clear either - histograms are for display not computation usually. It is better to explain your problem fully rather than assume we'll read a paper to help you.

You seem to have a mix of issues. You start by talking about two independent components but then don't tell us what they are. Then you move to panel data issues. How you handle panel data issues will depend heavily on the substance of your problem (which you know but we don't). I suggest you start over and try to post your problem in a way that we might be able to help you without reading a paper.
1 like
Comment
Vaggelis Ktenas

Join Date: Nov 2018

Posts: 16
#3

23 Jan 2019, 12:38

Thank you for your answer Mr. Bromiley.

I actually use panel data (13 years & about 20000 companies). I use the commands in stata as below, for example in 2003:

Code:

histogram totalassets if year==2003, width(300000) frequency color(gs11) lwidth(thick) addlabels

I have to find companies that manage to avoid to be audited (earnings management), using the histogram of assets (accumulated assets just below a specific limit, discontinuity in the graph).I used the standardized difference which is equal with the fraction with numerator: n_i- [(n_i-1 + n_i+1)/2] and denominator: square(3/2*n_i), where n_iis the frequence in a bin and n_i-1 + n_i+1are the adjacent ones to n_i.

Code:

twoway__histogram_gen booksales if year==2003, width(300000) frequency gen(h x) gen numer1 = (h - 0.5*(h[_n - 1] + h[_n + 1])) gen numer2 = (h - 0.5*(h[_n - 2] + h[_n + 2])) gen numer3 = (h - 0.5*(h[_n - 3] + h[_n + 3])) gen dnumer = sqrt((3/2)*h) gen SD1 = num1/denom gen SD2 = num2/denom gen SD3 = num3/denom

One basic assumption is that the components (n_i-1 , n_i+1and n_i) are independent. If i use the statistic for only one year, the assumption exists but if i want it for two years the same company is probably in more than one bin in the histogram, so i don't have the Independence any more.

So, i dropped companies that have d.totalassets less or equal to width of histogram:

Code:

bysort id: gen dtotalassets=d.totalassets/ histogram totalassets if (year>=2003 | year<=2004) & dtotalassets<=354000, width(354000) frequency color(gs11) lwidth(thick) addlabels

But, in that way the width that depends on the number of observations(N) and the variance (s) : width=(N)^-1/5*s*1.06 , is another number after dropping observations, so i guess that is a wrong way. What should i do to have dependent components?

Thank you very match
Comment

Announcement

standardized difference (SD) Burgstahler & Dichev

Comment

Comment