Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Statistical Significance using Burgstahler & Dichev (1997)

    Hi
    I want to apply the methodology of Burgstahler & Dichev (1997) to calculate the statistical significance. It reads as under:

    "statistical significance of the hypotheses is tested by using a method similar to that used in Burgstahler and Dichev (1997). Specifically, under the null hypothesis of no deviations from a normal distribution, the cross sectional distribution of earnings changes and earnings levels is relatively smooth. We present the frequency distribution of the variables across intervals of Et or Change in Et and test that the distribution is smooth using standardised differences. This standardised difference is the difference between the actual number of observations and the expected number of observations within an interval divided by the estimated standard deviation of the difference. The expected number of observations in an interval is equal to the average of the two adjacent intervals. This standardised difference will approximate a normal distribution with mean 0 and standard deviation 1. In order to test our hypotheses, we examine the intervals around the zero benchmark and test whether these intervals have a statistically significant positive (actual observations greater than what is expected) or negative (actual observations lower than expected) standardised difference"

    I'm at a loss for applying this technique in Stata. Your help would be highly appreciated. My data is as under:

    Et

    23
    24
    22
    22
    30
    12
    19
    18
    22
    29
    31
    10
    11
    14
    17
    6
    4
    17
    11
    10

  • #2
    You didn't get a quick answer. You'll increase your chances of a helpful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. If you can give a simpler explanation of what you're trying to do, it might help.

    It seems to me you have to start with a set of intervals. Then you want to count the number of observations in each interval. And then you're testing whether the number in a middle interval is equal to the mean of the two intervals beside. There are many ways to do this and it depends a bit on how you want the data to look afterwards. I'm not clear on exactly over what subsample the standard deviation is calculated.

    Suppose you want to set intervals to be 0-10, 11-20, and greater than 20.

    recode var1 (0/10=1) (11/20=2) (21/40=3), generate(interval)
    su var1 if interval==1
    local mean1=r(mean)
    su var1 if interval==2
    local mean2=r(mean)
    su var1 if interval==3
    local mean3=r(mean)

    local diff= `mean2' - (`mean1'+ `mean3')/2
    di "`diff'"

    I don't know what standard deviation you want, but you can always get it from a summary statement.

    If you want to keep the number of observations and use that number of observations, you can generate a new variable instead the locals with the means. This is more efficiently done with egen.

    Alternatively, if you did
    recode var1 (0/10=1) (11/20=2) (21/40=1), generate(interval)
    ttest var1, by(interval)


    I think you would have something like your test.

    Comment


    • #3
      Thank you so much Phil. Helped me a lot. I shall post this question with your suggested modifications. Can you tell me when we say interval, is it the width size of a histogram that we get with the width command?
      Thanks again.

      Comment


      • #4
        Sorry, there was a typo, i meant to say the "hist" command, not "width".

        Comment

        Working...
        X