Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating kurtosis as a new variable

    Hello all,

    Thank you very much in advance for your help.

    I am trying to generate kurtosis as a new variable. In my dataset, groups are identified by "mov_id," year is identified by k_round, and the value for each year is k_year_sum.
    You see there is a -1 in the year (i.e., k_round) because there are cases (i.e., k_year_sum) happened before a release of the product.

    I need to calculate kurtosis for each group because I want to compare the pattern across groups.

    Question 1: I tried using two codes (with different "intervals") to generate kurtosis as a new variable and the values are different. (I am sorry it's stupid but) why is that? Which one should I use?

    The codes:
    (a)

    Code:
    rangestat (kurtosis) k_year_sum, interval(k_round -1 8) by(mov_id)
    or (b)

    Code:
    rangestat (kurtosis) k_year_sum, interval(count_year 1 10) by(mov_id)

    Question 2: With both of these codes, the result generated in each row is different and I don't understand why. I thought I could have just one value for the entire group id (i.e., mov_id).

    Question 3: I also tried using the "sum, detail" command to see the kurtosis value, and yet the value is once again different.

    The code I used:

    Code:
    bysort mov_id: sum k_year_sum, detail

    Part of my data looks like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(mov_id k_round count_year k_year_sum)
    517 -1  1  92
    517  0  2 133
    517  1  3   6
    517  2  4   8
    517  3  5   1
    517  4  6   4
    517  5  7   3
    517  6  8   5
    517  7  9   7
    517  8 10   6
    521 -1  1  69
    521  0  2 150
    521  1  3   6
    521  2  4   1
    521  3  5   4
    521  4  6   2
    521  5  7   6
    521  6  8   8
    521  7  9  13
    521  8 10   3
    729 -1  1  59
    729  0  2 157
    729  1  3   9
    729  2  4   6
    729  3  5  44
    729  4  6   1
    729  5  7   6
    729  6  8  20
    729  7  9   8
    729  8 10   7
    757 -1  1 192
    757  0  2 114
    757  1  3   4
    757  2  4   6
    757  3  5  19
    757  4  6   1
    757  5  7   9
    757  6  8  12
    757  7  9   8
    757  8 10   2
    end

    Thank you very much for helping. Really appreciate it.





  • #2
    In the option of rangestat (from SSC, as you are asked to explain)

    interval(varname #1 #2)

    #1 and #2 are offsets, not literal values. So you are asking tor the interval

    [current value of varname MINUS #1, current value of varname PLUS #2]

    which may well differ according to varname.

    I think you need just

    Code:
     
     rangestat (kurtosis) k_year_sum, interval(mov_id 0 0)
    which is rangestat's slightly quirky way to implement identifier-wise calculations.

    See also https://journals.sagepub.com/doi/pdf...867X1001000311 for possibly surprising limits on kurtosis from small samples.

    L-moments have less bizarre sampling behaviour.

    Comment


    • #3
      Thank you very much indeed, Nick! The code works perfectly!!

      Comment

      Working...
      X