Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating Cumulative H-index with Stata?


    Hello statalisters,

    I've been trying to trying to calculate the h index for a large dataset consisting of scientists. The h index is defined as the maximum value of h such that the given author/journal has published h papers that have each been cited at least h times. The dataset looks somewhat like this:
    authorid year articleid citation hindex c_hindex t_hindex
    A 1990 1 7 5 5 15
    A 1990 2 5 5 5 15
    A 1990 3 13 5 5 15
    A 1990 4 12 5 5 15
    A 1990 5 17 5 5 15
    A 1991 6 11 4 7 15
    A 1991 7 9 4 7 15
    A 1991 8 19 4 7 15
    A 1991 9 15 4 7 15
    A 1992 10 14 3 9 15
    A 1992 11 4 3 9 15
    A 1992 12 3 3 9 15
    A 1992 13 7 3 9 15
    A 1992 14 5 3 9 15
    A 1992 15 4 3 9 15
    A 1992 16 11 3 9 15
    A 1992 17 17 3 9 15
    A 1993 18 15 4 15
    A 1993 19 17 4 15
    A 1993 20 18 4 15
    A 1993 21 11 4 15
    A 1994 22 3 15
    A 1994 23 15 15
    A 1994 24 14 15
    A 1994 25 17 15
    A 1994 26 13 15
    A 1994 27 12 15
    A 1994 28 6 15
    A 1994 29 15 15
    A 1994 30 5 15
    B 1990 31 11
    B 1991 32 11
    B 1991 33 4
    B 1991 34 4
    B 1991 35 3
    B 1992 36 9
    B 1992 37 22
    B 1992 38 2
    B 1992 39 9
    B 1992 40 4
    B 1992 41 37
    B 1992 42 9
    B 1992 43 8
    B 1992 44 3
    B 1993 45 13
    B 1993 46 9
    B 1993 47 7
    B 1993 48 3
    B 1993 49 10
    B 1993 50 9
    B 1994 51 1
    B 1994 52 2
    B 1994 53 6
    B 1994 54 6
    B 1994 55 7
    With a little bit of help from the stata forum (https://www.stata.com/statalist/arch.../msg00625.html), I could calculate the h-index of each author-year (hindex, column 5) using the following command:

    *generate h_index for each year, flow

    bysort authorid year : egen temp = rank(-citation), unique
    bysort authorid year citation : egen rank = max(temp)
    by authorid year : egen hindextemp = max(rank) if citation >= rank
    bysort authorid year : egen hindex = max(hindextemp)
    drop rank temp hindextemp


    What I'm having a hard time with is calculating the cumulative h-index of each author-year (c_hindex, column 6). For instance, author A has 7 articles that have been cited at least in 1991, therefore the cumulative h index for A in 1991 is 7.

    Could anybody help me up with the command for the cumulative h-index?

    Thank you very much in advance!

    Hyeonjin
    Last edited by Hyeonjin Cha; 14 Feb 2020, 18:33.
Working...
X