Hello statalisters,

I've been trying to trying to calculate the h index for a large dataset consisting of scientists. The h index is defined as the maximum value of h such that the given author/journal has published h papers that have each been cited at least h times. The dataset looks somewhat like this:

authorid | year | articleid | citation | hindex | c_hindex | t_hindex |

A | 1990 | 1 | 7 | 5 | 5 | 15 |

A | 1990 | 2 | 5 | 5 | 5 | 15 |

A | 1990 | 3 | 13 | 5 | 5 | 15 |

A | 1990 | 4 | 12 | 5 | 5 | 15 |

A | 1990 | 5 | 17 | 5 | 5 | 15 |

A | 1991 | 6 | 11 | 4 | 7 | 15 |

A | 1991 | 7 | 9 | 4 | 7 | 15 |

A | 1991 | 8 | 19 | 4 | 7 | 15 |

A | 1991 | 9 | 15 | 4 | 7 | 15 |

A | 1992 | 10 | 14 | 3 | 9 | 15 |

A | 1992 | 11 | 4 | 3 | 9 | 15 |

A | 1992 | 12 | 3 | 3 | 9 | 15 |

A | 1992 | 13 | 7 | 3 | 9 | 15 |

A | 1992 | 14 | 5 | 3 | 9 | 15 |

A | 1992 | 15 | 4 | 3 | 9 | 15 |

A | 1992 | 16 | 11 | 3 | 9 | 15 |

A | 1992 | 17 | 17 | 3 | 9 | 15 |

A | 1993 | 18 | 15 | 4 | 15 | |

A | 1993 | 19 | 17 | 4 | 15 | |

A | 1993 | 20 | 18 | 4 | 15 | |

A | 1993 | 21 | 11 | 4 | 15 | |

A | 1994 | 22 | 3 | 15 | ||

A | 1994 | 23 | 15 | 15 | ||

A | 1994 | 24 | 14 | 15 | ||

A | 1994 | 25 | 17 | 15 | ||

A | 1994 | 26 | 13 | 15 | ||

A | 1994 | 27 | 12 | 15 | ||

A | 1994 | 28 | 6 | 15 | ||

A | 1994 | 29 | 15 | 15 | ||

A | 1994 | 30 | 5 | 15 | ||

B | 1990 | 31 | 11 | |||

B | 1991 | 32 | 11 | |||

B | 1991 | 33 | 4 | |||

B | 1991 | 34 | 4 | |||

B | 1991 | 35 | 3 | |||

B | 1992 | 36 | 9 | |||

B | 1992 | 37 | 22 | |||

B | 1992 | 38 | 2 | |||

B | 1992 | 39 | 9 | |||

B | 1992 | 40 | 4 | |||

B | 1992 | 41 | 37 | |||

B | 1992 | 42 | 9 | |||

B | 1992 | 43 | 8 | |||

B | 1992 | 44 | 3 | |||

B | 1993 | 45 | 13 | |||

B | 1993 | 46 | 9 | |||

B | 1993 | 47 | 7 | |||

B | 1993 | 48 | 3 | |||

B | 1993 | 49 | 10 | |||

B | 1993 | 50 | 9 | |||

B | 1994 | 51 | 1 | |||

B | 1994 | 52 | 2 | |||

B | 1994 | 53 | 6 | |||

B | 1994 | 54 | 6 | |||

B | 1994 | 55 | 7 |

*generate h_index for each year, flow

bysort authorid year : egen temp = rank(-citation), unique

bysort authorid year citation : egen rank = max(temp)

by authorid year : egen hindextemp = max(rank) if citation >= rank

bysort authorid year : egen hindex = max(hindextemp)

drop rank temp hindextemp

What I'm having a hard time with is calculating the cumulative h-index of each author-year (c_hindex, column 6). For instance, author A has 7 articles that have been cited at least in 1991, therefore the cumulative h index for A in 1991 is 7.

Could anybody help me up with the command for the cumulative h-index?

Thank you very much in advance!

Hyeonjin