Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating citation half-life

    Dear all,

    Assume that you have the following problem. You have patent citations for firms in different years. The questions is to calculate the half-life of citations. That is, how many years will be needed until a firm cumulatively gets 50% of the citations from a specific year. I have prepared the following simplistic example:

    Code:
     input id    year    value    half_value    time_until_half
    1    1990    50    25    3
    1    1991    2    1    1
    1    1992    3    1.5    1
    1    1993    25    12.5    1
    1    1994    15    7.5    3
    1    1995    5    2.5    3
    1    1996    1    0.5    1
    1    1997    1    0.5    1
    1    1998    3    .    .
    2    1990    8    4    2
    2    1991    3    1.5    2
    2    1992    1    0.5    1
    2    1993    2    1    1
    2    1994    12    6    4
    2    1995    1    0.5    1
    2    1996    2    1    1
    2    1997    1    0.5    1
    2    1998    3    .    .
    end
    In this example, we have two firms "1" and "2". Firm "1" starts with 50 citations in year 1990. Based on the citation data for the following years, one can find that 3 years will be needed until the half-value (25) is reached (by summing citations from the following years). In the same manner we can calculate half-life for the other citations. For example, in year 1991 firm "1" has 2 citation (half-value is 1), so it just needs one year to reach that value (in year 1993 it has 3 citations).

    How would one calculate half-life in the way described above in Stata?

    Thanks a lot
    Last edited by Pantelis Kazakis; 09 Aug 2019, 11:27.

  • #2
    Well, your worked example does not seem consistent with your description in words. Let's look at id 2 year 1990. The value is 8, so the half value is 4. Only in 1994 does value rise above 4, so the time elapsed is 4 years, but you show 2 as your answer. Similarly id 1 with year 1995 has a value of 5. So the half value is 2.5. You show the time until half as 1, but in 1996, value is only 1, which is not above 2.5. I find that the value first exceeds 2.5 in 1998 (value = 3) which makes the time 3 years.

    Assuming that these discrepancies are errors in your calculation and that your word description is what you want, you can use the following code:

    Code:
    isid id year, sort
    preserve
    keep id year value
    tempfile copy
    save `copy'
    
    restore
    //  PAIR EACH OBSERVATION WITH ALL OBSERVATIONS OF
    //  SAME FIRM WHERE VALUE EXCEEDS CURRENT HALF VALUE
    rangejoin value half_value . using `copy', by(id)
    //  KEEP ONLY OBSERVATIONS LATER THAN CURRENT YEAR
    keep if year_U > year
    
    //  KEEP ONLY THE FIRST OF THESE
    by id year (year_U), sort: keep if _n == 1
    merge m:1 id year using `copy'
    
    //  CALCULATE INTERVAL FROM YEAR TO YEAR_U
    gen wanted = year_U - year
    
    isid id year, sort
    To use this you must have the -rangestat- command, written by Robert Picard, Nick Cox, and Roberto Ferrer, available from SSC.

    Comment


    • #3
      Dear Clyde,

      Indeed for id "1" and year 1995 it should be 3. As I said, I would like to know how many years are needed until the half value of a specific year is reached (by summing citations in the years that follow). For example, in year 1990 the value is 50 (half value is 25). Hence we need citations from years 1991, 1992, and 1993 to complete at least 25 citations -- in fact we have 2 + 3 + 25 = 30 citations.

      Another example, for id = 1 and year 1995, we have 5 citation (half is 2.5). Then, we need to sum citations from years 1996, 1997, and 1998 to at least reach this value (in fact we have 1 + 1 + 3 = 5 citations).

      I hope it makes more sense now of what I am trying to do.

      Originally posted by Clyde Schechter View Post
      Well, your worked example does not seem consistent with your description in words. Let's look at id 2 year 1990. The value is 8, so the half value is 4. Only in 1994 does value rise above 4, so the time elapsed is 4 years, but you show 2 as your answer. Similarly id 1 with year 1995 has a value of 5. So the half value is 2.5. You show the time until half as 1, but in 1996, value is only 1, which is not above 2.5. I find that the value first exceeds 2.5 in 1998 (value = 3) which makes the time 3 years.

      Assuming that these discrepancies are errors in your calculation and that your word description is what you want, you can use the following code:

      Comment


      • #4
        Well, I still find discrepancies between your words and your example.

        For id = 1 year = 1994, value = 15, so the half-value is 7.5 You show time_until_half = 3, but in 1995, 1996, and 1997 we have value = 5+1+1 = 7 < 7.5. So we need to go on to 1998 to finally get to 7.5 citations, which means time until half should be 4.

        Similarly with id = 1 year = 1995, value = 2, so the half value is 2.5 You show time_until_half = 1, but in 1996, value = 1 < 2.5. So again, we need to go on, in this case all the way to 1998 (1996 through 1998 gives us 1+1+3 = 5) to reach 2.5.

        Again assuming your calculations are mistaken and my understanding of your words is correct, then the code goes like this:

        Code:
        by id (year), sort: gen running_sum = sum(value)
        
        tempfile copy
        save `copy'
        
        gen target = running_sum + half_value
        
        rangejoin running_sum target . using `copy', by(id)
        keep if year_U > year
        by id year (year_U), sort: keep if _n == 1
        merge 1:1 id year using `copy'
        
        gen wanted = year_U - year
        
        isid id year, sort
        (You can drop all the *_U variables as well as running_sum and target at the end, of course, if they are cluttering up your data set.)

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Well, I still find discrepancies between your words and your example.

          For id = 1 year = 1994, value = 15, so the half-value is 7.5 You show time_until_half = 3, but in 1995, 1996, and 1997 we have value = 5+1+1 = 7 < 7.5. So we need to go on to 1998 to finally get to 7.5 citations, which means time until half should be 4.

          Similarly with id = 1 year = 1995, value = 2, so the half value is 2.5 You show time_until_half = 1, but in 1996, value = 1 < 2.5. So again, we need to go on, in this case all the way to 1998 (1996 through 1998 gives us 1+1+3 = 5) to reach 2.5.

          Again assuming your calculations are mistaken and my understanding of your words is correct, then the code goes like this:

          Code:
          by id (year), sort: gen running_sum = sum(value)
          
          tempfile copy
          save `copy'
          
          gen target = running_sum + half_value
          
          rangejoin running_sum target . using `copy', by(id)
          keep if year_U > year
          by id year (year_U), sort: keep if _n == 1
          merge 1:1 id year using `copy'
          
          gen wanted = year_U - year
          
          isid id year, sort
          (You can drop all the *_U variables as well as running_sum and target at the end, of course, if they are cluttering up your data set.)
          You are exactly correct Clyde. You have understood my point. You code works perfectly.
          Thanks a lot.

          Comment

          Working...
          X