Hello Statalisters,
I've be wrangling with this problem for a while and have not developed a feasible code. I trying to calculate a Jaccard similarity measure using the data below. What I have is a patent number and technological class categories in which the patent is assigned (a patent can have multiple classes). What I want to do is measure the similarity between sets of classes.
A Jaccard similarity is the number of elements shared between set divided by number of elements in both sets (share and unshared).
For my purposes, this would be the number of patents which cite the same classes divided by the number of patents that cite either class.
For example, for class aa and bb, the Jaccard similarity would be .67 since the numerator is 2 (both patents 1 and 2 cite both classes aa and bb) and the denominator is 3 (since patents 1, 2 and 4 cite either class aa, bb or both.
Could anyone suggest code that would allow me to calculate Jaccard similarity measures for a larger dataset?
Thanks in advance,
Ed
I've be wrangling with this problem for a while and have not developed a feasible code. I trying to calculate a Jaccard similarity measure using the data below. What I have is a patent number and technological class categories in which the patent is assigned (a patent can have multiple classes). What I want to do is measure the similarity between sets of classes.
A Jaccard similarity is the number of elements shared between set divided by number of elements in both sets (share and unshared).
For my purposes, this would be the number of patents which cite the same classes divided by the number of patents that cite either class.
Code:
input patno str3 class 1 aa 1 bb 2 cc 2 dd 3 aa 3 bb 4 aa 4 cc end
Could anyone suggest code that would allow me to calculate Jaccard similarity measures for a larger dataset?
Thanks in advance,
Ed
Comment