Hi all -
I'm doing a project where I need to compare the data of a single field, but by grouped rows (so something like matchit won't really work unless I am misunderstanding it). I've looked for solutions here, but have not gotten many hits. It may well be my search is using the wrong words, so sorry if this is a repeat. Basically, I have multiple patents in a "family" and I want to see if the patents have the same inventors.
So a simplified dataset would be:
patno,familyno,inventor
1,1,A
2,1,A
3,1,B
4,2,C
5,2,C
I want to test all familyno of 1, 2, etc. and have stata report, tag, etc., all patno with an inventor that matches another with the same familyno.
Problem 1: doing the comparison at all
Problem 2: doing a fuzzy comparison
Problem 3: it can be multiple match (that is patno 1 and patno 2 could have both A and B and I want to see if both are the same)
FWIW, this is a followon of this discussion https://www.statalist.org/forums/for...ation-probleim
I have done that consolidation, but the counting methods don't work as well (I don't think) because a) fuzzy matching on names and b) not every family has all or none
Thoughts appreciated.
I'm doing a project where I need to compare the data of a single field, but by grouped rows (so something like matchit won't really work unless I am misunderstanding it). I've looked for solutions here, but have not gotten many hits. It may well be my search is using the wrong words, so sorry if this is a repeat. Basically, I have multiple patents in a "family" and I want to see if the patents have the same inventors.
So a simplified dataset would be:
patno,familyno,inventor
1,1,A
2,1,A
3,1,B
4,2,C
5,2,C
I want to test all familyno of 1, 2, etc. and have stata report, tag, etc., all patno with an inventor that matches another with the same familyno.
Problem 1: doing the comparison at all
Problem 2: doing a fuzzy comparison
Problem 3: it can be multiple match (that is patno 1 and patno 2 could have both A and B and I want to see if both are the same)
FWIW, this is a followon of this discussion https://www.statalist.org/forums/for...ation-probleim
I have done that consolidation, but the counting methods don't work as well (I don't think) because a) fuzzy matching on names and b) not every family has all or none
Thoughts appreciated.
Comment