Dear Statalist, I am wondering if you can help to do the analysis I am doing in the loop (see below) in a more efficient way (faster way). For a group of cited patents I want to generate an indicator that tells me if the firms (a patent may be developed by several firms that are captured in several variables) in _n are in any observation within that group (_n+1, _n+2…). So, if only one firm in the groups of variables in _n is present in the group of variables in _n+x, I would assign a 1. So, let’s say that I have:
Then the loop will create several variables that assign a 1 if that obs have a firm in common with other obs. That is, following the previous table, for _n and _n+1 it will create a variable (in _n) that will be missing in that obs because there are not common firms between _n and _n+1. But between _n and _n+2 it will create another variable that will be 1 because the same firm is in Firm2[_n] and Firm1[_n+2]. Thereafter for having this information in a single variable I use “egen rowtotal” in the second part of the loop.
Look that the command is not comparing firm1[_n] with firm1[_n+x]; but firm1[_n] with firm1, firm2, firm3, firm4… in [_n+x]. And the same for firm2[_n]…
Since I am using a dataset with more than 7million obs and there are around 40 firms variables and some groups have around 2000 obs, this loop is endless. Can you please suggest me a more efficient way to know if for the group of cited patent there are any citing firm in [_n] that is also in [_n+x]?
Below you can see the loop and a small sample of the data.
Thanks in advance!
Firm1 | Firm2 | Firm3 | Firm4 | |
_n | 3 | 6 | 8 | 9 |
_n+1 | 1 | 2 | 4 | 5 |
_n+2 | 6 | 7 | 11 | 12 |
Look that the command is not comparing firm1[_n] with firm1[_n+x]; but firm1[_n] with firm1, firm2, firm3, firm4… in [_n+x]. And the same for firm2[_n]…
Since I am using a dataset with more than 7million obs and there are around 40 firms variables and some groups have around 2000 obs, this loop is endless. Can you please suggest me a more efficient way to know if for the group of cited patent there are any citing firm in [_n] that is also in [_n+x]?
Below you can see the loop and a small sample of the data.
Thanks in advance!
Code:
ds citing_firm_id* local nwords : word count `r(varlist)' display `nwords' display wordcount("`r(varlist)'") sum ccc if cited_firm_id1!=. & citing_firm_id1!=. local max_k = r(max) forvalues k = 1/`max_k' { forvalues x = 1/`nwords' { forvalues y = 1/`nwords' { sort cited_appln_id citing_year citing_appln_id bys cited_appln_id: gen count4_`x'_`y'_`k' = 1 if citing_firm_id`x'==citing_firm_id`y'[_n+`k'] & !missing(citing_firm_id`x',citing_firm_id`y'[_n+`k']) } } egen self_cit_v_k`k' = rowtotal(count4_*), missing drop count4_* }
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input double(citing_appln_id cited_appln_id) float(citing_firm_id1 citing_firm_id2 citing_firm_id3 citing_firm_id4 citing_firm_id5 citing_firm_id6 citing_firm_id7 citing_firm_id8 citing_firm_id9 citing_firm_id10 cited_firm_id1 citing_year ccc) 273682949 522 2931813 . . . . . . . . . 2931813 2010 1 315557296 522 1709046 2931813 . . . . . . . . 2931813 2010 2 274384595 522 2931813 . . . . . . . . . 2931813 2011 3 333687126 522 2931813 . . . . . . . . . 2931813 2012 4 334378122 522 2931813 . . . . . . . . . 2931813 2012 5 336160680 522 2931813 4073560 . . . . . . . . 2931813 2013 6 337102271 522 2931813 . . . . . . . . . 2931813 2013 7 337102281 522 2931813 . . . . . . . . . 2931813 2013 8 352267359 522 2931813 . . . . . . . . . 2931813 2013 9 379237343 522 1394838 . . . . . . . . . 2931813 2013 10 380717448 522 296197 . . . . . . . . . 2931813 2014 11 381189080 522 2931813 . . . . . . . . . 2931813 2014 12 412500492 522 908975 . . . . . . . . . 2931813 2015 13 415658469 522 908975 . . . . . . . . . 2931813 2015 14 418707239 522 2931813 . . . . . . . . . 2931813 2015 15 419016602 522 2931813 . . . . . . . . . 2931813 2015 16 438831616 522 908975 . . . . . . . . . 2931813 2016 17 439633449 522 908975 . . . . . . . . . 2931813 2016 18 474527945 522 918273 . . . . . . . . . 2931813 2018 19 524645129 522 2931813 . . . . . . . . . 2931813 2020 20 56840315 540 296791 3091119 3096471 . . . . . . . 296791 2009 1 273300774 540 296791 . . . . . . . . . 296791 2010 2 208669 1186 140411 . . . . . . . . . 140411 2003 1 15909980 1186 557078 . . . . . . . . . 140411 2003 2 15999323 1186 140411 238661 . . . . . . . . 140411 2003 3 16031254 1186 981997 . . . . . . . . . 140411 2004 4 336876707 1186 140411 238661 . . . . . . . . 140411 2011 5 end
Comment