Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Caculate proximity between two variables

    Hi all,
    I have dataset with patent, patent class, inventor and firm. Example as follows:
    firm_id year inventor move patent class sourcef kd
    10036 2001 4144893-1 0 6197294 424093 .
    10036 2001 4144893-1 0 6225448 424130 .
    10036 2001 4212742-2 1 6322804 424422 100390 0.8
    10036 2001 4495402-2 0 6322804 424422
    10036 2001 4861627-1 0 6262034 424468
    10036 2001 4868121-1 0 6322804 424422
    10036 2001 4868121-2 0 6322804 424422
    10036 2001 4877029-2 6322804 424422

    firm_id: the focal firm who hires inventor from source firm (sourcef_id) at year t.
    inventor_id: inventors hwo is now working at focal firm (firm_id).
    move: dummy variable, 1 if inventor move to focal firm at year t, otherwise 0.
    class: classification number which indicates certain class for each patent (patent_id)
    sourcef_id: id for source firm, that is the inventors' employer before moving to focal firm. this variable only exists when inventor move (move =1).
    kd: knowledge distance indicates knowledge distance between hiring firm and mobile inventor at the time inventor moves in to hiring firm. This is the desired varaible.

    E.g.The third row of the dataset says that: focal firm (firm_id) hire inventor 4212742-2 at year 2001 from firm 100390. And I woule like to know the knwoledge distance between inventor 4212742-2 and firm 10036.

    How calculate:
    correlation of vector Ci and Cj .
    kdij=Ci*Cj / [(Ci*Ci)(Cj*Cj)]1/2

    Ci: proportion of inventor i patent at each patent class 5 years before move.
    Cj: proportion of firm j patent at each patent class 5 tears before hiring certain inventor.
    kd is a variable from 0 to 1, the larger, the less knowledge distance (more close) between hiring firm and mobilr inventor.

    Take the third row as an example, inventor 4212742-2 move to firm 10036 at 2001. Calculation could be as follows as I could understand (probabily more siple and easy method could be used):
    Firstly, I will calculate patent class matrix Ci for inventor 4212742-2. Let's suppose, 5 years before move the inventor's patent hisotry in patet class is as follows:
    Ci = [1/16, 3/16, 0, 0 ,0 , 5/16, 0, 0, ......] : 1*k matrix. k will be number of patent class.
    Secondly, calculate patent class matrix for hiring firm.
    Cj= [1/120, 1/120, 0, 0 ,0 , 4/120, 0, 0, ......] : 1*k matrix. k will be number of patent class.
    Thirdly, the knowledge distance above could be calculated for each pair of mobile inventor and hiring firm.

    However, there're thousands of inventors' mobility events at different years. I have problem transfering these logic into codes. Any suggestions would be rather appreciated!!

    Note:
    1. for inventors who have less than 5-year patent history before move, then calculate as the years they have.
    2. for firms with less than 5-year patent history before hiring, calculate the years as they have.

  • #2
    The above question have been solved.

    Comment


    • #3
      For the above issue, I solve it with the following logic:
      Firstly, I obtain the percentage of patents inventors have patented
      Code:
      bysort permno_adj group class: egen npat_class = count(class) if dyear >0 & dyear <=5 // count number of patents inventors have patented at each class within paster 5 years
      bysort permno_adj group: egen npat_total_= count(class) if dyear >0 & dyear <=5 // count total number of patents inventors have patented  within paster 5 years
      bysort permno_adj group: gen per_class_firm = npat_class/npat_total // calculate percentage of patents in each class
      drop (npat_class npat_total ) 
      reshape wide per_class, i(inventor_id group) j(class) string  // reshape for calculation convenience
      reshape long per_class, i(id group) j(class) string // reshape for calculation convenience
      Now I have a table with inventors as the first column, and category of class the second column and percentage of class the tihrd column. Then I obtain percentage of patents firms have patented with similar logic. Then join by the two both firm and inventor dataset by class category.

      Lastly, calculate the distance with the following code.
      Code:
       by group: gen inner_product = sum(var1*var2)  
        by group: replace inner_product = inner_product[_N]
      
        forvalues i = 1/2 {
          by group: gen length`i' = sum(var`i'^2)
          by group: replace length`i' = sqrt(length`i'[_N])
        }
      
        by group: gen kd= inner_product/(length1*length2)


      Comment

      Working...
      X