How to change data into vector format, and conduct vector multiplication?

Chengmou Lei

Join Date: May 2023

Posts: 24
#1

How to change data into vector format, and conduct vector multiplication?

22 May 2023, 14:41

Dear all,

It is my first time trying to use Stata to do vector operations.

My setting is: I have a data set for target, which includes information on patents (patent_id) each firm(gvkey) has. Each patent can belong to one or multiple WIPO technology classes(wipo_sector_id). I have already calculated the number of patents each firm has in a specific technology class k(s_target_sector). Now, I want to create a vector ($S_{\text{target}}$), where each entry corresponds to the number of patents (s_target_sector: $N_{k1}...N_{kK}$) for a particular class (wipo_sector_id: k). The length of this vector (K) should equal the total number of possible technology classes in my dataset.

For example, if there are three classes (manufacture, agriculture, semiconductor), but firm gvkey == 6078 only has patents in the first and third sectors, I still want the vector length to be 3, with the agriculture entry value set to 0. Do you know can I achieve this?

Additionally, I have a similar dataset for the acquirer firm, and I would like to perform the same calculations. Once I have obtained the two vectors, I intend to multiply the acquirer vector with the target vector, performing entry-wise multiplication. In mathematical terms, I want to calculate:
\[S_{acq}\cdot {S_{target}}'=\left ( N_{k1}^{acq}, N_{k2}^{acq}, ..., N_{kK}^{acq} \right )\cdot { \left ( N_{k1}^{target}, N_{k2}^{target}, ..., N_{kK}^{target} \right )}' = N_{k1}^{acq} N_{k1}^{target}+N_{k2}^{acq} N_{k2}^{target}+...N_{kK}^{acq} N_{kK}^{target}\]
Here, K represents the maximum number of classes in the dataset. It is crucial that the order of entries is the same for both the target and acquirer vectors, and the multiplication is performed entry by entry. How can I accomplish this?

I attempted to use the -putmata- command, but I am unsure how to convert a variable into a vector grouped by firm (i.e., each firm having its own vector), and how to ensure that the vectors have the same length even if some firms do not have patents in every class. Do you know how to do this?

I attach my sample data here:
----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float gvkey double ma_dealnumber byte wipo_sector_id float s_target_sector 1512 150881043 2 .75 1512 150881043 2 .75 1512 150881043 2 .75 1512 150881043 5 .25 1554 2974207020 1 .15789473 1554 2974207020 2 .2631579 1554 2974207020 2 .2631579 1554 2974207020 3 .7894737 1554 2974207020 3 .7894737 1554 2974207020 3 .7894737 1554 2974207020 4 .4210526 1554 2974207020 4 .4210526 1767 95887020 2 .2 1767 95887020 3 .4 1767 95887020 4 .4 1767 95887020 4 .4 1786 994758020 1 .5 1786 994758020 3 .375 1786 994758020 4 .25 1866 1721677020 2 1 1900 313115020 1 .4 1900 313115020 1 .4 1900 313115020 2 .2 1900 313115020 3 .6 1900 313115020 3 .6 1900 313115020 3 .6 1900 313115020 4 .8 1900 313115020 4 .8 1900 313115020 4 .8 2048 22757020 2 .5 end label values wipo_sector_id wipo_sector_id label def wipo_sector_id 1 "Chemistry", modify label def wipo_sector_id 2 "Electrical engineering", modify label def wipo_sector_id 3 "Instruments", modify label def wipo_sector_id 4 "Mechanical engineering", modify label def wipo_sector_id 5 "Other fields", modify

Note: ma_dealnumber is content specific because in the M&A data set, one firm can participate in multiple M&A deals. This would cause the record of the same firm, same class multiple times.

Any guidance or suggestions would be greatly appreciated.

Thank you in advance for your assistance.

Last edited by Chengmou Lei; 22 May 2023, 14:43.
Tags: None
Chengmou Lei

Join Date: May 2023

Posts: 24
#2

23 May 2023, 06:32

An update on my current solution is as follows:

To solve the problem, I have utilized a non-vector approach. Specifically, I determine the result of the vector multiplication (which is the sum of squares in the formula mentioned in my original post) and calculate it directly. Here are the steps involved:
I employ the "tsfill" function to ensure that each firm has a vector of the same length, which is equal to N.

I fill in the missing sectors and replace the values of "s_target_sector" with 0.

I sort the sectors to ensure that they are in the same order across firms and the entire dataset.

I repeat the previous step for the acquirer side as well.

By using the "ma_dealnumber" and "sector" as keys, I merge the target data with the acquirer data.

I calculate the product of "s_target_sector" and "s_acq_sector".

I sum the multiplications across different sectors within each pair.

However, I would still appreciate it if someone can provide guidance on how to achieve the same results using a vectorized approach.

Thank you!
Comment

Announcement

How to change data into vector format, and conduct vector multiplication?

Comment