Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to change data into vector format, and conduct vector multiplication?

    Dear all,

    It is my first time trying to use Stata to do vector operations.

    My setting is: I have a data set for target, which includes information on patents (patent_id) each firm(gvkey) has. Each patent can belong to one or multiple WIPO technology classes(wipo_sector_id). I have already calculated the number of patents each firm has in a specific technology class k(s_target_sector). Now, I want to create a vector ($S_{\text{target}}$), where each entry corresponds to the number of patents (s_target_sector: $N_{k1}...N_{kK}$) for a particular class (wipo_sector_id: k). The length of this vector (K) should equal the total number of possible technology classes in my dataset.

    For example, if there are three classes (manufacture, agriculture, semiconductor), but firm gvkey == 6078 only has patents in the first and third sectors, I still want the vector length to be 3, with the agriculture entry value set to 0. Do you know can I achieve this?

    Additionally, I have a similar dataset for the acquirer firm, and I would like to perform the same calculations. Once I have obtained the two vectors, I intend to multiply the acquirer vector with the target vector, performing entry-wise multiplication. In mathematical terms, I want to calculate:
    \[S_{acq}\cdot {S_{target}}'=\left ( N_{k1}^{acq}, N_{k2}^{acq}, ..., N_{kK}^{acq} \right )\cdot { \left ( N_{k1}^{target}, N_{k2}^{target}, ..., N_{kK}^{target} \right )}' = N_{k1}^{acq} N_{k1}^{target}+N_{k2}^{acq} N_{k2}^{target}+...N_{kK}^{acq} N_{kK}^{target}\]
    Here, K represents the maximum number of classes in the dataset. It is crucial that the order of entries is the same for both the target and acquirer vectors, and the multiplication is performed entry by entry. How can I accomplish this?

    I attempted to use the -putmata- command, but I am unsure how to convert a variable into a vector grouped by firm (i.e., each firm having its own vector), and how to ensure that the vectors have the same length even if some firms do not have patents in every class. Do you know how to do this?

    I attach my sample data here:
    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float gvkey double ma_dealnumber byte wipo_sector_id float s_target_sector
    1512  150881043 2       .75
    1512  150881043 2       .75
    1512  150881043 2       .75
    1512  150881043 5       .25
    1554 2974207020 1 .15789473
    1554 2974207020 2  .2631579
    1554 2974207020 2  .2631579
    1554 2974207020 3  .7894737
    1554 2974207020 3  .7894737
    1554 2974207020 3  .7894737
    1554 2974207020 4  .4210526
    1554 2974207020 4  .4210526
    1767   95887020 2        .2
    1767   95887020 3        .4
    1767   95887020 4        .4
    1767   95887020 4        .4
    1786  994758020 1        .5
    1786  994758020 3      .375
    1786  994758020 4       .25
    1866 1721677020 2         1
    1900  313115020 1        .4
    1900  313115020 1        .4
    1900  313115020 2        .2
    1900  313115020 3        .6
    1900  313115020 3        .6
    1900  313115020 3        .6
    1900  313115020 4        .8
    1900  313115020 4        .8
    1900  313115020 4        .8
    2048   22757020 2        .5
    end
    label values wipo_sector_id wipo_sector_id
    label def wipo_sector_id 1 "Chemistry", modify
    label def wipo_sector_id 2 "Electrical engineering", modify
    label def wipo_sector_id 3 "Instruments", modify
    label def wipo_sector_id 4 "Mechanical engineering", modify
    label def wipo_sector_id 5 "Other fields", modify
    Note: ma_dealnumber is content specific because in the M&A data set, one firm can participate in multiple M&A deals. This would cause the record of the same firm, same class multiple times.

    Any guidance or suggestions would be greatly appreciated.

    Thank you in advance for your assistance.
    Last edited by Chengmou Lei; 22 May 2023, 14:43.

  • #2
    An update on my current solution is as follows:

    To solve the problem, I have utilized a non-vector approach. Specifically, I determine the result of the vector multiplication (which is the sum of squares in the formula mentioned in my original post) and calculate it directly. Here are the steps involved:
    1. I employ the "tsfill" function to ensure that each firm has a vector of the same length, which is equal to N.
    2. I fill in the missing sectors and replace the values of "s_target_sector" with 0.
    3. I sort the sectors to ensure that they are in the same order across firms and the entire dataset.
    4. I repeat the previous step for the acquirer side as well.
    5. By using the "ma_dealnumber" and "sector" as keys, I merge the target data with the acquirer data.
    6. I calculate the product of "s_target_sector" and "s_acq_sector".
    7. I sum the multiplications across different sectors within each pair.
    However, I would still appreciate it if someone can provide guidance on how to achieve the same results using a vectorized approach.

    Thank you!

    Comment

    Working...
    X