Hi,
I am trying to calculate cosine similarity for a large, unpbalanced dataset.
My Data has the stracture:
Now I need to create all pairs of group id-s in a year and calculate from the given list and weight of components the cosine siimilarity of each pair.
I cheked How can I create all pairs within groups? | Stata FAQ (ucla.edu) and Stata | FAQ: Expanding datasets to all possible pairs,
but they do not seem to work in my case as they would need the same group id to appear only once in a given year (group).
I could potentially achieve that if I was to reshape it wide and convert component to variable,I tried , but given that the list of components is very large and it changes over years...it did not work.
Any idea how could I go about creating all pairs of group_id-s in a yera and Calculate cosine similarity among each pair?
Thank you for any ideas on how to deal with this!
I am trying to calculate cosine similarity for a large, unpbalanced dataset.
My Data has the stracture:
year | group Id | component | weight |
2020 | 23 | a | 0.2 |
2020 | 23 | b | 0.8 |
2020 | 24 | a | 0.3 |
2020 | 24 | b | 0.3 |
2020 | 24 | c | 0.4 |
2019 | 23 | b | 1 |
2019 | 25 | c | 1 |
I cheked How can I create all pairs within groups? | Stata FAQ (ucla.edu) and Stata | FAQ: Expanding datasets to all possible pairs,
but they do not seem to work in my case as they would need the same group id to appear only once in a given year (group).
I could potentially achieve that if I was to reshape it wide and convert component to variable,I tried , but given that the list of components is very large and it changes over years...it did not work.
Any idea how could I go about creating all pairs of group_id-s in a yera and Calculate cosine similarity among each pair?
Thank you for any ideas on how to deal with this!
Comment