Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cosine similarity for ca. 2000 pairs of documents

    Dear community!

    I am working on my MSc thesis and need your help. I want to calculate the cosine similarity between 2000 pairs of documents. What I have in more detail: two separate datasets (a,b) with 2000 observations in each (observation 1 in dataset a is a pair with observation 1 in dataset b). In both datasets, I have ca. 100 variables which are word counts of special words that I previously selected. I want to determine the cosine similarity of these word counts for every one of the 2000 pairs. So i think I have a total of 4000 vectors and one vector consists of 100 numbers.
    I don't know how I can prepare my dataset (i think i need to merge something before I can start as a and b are separate files). Also, I read something about the angular command but I am not sure if it can help me here and if so i need it for between observations or between variables.

  • #2
    * I think I need a total of 4000 vectors. I do not have any vectors right now

    Comment

    Working...
    X