Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • problem with the simimlarity score when using matchit

    Dear all, I want to match firms in a patent dataset with firms in a financial dataset. I do not know what I am doing wrong, but, it seems that something is not properly working with the matchit (or probably I am doing something wrong) since as you can see below, there are very odd results. For instance, very similar firm's names ended wih a very low similarity score while the opposite happen to very different firm's names (with scores higher than 0.99). This is the command I am using:
    Code:
    matchit han_id pat_cp_name using "`sabi'", idu(n_nif) txtu(sabi_cp_name) weights(simple) threshold(0.5) sim(token) override
    Is there any hint or advice you can give me for solving this issue? Thanks in advance.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long han_id str57 pat_cp_name long n_nif str54 sabi_cp_name float similscore
     157366 "ASOCIACION INDUSTRIAL DE OPTICA COLOR E IMAGEN AIDO 46980" 1606957 "LEDESMA AIDO SL 41927"                                   .9986591
     747036 "DONAIRE GONZALEZ FELICIANA 43205"                          1490409 "ESTETICA SANTA FELICIANA A SL 28010"                      .998652
    1761626 "INT SPORTS ORG SA 28009"                                    226051 "ASESORES ORG SL 07002"                                   .9986122
     517083 "CARINSA CREACIONES AROMATICAS INDUSTRIALES SA 08192"        398044 "CARINSA PROMOTORA Y CONSTRUCTORA DE EDIFICIOS SL 18005"  .9985354
    2629067 "RADIADORES PUMA CHAUSSON SA"                               1471980 "REPRESENTACIONES DE MUEBLES CHAUSSON SL 28979"           .9985145
     295654 "BIOTECH INSTITUTE I MAS DSL 01005"                         1226280 "DISENO GRAFICO EN ENTORNO VISUAL DSL SL 11201"           .9983477
    1378240 "TELEDYNE INNOVACIONES MICROELECTAS SLU 41092"              1599104 "TELEDYNE INNOVACIONES MICROELECTRONICAS SL 41092"       .50000495
    1860433 "MANUFACTURAS Y TRANSFORMADOS AB SLU 08700"                 1016037 "MANUFACTURAS Y TRANSFORMADOS A B SLU 08700"              .5761753
     468751 "CELO KONFORTO KAY KVALITO SL 18330"                         404630 "CELO KONFORTO KAJ KVALITO SL 18320"                     .58131045
     627044 "DACHS ELECTA SA 08340"                                       80560 "DACHS ELECTRONICA SA 08340"                              .5913933
    2981588 "THINK PIPE LINE SLNE 08240"                                 390165 "THINK PIPE LINE SL 17249"                                .6040162
    end


Working...
X