Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "while" loop not working

    Dear all, I want for each firm in the patent data to be matched with a firm in the Amadues (sabi) data, I do not want a given firm from sabi to be repeated for different firms in the patent data. Therefore, I first create a matchit and eliminate those obs with the same sabi firm and lower score. Then, I keep only the firm with the best score from patent. This is kept in a file (matchit1). The idea is to merge this with the original patent data to discard from the patent file those firms that were matched. And I do the same for sabi. Therefore, I assume that what is left in the patent and sabi data are firms that were unmatched.

    Using a "while" loop, I redo the matchit using the new datasets for patent and sabi (matchit2), and again I eliminate duplicates sabi firm (serving as a matched firm for more than one patent firm) and keep only the best matches for patent firm (in terms of score). Thereafter, I append these new set of matched firms (matchit2) with the previous best matches (matchit1). The idea is to repeat the process until there is not more duplicate firms (a single firm from sabi serving as match for different firms in the patent data, and therefore, being duplicated). Of course, the number of duplicates firms will depend on the threshold in the matchit comand.

    However, I am having huge problems, since I cannot understand why it is not doing the "while" loop. If instead, I copy/paste the block of codes within the loop right below (several times), it continues working (since dup is greater than 0). A very short data example and code is below.

    Any idea will be more than welcome!

    Code:
    clear
    input long han_id str129 pat_cp_name
      890 "3 D MITA SL 28500"                                                     
     4280 "3P BIOPHARMACEUTICALS SL 31110"                                        
     6541 "A INGENIERIA DE AUTOMATISMOS SA 48213"                                 
    11608 "A3 ADVANCED AUTOMOTIVE ANTENNAS 08100"                                 
    14850 "ABB POWER TECH SA 28037"                                               
    15754 "ABENGOA SOLAR NEW TECH SA "                                            
    16081 "AB BIOTICS SA 08193"                                                   
    16926 "ABENGOA SOLAR NEW TECNOLOGIES SA 41014"                                
    17564 "ABN PIPE SYSTEMS SLU 15008"                                            
    18182 "ABUNDANCIA DE SANTIAGO RAMON 35017"                                    
    19340 "ABENGOA HIDROGENO SA 41014"                                            
    19410 "ABAD MUNOZ FERNANDO 14009"                                             
    19821 "ABUNDANCIA NAVARRO CRISTINA 35017"                                     
    19822 "ABUNDANCIA NAVARRO JUAN CARLOS 35017"                                  
    21281 "ACENER INVESTIGACION Y DESARROLLO SL 28030"                            
    23363 "ACCIONA ENERGIA SA 31621"                                              
    23366 "ACCIONA TOWERS SA 28020"                                               
    23451 "ABRAHAM VENEGAS FRANCO 29730"                                          
    23728 "ABERTIS AUTOPISTAS ESPANA SA 08040"                                    
    23910 "ACOSTA APARICIO VICTOR 28006"                                          
    26111 "ACTUALITY SISTEMS SL 03113"                                            
    26529 "ACEITES DEL SUR COOSUR SA 23220"                                       
    26552 "ACERALIA TRANSFORMADOS SA 31014"                                       
    28806 "ACRONIMUS TECH SL "                                                    
    30133 "ACERIA COMPACTA DE BIZKAIA SA 48910"                                   
    30280 "ACCIONA INFRAESTRUCTURAS SA 28108"                                     
    31274 "ACITURRI ENGINEERING SLU 47151"                                        
    31355 "IMMOSOLAR ACTIVE BUILDING TECH SL 07180"                               
    31402 "ACTIVO MARK SL 46002"                                                  
    31940 "ADMINISTRACION GENERAL DE LA COMUNIDAD AUTONOMA DE EUSKADI 01010"      
    32563 "ACONDICIONAMIENTO TARRASENSE 08225"                                    
    33140 "ADELTE AIRPORT TECH SL 08029"                                          
    35500 "ACORDE TECH SA 39002"                                                  
    35722 "ADAICO SL 31006"                                                       
    36403 "ACS SERVICIOS COMUNICACIONES Y ENERGIA SL 28016"                       
    36537 "ADVANCED IN VITRO CELL TECH SL 08028"                                  
    37061 "ADVANCED MEDICAL PROJECTS 28760"                                       
    37779 "ADVANCED SCIENTIFIC TECH EUROPE SA 08700"                              
    37791 "ADVANCED SIMULATION TECH SL 33203"                                     
    40071 "AERNNOVA ENGINEERING SOLUTIONS IBERICA 28050"                          
    40197 "ADT ESPANA SL 50008"                                                   
    41964 "ADN CONTEXT AWARE MOBILE SOLUTIONS SL 33203"                           
    43015 "AEROSPACE CONSULTING CORP SPAIN SL 08034"                              
    43561 "ADVANCED DIGITAL DESIGN SA 50018"                                      
    45346 "ADVANCELL ADVANCED IN VITRO CELL TECH SA 08028"                        
    47630 "AGENCIA PUBLICA EMPRESARIAL SANITARIA HOSPITAL ALTO GUADALQUIVIR 23740"
    47631 "AGENCIA PUBLICA EMPRESARIAL SANITARIA HOSPITAL DE PONIENTE 04700"      
    48483 "AF SISTEMAS SA 28041"                                                  
    48564 "AFINITICA TECH SL 08193"                                               
    48740 "AGQ TECHNOLOGICAL CORPORATE SA 41220"   
    end
    tempfile patent
    save `patent'
    
    clear
    input long n_nif str126 sabi_cp_name
        181 "ACERIA DE ALAVA SA 01470"                                                   
       1221 "AERNNOVA MANUFACTURING ENGINEERING SA 01013"                                
       9480 "LA AUXILIAR TARRASENSE SA 08221"                                            
      10625 "INMOBILIARIA TARRASENSE SA 08221"                                           
      52084 "ACCIONA SOLAR SA 31621"                                                     
      61858 "ABENGOA SA 41014"                                                           
      92568 "ADVANCELL ADVANCED IN VITRO CELL TECHNOLOGIES SA 08006"                     
      93543 "AB BIOTICS SA 08172"                                                        
      94636 "ABERTIS INTERNACIONAL SA 28046"                                             
     101792 "INVERSIONES AYERBE SA 28007"                                                
     102429 "ABB POWER TECHNOLOGY SA 28037"                                              
     106616 "ABB ENERGIA SA 28043"                                                       
     106853 "ACCIONA MANTENIMIENTO DE INFRAESTRUCTURAS SA 28108"                         
     108195 "ACEITES DEL SUR COOSUR SA 23220"                                            
     110005 "AF INMUEBLES SA 28014"                                                      
     111016 "ADVANCED DIGITAL SA 28100"                                                  
     111303 "ABERTIS AUTOPISTAS ESPAÑA SAU 28046"                                       
     114255 "ACS SERVICIOS COMUNICACIONES Y ENERGIA SA 28016"                            
     120532 "AERNNOVA ENGINEERING SOLUTIONS IBERICA SA 28050"                            
     124260 "ABENGOA SOLAR SA 41014"                                                     
     151797 "ACENER SXXI SL 02002"                                                       
     316001 "CONSTRUCCIONES PLAZA SL 14009"                                              
     320670 "DISTRIBUCIONES GOMEZ VILLA SL 14009"                                        
     347630 "ABN PIPE GESTION SL 15008"                                                  
     348503 "ABN PIPE SYSTEMS SL 15008"                                                  
     415960 "ADVANCED MEDICAL PROJECTS SL 28805"                                         
     428956 "ARCELORMITTAL ACERALIA BASQUE HOLDING SL 48910"                             
     454807 "ADELTE AIRPORT TECHNOLOGIES SL 08029"                                       
     542232 "ABRAHAM RUIZ SL 29620"                                                      
     581950 "ADAICO RECAMBIOS SL 31006"                                                  
     582139 "3 P BIOPHARMACEUTICALS SL 31110"                                            
     582896 "ACCIONA ENERGIA SOLAR SL 31621"                                             
     606385 "ADVANCED SIMULATION TECHNOLOGIES SL 33203"                                  
     608869 "ADN CONTEXT AWARE MOBILE SOLUTIONS SL 33211"                                
     715793 "INSTITUTO AWARE SL 46010"                                                   
     720591 "INMOBILIARIA ACOSTA SL 41500"                                               
     833815 "ADT ESPAÑA SL 50008"                                                       
     838620 "AGRO INDUSTRIAL AYERBE SL 50007"                                            
     862439 "ACOSTA APARICIO SL 03540"                                                   
     924840 "ENERGIA AUTONOMA SL 43201"                                                  
     942960 "ACTIVE BUILDING TECHNOLOGIES INTELLIGENT SYSTEMS SL 07009"                  
     992759 "IMMOSOLAR SL 08830"                                                         
    1013603 "ADT TELECOMUNICACIONES SL 08110"                                            
    1059704 "ADVANCED AUTOMOTIVE ANTENNAS SL 08028"                                      
    1064429 "BARCELONA MARK CENTER SL 08029"                                             
    1122507 "AF SOL GRUP SL 08036"                                                       
    1137540 "ACRONIMUS TECHNOLOGY SL 08242"                                              
    1138273 "IN VITRO MEDIA SL 08008"                                                    
    1142883 "ADELTE GROUP SL 08029"                                                      
    1151815 "MITA COSTA SL 08013"                                                        
    1154199 "AFINITICA TECHNOLOGIES SL 08193"                                            
    1177769 "AFINITICA PROCESS TECHNOLOGY SL 08193"                                      
    1182462 "HIDROGENO CONSULTING SL 08006"                                              
    1296991 "COMERCIAL AGAR SL 28027"                                                    
    1319680 "AEROSPACE SL 28007"                                                         
    1334880 "ACITURRI ENGINEERING SL 47151"                                              
    1363136 "MADRID SCIENTIFIC FILMS SL 28691"                                           
    1379457 "ACORDE CONSULTORES SL 28003"                                                
    1380552 "INVERSIONES ACORDE SL 28001"                                                
    1393146 "3 D MITA INGENIERIA SL 28500"                                               
    1400325 "ACS SERVICIOS COMUNICACIONES Y ENERGIA INTERNACIONAL SL 28016"              
    1410031 "TOWERS CONSULTING E INVERSIONES SL 28016"                                   
    1410223 "GRAN ABUNDANCIA SL 28006"                                                   
    1433549 "INNOVA SCIENTIFIC SL 28290"                                                 
    1437034 "ADVANCED MEDICAL SYSTEMS SL 28012"                                          
    1453788 "ACTUALITY EVENTOS Y COMUNICACION SL 28027"                                  
    1454045 "ACENER RENOVA SL 28006"                                                     
    1475987 "MADRID AEROSPACE SERVICES SL 28850"                                         
    1476213 "ACTUALITY SALUD SL 28028"                                                   
    1485810 "TOWERS INVERSIONES Y PROYECTOS SL 28028"                                    
    1500934 "ADVANCED MATERIAL SIMULATION SL 48008"                                      
    1521569 "ABRAHAM PRODUCCIONES SL 28014"                                              
    1536575 "DESARROLLO EMPRESARIAL ADVANCED SL 28014"                                   
    1540823 "CORPORACION ACCIONA INFRAESTRUCTURAS SL 28108"                              
    1543325 "FOOD MARK GROUP SL 28007"                                                   
    1548989 "CERES BIOTICS TECH SL 28830"                                                
    1575537 "ACITURRI GETAFE SL 28906"                                                   
    1582869 "AGAR MANAGEMENT SL 28039"                                                   
    1611422 "AGQ TECHNOLOGICAL CORPORATE SL 41013"                                       
    1616563 "LABS & TECHNOLOGICAL SERVICES AGQ SL 41220"                                 
    1662970 "LA JOYA DE LA ABUNDANCIA SL 29602"                                          
    1686003 "CARPINTERIA SEGUI SL 48213"                                                 
    1688079 "ACERALIA CONSTRUCCIONES SL 48910"                                           
    1695829 "LUMINOSOS VERA SL 48213"                                                    
    1704855 "HIDROGENO DEL NORTE SL 48960"                                               
    1707355 "ADAICO TRUCK & TRAILER SL 48340"                                            
    1824467 "COOP AUTONOMA VALENCIANA DE TRANSPORTES SC V 46920"                         
    1830045 "AGENCIA SANITARIA ALTO GUADALQUIVIR 23740"                                  
    1830156 "AGENCIA PUBLICA EMPRESARIAL SANITARIA DEL BAJO GUADALQUIVIR 41710"          
    1830195 "AGENCIA PUBLICA EMPRESARIAL SANITARIA HOSPITAL DE PONIENTE DE ALMERIA 04700"
    1830205 "AGENCIA PUBLICA EMPRESARIAL SANITARIA COSTA DEL SOL 29600"                  
    1830463 "ABENGOA SA Y ABENGOA SERVICIOS URBANOS UTE 41018"                           
      92568 "ADVANCELL ADVANCED IN VITRO CELL TECHNOLOGIES SA 08006"                     
      92568 "ADVANCELL ADVANCED IN VITRO CELL TECHNOLOGIES SA 08006"                     
     124260 "ABENGOA SOLAR SA 41014"                                                     
    1138273 "IN VITRO MEDIA SL 08008"                                                    
    1410223 "GRAN ABUNDANCIA SL 28006"                                                   
    1410223 "GRAN ABUNDANCIA SL 28006"                                                   
    1662970 "LA JOYA DE LA ABUNDANCIA SL 29602"                                          
    1662970 "LA JOYA DE LA ABUNDANCIA SL 29602"  
    end
    
    tempfile sabi
    save `sabi'
    
    use `patent', clear 
    matchit han_id pat_cp_name using "`sabi'", idu(n_nif) txtu(sabi_cp_name) weights(simple) threshold(0.5) sim(token) override
    gsort han_id -similscore
    
    preserve
    gsort sabi_cp_name -similscore
    duplicates drop sabi_cp_name, force
    duplicates drop pat_cp_name, force
    keep if similscore>0.95
    gen v1 = _n
    sum v1
    tempfile matchit1
    save `matchit1', replace
    restore
    
    use `matchit1', clear
    merge 1:1 han_id pat_cp_name using "`patent'"
    keep if _merge==2
    drop _merge
    tempfile patent
    save `patent', replace
    
    use `matchit1', clear
    merge 1:m n_nif sabi_cp_name using "`sabi'"
    keep if _merge==2
    drop _merge
    tempfile sabi
    save `sabi', replace
    
    use `patent', clear
    matchit han_id pat_cp_name using "`sabi'", idu(n_nif) txtu(sabi_cp_name) weights(simple) threshold(0.5) sim(token) override
    gsort han_id -similscore
    duplicates report sabi_cp_name
    duplicates tag sabi_cp_name, gen(dup)
    tab dup
    Code:
    while dup>0 { 
    
    gsort sabi_cp_name -similscore
    duplicates drop sabi_cp_name, force
    gsort pat_cp_name -similscore
    duplicates drop pat_cp_name, force
    tempfile matchit2
    save `matchit2', replace
    use `matchit1', clear
    duplicates report han_id pat_cp_name, force
    append using `matchit2'
    tempfile matchit1
    save `matchit1', replace
    
    use `matchit1', clear
    merge 1:1 han_id pat_cp_name using "`patent'"
    keep if _merge==2
    drop _merge
    tempfile patent
    save `patent', replace
    
    use `matchit1', clear
    merge 1:m n_nif sabi_cp_name using "`sabi'"
    keep if _merge==2
    drop _merge
    tempfile sabi
    save `sabi', replace
    
    use `patent', clear
    matchit han_id pat_cp_name using "`sabi'", idu(n_nif) txtu(sabi_cp_name) weights(simple) threshold(0.1) sim(token) override
    gsort han_id -similscore
    duplicates report sabi_cp_name
    duplicates tag sabi_cp_name, gen(dup)
    tab dup
    }

  • #2
    If dup is a variable then

    Code:
    while dup > 0 
    is interpreted as the value of the variable in the first observation,

    Code:
    while dup[1] > 0
    which is not what you want. There is no sense whatsoever in which the while machinery scans the entire variable looking for a minimum, a maximum, or any other summary.

    In programming languages I have ever used while with a scalar, not a vector, matrix or array -- and, regardless, Stata works with a scalar. So, why doesn't Stata object if you refer to a variable? My guess is that this is historic.

    I haven't looked inside your code, but the general strategy here should be more like


    Code:
    local go_on = 1 
    
    while `go_on' { 
    
          <calculations> 
    
    
          if we're done change local go_on to 0 
    }

    Comment


    • #3
      Dear Nick Cox, thanks for your help with this because without your help it would be impossible for me to solve. I have done some changes trying to mimic your suggestions. I think it is now working, even though when the matchit does not have obs it stops and break the loop, but this is another thing.
      This is what I did, for if you think it follow your idea (very first line and last two lines of the loop).
      Do you think the last two lines do what you suggest? The idea is to stop the loop when dup==0.

      Code:
      ****************************** loop1 *****************************
      local a = 0
      local i = 1
      local b = 0
      gen iteration = .
      
      while `i' == 1  { 
      
      gsort sabi_cp_name -similscore
      duplicates drop sabi_cp_name, force
      gsort pat_cp_name -similscore
      duplicates drop pat_cp_name, force
      tempfile matchit2
      save `matchit2', replace
      use `matchit1', clear
      duplicates report han_id pat_cp_name, force
      append using `matchit2'
      local a = `a' + 1
      di "************************** `a' *****************************"
      replace iteration = `a'
      tempfile matchit1
      save `matchit1', replace
      
      use `matchit1', clear
      merge 1:1 han_id pat_cp_name using "`patent'"
      keep if _merge==2
      drop _merge
      tempfile patent
      save `patent', replace
      
      use `matchit1', clear
      merge 1:m n_nif sabi_cp_name using "`sabi'"
      keep if _merge==2
      drop _merge
      tempfile sabi
      save `sabi', replace
      
      use `patent', clear
      matchit han_id pat_cp_name using "`sabi'", idu(n_nif) txtu(sabi_cp_name) weights(simple) threshold(0.1) sim(token) override
      gsort han_id -similscore
      duplicates report sabi_cp_name
      duplicates tag sabi_cp_name, gen(dup)
      tab dup
      
      replace dup = 1 if dup>1 & !missing(dup)
      if (`i' <= `b') continue, break
      }

      Comment


      • #4
        You set

        Code:
        local i = 1 
        local b = 0
        and never change either, so it seems that you have set up an infinite loop.

        I can't comment on the detail of your code otherwise.

        Comment


        • #5
          Dear Nick Cox, thanks for your patience and help. Yes, my point is that the loop may stop becuase, (1) there is not dup>0 (which is what I want), or (2) becuase the matchit stop becuase there are no obs (becuase of the threshold).
          However, even though I see what you mean, if I instead use (local i = 0 and local b = 0), the loop do not continue. As you can see next, the variable dup is greater than 0, so it should continue, but it stop right there.
          I am certainly misunderstanding how this should work, but do not see why.
          Code:
          Duplicates in terms of sabi_cp_name
          
                  dup |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    0 |         17       62.96       62.96
                    1 |          6       22.22       85.19
                    3 |          4       14.81      100.00
          ------------+-----------------------------------
                Total |         27      100.00
          (4 real changes made)

          Comment


          • #6
            As already flagged in #2 you need to set up and apply a rule for leaving the loop. It seems that you don't understand my explanation, but I can't think of a way to explain differently.

            Comment


            • #7
              Dear Nick Cox, I really appreciate your patience and help, and I apologize for not properly understanding your explanation. Trying to find out a rule for changing local `i' to zero when there is no more duplicates (dup is equal zero) (as you suggest in #2), I have done the following rule. I do not want to bother you more, I only put it here for if it help someone with the same issue (assuming is correct).
              Code:
              local i = 1
              
              while `i'  { 
              
              /* calculations */
              
              tab dup
              scalar dup1 = cond(dup>0,1,0)
              local `i' = dup1
              di `i'
              di `=dup1'
              
              }

              Comment


              • #8
                Sorry, but that code can't help you one bit for exactly the reason I first flagged.

                Code:
                scalar dup1 = cond(dup > 0, 1, 0)
                is interpreted as

                Code:
                scalar dup1 = cond(dup[1] > 0, 1, 0)
                It is not the problem but dup[1] > 0 is enough for the same result.

                Either way, IIUC, your code is not general enough for what you want. It is the same misconception. Stata uses the first observation when asked to map a variable to a scalar.

                Believe me, I would give you a code solution if I knew one, but I haven't tried reading and understanding your code as a whole, yet it's evident that you are working on a kind of problem I have never worked on. But your answer is more likely to arise from a summarize of your dup variable.

                .

                Comment


                • #9
                  Dear Nick Cox, I understand. At the end a second best option is to copy/paste the same code from the loop several times right below and try.
                  Anyway, thank you for your time and help!

                  Comment

                  Working...
                  X