Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying matches between two string variables - that may not be in the same row

    Hi Stata Community

    I have a unique problem.

    I have two string variables containing names of the villages; fplmis (village1) and training (village2).
    For the variable village 1; there are 5500 observations and,
    for the variable village2, there are only 1500 observations.
    I want to create a new variable ‘match’ if the string in the variable village1 matches the string in the variable village2 (match==1 and nomatch ==0).
    Regarding this I have following doubts:
    The sequence of names in village1 and village2 are different owing to the difference in the number of observations. For example, a village named xxx may lie at 24th observation in variable village1, the same village may lie at 5th observation in village2. I want to match if a string in village1 matches with any observations in village2.
    Below is the example of my Data



    clear
    input str36 fplmis str35 training
    "Aagurli " "Chhijwar"
    "Abhilaheda " "Shivrajpur"
    "Agar " "Bhadawal"
    "Agdal " "Balbhadragarh"
    "Ajapura " "Deora 2"
    "Ajgar " "Anjora"
    "Ali " "Sanai"
    "Ambodiya " "Khatkhari"
    "Amroul " "Kalora"
    "Andawa " "Dharnauda"
    "Ankhwari " "Kherkhedi"
    "Arniyagoad " "Geruari"
    "Arusi " "Kumbhraj rural"
    "Attar " "aron"
    "Babri " "Sigora new"
    "Badagau " " barsat new"
    "badgoan " "Madagan mafi"
    "Badgore " "Tijarpur"
    "Badgyar " "Pachawali"
    "Badjher " "Patulkhi"

  • #2
    It might be easiest to save the second column as a different dataset, and then merge the two.

    Something like this?

    (I've changed your data extract since it didn't have any matches..)

    Code:
    // CHANGED DATA EXTRACT
    clear
    input str36 fplmis str35 training
    "Aagurli " "Patulkhi"
    "Abhilaheda " "Shivrajpur"
    "Agar " "Bhadawal"
    "Agdal " "Badjher"
    "Ajapura " "Deora 2"
    "Ajgar " "Anjora"
    "Ali " "Sanai"
    "Ambodiya " "Badgore"
    "Amroul " "Kalora"
    "Andawa " "Dharnauda"
    "Ankhwari " "Kherkhedi"
    "Arniyagoad " "Geruari"
    "Arusi " "Attar"
    "Attar " "aron"
    "Babri " "Sigora new"
    "Badagau " " barsat new"
    "badgoan " "Ajapura"
    "Badgore " "Tijarpur"
    "Badgyar "
    "Badjher "
    end
    
    // SOLUTION BEGINS HERE
    replace fplmis = lower(trim(fplmis))
    replace training = lower(trim(training))
    
    preserve    
        keep training
        drop if training == ""
        rename training fplmis
        tempfile vill2
        save `vill2'
    restore
    
    merge 1:1 fplmis using `vill2', keep(1 3)
    
    gen byte matched = (_merge == 3)
    drop _merge
    which produces:

    Code:
    . list, noobs sep(0)
    
      +-----------------------------------+
      |     fplmis     training   matched |
      |-----------------------------------|
      |    aagurli     patulkhi         0 |
      | abhilaheda   shivrajpur         0 |
      |       agar     bhadawal         0 |
      |      agdal      badjher         0 |
      |    ajapura      deora 2         1 |
      |      ajgar       anjora         0 |
      |        ali        sanai         0 |
      |   ambodiya      badgore         0 |
      |     amroul       kalora         0 |
      |     andawa    dharnauda         0 |
      |   ankhwari    kherkhedi         0 |
      | arniyagoad      geruari         0 |
      |      arusi        attar         0 |
      |      attar         aron         1 |
      |      babri   sigora new         0 |
      |    badagau   barsat new         0 |
      |    badgoan      ajapura         0 |
      |    badgore     tijarpur         1 |
      |    badgyar                      0 |
      |    badjher                      1 |
      +-----------------------------------+
    Last edited by Hemanshu Kumar; 09 Jul 2024, 01:09.

    Comment

    Working...
    X