Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using Matchit

    Hi,
    I'm trying to merge two datasets based on a string variable; but the string variables don't match exactly. I further need to do this by group. For instance, if this was my first data set:
    Name Age City
    John Smith 45 New York
    Jane Doe 24 New York
    John Smith 30 New Orleans
    And this was my second
    Name Sex City
    Smith, John Q M New York
    Jane Doe F New York
    Smith, John M New Orleans
    How do I merge these two based on the name and grouped by city?
    I tried to append the two datasets and then used matchit, but I don't know how to group by city. I'd be super grateful for any help!!

  • #2
    Welcome to Statalist.

    At the end of this post I've added your sample data presented with dataex for ease of use in Stata, as described in the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Please take a moment review it, especially sections 9-12 on how to best pose your question. The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

    At the moment I don't have time to take this further, but perhaps someone else can build on this to advise you.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str10 name byte age str11 city
    "John Smith" 45 "New York"   
    "Jane Doe"   24 "New York"   
    "John Smith" 30 "New Orleans"
    end
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str13 name str1 sex str11 city
    "Smith, John Q" "M" "New York"   
    "Jane Doe"      "F" "New York"   
    "Smith, John"   "M" "New Orleans"
    end

    Comment


    • #3
      Here's another command that may be helpful to those who have the time and inclination to look into this problem.

      Code:
      net describe matchit, from(http://fmwww.bc.edu/RePEc/bocode/m)
      --
      Bruce Weaver
      Email: [email protected]
      Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
      Version: Stata/MP 18.0 (Windows)

      Comment


      • #4
        I found the time I thought I'd lacked in post #1. Here's sample code that I hope will help you in your use of matchit.
        Code:
        // Set up sample data
        
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str10 name byte age str11 city
        "John Smith" 45 "New York"  
        "Jane Doe"   24 "New York"  
        "John Smith" 30 "New Orleans"
        end
        generate id1 = _n
        generate text1 = name + " " + city
        tempfile data1
        save `data1'
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str13 name str1 sex str11 city
        "Smith, John Q" "M" "New York"  
        "Jane Doe"      "F" "New York"  
        "Smith, John"   "M" "New Orleans"
        end
        generate id2 = _n
        generate text2 = name + " " + city
        tempfile data2
        save `data2'
        
        // match the two datasets prepring a crosswalk
        
        use `data1', clear
        matchit id1 text1 using `data2', idusing(id2) txtusing(text2)
        sort id1 similscore
        list, noobs sepby(id1)
        
        // reduce crosswalk to highest score for each observation from the first dataset
        
        by id1 (similscore): keep if _n==_N
        drop text1 text2 similscore
        
        // merge datasets to the crosswalk
        
        merge 1:1 id1 using `data1'
        drop text1 _merge
        rename (name city) (name1 city1)
        merge 1:1 id2 using `data2'
        drop text2 _merge
        rename (name city) (name2 city2)
        list, noobs clean
        Code:
        . list, noobs sepby(id1)
        
          +--------------------------------------------------------------------------+
          | id1                    text1   id2                     text2   similsc~e |
          |--------------------------------------------------------------------------|
          |   1      John Smith New York     3   Smith, John New Orleans   .60302269 |
          |   1      John Smith New York     1    Smith, John Q New York    .8229512 |
          |--------------------------------------------------------------------------|
          |   2        Jane Doe New York     2         Jane Doe New York           1 |
          |--------------------------------------------------------------------------|
          |   3   John Smith New Orleans     1    Smith, John Q New York   .57142857 |
          |   3   John Smith New Orleans     3   Smith, John New Orleans      .88396 |
          +--------------------------------------------------------------------------+
        Code:
        . list, noobs clean
        
            id1   id2        name1   age         city1           name2   sex         city2  
              1     1   John Smith    45      New York   Smith, John Q     M      New York  
              2     2     Jane Doe    24      New York        Jane Doe     F      New York  
              3     3   John Smith    30   New Orleans     Smith, John     M   New Orleans
        Last edited by William Lisowski; 09 Jan 2019, 07:52.

        Comment

        Working...
        X