Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating a variable when duplicate is present

    Hi forum

    I have a string variable "emails" with approximately 4000 entries of people who received a specific campaign. I have now received a second list of emails, lets call it "compliers" this is the subset of people who actually engaged with this campaign, approximately 1100 emails. This list is a subset of my string variable "emails". I am trying to generate a variable when "emails" == "compliers". I tried using the sort command and then egen if these variables are duplicates, but the two variables aren't sorted when equal, so I am definitely missing a step

    Any ideas how to approach this?

    Thanks

    Last edited by Mike Tanner; 07 Mar 2023, 07:32.

  • #2
    There may be more direct ways. What I can think now is to merge that subset back to the 4000 entries using e-mail as the ID variable.

    Code:
    clear
    input str50 email str50 compiler
    "[email protected]" "[email protected]"
    "[email protected]" "[email protected]"
    "[email protected]" "[email protected]"
    "[email protected]" "[email protected]"
    end
    
    gen in_email = !missing(email)
    
    preserve
    keep compiler
    rename compiler email
    gen is_compiler = 1
    * check duplicates just in case
    duplicates report email
    save tempfile, replace
    restore
    
    merge 1:1 email using tempfile, nogen
    keep if in_email == 1
    
    list
    Results:

    Code:
         +-----------------------------------------------------+
         |         email        compiler   in_email   is_com~r |
         |-----------------------------------------------------|
      1. |   [email protected]   [email protected]          1          1 |
      2. |   [email protected]     [email protected]          1          . |
      3. | [email protected]     [email protected]          1          . |
      4. |   [email protected]       [email protected]          1          1 |
         +-----------------------------------------------------+
    Chance is you'll probably have some duplicates within each list, if that prevented the merge, look into help duplicates.

    Comment


    • #3
      Originally posted by Ken Chui View Post
      There may be more direct ways. What I can think now is to merge that subset back to the 4000 entries using e-mail as the ID variable.

      Code:
      clear
      input str50 email str50 compiler
      "[email protected]" "[email protected]"
      "[email protected]" "[email protected]"
      "[email protected]" "[email protected]"
      "[email protected]" "[email protected]"
      end
      
      gen in_email = !missing(email)
      
      preserve
      keep compiler
      rename compiler email
      gen is_compiler = 1
      * check duplicates just in case
      duplicates report email
      save tempfile, replace
      restore
      
      merge 1:1 email using tempfile, nogen
      keep if in_email == 1
      
      list
      Results:

      Code:
      +-----------------------------------------------------+
      | email compiler in_email is_com~r |
      |-----------------------------------------------------|
      1. | [email protected] [email protected] 1 1 |
      2. | [email protected] [email protected] 1 . |
      3. | [email protected] [email protected] 1 . |
      4. | [email protected] [email protected] 1 1 |
      +-----------------------------------------------------+
      Chance is you'll probably have some duplicates within each list, if that prevented the merge, look into help duplicates.


      Thank you very much, I guess merging works well, just something i= forgot to add, I do have other variables I want to keep, so, If I just then want to merge both data bases (one that has all the emails and variables, and one with the compliers and their variables, I guess I should use the string variable email as unique IDs and merge the data bases?

      Thanks, I did first clear duplicates and then proceeded to merge 1:1 using emails as my variable.

      Last edited by Mike Tanner; 07 Mar 2023, 09:56.

      Comment


      • #4
        Originally posted by Mike Tanner View Post
        Thank you very much, I guess merging works well, just something i= forgot to add, I do have other variables I want to keep, so, If I just then want to merge both data bases (one that has all the emails and variables, and one with the compliers and their variables, I guess I should use the string variable email as unique IDs and merge the data bases?)
        I believe that'd work. The merge command creates a _merge variable that would let you know if the row was merged on both data set, just the active one or the one that you added.

        Comment

        Working...
        X