Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Joinby + taking into account the difference period of 5 years

    Dear all,

    I want to merge 2 datasets based on non-unique identifiers (this is why I use joinby Licensee using ...). Each row represents a unique licensing deal. There are 3 variables in Dataset1, namely Licensor, Licensee and LicensingYear and there are 2 variables in Dataset2, namely Licensee and PublicationYear. I would only like to merge when publicationYear is up to 5 years after the LicensingYear for a given Licensee.

    Dataset1:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1(Licensor Licensee) int LicensingYear
    "A" "G" 1999
    "B" "I" 2002
    end

    Dataset2:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1 Licensee int PublicationYear
    "G" 2003
    "G" 1996
    "G" 2000
    "I" 2005
    "I" 1994
    end
    This is the result that I want:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1(Licensor Licensee) int(LicensingYear PublicationYear)
    "A" "G" 1999 2003
    "A" "G" 1999 2000
    "B" "I" 2002 2005
    end
    When I use joinby it does not take the 5-year period into account. And I have tried to combine joinby and rangestat, but that would give me an error.

    Can somebody give me feedback please?

    Thank you in advance.
    Last edited by Glenda C; 27 Apr 2017, 16:57.

  • #2
    Well, you're thinking about it the right way! There is another command, -rangejoin- that will do exactly this. It is authored by Robert Picard and is available from SSC. The syntax would be:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1 Licensee int PublicationYear
    "G" 2003
    "G" 1996
    "G" 2000
    "I" 2005
    "I" 1994
    end
    tempfile two
    save `two'
    
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1(Licensor Licensee) int LicensingYear
    "A" "G" 1999
    "B" "I" 2002
    end
    tempfile one
    save `one'
    
    use `one', clear
    gen licensing_year_plus_5 = LicensingYear + 5
    rangejoin PublicationYear LicensingYear licensing_year_plus_5 using `two', by(Licensee)
    drop licensing_year_plus_5
    list, noobs clean

    Comment


    • #3
      Dear Clyde,

      Thank you very much!

      Comment

      Working...
      X