Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging gives identification error, while "isid address" gives no errors

    I am trying to do a 1:1 merge of two datasets based on variable address. There are some duplicates which I remove by:

    Code:
    sort address
    by address: gen dup = cond(_N==1,0,_n)
    drop if dup != 0
    I then check if address uniquely identifies observations in my dataset:
    Code:
    isid address
    throws no error. However, when I attempt to do the merge (merge 1:1 address using dataset2.dta), I receive the error "variable address does not uniquely identify observations in the master data."

    Can you please suggest what might be the problem? Thank you very much.

  • #2
    Jan:
    welcome to this forum.
    You can check whether -duplicates- is helpful in that instance.
    You may have multiple observations for the same -id- if your data follow a panel structure.
    Last edited by sladmin; 04 Feb 2018, 17:27. Reason: update username
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Jan:
      welcome to this forum.
      You can check whether -duplicates- is helpful in that instance.
      You may have multiple observations for the same -id- if your data follow a panel structure.
      Dear Carlo,

      thank you very much for your response and your welcome!

      I believe I have found the answer - I tried switching the master and using data sets (out of despair) and received a more informative error: "key variable address is strL in using data. The key variables -- the variables on which observations are matched can be str#, but they cannot be strLs." I was then able to solve the issue by using -recast- on address and merging 1:1 address again.

      What is strange is that it seems the first error I received was incorrect, even though I thought it should not matter which data set is set as master.
      Last edited by sladmin; 04 Feb 2018, 17:28. Reason: update username

      Comment


      • #4
        Jan:
        thanks for providing the way you fixed your problem.
        However, providing your codes, along with what Stata gave you back can help others benefitting from your solution. Thanks.
        Last edited by sladmin; 04 Feb 2018, 17:28. Reason: update username
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          I am able to reproduce the problem described above. I agree that the error message seems incorrect when the key in the master dataset is strL. I will refer Stata Technical Services to this topic.
          Code:
          cls
          clear
          set obs 2
          generate str8 k = "key"+string(_n)
          tempfile s4
          save `s4'
          clear
          set obs 2
          generate strL k = "key"+string(_n)
          tempfile sl
          save `sl'
          use `s4', clear
          capture noisily merge 1:1 k using `sl'
          use `sl', clear
          capture noisily merge 1:1 k using `s4'
          about
          Code:
          . use `s4', clear
          
          . capture noisily merge 1:1 k using `sl'
          key variable k is strL in using data.
              The key variables -- the variables on which observations are matched -- can be str#, but
              they cannot be strLs.
          
          . use `sl', clear
          
          . capture noisily merge 1:1 k using `s4'
          variable k does not uniquely identify observations in the master data
          
          . about
          
          Stata/SE 15.1 for Mac (64-bit Intel)
          Revision 11 Jan 2018

          Comment


          • #6
            Dear all,

            This is indeed an incorrect error message and it will be fixed in a future update.

            If your key variable is a strL, and if its maximum length is 2,045 bytes or less, you could use -recast- to change it to a str# variable before using -merge-.

            Comment


            • #7
              Originally posted by Jan Dudek View Post

              Dear Carlo,

              thank you very much for your response and your welcome!

              I believe I have found the answer - I tried switching the master and using data sets (out of despair) and received a more informative error: "key variable address is strL in using data. The key variables -- the variables on which observations are matched can be str#, but they cannot be strLs." I was then able to solve the issue by using -recast- on address and merging 1:1 address again.

              What is strange is that it seems the first error I received was incorrect, even though I thought it should not matter which data set is set as master.
              I've had the same problem. After finding that the error message I had received ("variable k does not uniquely identify observations in the master data") was incorrect (there were no duplicates), I decided to use "merge m:1" to try to see which values Stata considered as duplicates. This allowed me to get, as in your case, the correct error message ("key variable address is strL in using data. The key variables -- the variables on which observations are matched can be str#, but they cannot be strLs."). I solved the issue by using the "compress" command.

              Comment

              Working...
              X