Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging problem for two data-sets with a common string" id"

    Hello,
    I want to merge two datasets using a common "id" variable that is in both of them. However, the "id" variable is"string" and is a mixture of "Q" and some numbers such as Q2515789.

    I tried to use this "id" variable with the command
    Code:
    merge 1:1 id
    , but it did not merge them and only put the second database below the first one like what the command
    append
    does.


    Should I convert the "id" first to something else? How?
    Please let me know how the command
    merge
    works this kind of variables.

    Thanks for your help.

    Hossein

  • #2
    Did you use the command correctly? Usually it is no problem if the key variable is string ! I merged today with a string variable and it worked (Stata 15.0)
    What is your error message?
    Did you check if your specification 1:1/1:m/m:1 is the right one? Is the key variable named identically in both the master and using dataset?

    Comment


    • #3
      Since you don't share samples of your data, and merge apparently did run, I'd have to guess that there were no matches between the two datasets. Since you didn't share what merge reported when it ran, I'd have to guess that it reported that none of your observations matched, and you overlooked or misunderstood that..

      Usually when there are no matching observations it's obvious from looking at the two datasets. But one subtle, easy-to-overlook possibility is that in one (or both!) of your datasets, there are extra blank characters at the end of id, which are not there in the other dataset. This can be resolved by
      Code:
      replace id = trim(id)
      in whichever dataset has this problem.

      If none of this helps, or even if it does, please take the time to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

      The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

      Comment


      • #4
        Thanks for your comments. Unfortunately, the data that I am working with is confidential and I cannot share it since I do not have access to it out of the institute.

        I am glad that the merge command accepts string variables as the key variable to merge two datasets.

        Maybe as Rebecca mentioned, I am using a wrong "merge x:x" command.

        As an example, there is a firm with ID= Q2515789 and it has 3 different types of capital assets such as a,b, and c.
        Dataset 1 contains the expected life of each asset and the second one has data for the actual discard date. As you can see below, asset c in the second data set does not appear and asset "a" may be missing.

        What type of merge command do I need to use?

        Code:
        merge 1:1 ID Asset using …
        Or

        Code:
        merge m:m ID Asset using …

        I used the first command above and the value for the variable "_merge==3" was zero.

        Thank you so much for your help.
        Code:
         
        Dataset 1 Dataset 2
        ID Asset expected life ID Asset discard date
        Q2515789 a 10 Q2515789 a .
        Q2515789 b 8 Q2515789 b 9
        Q2515789 c 9

        Comment


        • #5
          Maybe ... I am using a wrong "merge x:x" command.
          Since your
          Code:
          merge 1:1 ID Asset using
          completed without an error message, we know that ID and Asset provide unique identifiers in each dataset - if one of them had duplicated values of ID and Asset, the merge command would have given an error message telling you so, and not proceeded to create a merged dataset. So, no, neither merge 1:m nor merge m:1 nor merge m:m will do what you need. If this isn't clear, do review the documentation on the merge command in the Stata Data Management Reference Manual PDF including in your Stata installation and accessible through Stata's Help menu. In particular the documentation will warn you that merge m:m is never the right thing to do, so I suspect you have yet to review the documentation.

          the value for the variable "_merge==3" was zero
          That confirms my assertion that none of the ID/Asset pairs in your master dataset matched any of the ID/Asset pairs in your using dataset.

          Post #3 gave you guidance on what the problem could be, and a solution. You do not report results of exploring that possibility. Let me show you an approach to test my assertion that one or both of your datasets has blanks in ID or Asset or both. I created a dataset with 100 IDs from Q101 to Q200.
          Code:
           codebook ID
          
          ------------------------------------------------------------
          ID                                               (unlabeled)
          ------------------------------------------------------------
          
                            type:  string (str4)
          
                   unique values:  100                      missing ""
          > :  0/100
          
                        examples:  "Q120"
                                   "Q140"
                                   "Q160"
                                   "Q180"
          Next I modified the data so that one of the values had a trailing space.
          Code:
          . codebook ID
          
          ------------------------------------------------------------
          ID                                               (unlabeled)
          ------------------------------------------------------------
          
                            type:  string (str5)
          
                   unique values:  100                      missing ""
          > :  0/100
          
                        examples:  "Q120 "
                                   "Q140"
                                   "Q160"
                                   "Q180"
          
                         warning:  variable has trailing blanks
          So in both your master dataset and your using dataset, run
          Code:
          codebook ID Asset
          and see what it reports about blanks that you do not expect to have in either variable.

          Comment

          Working...
          X