Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with merging datasets - Variables do not uniquely identify

    Hi,

    I'm new to the forum, so I'm just learning the ropes around here. English is not my first language, so do overlook the grammatical mistakes.

    I have two datasets that I want to merge, MaternityDB.dta and AllowancesDB.dta. The data sets look like this (only excerpts):
    COUNTRY_ISO CODE YEAR ML_WKS
    AU 1 1960 0
    AU 1 1961 0
    AT 2 1960 0
    COUNTRY_ISO CODE YEAR TFR
    AU 1 1960 3.45
    AU 1 1961 3.55
    AT 2 1960 2.69
    I used this code:
    use MaternityDB.dta, clear
    merge m:1 COUNTRY_ISO CODE YEAR using AllowancesDB.dta


    I think is a bit hard to understand the difference between one-to-one, one-to-many and many-to-one merges, so I tried all of them. However, I got the same red message from Stata: variables COUNTRY_ISO CODE YEAR do not uniquely identify observations in the using data/master data (depending which of the merge alternatives I tried).

    I have tried to understand what the message means, but my searches haven't given results. Do any of you know what I do wrong? I would much appreciate the help, and I hope I have given the information needed.

    Kind regards,
    Frøydis

  • #2
    Please read the FQ. There you'll find advice about sharing data/output/command.

    That said, you may try something as:

    Code:
    joinby CODE using AllowancesDB.dta
    Best regards,

    Marcos

    Comment


    • #3
      Thank you for the advice, Marcos. I will make sure to read the FAQ thoroughly before starting another thread or posting another question.

      Also, thank you for the recommended code, although it did not work.


      Kind regards,

      Frøydis

      Comment


      • #4
        Hi Froeydis,
        this could be because you have duplicates in the variable that you are using to match both databases. For example: COUNTRY_ISO (AU, AU)

        Nerea

        Comment


        • #5
          The data excerpts you show don't have duplicates on the variables you're using for the merge so it's hard to guess what the actual structure of the data is or what you would like the final merge to look like.

          Is the combination of COUNTRY_ISO CODE and YEAR a unique identifier for either dataset? You can check this with the command isid. If not, you need to think through why they are not unique. That is, why do you have multiple observations per year. Are there any additional variables that you could use that do uniquely identify observations?

          You also need to think through exactly what you want Stata to do with the duplicates. If you can describe what you want we can probably help you figure out how to get there. Without that information we'll just be guessing at how your data are linked.

          Comment


          • #6
            If COUNTRY_ISO CODE YEAR should uniquely identify the data (in both datasets), than you will want to use a 1:1 merge. (That's also my guess just looking at your two short samples of data you posted in your original post.

            I would use the duplicates command to determine which observations are causing you problems:

            Code:
            dataex id country_iso code year ml_wks  // ssc install dataex
            clear
            input float id str2 country_iso byte code int year byte ml_wks
            1 "AT" 2 1960 0
            2 "AT" 2 1960 0
            3 "AT" 2 1960 0
            4 "AU" 1 1960 2
            5 "AU" 1 1960 0
            6 "AU" 1 1961 0
            7 "AU" 1 1961 5
            8 "AU" 1 1962 5
            9 "AU" 1 1963 5
            end
            ------------------ copy up to and including the previous line ------------------

            Code:
            isid country_iso code year  // try this in both datasets
            help duplicates
            duplicates tag country_iso code year, gen(dup1)
            
            . list, sepby(country_iso) noobs
            
              +---------------------------------------------+
              | id   countr~o   code   year   ml_wks   dup1 |
              |---------------------------------------------|
              |  1         AT      2   1960        0      2 |
              |  2         AT      2   1960        0      2 |
              |  3         AT      2   1960        0      2 |
              |---------------------------------------------|
              |  4         AU      1   1960        2      1 |
              |  5         AU      1   1960        0      1 |
              |  6         AU      1   1961        0      1 |
              |  7         AU      1   1961        5      1 |
              |  8         AU      1   1962        5      0 |
              |  9         AU      1   1963        5      0 |
              +---------------------------------------------+
            
            . list, sepby(country_iso year) noobs
            
              +---------------------------------------------+
              | id   countr~o   code   year   ml_wks   dup1 |
              |---------------------------------------------|
              |  1         AT      2   1960        0      2 |
              |  2         AT      2   1960        0      2 |
              |  3         AT      2   1960        0      2 |
              |---------------------------------------------|
              |  4         AU      1   1960        2      1 |
              |  5         AU      1   1960        0      1 |
              |---------------------------------------------|
              |  6         AU      1   1961        0      1 |
              |  7         AU      1   1961        5      1 |
              |---------------------------------------------|
              |  8         AU      1   1962        5      0 |
              |---------------------------------------------|
              |  9         AU      1   1963        5      0 |
              +---------------------------------------------+
            
            . tabulate dup
            
                   dup1 |      Freq.     Percent        Cum.
            ------------+-----------------------------------
                      0 |          2       22.22       22.22
                      1 |          4       44.44       66.67
                      2 |          3       33.33      100.00
            ------------+-----------------------------------
                  Total |          9      100.00
            
            *** Then you could do a "brow [varlist] if dup1 >=1"
            Last edited by David Benson; 19 Mar 2019, 23:14.

            Comment

            Working...
            X