Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1:1 data merge fails - r(459) - despite no duplicates on the merge by variable pairs in either data set

    Hello,
    I'm a STATA novice and am having trouble in the final stage of compiling a data set merging year*place_code crime data (3700 records) and year*place_code population data (4128 records):
    clear
    use "...placecountyyr_crimclr.dta" /*3700 recs*/
    merge 1:1 year place_code using "...placecountyyr_pop.dta"
    /*4128 recs*/
    I expected/hoped to get <=3700 year*place crime records matched to the corresponding year*place population records, discarding records from either source that could not be matched.
    But when I run the above code, I get this error message:
    (note: variable county_fips was int, now float to accommodate using data's values)
    (note: variable place_code was long, now double to accommodate using data's values)
    variables year place_code do not uniquely identify observations in the using data
    r(459)
    Yet when I run the following commands:
    use "...placecountyyr_crimclr.dta"
    duplicates list year place_code
    use "...placecountyyr_pop.dta"
    duplicates list year place_code
    ...I get the same message both times:
    Duplicates in terms of year place_code
    (0 observations are duplicates)
    That is, as I expected/designed, there is only 1 record for a given place and year, both in the crime and in the population data sets.
    And I know that there are many visual matches on various year*place_code combinations.

    What am I misunderstanding?
    Are the on-the-fly variable type changes (e.g., "variable county_fips was int, now float to accommodate using data's values") maybe somehow related?

    Thanks for any help!

    - thomas



    Last edited by Thomas Nephew; 16 Oct 2018, 00:32.

  • #2
    Ok, I'm seeing I didn't follow protocol in using dataex and so forth.
    Also, I was checking the wrong data set for duplicates; I have found them in the correct data set, and now need to figure out how they happened.

    Comment


    • #3
      Thomas:
      ...when you have eliminated the impossible, whatever remains, however improbable, must be the truth...
      (AC Doyle. The Sign of the four, 1890. https://en.wikiquote.org/wiki/Sherlock_Holmes).
      Last edited by Carlo Lazzaro; 16 Oct 2018, 02:04.
      Kind regards,
      Carlo
      (Stata 18.0 SE)

      Comment


      • #4
        Thomas, I'm glad you discovered the problem and posted an update. Thank you.

        There is a moral to this story for anyone who is reading along. When you want help troubleshooting code here, it is important to show the exact code you ran. Thomas' problem arose because he was running his -duplicates list- on a different dataset than the one he was merging. Notice, though, that you could never figure that out from what he posted, because he elided the pathnames. The parts of the code that he shows look like the same files were being used in -merge- and -duplicates list-. Probably Thomas thought that the path name was long and uninformative, an ignorable detail. But in code, there is no such thing as an unimportant detail.

        The best way to assure that you are posting the exact code that is causing problems is to copy/paste directly from the Results window or your log file into this Forum's editor. If you re-type you may introduce changes inadvertently (or be tempted to do so deliberately to save keystrokes.)

        Comment

        Working...
        X