Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two datasets

    Hi all,

    Im new to STATA so I apologise if this is a silly question. I am trying to combine two datasets; these datasets represent two waves of data (time 1 & time 3). I've used the following code:


    *import excel dataset
    import excel "/Users/bridget/Dropbox/PhD/Data/T1.xlsx", sheet("Sheet1") firstrow clear

    *save dataset as dta
    save T1.dta

    *import excel dataset
    import excel "/Users/bridget/Dropbox/PhD/Data/T3.xlsx", sheet("Sheet1") firstrow clear

    *save dataset as dta
    save T3.dta

    *merge datasets
    merge 1:1 ID using T1.dta, generate(_mergeT1)
    merge 1:1 ID using T3.dta, generate(_mergeT3)
    save MERGED.dta


    However, I get the following error message:

    "variable ID does not uniquely identify observations in the master data
    r(459)"


    Any guidance on this would be greatly appreciated!!

  • #2
    When this happens I typically try to browse the data if I know I should have a unique identifier.

    In your case before the merge, something like

    Code:
    bys ID: gen c = _N 
    br if c > 1
    to give you an idea of what's going on.

    It's good practice to make use of isid to check for unique identifiers.

    Comment


    • #3
      the message means exactly what it says: there are duplicate id's in the master data set; if you didn't expect this, then you need to do some work to find out how extensive the issue is and what causes it; if you did expect it then use "merge m:1" instead

      Comment


      • #4
        Are you expected one observation per ID in both datasets? Or, is is one observation each year? Then you need to merge on ID year (or whatever).

        It's hard to help when you're in the dark. Use dataex to show some of your data and you'll get better advice.

        Comment


        • #5
          Agree with all suggestions above. Or, since your raw data comes in excel files, maybe you could also sort your IDs/observations in excel to identify the problem.

          Comment


          • #6
            Thanks all. I appreciate the guidance.

            There was an error in the dataset - all sorted now.

            Comment

            Working...
            X