Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • issue in data merging.

    i have two data sets with one common variable (applnid). When i try to merge these datasets, it says r459 error.

    There are two datasets File1 (variables:applnid,x,y,z) and file2 (variables:applnid,a,b,c) with common variable applnid. We have to take applnid from file one and merge it with the other variables available in file2, with respect to applnid.
    i used-
    merge 1:1 appln_id using file1
    variable appln_id does not uniquely identify observations in the master data
    r(459);
    all the variables of the file1 which has applnid does not have the same number of appln ids in the file2.

  • #2
    The identifier must uniquely identify observations in at least one of the datasets. If it is the case that this holds for exactly one dataset, then either

    Code:
    merge 1:m
    or

    Code:
    merge m:1
    will work. merge 1:1 only works if the identifier uniquely identifies observations in both datasets. If the identifier does not uniquely identify observations in both datasets, then

    1. You may have some duplicate observations, which falls under data errors.
    2. This is intended, in which case see

    Code:
    help joinby
    to see how you can combine both datasets to form all possible pairwise-combinations. You can run:


    Code:
    bysort  appln_id: gen tag= _N>1
    list if tag, sepby(appln_id)
    to see why the identifier appears more than once in a dataset. You may have panel data (repeated observations of the same individuals), in which case the unique identifier will not be one variable but a combination of variables, e.g., id and year. Then merge will require you to specify both variables

    Code:
    merge m:1 id year using ...
    Apart from the documentation of merge

    Code:
    help merge
    a useful command in this area is isid, see

    Code:
    help isid

    Comment

    Working...
    X