Dear all,
I am quite new to Stata and I am trying to merge two datasets but I get unmatched observations classified as "Duplicated observation".
In my master, I have groups (entryid) and under each group there are several country (countryid) -year (lmyear_n) observations. In my using, I have information on such country-year observations and I want to add it to my master. All goes well, a part from the "Duplicated obsevation" (_merge==1), which are not really duplicates because they are unique within each group of the master. For such duplicated observation, Stata does not provide the info I want from the using and leaves just a missing value "."
The code I use is: merge m:1 lmyear_n countryid using "GCI_Hostcountries_4dim_v9.dta"
I tried then to create a using file with group-country-year obsevations, but it does not work, as I get the same "Duplicated observation" for the same observations, which are not real duplicates, as they are unique within groups. Here I used the code: merge m:1 entryid lmyear_n countryid using "GCI_Hostcountries_4dim_v9_expanded.dta"
So the question is: why do I get "Duplicated observation"? And how can I overcome this? I cannot deleted the "Duplicated observation" as they are not duplicated within groups (entryid).
Just an info: I also tried the joinby (but I did not manage to make it work), the m:m (same problem arises, plus it is widely not suggested), I sorted my data by group in the master.
Any suggestion is more than welcomed! Thank you a lot in advance!
Ilaria
Below an example of what I get. 1="Duplicated observation"
input float(entryid lmyear_n countryid) double(gci_est gci_pp gci_rpm gci_wda) byte _merge
1 2017 1 . . . . 1
1 2017 2 3.3571429249999998 3.444444418 3.868686914 3.696969748 3
1 2017 3 3.832371473 4.46684742 4.406903267 4.066738605 3
1 2017 4 3.1862185 3.325876951 2.903080225 3.086344481 3
1 2017 5 . . . . 1
1 2017 6 2.64601779 2.345132828 2.333333254 2.739130497 3
1 2017 7 . . . . 1
1 2017 8 . . . . 1
1 2017 9 3.755302429 3.348157644 4.520998001 4.363435745 3
1 2017 10 3.626057863 4.069918633 4.216534615 3.898905277 3
1 2017 11 4.150473595 4.806335449 4.579015732 4.464984417 3
1 2017 12 3.568708181 3.797599792 3.57373786 3.393399 3
1 2017 13 4.937436104 4.531143665 6.079502106 5.690037727 3
1 2017 14 . . . . 1
1 2017 15 4.711019993 4.712504864 4.790915966 4.673634052 3
1 2017 16 3.322090626 3.761115551 3.915687323 3.750954866 3
1 2017 17 4.198631784867267 3.5236213995485888 4.576725159859171 3.845379157936559 3
1 2017 18 5.121985912 4.520885944 5.811362267 5.550701618 3
2 2012 1 . . . . 1
2 2012 2 3.064055829896907 3.3552341221649487 3.2863143329896904 3.984685267525773 3
2 2012 3 4.361998715625 4.360532275 4.1845177875 3.9375299562499997 3
2 2012 4 2.5709801416666664 2.3843297249999997 2.278467695833333 1.8959146333333334 3
2 2012 5 . . . . 1
2 2012 6 3.6628700703125006 3.2716203265625 2.8106725968750004 2.2806640218750003 3
2 2012 7 . . . . 1
2 2012 8 . . . . 1
2 2012 9 3.846512918918919 2.7605287027027026 4.428788135135135 3.5951550270270274 3
2 2012 10 3.6380153644171775 4.394995083435583 3.925456224539877 3.2193912392638033 3
2 2012 11 4.078443947643979 4.60922019895288 3.783767640837696 3.757755537172775 3
2 2012 12 3.057318725 3.6237620999999995 3.5452239125 3.11159975 3
2 2012 13 4.6110981142857135 3.7754580857142854 5.9300087999999995 5.0241026857142845 3
2 2012 14 . . . . 1
2 2012 15 4.640976586206896 4.416021344827586 5.039093303448276 4.046210551724138 3
2 2012 16 3.0197708677419355 3.5288765322580646 3.5142787290322577 2.8206980451612904 3
2 2012 17 4.5774431009174315 3.690304073394495 5.183563392660551 4.18609726146789 3
2 2012 18 4.830055855629139 3.6365744953642385 5.426200631788079 4.958060577483444 3
2 2012 19 3.248048 3.691751 3.568578 3.124173 3
I am quite new to Stata and I am trying to merge two datasets but I get unmatched observations classified as "Duplicated observation".
In my master, I have groups (entryid) and under each group there are several country (countryid) -year (lmyear_n) observations. In my using, I have information on such country-year observations and I want to add it to my master. All goes well, a part from the "Duplicated obsevation" (_merge==1), which are not really duplicates because they are unique within each group of the master. For such duplicated observation, Stata does not provide the info I want from the using and leaves just a missing value "."
The code I use is: merge m:1 lmyear_n countryid using "GCI_Hostcountries_4dim_v9.dta"
I tried then to create a using file with group-country-year obsevations, but it does not work, as I get the same "Duplicated observation" for the same observations, which are not real duplicates, as they are unique within groups. Here I used the code: merge m:1 entryid lmyear_n countryid using "GCI_Hostcountries_4dim_v9_expanded.dta"
So the question is: why do I get "Duplicated observation"? And how can I overcome this? I cannot deleted the "Duplicated observation" as they are not duplicated within groups (entryid).
Just an info: I also tried the joinby (but I did not manage to make it work), the m:m (same problem arises, plus it is widely not suggested), I sorted my data by group in the master.
Any suggestion is more than welcomed! Thank you a lot in advance!
Ilaria
Below an example of what I get. 1="Duplicated observation"
input float(entryid lmyear_n countryid) double(gci_est gci_pp gci_rpm gci_wda) byte _merge
1 2017 1 . . . . 1
1 2017 2 3.3571429249999998 3.444444418 3.868686914 3.696969748 3
1 2017 3 3.832371473 4.46684742 4.406903267 4.066738605 3
1 2017 4 3.1862185 3.325876951 2.903080225 3.086344481 3
1 2017 5 . . . . 1
1 2017 6 2.64601779 2.345132828 2.333333254 2.739130497 3
1 2017 7 . . . . 1
1 2017 8 . . . . 1
1 2017 9 3.755302429 3.348157644 4.520998001 4.363435745 3
1 2017 10 3.626057863 4.069918633 4.216534615 3.898905277 3
1 2017 11 4.150473595 4.806335449 4.579015732 4.464984417 3
1 2017 12 3.568708181 3.797599792 3.57373786 3.393399 3
1 2017 13 4.937436104 4.531143665 6.079502106 5.690037727 3
1 2017 14 . . . . 1
1 2017 15 4.711019993 4.712504864 4.790915966 4.673634052 3
1 2017 16 3.322090626 3.761115551 3.915687323 3.750954866 3
1 2017 17 4.198631784867267 3.5236213995485888 4.576725159859171 3.845379157936559 3
1 2017 18 5.121985912 4.520885944 5.811362267 5.550701618 3
2 2012 1 . . . . 1
2 2012 2 3.064055829896907 3.3552341221649487 3.2863143329896904 3.984685267525773 3
2 2012 3 4.361998715625 4.360532275 4.1845177875 3.9375299562499997 3
2 2012 4 2.5709801416666664 2.3843297249999997 2.278467695833333 1.8959146333333334 3
2 2012 5 . . . . 1
2 2012 6 3.6628700703125006 3.2716203265625 2.8106725968750004 2.2806640218750003 3
2 2012 7 . . . . 1
2 2012 8 . . . . 1
2 2012 9 3.846512918918919 2.7605287027027026 4.428788135135135 3.5951550270270274 3
2 2012 10 3.6380153644171775 4.394995083435583 3.925456224539877 3.2193912392638033 3
2 2012 11 4.078443947643979 4.60922019895288 3.783767640837696 3.757755537172775 3
2 2012 12 3.057318725 3.6237620999999995 3.5452239125 3.11159975 3
2 2012 13 4.6110981142857135 3.7754580857142854 5.9300087999999995 5.0241026857142845 3
2 2012 14 . . . . 1
2 2012 15 4.640976586206896 4.416021344827586 5.039093303448276 4.046210551724138 3
2 2012 16 3.0197708677419355 3.5288765322580646 3.5142787290322577 2.8206980451612904 3
2 2012 17 4.5774431009174315 3.690304073394495 5.183563392660551 4.18609726146789 3
2 2012 18 4.830055855629139 3.6365744953642385 5.426200631788079 4.958060577483444 3
2 2012 19 3.248048 3.691751 3.568578 3.124173 3
Comment