reshape long: variable id does not uniquely identify the observations - 2 identifiers

Maurice Nikolaus

Join Date: Mar 2022
Posts: 6

reshape long: variable id does not uniquely identify the observations - 2 identifiers

24 Sep 2022, 07:51

Hello everyone,

I am working with the Ease of Doing Business Report by he World Bank. My data is currently in the wide format and resembles this (values are random, this is just to replicate the structure of the data:

country	indicatorname	YR2004	YR2005	YR2006
Afghanistan	Starting a business	82	83	83
Afghanistan	Enforcing Contracts	16	18	19
Afghanistan	Trading across borders	20	21	19
Afghanistan	Ease of Doing Business	17	8	19
Albania	Starting a business	99	95	91
Albania	Enforcing Contracts	21	23	25
Albania	Trading across borders	27	28	28
Albania	Ease of Doing Business	24	25	24

I want to reshape it to the wide format so it looks something like this:

country	YR	Starting a business	Enforcing Contracts	Trading across borders	Ease of Doing Business
Afghanistan	YR2004	82	16	20	17
Afghanistan	YR2005	83	18	21	8
Afghanistan	YR2006	83	19	19	19
Albania	YR2004	99	21	27	24
Albania	YR2005	95	23	28	25
Albania	YR2006	91	25	28	24

With

Code:

reshape long YR indicatorname, i(country) j(j)

i receive the error, that "variable id does not uniquely identify the observations".
I am not really grasping the concept of the syntax for reshape, to be honest.
Does anyone have suggestions how to do this and maybe has some pointers on how reshape is structured?

Best,

Maurice

Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30101

24 Sep 2022, 11:11

You need

Code:

reshape long YR, i(country indicatorname) j(year)
rename YR value // OR SOME OTHER NAME THAT REFLECTS WHAT THOSE NUMBERS REALLY ARE

Comment

Maurice Nikolaus

Join Date: Mar 2022
Posts: 6

26 Sep 2022, 14:05

Hello Clyde,

thank you for your time. I tried it out, but I still get the following error (the same as before, essentially):

Code:

. reshape long YR, i(country indicatorname) j(year)
(j = 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020)
variable id does not uniquely identify the observations
    Your data are currently wide. You are performing a reshape long. You specified i(country indicatorname) and j(year). In the current
    wide form, variable country indicatorname should uniquely identify the observations. Remember this picture:

         long                                wide
        +---------------+                   +------------------+
        | i   j   a   b |                   | i   a1 a2  b1 b2 |
        |---------------| <--- reshape ---> |------------------|
        | 1   1   1   2 |                   | 1   1   3   2  4 |
        | 1   2   3   4 |                   | 2   5   7   6  8 |
        | 2   1   5   6 |                   +------------------+
        | 2   2   7   8 |
        +---------------+
    Type reshape error for a list of the problem observations.
r(9);

.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#4

26 Sep 2022, 18:20

First, let's note that in your example data, this problem does not arise. The code given in #2 produces the -reshape- you want with the example data. So there is something different about your actual data from the example given.

To troubleshoot this, the first step will be:

Code:

duplicates tag country indicatorname, gen(flag) browse if flag

This will show you the observations that Stata is having problems with. Study them carefully: these are situations where the same country and indicator appear twice in the data. Now examine them to determine whether these observations agree on all of the variables in the data set, or only on some.

If they agree on all of the variables in the data set, then there is something wrong with the data management that created your data set: you should not have exact duplicate observations. Either some variable that distinguishes these observations has been omitted from the data set (e.g. maybe there should be a variable distinguishing regions within the country), or something went wrong when other data sets were combined to produce this one so that some observations crept in more than once. Either way, you need to go back, fix this problem, and create a corrected dataset.

If these otherwise duplicate observations disagree on some variable, there are two possibilities. The variable(s) on which they disagree may, in fact, be just what you need to come up with unique identification of observations. For example, if one of the variables on which they disagree is a region within country, or if it distinguishes different sources of the responses, then adding that (those) variable(s) to the -i()- option may well solve your problem. If, however, the variables on which they disagree are not of this nature, for example, if they represent conflicting values within the YR variables, then you have a very serious problem: your data are self-contradictory. You then have to go back to the original sources of these data to determine which of the conflicting observations (if any) is correct, and remove the incorrect ones, or determine some other way to reconcile the inconsistencies.
Comment
Maurice Nikolaus

Join Date: Mar 2022

Posts: 6
#5

04 Oct 2022, 13:46

Hello Clyde,

thank you very much for the continued support!
I have followed your suggestion and checked the duplicates. It turned out that at the end of the dataset, there were some empty observation which I failed to notice and remove while preparing the data. It now worked as you described!
Comment

Announcement

reshape long: variable id does not uniquely identify the observations - 2 identifiers

Comment

Comment

Comment

Comment