Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reshape long: variable id does not uniquely identify the observations - 2 identifiers

    Hello everyone,

    I am working with the Ease of Doing Business Report by he World Bank. My data is currently in the wide format and resembles this (values are random, this is just to replicate the structure of the data:
    country indicatorname YR2004 YR2005 YR2006
    Afghanistan Starting a business 82 83 83
    Afghanistan Enforcing Contracts 16 18 19
    Afghanistan Trading across borders 20 21 19
    Afghanistan Ease of Doing Business 17 8 19
    Albania Starting a business 99 95 91
    Albania Enforcing Contracts 21 23 25
    Albania Trading across borders 27 28 28
    Albania Ease of Doing Business 24 25 24
    I want to reshape it to the wide format so it looks something like this:
    country YR Starting a business Enforcing Contracts Trading across borders Ease of Doing Business
    Afghanistan YR2004 82 16 20 17
    Afghanistan YR2005 83 18 21 8
    Afghanistan YR2006 83 19 19 19
    Albania YR2004 99 21 27 24
    Albania YR2005 95 23 28 25
    Albania YR2006 91 25 28 24
    With
    Code:
    reshape long YR indicatorname, i(country) j(j)
    i receive the error, that "variable id does not uniquely identify the observations".
    I am not really grasping the concept of the syntax for reshape, to be honest.
    Does anyone have suggestions how to do this and maybe has some pointers on how reshape is structured?

    Best,

    Maurice

  • #2
    You need
    Code:
    reshape long YR, i(country indicatorname) j(year)
    rename YR value // OR SOME OTHER NAME THAT REFLECTS WHAT THOSE NUMBERS REALLY ARE

    Comment


    • #3
      Hello Clyde,

      thank you for your time. I tried it out, but I still get the following error (the same as before, essentially):
      Code:
      . reshape long YR, i(country indicatorname) j(year)
      (j = 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020)
      variable id does not uniquely identify the observations
          Your data are currently wide. You are performing a reshape long. You specified i(country indicatorname) and j(year). In the current
          wide form, variable country indicatorname should uniquely identify the observations. Remember this picture:
      
               long                                wide
              +---------------+                   +------------------+
              | i   j   a   b |                   | i   a1 a2  b1 b2 |
              |---------------| <--- reshape ---> |------------------|
              | 1   1   1   2 |                   | 1   1   3   2  4 |
              | 1   2   3   4 |                   | 2   5   7   6  8 |
              | 2   1   5   6 |                   +------------------+
              | 2   2   7   8 |
              +---------------+
          Type reshape error for a list of the problem observations.
      r(9);
      
      .

      Comment


      • #4
        First, let's note that in your example data, this problem does not arise. The code given in #2 produces the -reshape- you want with the example data. So there is something different about your actual data from the example given.

        To troubleshoot this, the first step will be:
        Code:
        duplicates tag country indicatorname, gen(flag)
        browse if flag
        This will show you the observations that Stata is having problems with. Study them carefully: these are situations where the same country and indicator appear twice in the data. Now examine them to determine whether these observations agree on all of the variables in the data set, or only on some.

        If they agree on all of the variables in the data set, then there is something wrong with the data management that created your data set: you should not have exact duplicate observations. Either some variable that distinguishes these observations has been omitted from the data set (e.g. maybe there should be a variable distinguishing regions within the country), or something went wrong when other data sets were combined to produce this one so that some observations crept in more than once. Either way, you need to go back, fix this problem, and create a corrected dataset.

        If these otherwise duplicate observations disagree on some variable, there are two possibilities. The variable(s) on which they disagree may, in fact, be just what you need to come up with unique identification of observations. For example, if one of the variables on which they disagree is a region within country, or if it distinguishes different sources of the responses, then adding that (those) variable(s) to the -i()- option may well solve your problem. If, however, the variables on which they disagree are not of this nature, for example, if they represent conflicting values within the YR variables, then you have a very serious problem: your data are self-contradictory. You then have to go back to the original sources of these data to determine which of the conflicting observations (if any) is correct, and remove the incorrect ones, or determine some other way to reconcile the inconsistencies.

        Comment


        • #5
          Hello Clyde,

          thank you very much for the continued support!
          I have followed your suggestion and checked the duplicates. It turned out that at the end of the dataset, there were some empty observation which I failed to notice and remove while preparing the data. It now worked as you described!

          Comment

          Working...
          X