Reshape Command showing "variable id does not uniquely identify the observations" error

Arvind Sharma

Join Date: Jan 2016

Posts: 89
#1

Reshape Command showing "variable id does not uniquely identify the observations" error

12 Jan 2016, 19:41

I am using Stata 14.1 on Windows 10, and am trying to convert a data in wide format into long format.

After importing the data from Excel file to Stata, I have successfully converted the data from strings to real numbers so that I can run operations on them. Then, I have been able to confirm that the data is indeed in wide format by using the command "reshape wide" which displays 'already wide'.

However, when I try to use the command "reshape long V, i(DES) j(time)" where V is the stem for years variable (named V8, V9,...V53), DES is variable with approximately 9 categories of income sources, and time is just a name for years (no such variable defined), I get the following error-

(note: j = 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53)
variable id does not uniquely identify the observations
Your data are currently wide. You are performing a reshape long. You specified i(DES) and j(time). In the current wide form, variable DES should uniquely
identify the observations. Remember this picture:

long wide
+---------------+ +------------------+
| i j a b | | i a1 a2 b1 b2 |
|---------------| <--- reshape ---> |------------------|
| 1 1 1 2 | | 1 1 3 2 4 |
| 1 2 3 4 | | 2 5 7 6 8 |
| 2 1 5 6 | +------------------+
| 2 2 7 8 |
+---------------+
Type reshape error for a list of the problem observations.

Typing the "reshape error" command shows the following output-

reshape error
(note: j = 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53)

i (DES) indicates the top-level grouping such as subject id.

The data are currently in the wide form; there should be a single
observation per i.

1674 of 1674 observations have duplicate i values:

+--------------------------------------------------------------------+
| DES |
|--------------------------------------------------------------------|
1. | Employer contributions for employee pension and insurance funds 6/ |
2. | Employer contributions for employee pension and insurance funds 6/ | ....

53. | Employer contributions for employee pension and insurance funds 6/ |
54. | Employer contributions for employee pension and insurance funds 6/ |
55. | Employer contributions for government social insurance |
|--------------------------------------------------------------------|
56. | Employer contributions for government social insurance |
57. | Employer contributions for government social insurance |
58. | Employer contributions for government social insurance | ...

I suspect there is an error because the variable DES should have dummy categories, but I am not sure how to proceed further.

Any comments and suggestions would be welcome. Thank you.

Attached Files

CA30_1969_2014_AK.csv (621.5 KB, 2 views)

Reshape Command on Economic Profile Data.do (3.5 KB, 2 views)

Reshape Error.smcl (205.4 KB, 2 views)
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#2

12 Jan 2016, 20:36

Stata's error message gives you the reason: DES does not uniquely identify observations, and so by itself cannot be used as the id for reshape. Try something like the following.

.ÿversionÿ14.1

.ÿ
.ÿclearÿ*

.ÿsetÿmoreÿoff

.ÿ
.ÿimportÿdelimitedÿCA30_1969_2014_AK.csv
(53ÿvars,ÿ1678ÿobs)

.ÿ
.ÿquietlyÿdropÿinÿ1675/1678

.ÿdropÿgeofipsÿregionÿtableÿlinecodeÿindustryclassification

.ÿ
.ÿforeachÿvarÿofÿvarlistÿv*ÿ{
ÿÿ2.ÿÿÿÿÿÿÿÿÿlocalÿyearÿ:ÿvariableÿlabelÿ`var'
ÿÿ3.ÿÿÿÿÿÿÿÿÿrenameÿ`var'ÿv`year'
ÿÿ4.ÿ}

.ÿ
.ÿisidÿgeonameÿdescription

.ÿencodeÿdescription,ÿgenerateÿ(des)

.ÿencodeÿgeoname,ÿgenerateÿ(geo)

.ÿdropÿdescriptionÿgeoname

.ÿ
.ÿquietlyÿreshapeÿlongÿv,ÿi(geoÿdes)ÿj(year)

.ÿ
.ÿquietlyÿreplaceÿvÿ=ÿ".l"ÿifÿvÿ==ÿ"(L)"

.ÿquietlyÿreplaceÿvÿ=ÿ".n"ÿifÿvÿ==ÿ"(NA)"

.ÿdestringÿv,ÿreplace
vÿhasÿallÿcharactersÿnumeric;ÿreplacedÿasÿlong
(37524ÿmissingÿvaluesÿgenerated)

.ÿ
.ÿlistÿinÿ1/2,ÿnoobs

ÿÿ+---------------------------------------------------------------------------------------------------------+
ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgeoÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿdesÿÿÿyearÿÿÿÿÿÿÿÿvÿ|
ÿÿ|---------------------------------------------------------------------------------------------------------|
ÿÿ|ÿAlaskaÿstateÿtotalÿÿÿEmployerÿcontributionsÿforÿemployeeÿpensionÿandÿinsuranceÿfundsÿ6/ÿÿÿ1969ÿÿÿ143940ÿ|
ÿÿ|ÿAlaskaÿstateÿtotalÿÿÿEmployerÿcontributionsÿforÿemployeeÿpensionÿandÿinsuranceÿfundsÿ6/ÿÿÿ1970ÿÿÿ168957ÿ|
ÿÿ+---------------------------------------------------------------------------------------------------------+

.ÿ
.ÿexit

endÿofÿdo-file

.
3 likes
Comment
Arvind Sharma

Join Date: Jan 2016

Posts: 89
#3

16 Jan 2016, 14:25

Thank you Joseph, that did solve the problem.
Comment
Bettna Leibetseder

Join Date: Jun 2018

Posts: 4
#4

18 Jun 2018, 02:40

Dear all,
I have a similar problem, as I cannot solve the following merger: (I double checked, but the id is unique).
reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_, i(id) j(vignr)
(note: j = 1 2 3 4 5 6 7 8 9)
variable id does not uniquely identify the observations
Your data are currently wide. You are performing a reshape long. You specified i(id) and j(vignr). In the current wide form, variable id should uniquely
identify the observations. Remember this picture:

long wide
+---------------+ +------------------+
| i j a b | | i a1 a2 b1 b2 |
|---------------| <--- reshape ---> |------------------|
| 1 1 1 2 | | 1 1 3 2 4 |
| 1 2 3 4 | | 2 5 7 6 8 |
| 2 1 5 6 | +------------------+
| 2 2 7 8 |
+---------------+
Type reshape error for a list of the problem observations.

My do file is the following:

/Merger Vignettes

// #0
// program setup

clear all
macro drop _all
version 12.1
set more off

// #1
// load dataset with respondent data

cd "XX"
use "spss_probe.dta" , clear

// #2
// sort and save data

sort deck
save myfile,replace

// #4
// load dataset with the vignette setup

use "vigid.dta"
drop id_vig vignette
order id_vignette, last

// #5
// reshape data to wide format and sort data

reshape wide alter - id_vignette, i(deck) j(vignr)
sort deck

// #6
// merge both datasets, generate a numeric id (is good for some analyses) and reshape data to long format

merge deck using myfile
drop _merge
gen id_numeric = _n
reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_, i(id) j(vignr)
order id id_numeric, first
I have added the two datasets.

I would appreciate any assistance - thank you!
Bettina
vigid.dta
spss_probe.dta

Attached Files

spss_probe.dta (138.7 KB, 1 view)
Comment

Jorrit Gosens

Join Date: Jan 2015
Posts: 1019

18 Jun 2018, 03:06

There's no duplicates in the spss_probe dataset, but when you merge, you end up with a number of missing values for id. These are also considered duplicates.

Code:

. duplicates report id

Duplicates in terms of id

--------------------------------------
   copies | observations       surplus
----------+---------------------------
        1 |            7             0
       11 |           11            10
--------------------------------------

. order id, first

. list id-alter1

     +--------------------+
     | id   deck   alter1 |
     |--------------------|
  1. |  .      1       20 |
  2. |  .      2       20 |
  3. |  .      3       20 |
  4. |  .      4       20 |
  5. | 10      5       20 |
     |--------------------|
  6. | 24      6       20 |
  7. |  .      7       20 |
  8. |  .      8       20 |
  9. |  .      9       20 |
 10. | 18     10       20 |
     |--------------------|
 11. | 19     11       20 |
 12. |  .     12       20 |
 13. |  .     13       20 |
 14. | 11     14       20 |
 15. |  .     15       20 |
     |--------------------|
 16. |  .     16       20 |
 17. | 29     11       20 |
 18. |  4     11       20 |
     +---------------

You may want to drop all observations with missing id's.
Alternatively, you may want to load the spss_probe.dta into memory, merge with vigid, and drop the observations that have _merge==2, i.e., those that are only available in the using dataset (this would give the result you seem to expect)

Comment

Bettna Leibetseder

Join Date: Jun 2018

Posts: 4
#6

18 Jun 2018, 04:35

Thank you Jorrit for your quick reply.
As I am quite new to stata, I do understand, why I need to drop those obersvations that are not connected to an ID. However, I do not understand how to drop the oberservations with _merge==2.
Could you please specify the command?

Bettina
Comment
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#7

18 Jun 2018, 05:41

As to why you need to drop the observations with ID missing, this is because you have more than one ID missing, and Stata has no way of reshaping long such that the new list of observations is uniquely identified by a combination of earlier ID variable and the list of variables that were previously wide. This is probably not too helpful but difficult to put in words. The error message that Stata gives when a reshape cannot be done is probably the best way to explain it....

As for a fix for your problem, answering this question properly also requires some understanding of the data transformation you are trying to do, and that is not very clear to me just yet. It helps that you attach your data, but still guessing a little bit what you want to end up with.
One possibility, however, is that you wanted/should do is use your varibale id_numeric in your reshape, i.e.:

Code:

reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_, i(id_numeric) j(vignr)

As for dropping observations, that would be:

Code:

drop if _merge==?

where ? depends on what dataset is your master and which is your using dataset. Make sure to understand the values of the _merge variable by looking into help merge
1 like
Comment
Bettna Leibetseder

Join Date: Jun 2018

Posts: 4
#8

18 Jun 2018, 08:23

Thank you for patience!
There are 16 possible decks of vignettes with 9 vignettes each in the first file. I only put in some data for the respondents, who each just answer one deck randomely , to check the merging procedure (7 respondents).

I rewrote the do file as follows and it worked:
/ #6
// merge both datasets, generate a numeric id (is good for some analyses) and reshape data to long format

merge deck using myfile
drop if _merge==1
gen id_numeric = _n
reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_, i(id) j(vignr)
order id id_numeric, first
Comment
Bettna Leibetseder

Join Date: Jun 2018

Posts: 4
#9

28 Aug 2018, 04:32

Dear Jorrit,
sorry to bother you gain - I now got my final data and the merger does not work. And I simply cannot figure out any difference in the trial and final data sets.
My syntax is as follows:
// #0
// program setup

clear all
macro drop _all
version 12.1
set more off

// #1
// load dataset with respondent data

cd "?"
use "data_final1.dta" , clear

// #2
// sort and save data

sort deck
save myfile,replace

// #4
// load dataset with the vignette setup

use "vigid.dta"
order id_vignette, last
drop id_vig vignette

// #5
// reshape data to wide format and sort data

reshape wide alter - id_vignette, i(deck) j(vignr)
sort deck

// #6
// merge both datasets, generate a numeric id (is good for some analyses) and reshape data to long format
//first trial
merge deck using myfile
drop _merge
gen id_numeric = _n
reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_ , i(id) j(vignr)
order id id_numeric, first
//second trial
merge deck using myfile
drop if _merge==1
gen id_numeric = _n
reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_, i(id) j(vignr)
order id id_numeric, first
///third trial
merge deck using myfile
drop if _merge==2
gen id_numeric = _n
reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_, i(id) j(vignr)
order id id_numeric, first
I appreciate your help a lot!

Attached Files

data_final1.dta (20.6 KB, 1 view)

vigid.dta (7.8 KB, 2 views)
Comment
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#10

28 Aug 2018, 05:07

You probably wanted:

Code:

reshape long alter ausbildung interesse familie deutsch lernen selbst taetig versicherung dauer anforderung id_vignette vig_ , i(id_numeric) j(vignr)

The issue being that there are duplicates in terms of your id variable
Comment
Jim Callegari

Join Date: Jan 2021

Posts: 35
#11

10 Jan 2021, 13:58

Dear All,

It looks like I'm having a similar problem as well. I have tried to:

Code:

reshape long a, i(combined) j(prod, string)

But I am met with the error:

Code:

variable id does not uniquely identify the observations Your data are currently wide. You are performing a reshape long. You specified i(combined) and j(prod). In the current wide form, variable combined should uniquely identify the observations. Remember this picture: long wide +---------------+ +------------------+ | i j a b | | i a1 a2 b1 b2 | |---------------| <--- reshape ---> |------------------| | 1 1 1 2 | | 1 1 3 2 4 | | 1 2 3 4 | | 2 5 7 6 8 | | 2 1 5 6 | +------------------+ | 2 2 7 8 | +---------------+ Type reshape error for a list of the problem observations. r(9);

My goal of this code is to create a "prod" variable with all of the variable names (such as "White bread, 1 kg (supermarket)") inputted as individual entries underneath the "prod" variable. The reason I have 2 rows which (one of which has variable names aa, ab, ac, etc. and the other with the actual product names themselves, "White bread, 1 kg (supermarket)," is due to STATA's 32 character limit. It was causing me issues, so I tried to work around it by reshaping with an additional row.

I would also like to make another variable "store_type" in which it either says "supermarket" or "mid-priced store" and no longer having this information captured in the individual data entires themselves. So, to clarify, in the end I would have variables: year, country, city, currency, prod, and store_type. Under "prod" it would display "White bread, 1 kg," for example, and under "store_type" it would show either "supermarket" or "mid-priced store."

I'm sorry if this is a simple question--I'm just becoming more familiar with STATA and appreciate any and all suggestions you all may have.

Many thanks,
Jim
5EIU Data.xlsx
Submit EIU clean.do
EIU submit attempt.dta
Attached Files

5EIU Data.xlsx (73.1 KB, 1 view)

Submit EIU clean.do (279 Bytes, 1 view)

EIU submit attempt.dta (1.61 MB, 1 view)

Last edited by Jim Callegari; 10 Jan 2021, 14:01.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#12

10 Jan 2021, 15:51

Post #11 was subsequently reposted into its own topic and responded to at

https://www.statalist.org/forums/for...e-observations

Last edited by William Lisowski; 10 Jan 2021, 15:56.
Comment
Yevu Yayra

Join Date: Oct 2020

Posts: 3
#13

05 Jun 2021, 22:47

Please I need a help on this. Iam trying to reshape the data

reshape long yr,i(c_id s_id) j(year)
(note: j = 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019)
variable id does not uniquely identify the observations
Your data are currently wide. You are performing a reshape long. You specified i(c_id s_id) and
j(year). In the current wide form, variable c_id s_id should uniquely identify the observations.
Remember this picture:

long wide
+---------------+ +------------------+
| i j a b | | i a1 a2 b1 b2 |
|---------------| <--- reshape ---> |------------------|
| 1 1 1 2 | | 1 1 3 2 4 |
| 1 2 3 4 | | 2 5 7 6 8 |
| 2 1 5 6 | +------------------+
| 2 2 7 8 |
+---------------+
Type reshape error for a list of the problem observations.
r(9);

end of do-file

r(9);
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35696
#14

06 Jun 2021, 16:42

The advice given to type

Code:

reshape error

is good. Equivalently

Code:

duplicates list c_id s_id
Comment
Romildo Andrade

Join Date: Jan 2020

Posts: 3
#15

20 Apr 2023, 07:04

We are trying to generate a map (Espirito Santo- Brazil) with mean anual Tx of incidence of COVID. We did
shp2dta using C:\Users\rlman\Desktop\spmap\ES_Municipios_2022.sh p, database(esmun) coor(es78coor) genid(id) genc(c)
merge m:1 id_mun using esmun.dta
spmap Txmeano using es78coor, id(id_mun) fcolor(Blues)
Variable id_mun specified in option id() does not uniquely identify the observations

We did not get any solution looking about the r459 error.

Could anybody kindly help?

Thank you!
Comment

Announcement

Reshape Command showing "variable id does not uniquely identify the observations" error

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment