Reshape error: how do I select the first instance?

Tommie Thompson

Join Date: Nov 2016
Posts: 54

Reshape error: how do I select the first instance?

28 Dec 2017, 13:29

I ran the code

Code:

reshape wide happy, i(hh) j(city)

and received the error

Code:

.  reshape error

i (hh) indicates the top-level grouping such as subject id.
j (city) indicates the subgrouping such as time.
The data are in the long form;  j should be unique within i.

There are multiple observations on the same city within hh.

I want to reshape it, but to only keep the first instance of each row. For instance, if HH_1 had two rows where city=2, then I want to only include the first rows where city=2 and drop the second instance. Advice?

here is a dataex code. Don't try to infer if this is an appropriate thing to do based on the var names; I have changed them to preserve data anonymity. I'm not actually dealing with cities, HHs, and happiness.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(city happy) long hh
76       .  1
 9       2  2
17       .  2
45       .  2
 1      60  3
 2       .  3
 8       .  3
19       .  3
15       .  4
15       .  4
34       .  4
34       .  4
 9       5  5
15      15  5
23       .  5
 9      10  6
15      20  6
19      15  6
 9       7  7
19       7  7
 9       5  8
19       3  8
71      10  8
 1      60  9
 8      20  9
 6       . 10
34       0 10
 1      30 11
 8       . 11
19       5 11
 1      50 12
19       0 12
78     500 12
15      12 13
 4    2500 14
 9      10 14
34      70 14
23       . 15
36      50 15
 7       . 16
15      15 16
19       8 16
67       . 17
79       . 17
 9      12 18
15      48 18
15       . 19
19       9 19
 1      80 20
 8       . 20
13       . 20
19       . 20
23       . 20
 8       . 21
 9       . 21
23       . 21
67       . 22
79       . 22
 1      70 23
 8       . 23
19       8 23
15      22 24
19       . 24
67       . 25
 1      50 26
19       5 26
 1      15 27
 8       . 27
 9      10 27
19       5 27
15       . 28
19       8 28
 3      30 29
 9      40 29
15      30 29
67    2000 29
71     300 29
23       . 30
 7       4 31
 9       4 31
15      30 31
 1      60 32
23       . 32
 8       . 33
 9      20 33
33 2000000 33
78       2 33
23       . 34
 1       . 35
15      30 35
 9       6 36
19       6 36
34     800 36
 9       5 37
19       8 37
34     500 37
36       . 37
 7       2 38
15       . 38
19       6 38
end
label values hh new

So in the above sample, HH==4 has 2 city==15. I want to drop the second instance.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

28 Dec 2017, 13:41

Let me assume that, more generally, if there are multiple instances of a value of city within a HH, you want to keep only the first, first being defined in terms of the current sort order of the data. So:

Code:

gen long obs_no = _n // IDENTIFY CURRENT ORDER OF DATA by hh city (obs_no), sort: keep if _n == 1 drop obs_no

After that you will have only one observation for each hh city pair, and your -reshape- should run with no problem.
Comment
Tommie Thompson

Join Date: Nov 2016

Posts: 54
#3

28 Dec 2017, 13:50

Yup, that worked. Thanks!

Edit:

Actually, can you explain what -
by hh city (obs_no), sort: keep if _n == 1- does?

Last edited by Tommie Thompson; 28 Dec 2017, 13:52.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#4

28 Dec 2017, 14:56

So, the -sort- option in the -by- prefix causes the data to be sorted by hh and city, and within groups defined by hh and city, sorted by obs_no. So it organizes the data into hh X city clumps, sorted internally in the same order as the original data.

The -by- prefix then tells Stata to execute the command following the colon one hh X city clump at a time.

The keep if _n == 1 command tells Stata to keep only the first observation within the clump.
Comment

Announcement

Reshape error: how do I select the first instance?

Comment

Comment

Comment