Sorting data by transposing rows in columns and deleting duplicates

Jan Filips

Join Date: Apr 2017

Posts: 20
#16

08 May 2017, 11:06

Originally posted by Nick Cox View Post

As already pointed out in #2 you need to drop duplicates first.

Your syntax is the wrong way round as it's the values of values that are being reshaped and it's fyear and groups that tell you about their identifiers. This works.

Code:

duplicates drop reshape wide values, i(fyear) j(groups) string

Whether this is a good idea has already been wondered.

Thank you, it worked. I have a last question;

After reshaping, I tried to create a new variable that calculated the value for each year but it didn't work out as well as I thought:

Code:

gen groupvalue = ((SL1+SH1)-(BL1+BH1))/2

but it says ''SL1 not found'' , but there is definitely a column with SL1... and SH1, BL1 and BH1..
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35730
#17

08 May 2017, 11:06

Look at your variable names to see what you did wrong.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#18

08 May 2017, 11:10

Well, of course there are missing values after you go to wide layout. Some of your groups don't have data in every year, so the corresponding observations in the -reshaped- data will have to be missing. What else could they be?

Here's a table showing which groups have observations in which years:

Code:

. duplicates drop Duplicates in terms of all variables (77 observations deleted) . table groups fyear ---------------------------------- | fyear groups | 1966 1967 1968 1969 ----------+----------------------- BH1 | 1 1 BL1 | 1 1 BN1 | 1 1 MH1 | 1 1 1 1 ML1 | 1 1 1 MN1 | 1 1 1 1 SH1 | 1 1 1 1 SN1 | 1 1 ----------------------------------

I will also re-emphasize what Nick has said earlier: it is probably a bad idea to -reshape- wide anyway. Nearly all analyses in Stata are easier to do with data in long layout. There are some exceptions, but other than perhaps some kinds of graphs, they are not commonly encountered. So you are probably better off just dropping the duplicate observations and moving on from there without -reshape-ing.

Added: Crossed with Nick's response, which says essentially the same thing more compactly.
Comment

Announcement

Comment

Comment

Comment