Ordering duplicates

William Rossi

Join Date: May 2022
Posts: 24

Ordering duplicates

06 Nov 2023, 08:06

Hi, this should be easy but I cannot find a solution. I have a dataset like in the code example and I would like to reshape it to obtain a dataset that has a single row for each id. Something like the following:

id | sector_1_code | sector_1_firm_1_code | sector_1_firm_2_code | sector_2_code | sector_2_firm_1_code | sector_2_firm_2_code

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(id sector_code firm_code)
 1 17  1
 1 17 11
 1 17 21
 2 34  5
 2 34 15
 2 34 25
 3 24  2
 3 24  3
 3 24 12
 3 24 13
 3 24 22
 3 24 23
 4 29  1
 4 29 11
 4 29 16
 4 29 21
 4 29 26
 4 29 30
 5 42  3
 5 42  5
 5 42 13
 5 42 15
 5 42 23
 6  2  1
 6  1  1
 6  1  2
 6  1 11
 6  1 12
 6  1 21
 6  1 22
 7 26 11
 7 26 12
 7 26 26
 8 11  2
 8 11 12
 8 11 22
 9 27 14
 9 27 25
10 17 25
11 31  6
11 31 16
11 31 26
12 34  1
12 34 11
13  4  4
13  4  5
13  4 14
13  4 15
13  4 24
13  4 25
13  5  3
13  5  4
13  5 14
13  5 15
13  5 23
13  5 24
14 33  2
14 33 12
14 33 22
15 29  8
15 29 18
16 19  2
16 19 15
16 19 25
17 26  3
17 26  4
17 26  5
17 26 13
17 26 14
17 26 15
17 26 23
17 26 24
17 26 25
17 26 27
18 35  1
18 35  2
18 35 11
18 35 12
18 35 21
18 35 22
18 36  1
18 36 11
18 36 21
19 32  2
19 32 12
19 32 17
20  3  2
20  3 21
21 30  2
21 30 16
21 30 26
22 36  1
22 36 11
22 36 12
22 36 21
23 34  2
23 34  5
23 34 12
23 34 15
23 34 22
end

Hope the question is clear enough. I am looking for something like a double reshape but I am really stuck. Any advice is greately appreciated!

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#2

06 Nov 2023, 09:34

The difficulty lies not so much with the double reshaping as with the elaborate and finnicky renaming of variables needed to achieve the desired result:

Code:

sort id sector_code, stable by id sector_code: gen _j = _n rename firm_code firm__code reshape wide firm_@_code, i(id sector_code) j(_j) rename (firm_*_code) sectorfirm_*_code sort id, stable by id: gen _j = _n reshape wide sectorfirm_*_code sector_code, i(id) j(_j) rename (sectorfirm_#_code# ) sector_#[2]_firm_#[1]_code rename (sector_code#) sector_#_code

I don't know what you plan to do with this data once you have re-organized it this way. I'll just caution you that most data management and analysis in Stata is much easier, or even only possible, in the long layout that you are starting with. So unless you specifically know that you will be doing one of the relatively few things in Stata that works better with this reorganization of your data, you will regret having done this. Think about it before you do it. Undoing it will be just as complicated as doing it.
1 like
Comment

Announcement

Ordering duplicates

Comment