Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ordering duplicates

    Hi, this should be easy but I cannot find a solution. I have a dataset like in the code example and I would like to reshape it to obtain a dataset that has a single row for each id. Something like the following:

    id | sector_1_code | sector_1_firm_1_code | sector_1_firm_2_code | sector_2_code | sector_2_firm_1_code | sector_2_firm_2_code


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id sector_code firm_code)
     1 17  1
     1 17 11
     1 17 21
     2 34  5
     2 34 15
     2 34 25
     3 24  2
     3 24  3
     3 24 12
     3 24 13
     3 24 22
     3 24 23
     4 29  1
     4 29 11
     4 29 16
     4 29 21
     4 29 26
     4 29 30
     5 42  3
     5 42  5
     5 42 13
     5 42 15
     5 42 23
     6  2  1
     6  1  1
     6  1  2
     6  1 11
     6  1 12
     6  1 21
     6  1 22
     7 26 11
     7 26 12
     7 26 26
     8 11  2
     8 11 12
     8 11 22
     9 27 14
     9 27 25
    10 17 25
    11 31  6
    11 31 16
    11 31 26
    12 34  1
    12 34 11
    13  4  4
    13  4  5
    13  4 14
    13  4 15
    13  4 24
    13  4 25
    13  5  3
    13  5  4
    13  5 14
    13  5 15
    13  5 23
    13  5 24
    14 33  2
    14 33 12
    14 33 22
    15 29  8
    15 29 18
    16 19  2
    16 19 15
    16 19 25
    17 26  3
    17 26  4
    17 26  5
    17 26 13
    17 26 14
    17 26 15
    17 26 23
    17 26 24
    17 26 25
    17 26 27
    18 35  1
    18 35  2
    18 35 11
    18 35 12
    18 35 21
    18 35 22
    18 36  1
    18 36 11
    18 36 21
    19 32  2
    19 32 12
    19 32 17
    20  3  2
    20  3 21
    21 30  2
    21 30 16
    21 30 26
    22 36  1
    22 36 11
    22 36 12
    22 36 21
    23 34  2
    23 34  5
    23 34 12
    23 34 15
    23 34 22
    end

    Hope the question is clear enough. I am looking for something like a double reshape but I am really stuck. Any advice is greately appreciated!

  • #2
    The difficulty lies not so much with the double reshaping as with the elaborate and finnicky renaming of variables needed to achieve the desired result:
    Code:
    sort id sector_code, stable
    by id sector_code: gen _j = _n
    rename firm_code firm__code
    reshape wide firm_@_code, i(id sector_code) j(_j)
    rename (firm_*_code) sectorfirm_*_code
    
    sort id, stable
    by id: gen _j = _n
    reshape wide sectorfirm_*_code sector_code, i(id) j(_j)
    rename (sectorfirm_#_code# ) sector_#[2]_firm_#[1]_code
    rename (sector_code#) sector_#_code
    I don't know what you plan to do with this data once you have re-organized it this way. I'll just caution you that most data management and analysis in Stata is much easier, or even only possible, in the long layout that you are starting with. So unless you specifically know that you will be doing one of the relatively few things in Stata that works better with this reorganization of your data, you will regret having done this. Think about it before you do it. Undoing it will be just as complicated as doing it.

    Comment

    Working...
    X