Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape a categorical variable

    Dear Statausers,

    I'm struggling using the command reshape (quite a classic) to convert my categorical variable crime into different variables that express the total number of crimes (of each type) committed per year and in each comuna (=municipality).

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id int code_comuna str20 comuna float year long(part_type crime num)
      1  5602 "Algarrobo" 2017 1  5  98
      2  5602 "Algarrobo" 2017 1 10  10
      3  5602 "Algarrobo" 2017 1 12  10
      4  5602 "Algarrobo" 2017 1 14  31
      5  5602 "Algarrobo" 2017 1 15  98
      6  5602 "Algarrobo" 2017 1 16   9
      7  5602 "Algarrobo" 2017 1 17 387
      8  5602 "Algarrobo" 2017 1 18  65
      9  5602 "Algarrobo" 2017 1 20   9
     10  5602 "Algarrobo" 2017 1 21   1
     11  5602 "Algarrobo" 2018 1  5  73
     12  5602 "Algarrobo" 2018 1 10  10
     13  5602 "Algarrobo" 2018 1 12   2
     14  5602 "Algarrobo" 2018 1 14  28
     15  5602 "Algarrobo" 2018 1 15  50
     16  5602 "Algarrobo" 2018 1 16   7
     17  5602 "Algarrobo" 2018 1 17 231
     18  5602 "Algarrobo" 2018 1 18  73
     19  5602 "Algarrobo" 2018 1 20  12
     20  5602 "Algarrobo" 2019 1  5  63
     21  5602 "Algarrobo" 2019 1 10  13
     22  5602 "Algarrobo" 2019 1 12   2
     23  5602 "Algarrobo" 2019 1 14  35
     24  5602 "Algarrobo" 2019 1 15  86
     25  5602 "Algarrobo" 2019 1 16  14
     26  5602 "Algarrobo" 2019 1 17 283
     27  5602 "Algarrobo" 2019 1 18  43
     28  5602 "Algarrobo" 2019 1 20   6
     29  5602 "Algarrobo" 2019 1 21   1
     30  5602 "Algarrobo" 2020 1  5  32
     31  5602 "Algarrobo" 2020 1 10   9
     32  5602 "Algarrobo" 2020 1 14  18
     33  5602 "Algarrobo" 2020 1 15  53
     34  5602 "Algarrobo" 2020 1 16   2
     35  5602 "Algarrobo" 2020 1 17  79
     36  5602 "Algarrobo" 2020 1 18  17
     37  5602 "Algarrobo" 2020 1 20   6
     38  5602 "Algarrobo" 2020 1 21   1
     39  5602 "Algarrobo" 2017 2  5  69
     40  5602 "Algarrobo" 2017 2 10   1
     41  5602 "Algarrobo" 2017 2 14   4
     42  5602 "Algarrobo" 2017 2 15   2
     43  5602 "Algarrobo" 2017 2 17  31
     44  5602 "Algarrobo" 2017 2 18   6
     45  5602 "Algarrobo" 2018 2  4   1
     46  5602 "Algarrobo" 2018 2  5  75
     47  5602 "Algarrobo" 2018 2 10   1
     48  5602 "Algarrobo" 2018 2 14  10
     49  5602 "Algarrobo" 2018 2 15   1
     50  5602 "Algarrobo" 2018 2 17  25
     51  5602 "Algarrobo" 2018 2 18   5
     52  5602 "Algarrobo" 2018 2 20   1
     53  5602 "Algarrobo" 2018 2 21   1
     54  5602 "Algarrobo" 2019 2  5 101
     55  5602 "Algarrobo" 2019 2 10   3
     56  5602 "Algarrobo" 2019 2 14   8
     57  5602 "Algarrobo" 2019 2 15   2
     58  5602 "Algarrobo" 2019 2 17  15
     59  5602 "Algarrobo" 2019 2 18   9
     60  5602 "Algarrobo" 2020 2  5  51
     61  5602 "Algarrobo" 2020 2 10   2
     62  5602 "Algarrobo" 2020 2 12   1
     63  5602 "Algarrobo" 2020 2 15   2
     64  5602 "Algarrobo" 2020 2 17   3
     65 13502 "Alhué"    2017 1  5  14
     66 13502 "Alhué"    2017 1 10   4
     67 13502 "Alhué"    2017 1 14   4
     68 13502 "Alhué"    2017 1 15   2
     69 13502 "Alhué"    2017 1 16   1
     70 13502 "Alhué"    2017 1 17  14
     71 13502 "Alhué"    2017 1 18  24
     72 13502 "Alhué"    2017 1 20   2
     73 13502 "Alhué"    2018 1  5  15
     74 13502 "Alhué"    2018 1 10   1
     75 13502 "Alhué"    2018 1 14   4
     76 13502 "Alhué"    2018 1 15   1
     77 13502 "Alhué"    2018 1 16   1
     78 13502 "Alhué"    2018 1 17  23
     79 13502 "Alhué"    2018 1 18  19
     80 13502 "Alhué"    2018 1 20   1
     81 13502 "Alhué"    2019 1  5   9
     82 13502 "Alhué"    2019 1 10   2
     83 13502 "Alhué"    2019 1 14  10
     84 13502 "Alhué"    2019 1 15   1
     85 13502 "Alhué"    2019 1 16   3
     86 13502 "Alhué"    2019 1 17  17
     87 13502 "Alhué"    2019 1 18  28
     88 13502 "Alhué"    2020 1  5   2
     89 13502 "Alhué"    2020 1 14   1
     90 13502 "Alhué"    2020 1 15   1
     91 13502 "Alhué"    2020 1 17   6
     92 13502 "Alhué"    2020 1 18   4
     93 13502 "Alhué"    2020 1 20   1
     94 13502 "Alhué"    2017 2  5   2
     95 13502 "Alhué"    2017 2 12   1
     96 13502 "Alhué"    2017 2 17   2
     97 13502 "Alhué"    2018 2  5   7
     98 13502 "Alhué"    2018 2 10   4
     99 13502 "Alhué"    2018 2 14   1
    100 13502 "Alhué"    2018 2 17   2
    end
    label values part_type part_type
    label def part_type 1 "  Denuncia", modify
    label def part_type 2 "  Detención", modify
    label values crime crime_type
    label def crime_type 4 "  Homicidios (homicides)", modify
    label def crime_type 5 "  Hurtos (theft)", modify
    label def crime_type 10 "  Lesiones menos graves", modify
    .
    .
    .
    label def crime_type 21 "  Violaciones", modify

    Summing up, I would like to have something like the following table (using just two kind of crimes, homicides and thefts):
    id code_comuna comuna year homicides thefts
    1 5602 Algarrobo 2017 . 98
    2 5602 Algarrobo 2018 . 73
    3 5602 Algarrobo 2019 . 63
    4 5602 Algarrobo 2020 . 32

    I have resorted to help reshape but I haven't managed to solve this issue.

    Thank you in advance!

    Daniel.

  • #2
    It may be hard to make progress here unless you contract first, although I haven't tried doing it without. This first simplifies and then builds on your data example.

    Code:
    . keep if crime == 4 | crime == 5 
    (85 observations deleted)
    
    . 
    . contract code_comuna comuna year crime [fw=num] 
    
    . 
    . reshape wide _freq, i(comuna year) j(crime)
    (j = 4 5)
    
    Data                               Long   ->   Wide
    -----------------------------------------------------------------------------
    Number of observations                9   ->   8           
    Number of variables                   5   ->   5           
    j variable (2 values)             crime   ->   (dropped)
    xij variables:
                                      _freq   ->   _freq4 _freq5
    -----------------------------------------------------------------------------
    
    . 
    . mvencode _freq*, mv(0)
          _freq4: 7 missing values recoded
    
    . 
    . list 
    
         +-----------------------------------------------+
         |    comuna   year   _freq4   _freq5   code_c~a |
         |-----------------------------------------------|
      1. | Algarrobo   2017        0      167       5602 |
      2. | Algarrobo   2018        1      148       5602 |
      3. | Algarrobo   2019        0      164       5602 |
      4. | Algarrobo   2020        0       83       5602 |
      5. |     Alhué   2017        0       16      13502 |
         |-----------------------------------------------|
      6. |     Alhué   2018        0       22      13502 |
      7. |     Alhué   2019        0        9      13502 |
      8. |     Alhué   2020        0        2      13502 |
         +-----------------------------------------------+
    I have doubtless misunderstood something here because I can't reproduce your wanted numbers, but the principles may help.

    Comment


    • #3
      Somehow my first thought was to use collapse and not contract, so here is some code that is only slightly different from #2:

      Code:
      collapse (sum) num, by(code_comuna comuna crime year)
      keep if inlist(crime,4,5)
      
      reshape wide num, i(code_comuna comuna year) j(crime)
      recode num* (. = 0)
      
      rename num4 homicides
      rename num5 thefts
      which produces:

      Code:
      . list, noobs sepby(code_comuna) ab(20)
        +-----------------------------------------------------+
        | code_comuna      comuna   year   homicides   thefts |
        |-----------------------------------------------------|
        |        5602   Algarrobo   2017           0      167 |
        |        5602   Algarrobo   2018           1      148 |
        |        5602   Algarrobo   2019           0      164 |
        |        5602   Algarrobo   2020           0       83 |
        |-----------------------------------------------------|
        |       13502       Alhué   2017           0       16 |
        |       13502       Alhué   2018           0       22 |
        |       13502       Alhué   2019           0        9 |
        |       13502       Alhué   2020           0        2 |
        +-----------------------------------------------------+

      Comment


      • #4
        Thank you Nick Cox and Hemanshu Kumar , both seem to work.

        PD: My numbers were wrong Nick, yours are the right ones!

        Daniel.

        Comment

        Working...
        X