Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reshape gives unexpected output

    Dear Statalist members. I have this small dataset in wide form which I want to change to the long form. Here is the data in wide form.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(g0 g1 g2 g3) float id
    4 1 2 1  1
    3 1 2 4  2
    2 1 3 1  3
    4 2 3 2  4
    4 1 3 2  5
    3 2 3 2  6
    4 3 2 3  7
    4 1 2 3  8
    4 1 3 2  9
    4 1 2 2 10
    3 1 2 1 11
    4 2 3 2 12
    3 2 4 1 13
    4 1 4 2 14
    end
    When I try to change the data to a long form
    Code:
    reshape long g, i(id) j(score)
    I get this data
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id byte(score g)
     1 1 4
     1 2 1
     1 3 2
     1 4 1
     2 1 3
     2 2 1
     2 3 2
     2 4 4
     3 1 2
     3 2 1
     3 3 3
     3 4 1
     4 1 4
     4 2 2
     4 3 3
     4 4 2
     5 1 4
     5 2 1
     5 3 3
     5 4 2
     6 1 3
     6 2 2
     6 3 3
     6 4 2
     7 1 4
     7 2 3
     7 3 2
     7 4 3
     8 1 4
     8 2 1
     8 3 2
     8 4 3
     9 1 4
     9 2 1
     9 3 3
     9 4 2
    10 1 4
    10 2 1
    10 3 2
    10 4 2
    11 1 3
    11 2 1
    11 3 2
    11 4 1
    12 1 4
    12 2 2
    12 3 3
    12 4 2
    13 1 3
    13 2 2
    13 3 4
    13 4 1
    14 1 4
    14 2 1
    14 3 4
    14 4 2
    end
    I expect the data to contain 14 observations each from g1 g2 g3 and g4. Instead, this is the result from
    Code:
    tab g
    used on the long data.

    g | Freq. Percent Cum.
    ------------+-----------------------------------
    1 | 13 23.21 23.21
    2 | 18 32.14 55.36
    3 | 13 23.21 78.57
    4 | 12 21.43 100.00
    ------------+-----------------------------------
    Total | 56 100.00


    I can see I am doing something wrong. But I can't figure out what.
    Kindly help.

  • #2
    it is your expectation that is incorrect here - you get, as expected 4*14=56 observations and there are 14 observations for each of the 4 scores; however, the distribution of "g" depends on your data, not on the command, and what you get is what you should get for that data sample; note, for example, in your dataex example that there are multiple occurrences of various values; e.g., for the last. observations there are 2 occurrences of "4"

    Comment


    • #3
      Thank you very much Rich Goldstein
      I get it now. My expectation really was wrong.

      Comment

      Working...
      X