Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape error: how do I select the first instance?

    I ran the code

    Code:
    reshape wide happy, i(hh) j(city)
    and received the error

    Code:
    .  reshape error
    
    i (hh) indicates the top-level grouping such as subject id.
    j (city) indicates the subgrouping such as time.
    The data are in the long form;  j should be unique within i.
    
    There are multiple observations on the same city within hh.
    I want to reshape it, but to only keep the first instance of each row. For instance, if HH_1 had two rows where city=2, then I want to only include the first rows where city=2 and drop the second instance. Advice?

    here is a dataex code. Don't try to infer if this is an appropriate thing to do based on the var names; I have changed them to preserve data anonymity. I'm not actually dealing with cities, HHs, and happiness.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(city happy) long hh
    76       .  1
     9       2  2
    17       .  2
    45       .  2
     1      60  3
     2       .  3
     8       .  3
    19       .  3
    15       .  4
    15       .  4
    34       .  4
    34       .  4
     9       5  5
    15      15  5
    23       .  5
     9      10  6
    15      20  6
    19      15  6
     9       7  7
    19       7  7
     9       5  8
    19       3  8
    71      10  8
     1      60  9
     8      20  9
     6       . 10
    34       0 10
     1      30 11
     8       . 11
    19       5 11
     1      50 12
    19       0 12
    78     500 12
    15      12 13
     4    2500 14
     9      10 14
    34      70 14
    23       . 15
    36      50 15
     7       . 16
    15      15 16
    19       8 16
    67       . 17
    79       . 17
     9      12 18
    15      48 18
    15       . 19
    19       9 19
     1      80 20
     8       . 20
    13       . 20
    19       . 20
    23       . 20
     8       . 21
     9       . 21
    23       . 21
    67       . 22
    79       . 22
     1      70 23
     8       . 23
    19       8 23
    15      22 24
    19       . 24
    67       . 25
     1      50 26
    19       5 26
     1      15 27
     8       . 27
     9      10 27
    19       5 27
    15       . 28
    19       8 28
     3      30 29
     9      40 29
    15      30 29
    67    2000 29
    71     300 29
    23       . 30
     7       4 31
     9       4 31
    15      30 31
     1      60 32
    23       . 32
     8       . 33
     9      20 33
    33 2000000 33
    78       2 33
    23       . 34
     1       . 35
    15      30 35
     9       6 36
    19       6 36
    34     800 36
     9       5 37
    19       8 37
    34     500 37
    36       . 37
     7       2 38
    15       . 38
    19       6 38
    end
    label values hh new

    So in the above sample, HH==4 has 2 city==15. I want to drop the second instance.

  • #2
    Let me assume that, more generally, if there are multiple instances of a value of city within a HH, you want to keep only the first, first being defined in terms of the current sort order of the data. So:

    Code:
    gen long obs_no = _n // IDENTIFY CURRENT ORDER OF DATA
    by hh city (obs_no), sort: keep if _n == 1
    drop obs_no
    After that you will have only one observation for each hh city pair, and your -reshape- should run with no problem.

    Comment


    • #3
      Yup, that worked. Thanks!

      Edit:

      Actually, can you explain what -
      by hh city (obs_no), sort: keep if _n == 1- does?
      Last edited by Tommie Thompson; 28 Dec 2017, 13:52.

      Comment


      • #4
        So, the -sort- option in the -by- prefix causes the data to be sorted by hh and city, and within groups defined by hh and city, sorted by obs_no. So it organizes the data into hh X city clumps, sorted internally in the same order as the original data.

        The -by- prefix then tells Stata to execute the command following the colon one hh X city clump at a time.

        The keep if _n == 1 command tells Stata to keep only the first observation within the clump.

        Comment

        Working...
        X