Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape long to wide for several variables

    Dear StataList users,

    I have an issue with a database that I need to reshape from long to wide. The command line
    Code:
    reshape wide v, i(idvar) j(tvar)
    allows more than one idvar, but it looks like I would need more than one tvar, which does not seem possible. Please find below a reproducible example of what I currently have and what I would like to obtain.

    In my example, my id string variable has 2 individuals (A and B). Individual A responded to 3 queries in Round1 (1, 2 and 3) and 3 queries in Round2 (1, 2 and 3) so that each individual appears in a long format of 9 rows. ResRound1 and ResRound2 are binary variables collecting responses (Yes/No) to respectively Round1 and Round2.


    Code:
    input str1 id Round1 Round2 ResRound1 ResRound2
              "A"   1      1        0         0
              "A"   1      2        0         1
              "A"   1      3        0         1
              "A"   2      1        1         0
              "A"   2      2        1         1
              "A"   2      3        1         1
              "A"   3      1        1         0
              "A"   3      2        1         1
              "A"   3      3        1         1
              "B"   1      1        0         1
              "B"   1      2        0         1
              "B"   1      3        0         0
              "B"   2      1        1         1
              "B"   2      2        1         1
              "B"   2      3        1         0
              "B"   3      1        0         1
              "B"   3      2        0         1
              "B"   3      3        0         0
    end


    I am looking to obtain one observation per individual, that is a wide format of only 2 rows, where Round1 and Round2 are dropped and their modality (1, 2 and 3) appear at the end of the variable ResRound1 and ResRound2 name. See below:

    Code:
     
    input str1 id ResRound11 ResRound12 ResRound13 ResRound21 ResRound22 ResRound23
    A          0          1          1          0          1          1            
    B          0          1          0          1          1          0 
    end
    I tried
    Code:
    reshape wide ResRound1 ResRound2, i(id Round1) j(Round2)
    . This try gives correct format and variable name but wrong values. Any help would be appreciated.
    Best, TR

  • #2
    This works for your example. I imagine there is a better way.

    Code:
    clear 
    
    input str1 id Round1 Round2 ResRound1 ResRound2
              "A"   1      1        0         0
              "A"   1      2        0         1
              "A"   1      3        0         1
              "A"   2      1        1         0
              "A"   2      2        1         1
              "A"   2      3        1         1
              "A"   3      1        1         0
              "A"   3      2        1         1
              "A"   3      3        1         1
              "B"   1      1        0         1
              "B"   1      2        0         1
              "B"   1      3        0         0
              "B"   2      1        1         1
              "B"   2      2        1         1
              "B"   2      3        1         0
              "B"   3      1        0         1
              "B"   3      2        0         1
              "B"   3      3        0         0
    end
    
    save safecopy, replace 
    drop Round2 ResRound2 
    duplicates drop 
    reshape wide ResRound1, i(id) j(Round1)
    save part1, replace 
    use safecopy 
    drop Round1 ResRound1 
    duplicates drop 
    reshape wide ResRound2 , i(id) j(Round2)
    merge 1:1 id using part1 
    
    list

    Comment


    • #3
      This I feel better about:


      Code:
      clear 
      
      input str1 id Round1 Round2 ResRound1 ResRound2
                "A"   1      1        0         0
                "A"   1      2        0         1
                "A"   1      3        0         1
                "A"   2      1        1         0
                "A"   2      2        1         1
                "A"   2      3        1         1
                "A"   3      1        1         0
                "A"   3      2        1         1
                "A"   3      3        1         1
                "B"   1      1        0         1
                "B"   1      2        0         1
                "B"   1      3        0         0
                "B"   2      1        1         1
                "B"   2      2        1         1
                "B"   2      3        1         0
                "B"   3      1        0         1
                "B"   3      2        0         1
                "B"   3      3        0         0
      end
      
      stack id Round1 ResRound1 id Round2 ResRound2, into(id Round ResRound) clear 
      duplicates drop
      gen which = 10 * _stack + Round
      drop _stack Round
      reshape wide ResRound, i(id) j(which)
      What's bizarre about the data structure is the degree of duplication, which is I think why reshape alone does not suffice.

      Comment


      • #4
        Thank you very much for the two proposals. Could you please explain to me what the '10' refers to when generating variable which?

        Beyond the above reproducible example, it would be interesting to have a method reproducible for any degree of duplication (so as any surrounding structure of the database, that is the existence of many other columns/variables).

        (R allows for reshaping a database (library reshape2) for any degree of duplication. I was quite surprised not to find much Stata posts on this issue since converting responses to a questionnaire into sort of panel data is not uncommon.)

        Comment


        • #5
          To get e.g. 13 as suffix from 1 and 3 I multiply 1 by 10 and add 3. If I do that by concatenation of characters I then have to destring it because reshape prefers numeric suffixes.

          So, I solved your problem twice but you want a third solution? Insert appropriate wry emoticon according to taste.

          Well. I've used reshape hundreds of times and rewritten an FAQ on it, but I don't recollect seeing exactly this structure before. The point about your question is presumably that it is a perverse structure compared with what you want!

          As said, there is duplication of information whereas reshape in Stata is essentially about rearranging information. Sometimes reshape exposes implied omissions in the data, but it's not about removing duplicates.

          If R can appear to fix this more simply, good for R. The main point is not whether commands or functions in different languages map on to each other one to one, because they often don''t. For example, duplicates is here a complementary command.

          Comment


          • #6
            Here is another way:

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str1 id float(Round1 Round2 ResRound1 ResRound2)
            "A" 1 1 0 0
            "A" 1 2 0 1
            "A" 1 3 0 1
            "A" 2 1 1 0
            "A" 2 2 1 1
            "A" 2 3 1 1
            "A" 3 1 1 0
            "A" 3 2 1 1
            "A" 3 3 1 1
            "B" 1 1 0 1
            "B" 1 2 0 1
            "B" 1 3 0 0
            "B" 2 1 1 1
            "B" 2 2 1 1
            "B" 2 3 1 0
            "B" 3 1 0 1
            "B" 3 2 0 1
            "B" 3 3 0 0
            end
            
            gen obsno=_n
            reshape long Round ResRound , i(obsno)
            bys id Round _j: keep if _n==1
            egen which=concat(_j Round)
            keep id ResRound which
            reshape wide ResRound, i(id) j(which) string
            Res.:

            Code:
            . l
            
                 +----------------------------------------------------------------------+
                 | id   ResRo~11   ResRo~12   ResRo~13   ResRo~21   ResRo~22   ResRo~23 |
                 |----------------------------------------------------------------------|
              1. |  A          0          1          1          0          1          1 |
              2. |  B          0          1          0          1          1          0 |
                 +----------------------------------------------------------------------+

            Comment


            • #7
              As a further twist, the extent to which this works, and doesn't work, expresses the underlying problem:


              Code:
              egen which = concat(Round1 Round2) 
              drop Round? 
              reshape wide ResRound1 ResRound2, i(id) j(which) string

              Comment


              • #8
                All proposals work and now I even understand why reshape alone could not do the job. So my problem is solved .

                Comment

                Working...
                X