Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Long Data: Reshape wide with 2 levels

    I googled how to do this but I couldn’t find a solution. Basically, I have data in a long format with different levels. My data represents students with miss a series of misconducts for two periods: 1=before and 2=after some time range from the start date of a class. The missing values for the “when” variable means that that misconduct happened outside the date parameters (neither before nor after). I need to reshape this data to a wide format. See below.

    My data:
    ID type when count
    21647a Arrest . 1
    21647a Bad Behavior . 1
    21647a Force . 2
    28878a Arrest 1 1
    28878a Infraction 1 3
    28878a Bad Behavior 1 1
    28878a Force 1 7
    28878a Force 2 1
    28878a Force . 5

    What I need:
    ID Arrest_Pre Arrest_Post BadBehavior_Pre BadBehavior_Post Force_Pre Force_Post Infraction_Pre Infraction_Post SeriousInjury_Pre SeriousInjury_Post
    21647a 0 0 0 0 0 0 0 0 0 0
    28878a 1 0 1 0 7 1 3 0 0 0
    Data sample:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str6 id str14 type byte(when count)
    "21647a" "Arrest"         . 1
    "21647a" "Bad Behavior"   . 1
    "21647a" "Force"          . 2
    "28878a" "Arrest"         1 1
    "28878a" "Infraction"     1 3
    "28878a" "Bad Behavior"   1 1
    "28878a" "Force"          1 7
    "28878a" "Force"          2 1
    "28878a" "Force"          . 5
    "32260a" "Infraction"     1 1
    "32260a" "Serious Injury" . 1
    "32260a" "Bad Behavior"   1 1
    "32260a" "Force"          1 4
    end
    ------------

    i think this gets even more complicated because not all students experienced all the types. So I guess we need to expand this data.

    Thank you so much for any help,
    Marvin




  • #2
    Thanks for using dataex (SSC).

    This may be close to what you are seeking

    Code:
    clear
    input str6 id str14 type byte(when count)
    "21647a" "Arrest"         . 1
    "21647a" "Bad Behavior"   . 1
    "21647a" "Force"          . 2
    "28878a" "Arrest"         1 1
    "28878a" "Infraction"     1 3
    "28878a" "Bad Behavior"   1 1
    "28878a" "Force"          1 7
    "28878a" "Force"          2 1
    "28878a" "Force"          . 5
    "32260a" "Infraction"     1 1
    "32260a" "Serious Injury" . 1
    "32260a" "Bad Behavior"   1 1
    "32260a" "Force"          1 4
    end
    
    drop if when == .
    gen when2 = cond(when == 1, "Pre", "Post")
    drop when
    rename when2 when
     
    replace type = subinstr(type, " ", "", .)
    
    reshape wide count, i(id type) j(when) string
    rename (count*) (*)
    
    reshape wide Pre Post, i(id) j(type) string
    mvencode P*, mv(0)
    
    list
    
         +--------------------------------------------------------------------------+
      1. |     id | PostAr~t | PreArr~t | PostBa~r | PreBad~r | PostFo~e | PreForce |
         | 28878a |        0 |        1 |        0 |        1 |        1 |        7 |
         |--------------------------------------------------------------------------|
         |              PostIn~n              |              PreInf~n               |
         |                     0              |                     3               |
         +--------------------------------------------------------------------------+
    
         +--------------------------------------------------------------------------+
      2. |     id | PostAr~t | PreArr~t | PostBa~r | PreBad~r | PostFo~e | PreForce |
         | 32260a |        0 |        0 |        0 |        1 |        0 |        4 |
         |--------------------------------------------------------------------------|
         |              PostIn~n              |              PreInf~n               |
         |                     0              |                     1               |
         +--------------------------------------------------------------------------+
    But I would not want to have to work with that structure. What are you going to do with it?

    There's a serious one-line alternative:


    Code:
    clear
    input str6 id str14 type byte(when count)
    "21647a" "Arrest"         . 1
    "21647a" "Bad Behavior"   . 1
    "21647a" "Force"          . 2
    "28878a" "Arrest"         1 1
    "28878a" "Infraction"     1 3
    "28878a" "Bad Behavior"   1 1
    "28878a" "Force"          1 7
    "28878a" "Force"          2 1
    "28878a" "Force"          . 5
    "32260a" "Infraction"     1 1
    "32260a" "Serious Injury" . 1
    "32260a" "Bad Behavior"   1 1
    "32260a" "Force"          1 4
    end
    
    contract id type when if when < . [fw=count] , zero
    
    l, sepby(id)
    
         +--------------------------------------+
         |     id           type   when   _freq |
         |--------------------------------------|
      1. | 28878a         Arrest      1       1 |
      2. | 28878a         Arrest      2       0 |
      3. | 28878a   Bad Behavior      1       1 |
      4. | 28878a   Bad Behavior      2       0 |
      5. | 28878a          Force      1       7 |
      6. | 28878a          Force      2       1 |
      7. | 28878a     Infraction      1       3 |
      8. | 28878a     Infraction      2       0 |
         |--------------------------------------|
      9. | 32260a         Arrest      1       0 |
     10. | 32260a         Arrest      2       0 |
     11. | 32260a   Bad Behavior      1       1 |
     12. | 32260a   Bad Behavior      2       0 |
     13. | 32260a          Force      1       4 |
     14. | 32260a          Force      2       0 |
     15. | 32260a     Infraction      1       1 |
     16. | 32260a     Infraction      2       0 |
         +--------------------------------------+
    WIth that I can imagine many more further analyses. In fact, the data structure you have already is good. Perhaps all you need is what contract does inside itself with fillin.

    Or just to use (e.g.)
    groups (SSC) on the original data:


    Code:
    . groups id type when [fw=count], fillin sepby(id)
    
      +------------------------------------------------+
      |     id           type   when   Freq.   Percent |
      |------------------------------------------------|
      | 28878a         Arrest      1       1      5.26 |
      | 28878a         Arrest      2       0      0.00 |
      | 28878a   Bad Behavior      1       1      5.26 |
      | 28878a   Bad Behavior      2       0      0.00 |
      | 28878a          Force      1       7     36.84 |
      | 28878a          Force      2       1      5.26 |
      | 28878a     Infraction      1       3     15.79 |
      | 28878a     Infraction      2       0      0.00 |
      |------------------------------------------------|
      | 32260a         Arrest      1       0      0.00 |
      | 32260a         Arrest      2       0      0.00 |
      | 32260a   Bad Behavior      1       1      5.26 |
      | 32260a   Bad Behavior      2       0      0.00 |
      | 32260a          Force      1       4     21.05 |
      | 32260a          Force      2       0      0.00 |
      | 32260a     Infraction      1       1      5.26 |
      | 32260a     Infraction      2       0      0.00 |
      +------------------------------------------------+


    But groups by itself just shows you the tabulation or listing. You'd need its saving() option to save the structure or, pretty much equivalently, to use contract for flexibility downstream, e.g. for good graphs.

    Comment


    • #3
      Note also that your example data allows this without any restructuring whatsoever:

      Code:
      . table type when [fw=count], by(id)
      
      -------------------------
                   |    when   
       id and type |    1     2
      -------------+-----------
      28878a       |
            Arrest |    1      
      Bad Behavior |    1      
             Force |    7     1
        Infraction |    3      
      -------------+-----------
      32260a       |
            Arrest |           
      Bad Behavior |    1      
             Force |    4      
        Infraction |    1      
      -------------------------

      Comment


      • #4
        I think this does it:

        Code:
        isid id type when, miss
        replace type = subinstr(type, " ", "", .)
        replace when = 3 if missing(when)
        
        reshape wide count, i(id type) j(when)
        rename count1 Precount
        rename count2 Postcount
        drop count3
        reshape wide *count, i(id) j(type) string
        rename *count* **
        mvencode Pre* Post*, mv(0)
        When you need to reshape double wide, you just have do the reshapes one at a time. The keys are to keep the variable names easy to work with. Here this entailed cleaning out the blanks from variable type so that its values could be legal parts of variable names. Another trick here was handling the observations with missing values of when. Since these observations actually contribute nothing to the final output, count3 can be dropped as soon as it is created. (But you can't just -drop if missing(when)- because then you might lose somebody whose only observation is from outside the time intervals of interest.)

        Added: Crossed with Nick's response. His approach is largely the same, with the two reshapes done in the opposite order, with a corresponding difference in the -rename- commands. That makes no difference. There is one substantial difference. Nick has -drop if missing(when)- in his code. As a consequence, if there is a person whose only observation in the data set has a missing value of when, that person will not appear at all in the results. In my code, that person will appear in the results with all of the Pre* and Post* variables set to zero.

        I should also remark that I agree with Nick's recommendation to not do this in the first place but to keep the data in long layout, perhaps after -contract-ing, if you plan to do further analysis. If, however, you're doing this to create a spreadsheet for people who are accustomed to wide layouts to read, then go ahead.
        Last edited by Clyde Schechter; 07 Dec 2016, 16:27.

        Comment


        • #5
          http://www.stata.com/support/faqs/da...-with-reshape/ notes that two reshapes may be needed in sequence.

          The absence of a reshape wide followed by another I have to see as arising from a deep-seated prejudice that it's not especially desirable.

          Comment


          • #6
            Nick Cox Thank you so much this is extremely helpful and informative.

            But I would not want to have to work with that structure. What are you going to do with it?
            This is merely a reporting project. I supervisor wants to see how this students behave before and after the class. She needs this is a excel format that is why I think we need to reshape the data first and then export it into excel. I agree, depending on the type of analysis, the long structure works better but for the moment I need it in a wide format. I will also try your "table" suggestion and then just copy and paste into excel. I will see how this goes.

            Clyde Schechter Yes Clyde. I dont want to drop missing from the beginning because I will be dropping an student that had not any misbehavior. Regarding multiple reshape, sometimes is hard for me to figure out the direction of the the reshape when are multiple levels but your help clarify it. Yes- I need it in wide to show it to non technical coworkers. The isid command checks for duplicates in panel data right (id and time)?

            Again, thank you both!

            Marvin

            Comment


            • #7
              This code doesn't drop missings on when.

              Code:
              clear
              input str6 id str14 type byte(when count)
              "21647a" "Arrest"         . 1
              "21647a" "Bad Behavior"   . 1
              "21647a" "Force"          . 2
              "28878a" "Arrest"         1 1
              "28878a" "Infraction"     1 3
              "28878a" "Bad Behavior"   1 1
              "28878a" "Force"          1 7
              "28878a" "Force"          2 1
              "28878a" "Force"          . 5
              "32260a" "Infraction"     1 1
              "32260a" "Serious Injury" . 1
              "32260a" "Bad Behavior"   1 1
              "32260a" "Force"          1 4
              end
              
              gen when2 = cond(missing(when), "unk", cond(when == 1, "Pre", "Post")) 
              drop when
              rename when2 when
               
              replace type = subinstr(type, " ", "", .)
              
              reshape wide count, i(id type) j(when) string
              rename (count*) (*)
              
              reshape wide unk Pre Post, i(id) j(type) string
              mvencode P* unk*, mv(0)

              Comment


              • #8
                Thanks Nick. Clyde's code also take care of this. That is not dropping missing on when.

                Thank you!

                Comment

                Working...
                X