Long Data: Reshape wide with 2 levels

Marvin Aliaga

Join Date: Feb 2015
Posts: 255

Long Data: Reshape wide with 2 levels

07 Dec 2016, 15:14

I googled how to do this but I couldn’t find a solution. Basically, I have data in a long format with different levels. My data represents students with miss a series of misconducts for two periods: 1=before and 2=after some time range from the start date of a class. The missing values for the “when” variable means that that misconduct happened outside the date parameters (neither before nor after). I need to reshape this data to a wide format. See below.

My data:

ID	type	when	count
21647a	Arrest	.	1
21647a	Bad Behavior	.	1
21647a	Force	.	2
28878a	Arrest	1	1
28878a	Infraction	1	3
28878a	Bad Behavior	1	1
28878a	Force	1	7
28878a	Force	2	1
28878a	Force	.	5

What I need:

ID	Arrest_Pre	Arrest_Post	BadBehavior_Pre	BadBehavior_Post	Force_Pre	Force_Post	Infraction_Pre	Infraction_Post	SeriousInjury_Pre	SeriousInjury_Post
21647a	0	0	0	0	0	0	0	0	0	0
28878a	1	0	1	0	7	1	3	0	0	0

Data sample:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str6 id str14 type byte(when count)
"21647a" "Arrest"         . 1
"21647a" "Bad Behavior"   . 1
"21647a" "Force"          . 2
"28878a" "Arrest"         1 1
"28878a" "Infraction"     1 3
"28878a" "Bad Behavior"   1 1
"28878a" "Force"          1 7
"28878a" "Force"          2 1
"28878a" "Force"          . 5
"32260a" "Infraction"     1 1
"32260a" "Serious Injury" . 1
"32260a" "Bad Behavior"   1 1
"32260a" "Force"          1 4
end

------------

i think this gets even more complicated because not all students experienced all the types. So I guess we need to expand this data.

Thank you so much for any help,
Marvin

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35724

07 Dec 2016, 16:13

Thanks for using dataex (SSC).

This may be close to what you are seeking

Code:

clear
input str6 id str14 type byte(when count)
"21647a" "Arrest"         . 1
"21647a" "Bad Behavior"   . 1
"21647a" "Force"          . 2
"28878a" "Arrest"         1 1
"28878a" "Infraction"     1 3
"28878a" "Bad Behavior"   1 1
"28878a" "Force"          1 7
"28878a" "Force"          2 1
"28878a" "Force"          . 5
"32260a" "Infraction"     1 1
"32260a" "Serious Injury" . 1
"32260a" "Bad Behavior"   1 1
"32260a" "Force"          1 4
end

drop if when == .
gen when2 = cond(when == 1, "Pre", "Post")
drop when
rename when2 when
 
replace type = subinstr(type, " ", "", .)

reshape wide count, i(id type) j(when) string
rename (count*) (*)

reshape wide Pre Post, i(id) j(type) string
mvencode P*, mv(0)

list

     +--------------------------------------------------------------------------+
  1. |     id | PostAr~t | PreArr~t | PostBa~r | PreBad~r | PostFo~e | PreForce |
     | 28878a |        0 |        1 |        0 |        1 |        1 |        7 |
     |--------------------------------------------------------------------------|
     |              PostIn~n              |              PreInf~n               |
     |                     0              |                     3               |
     +--------------------------------------------------------------------------+

     +--------------------------------------------------------------------------+
  2. |     id | PostAr~t | PreArr~t | PostBa~r | PreBad~r | PostFo~e | PreForce |
     | 32260a |        0 |        0 |        0 |        1 |        0 |        4 |
     |--------------------------------------------------------------------------|
     |              PostIn~n              |              PreInf~n               |
     |                     0              |                     1               |
     +--------------------------------------------------------------------------+

But I would not want to have to work with that structure. What are you going to do with it?

There's a serious one-line alternative:

Code:

clear
input str6 id str14 type byte(when count)
"21647a" "Arrest"         . 1
"21647a" "Bad Behavior"   . 1
"21647a" "Force"          . 2
"28878a" "Arrest"         1 1
"28878a" "Infraction"     1 3
"28878a" "Bad Behavior"   1 1
"28878a" "Force"          1 7
"28878a" "Force"          2 1
"28878a" "Force"          . 5
"32260a" "Infraction"     1 1
"32260a" "Serious Injury" . 1
"32260a" "Bad Behavior"   1 1
"32260a" "Force"          1 4
end

contract id type when if when < . [fw=count] , zero

l, sepby(id)

     +--------------------------------------+
     |     id           type   when   _freq |
     |--------------------------------------|
  1. | 28878a         Arrest      1       1 |
  2. | 28878a         Arrest      2       0 |
  3. | 28878a   Bad Behavior      1       1 |
  4. | 28878a   Bad Behavior      2       0 |
  5. | 28878a          Force      1       7 |
  6. | 28878a          Force      2       1 |
  7. | 28878a     Infraction      1       3 |
  8. | 28878a     Infraction      2       0 |
     |--------------------------------------|
  9. | 32260a         Arrest      1       0 |
 10. | 32260a         Arrest      2       0 |
 11. | 32260a   Bad Behavior      1       1 |
 12. | 32260a   Bad Behavior      2       0 |
 13. | 32260a          Force      1       4 |
 14. | 32260a          Force      2       0 |
 15. | 32260a     Infraction      1       1 |
 16. | 32260a     Infraction      2       0 |
     +--------------------------------------+

WIth that I can imagine many more further analyses. In fact, the data structure you have already is good. Perhaps all you need is what contract does inside itself with fillin.

Or just to use (e.g.) groups (SSC) on the original data:

Code:

. groups id type when [fw=count], fillin sepby(id)

  +------------------------------------------------+
  |     id           type   when   Freq.   Percent |
  |------------------------------------------------|
  | 28878a         Arrest      1       1      5.26 |
  | 28878a         Arrest      2       0      0.00 |
  | 28878a   Bad Behavior      1       1      5.26 |
  | 28878a   Bad Behavior      2       0      0.00 |
  | 28878a          Force      1       7     36.84 |
  | 28878a          Force      2       1      5.26 |
  | 28878a     Infraction      1       3     15.79 |
  | 28878a     Infraction      2       0      0.00 |
  |------------------------------------------------|
  | 32260a         Arrest      1       0      0.00 |
  | 32260a         Arrest      2       0      0.00 |
  | 32260a   Bad Behavior      1       1      5.26 |
  | 32260a   Bad Behavior      2       0      0.00 |
  | 32260a          Force      1       4     21.05 |
  | 32260a          Force      2       0      0.00 |
  | 32260a     Infraction      1       1      5.26 |
  | 32260a     Infraction      2       0      0.00 |
  +------------------------------------------------+

But groups by itself just shows you the tabulation or listing. You'd need its saving() option to save the structure or, pretty much equivalently, to use contract for flexibility downstream, e.g. for good graphs.

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

07 Dec 2016, 16:22

Note also that your example data allows this without any restructuring whatsoever:

Code:

. table type when [fw=count], by(id)

-------------------------
             |    when   
 id and type |    1     2
-------------+-----------
28878a       |
      Arrest |    1      
Bad Behavior |    1      
       Force |    7     1
  Infraction |    3      
-------------+-----------
32260a       |
      Arrest |           
Bad Behavior |    1      
       Force |    4      
  Infraction |    1      
-------------------------

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

07 Dec 2016, 16:22

I think this does it:

Code:

isid id type when, miss replace type = subinstr(type, " ", "", .) replace when = 3 if missing(when) reshape wide count, i(id type) j(when) rename count1 Precount rename count2 Postcount drop count3 reshape wide *count, i(id) j(type) string rename *count* ** mvencode Pre* Post*, mv(0)

When you need to reshape double wide, you just have do the reshapes one at a time. The keys are to keep the variable names easy to work with. Here this entailed cleaning out the blanks from variable type so that its values could be legal parts of variable names. Another trick here was handling the observations with missing values of when. Since these observations actually contribute nothing to the final output, count3 can be dropped as soon as it is created. (But you can't just -drop if missing(when)- because then you might lose somebody whose only observation is from outside the time intervals of interest.)

Added: Crossed with Nick's response. His approach is largely the same, with the two reshapes done in the opposite order, with a corresponding difference in the -rename- commands. That makes no difference. There is one substantial difference. Nick has -drop if missing(when)- in his code. As a consequence, if there is a person whose only observation in the data set has a missing value of when, that person will not appear at all in the results. In my code, that person will appear in the results with all of the Pre* and Post* variables set to zero.

I should also remark that I agree with Nick's recommendation to not do this in the first place but to keep the data in long layout, perhaps after -contract-ing, if you plan to do further analysis. If, however, you're doing this to create a spreadsheet for people who are accustomed to wide layouts to read, then go ahead.

Last edited by Clyde Schechter; 07 Dec 2016, 16:27.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#5

07 Dec 2016, 16:53

http://www.stata.com/support/faqs/da...-with-reshape/ notes that two reshapes may be needed in sequence.

The absence of a reshape wide followed by another I have to see as arising from a deep-seated prejudice that it's not especially desirable.
Comment
Marvin Aliaga

Join Date: Feb 2015

Posts: 255
#6

08 Dec 2016, 08:27

Nick Cox Thank you so much this is extremely helpful and informative.

But I would not want to have to work with that structure. What are you going to do with it?

This is merely a reporting project. I supervisor wants to see how this students behave before and after the class. She needs this is a excel format that is why I think we need to reshape the data first and then export it into excel. I agree, depending on the type of analysis, the long structure works better but for the moment I need it in a wide format. I will also try your "table" suggestion and then just copy and paste into excel. I will see how this goes.

Clyde Schechter Yes Clyde. I dont want to drop missing from the beginning because I will be dropping an student that had not any misbehavior. Regarding multiple reshape, sometimes is hard for me to figure out the direction of the the reshape when are multiple levels but your help clarify it. Yes- I need it in wide to show it to non technical coworkers. The isid command checks for duplicates in panel data right (id and time)?

Again, thank you both!

Marvin
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

08 Dec 2016, 09:15

This code doesn't drop missings on when.

Code:

clear
input str6 id str14 type byte(when count)
"21647a" "Arrest"         . 1
"21647a" "Bad Behavior"   . 1
"21647a" "Force"          . 2
"28878a" "Arrest"         1 1
"28878a" "Infraction"     1 3
"28878a" "Bad Behavior"   1 1
"28878a" "Force"          1 7
"28878a" "Force"          2 1
"28878a" "Force"          . 5
"32260a" "Infraction"     1 1
"32260a" "Serious Injury" . 1
"32260a" "Bad Behavior"   1 1
"32260a" "Force"          1 4
end

gen when2 = cond(missing(when), "unk", cond(when == 1, "Pre", "Post")) 
drop when
rename when2 when
 
replace type = subinstr(type, " ", "", .)

reshape wide count, i(id type) j(when) string
rename (count*) (*)

reshape wide unk Pre Post, i(id) j(type) string
mvencode P* unk*, mv(0)

Comment

Marvin Aliaga

Join Date: Feb 2015

Posts: 255
#8

08 Dec 2016, 10:05

Thanks Nick. Clyde's code also take care of this. That is not dropping missing on when.

Thank you!
Comment

Announcement

Long Data: Reshape wide with 2 levels

Comment

Comment

Comment

Comment

Comment

Comment

Comment