Splitting observations or creating new observations in STATA

Maureen

Join Date: Jul 2014

Posts: 12
#1

Splitting observations or creating new observations in STATA

24 Jul 2014, 12:02

Hello again!
I am undertaking an analysis of morbidity in infants. The infants were followed monthly for 1 year. Each observation corresponds to one visit. At each visit data on up to two episodes of illness were collected. I need to split these observations so that I have one observation per episode of illness per infant. Can anyone advise how best to do this? This only applies to about 31 observations, so numbers are very small.

Thanks and best wishes,
Maureen.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#2

24 Jul 2014, 12:06

You don't tell us anything about the variables in the data set, but under reasonable assumptions, the -reshape- command will probably solve your problem. Without knowing more about the variables, I can't give you more specifics, but the on-line help file for -reshape- is pretty clear.
Comment
Maureen

Join Date: Jul 2014

Posts: 12
#3

24 Jul 2014, 12:07

Thanks for the pointer. I will take a look at reshape and get back to you if I need more help.
Comment
Maureen

Join Date: Jul 2014

Posts: 12
#4

24 Jul 2014, 12:11

Oh and just to clarify, it is 31 observations out of a total of over 270k, so not all observations need to be reshaped...
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#5

24 Jul 2014, 12:17

Oh and just to clarify, it is 31 observations out of a total of over 270k, so not all observations need to be reshaped...

Well, you might start by splitting your data into two sets: one with just the 31, and the other with the rest (with the second illness variables deleted as they should all have missing values anyway), -reshape-ing just the 31, and then -append-ing the results to the rest. -reshape- can be slow in large data sets, so isolating the small number of observations that need it may save you a noticeable amount of execution time.
Comment
Jeph Herrin

Join Date: Apr 2014

Posts: 335
#6

24 Jul 2014, 12:17

It sounds like you might want expand; lets's suppose you have two illness variables, illness1 and illness2, and your 31 observations are just those that have both variables not missing:

Code:

gen byte has2=!mi(illness1)&!mi(illness2) expand 2 if has2, gen(new) replace illness1=illness2 if new drop illness2 new

Hope this helps,
Jeph
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#7

24 Jul 2014, 12:20

Jeph's approach would be better than mine if the information about the illnesses consists of only one or a handful of variables, as his code illustrates. His approach gets complicated if there are many variables about each illness, and then the -reshape- approach would be simpler.
Comment
Maureen

Join Date: Jul 2014

Posts: 12
#8

24 Jul 2014, 12:30

OK, I will give these a go and get back to you if there are any problems.
Thanks!
Comment
Jeph Herrin

Join Date: Apr 2014

Posts: 335
#9

24 Jul 2014, 14:50

Maureen said there were "up to two episodes" of illness were collected, so I took this literally. However, Clyde is correct that if you have many more than two, my approach gets very cumbersome.
Comment
Maureen

Join Date: Jul 2014

Posts: 12
#10

08 Aug 2014, 02:05

Hi,
Sorry for not responding sooner, moving house!!
Anyway Jeph I just wanted to let you know that the code worked beautifully.

thanks ever so much for your help,
Maureen.
Comment

Announcement

Splitting observations or creating new observations in STATA

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment