Need help re-arranging data

Ushma Agarwal

Join Date: Nov 2020

Posts: 37
#1

Need help re-arranging data

21 Nov 2020, 15:44

Hi I am new to STATA and I need help a bit urgently because of a deadline
I have a list of labs taken at different time intervals for 5 years and I want to make this into a 3 monthly followup data. Please help me out how to do this in STATA.

ID RESULT VALUE RESULT DATE

1 4 7/14/2010

1 6 6/09/2011

1 6 9/09/2011

1 9 7/04/2012

2 3 3/11/2012

2 2 6/12/2012

2 6 10/12/2013

Thanks.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30061
#2

21 Nov 2020, 15:57

Well, it appears from your example that the actual follow-up occurs at irregular intervals. So if you start dividing time into three month intervals, what do you want to do with intervals that contain more than one result, or none at all? Also, how do you want to define the three-month periods? Do we start at a certain date, maybe January 1, 2010? Or do we start each ID's first period at that ID's first date and move forward three months at a time from there? Or something else?

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Ushma Agarwal

Join Date: Nov 2020

Posts: 37
#3

21 Nov 2020, 16:17

Hi I am sorry I am really new to stata. I have already divided into time periods looking at some examples.
It looks like this:

Patient ID Result at 3mo Result at 6mo Result at 9 mo

1 x

1 y

1 z

1

and I was hoping it looks like this:

Patient ID Result at 3mo Result at 6 mo Result at 9 mo

1 x y z

Thanks. I hope it makes sense
Comment
Ushma Agarwal

Join Date: Nov 2020

Posts: 37
#4

21 Nov 2020, 18:32

The start period depends on each individual first date and then move forward 3 months. And also there are multiple entries for each time intervals and I want to keep only observation in each time period so if you can help me with that too it will be great!!

Thanks
Comment
Ushma Agarwal

Join Date: Nov 2020

Posts: 37
#5

21 Nov 2020, 18:48

The start period depends on each individual first date and then move forward 3 months. And also there are multiple entries for each time intervals and I want to keep only observation in each time period so if you can help me with that too it will be great!!

Thanks
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30061
#6

21 Nov 2020, 20:07

And also there are multiple entries for each time intervals and I want to keep only observation in each time period so if you can help me with that too it will be great!!

I don't understand this. Do you mean yhou want to keep only one observation in each time period? If so, which one?
1 like
Comment
Ushma Agarwal

Join Date: Nov 2020

Posts: 37
#7

21 Nov 2020, 20:12

I really dont have any preference whichever appears first or whichever STATA picks up first. But honestly, can you help me re arrange the data first if there is a code for it can you please let me know!!

Thanks Clyde Schechter
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30061
#8

21 Nov 2020, 20:30

It isn't clear to me that you actually have a Stata data set in hand, since your tableaux in #1 and #3 represent illegal Stata variable names. I'm going to assume that you do have one and that it looks like what the following code creates:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(id resultvalue) float date 1 4 18457 1 6 18787 1 6 18879 1 9 19178 2 3 19063 2 2 19156 2 6 19643 end format %td date

If your data set does not actually look like that, then you need to first get it to look like that, i.e. with a real Stata numerical date variable.

From that starting point, you can get what you want with:

Code:

by id (date), sort: gen follow_num = 3*floor((date-date[1])/(365/4)) collapse (first) resultvalue, by(id follow_num) rename resultvalue result__mos reshape wide result_@_mos, i(id) j(follow_num)

I really dont have any preference whichever appears first or whichever STATA picks up first.

Seriously? If you don't have any preference about your data, why bother with data analysis at all? I really consider that attitude irresponsible. The code shown above picks the chronologically first.

Finally, I will just add that you are likely to come to regret re-organizing your data in this way. While there are a few things that are best done with the wide layout you have asked for, the vast majority of data management and analysis commands in Stata work better (or only at all) with the data in the long layout you are starting with. Have you thought through where you are going with this?
Comment
Ushma Agarwal

Join Date: Nov 2020

Posts: 37
#9

21 Nov 2020, 20:50

I'm sorry if I sounded irresponsible but what I meant was within the 3 month time period I don't really have a preference if there are multiple observations, picking one random observation is fine
And my dates do look like this . In fact I don't know if it helps but I also created a new variable followup_time which calculated the days from the start date because I was hoping to separate 3 month time period using that.
Also If it's not too much to ask for can you explain me what this code will do exactly so that I also understand it better to do it next time!!

Thank you so much for your help!!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30061
#10

22 Nov 2020, 09:14

Code:

by id (date), sort: gen follow_num = 3*floor((date-date[1])/(365/4))

creates a variable whose value is an integer multiple of 3, representing the follow-up period. 0 is the period from 0 (first date) to 3 months, 3 is from 3 to 6 months, 6 from 6 to 9 months, etc. It is calculated by first getting the number of days from the observation's date to the same id's first date and then dividing that by the number of days in 3 months (365/4) and then truncating that to an integer.

Code:

collapse (first) resultvalue, by(id follow_num)

aggregates up the data to a single observation in each time period for each id. This one chooses the first it encounters, but because the preceding command sorted the data in chronological order within id, that means the earliest.

Code:

rename resultvalue result__mos

This changes the name of the variable resultvalue to something that the next command can change to the kind of variable names you are looking for.

Code:

reshape wide result_@_mos, i(id) j(follow_num)

This rearranges the data from long to wide layout, creating the new time-period-specific result value variables, numbered appropriately.

I suggest you step back from what you are doing and invest some time in reading the Getting Started [GS] and User's Guide [U] segments of the PDF documentation that comes with your Stata. It will introduce you to the most basic Stata commands that are used in data management and analylsis. They are the "bread and butter" commands. You won't remember every detail, but with this exposure under your belt, you will be able to solve most day-to-day data management problems in Stata, perhaps referring to -help files- or the manual chapters on specific commands to clarify some details.
Comment
Ushma Agarwal

Join Date: Nov 2020

Posts: 37
#11

22 Nov 2020, 14:12

Clyde Schechter
Thank you so much helping out.
When I tried to run the code, its showing me this error:

"Your data are currently long. You are performing a reshape wide. You specified i(mrn07)
and j(follow_num). There are observations within i(mrn07) with the same value of
j(follow_num). In the long data, variables i() and j() together must uniquely identify the
observations."

Can you please help me out as to what is the best way to deal with this error
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30061
#12

22 Nov 2020, 16:05

I have to say that it is hard for me to imagine how this could have happened. I wonder if you made a mistake when you did the -collapse- command, since it should leave behind a data set with only one observation per combination of your id variable and the follow_num variable. Did you perhaps get that one wrong and use some variable other than mrn07 as your id variable in that command?

If that is not the source of your current difficulty, please use the -dataex- command to provide an example data set that illustrates this problem and I will try to troubleshoot it. (See #2 for information about -dataex-.)
Comment
Ushma Agarwal

Join Date: Nov 2020

Posts: 37
#13

23 Nov 2020, 09:53

I will try that again again
Is it possible that it didnt work because some patients have two different start dates?
If so then how do I get STATA to keep the ones with the latest start dates?
Comment
Ushma Agarwal

Join Date: Nov 2020

Posts: 37
#14

23 Nov 2020, 10:05

Clyde Schechter It worked this time but it just shows this table of ID and monthly followup results but all other variables age gender are no longer in the table.
Why did that happen?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30061
#15

23 Nov 2020, 10:25

The -collapse- command eliminates all variables that are not mentioned in it. As you said nothing about any other variables in your original question, I did not tailor the code to do anything with them.

For gender, the simplest solution is to modify the -collapse- command. Since sex doesn't change over time you can just pick the first value--it will be the same as all the others:

Code:

collapse (first) gender result_value, by(id_variable follow_num)

Age is not so simple because it is going to change over time, so it does not lend it self nicely to a one-observation-per-id framework. (This may be another reason why you should reconsider doing this transformation in the first place.) So you need to decide whether you want to pick some particular value of age, e.g. the first, or the average, or the oldest, or.... Alternatively, you could create new variables for age at 0 months, age at 3 months, age at 6 months, etc. by including it in the -by()- option of the -collapse- command and then adding it to the -reshape- command alongside result_@_mos. It all depends on how you plan to use the information.
Comment

ID	RESULT VALUE	RESULT DATE
1	4	7/14/2010
1	6	6/09/2011
1	6	9/09/2011
1	9	7/04/2012
2	3	3/11/2012
2	2	6/12/2012
2	6	10/12/2013

Patient ID	Result at 3mo	Result at 6mo	Result at 9 mo
1	x
1		y
1			z
1

Patient ID	Result at 3mo	Result at 6 mo	Result at 9 mo
1	x	y	z

Announcement

Need help re-arranging data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment