Handling data from various years and various time interval for each individual

Stella Jo

Join Date: Oct 2023
Posts: 7

Handling data from various years and various time interval for each individual

27 Oct 2023, 12:39

Hello, I would like to know the detailed method of handling data from various years and various time intervals for each individual.
I aim to construct a new dataset where each individual is paired with a specific test name with specified visit intervals.

The lab tests were conducted on different dates for each ID, with a substantial amount of data generated for each date.
I intend to assign implementation dates to each ID as "visit1", "visit2", and so forth.
Additionally, I want to associate a lab testname with each visit and consolidate the results.

For example, I would like to create variables like "visit1_labtestA", "visit1_labtestB", "visit2_labtestA", and "visit2_labtest B" for each ID.
These variables will contain the respective lab values.

My question is whether it's possible to merge this new dataset in STATA and analyze the effect of the lab test results (such as A or B) within different time intervals on individual outcomes.

ID	Lab date	Lab test name	Lab value
1	2011-01-05	A	0.5
1	2011-01-05	B	0.7
1	2011-02-05	A	0.8
1	2011-02-05	B	0.3
2	2010-01-05	A	1.2
2	2010-01-05	B	1.4
2	2010-04-05	A	1.6
2	2010-04-05	B	1.8
3	2012-01-05	A	0.6
3	2012-02-05	B	0.4
3	2013-03-05	A	0.5
3	2013-04-05	B	0.3
4	2014-01-05	A	0.2
4	2014-02-05	B	0.1

to a new table

ID	visit1_A	visit1_B	visit2_A	visit2_B
1	0.5	0.7	0.8	0.3
2	1.2	1.4	1.6	1.8
3	0.6	0.4	0.5	0.3
4	0.2	0.1

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30155
#2

27 Oct 2023, 13:08

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte id float lab_date str2 lab_test_name float lab_value 1 18748 "A " .5 1 18748 "B " .7 1 18749 "A " .8 1 18749 "B " .3 2 18383 "A " 1.2 2 18383 "B " 1.4 2 18386 "A " 1.6 2 18386 "B " 1.8 3 19114 "A " .6 3 19115 "B " .4 3 19481 "A " .5 3 19482 "B " .3 4 19844 "A " .2 4 19845 "B " .1 end format %td lab_date by id (lab_date lab_test_name), sort: gen visit_num = sum(lab_date != lab_date[_n-1]) egen suffix = concat(visit_num lab_test_name), punct("_") drop visit_num lab_test_name lab_date reshape wide lab_value, i(id) j(suffix) string rename lab_value* visit*

Notes:
In order for this (or any reasonable) code to work, your date variable must be a genuine Stata internal format date variable, as in the example data as shown here. If what you have is a string that human eyes recognize as dates, you must convert it. If you are not familiar with how to do that, read -help daily()-.

You will probably regret doing this. In Stata, nearly all data management and analysis is easier, and sometimes only possible, with the long data layout you are starting from. Wide layouts are used only in a small number of commands, mostly devoted to visual display, and a few older statistical commands that have equivalents that work with long data anyway. So unless you are sure you really need the wide layout for what you will be doing, you really are best of just leaving well enough alone.

This code will fail if the contents of the lab_test_name variable contains characters that are not permissible in Stata variable names. The only permissible characters are digits, letters, and the underscore (_) character. If anything else is there, you will get an error message and execution will halt. To fix that problem, see -help strtoname()- which can clean up the lab_test_name variable.

In the future, when asking for help with code, please use the -dataex- command and show example data, as I have done here. Although sometimes, as here, it is possible to give an answer that has a reasonable probability of being correct, this is usually not the case. Moreover, such answers are necessarily based on experience-based guesses or intuitions about the nature of your data. When those guesses are wrong, both you and the person trying to help you have wasted their time as you end up with useless code. To avoid this, a -dataex- based example provides all of the information needed to develop and test a solution.

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment
Stella Jo

Join Date: Oct 2023

Posts: 7
#3

27 Oct 2023, 13:43

I'm sorry. After learning how to ask questions, I will ask for help. Thank you very much.
Comment

Announcement

Handling data from various years and various time interval for each individual

Comment

Comment