Selection one variable among the multiple variables

Stella Jo

Join Date: Oct 2023

Posts: 7
#1

Selection one variable among the multiple variables

07 Nov 2023, 12:14

Hello, I am beginner of STATA, and look forward to your guidance.

I want to make a new variable: baseline_age of my dataset.

I have to assemble baseline age of the dataset.
However, some of the ID does not have the data of age of visit_number 1.
The individual who has no age data at visit_number1 has the record of age the next year, therefore, I want to make a new variable: baseline age by substration -1 from the next year in this ID.
For example, ID number 1, and 2 have age data in visit_number 1, also has the next year data of age, visit_number 2.
However, ID number 3, and 4 doesn't have age data in visit_number 1, only have the data of age in next year, visit_number 2.

My dataset is like as below:

Code:

ID visit_number age 1 1 48 2 2 49 2 1 25 2 2 26 3 1 . 3 2 29 4 1 . 4 2 35

Therefore, I made a code like as below:

Code:

generate age1=age if visit_number==1 generate age2=age if visit_number==2 generate age2_1=age2-1 generate baseline_age=age1 replace baseline_age=age2_1 if age1 == .

The dataset has been changed like as below: Sorry to not upload as dataex. I don't know why dataex is not working in my STATA.

Code:

ID visit_number age age1 age2 age2_1 baseline_age 1 1 48 48 . . 48 1 2 49 . 49 48 48 2 1 25 25 . . 25 2 2 26 . 26 25 25 3 1 . . . . . 3 2 29 . 29 28 28 4 1 . . . . . 4 2 35 . 35 34 34

I want to make baseline_age variable from ID 1,2 from visit_number 1 row, and ID 3, 4 from visit_number 2.
However, this modified dataset has same structure in ID 1,2, (visit number 2) and ID 3,4, (visit number2).
I want to know the methods to selection of age variable from different row.

Thank you for your understanding.
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30147

07 Nov 2023, 13:09

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id visit_number age)
1 1 48
2 2 49
2 1 25
2 2 26
3 1  .
3 2 29
4 1  .
4 2 35
end

by id (visit_number), sort: replace age = age[2] - 1 if _n == 1 & missing(age[1])

Comment

Stella Jo

Join Date: Oct 2023

Posts: 7
#3

07 Nov 2023, 14:04

Thank you for your replying!

That command makes me the dataset which has ID has 2 rows of age1 and age2.
I want to keep the row of ID which has visit_number 1 and age1, despite they have the visit_number2 and age2, because age1 is their baseline_age. (the row ID #1_age1, ID #2_age1)
Also, I want to keep the only row of ID which has visit_number_2 and age2 who doesn't have visit_number1 and age 1. (the row of ID #3_age2, ID#4_age2)

Code:

id visit_number age 1 1 48 2 2 49 2 1 25 2 2 26 3 1 . 3 2 29 4 1 . 4 2 35

I want to keep baseline_age to unique ID.

If I choose the command drop, the whole row has to be dropped.
I want to know the appropriate command to selection and drop.
Please understand if I wrote something confusing.

Thank you very much!!
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30147

07 Nov 2023, 14:30

OK, I'm not entirely sure I understand what you want, but perhaps it is this. Try it and see if what it produces is what you need.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte(id visit_number age)
1 1 48
2 2 49
2 1 25
2 2 26
3 1  .
3 2 29
4 1  .
4 2 35
end

by id (visit_number), sort: gen baseline_age = min(age[1], age[2]-1)
by id (visit_number): keep if _n == cond(missing(age[1]), 2, 1)

Comment

Stella Jo

Join Date: Oct 2023

Posts: 7
#5

07 Nov 2023, 15:14

Thank you so much of your guidance.

I did as you recommended, and I can omit the row of ID who has no visit_number1_age1.

Code:

id visit_number age 1 1 48 2 2 49 2 1 25 2 2 26 3 2 29 4 2 35

However, I couldn't the delete the row of visit_number2_age2 of the ID who has both of visit_number_age1 and visit_number_age2.
I want to make a dataset like as below:

Code:

id visit_number age 1 1 48 2 1 25 3 2 29 4 2 35

Thank you for your understanding!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#6

07 Nov 2023, 15:20

Your new input is very different from the original input. Starting from the new input, you can do this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(id visit_number age) 1 1 48 1 2 49 2 1 25 2 2 26 3 2 29 4 2 35 end by id (visit_number), sort: keep if _n == 1

Note: In your input in #5, you give the second observation as 2 2 49. I believe you meant 1 2 49, and I changed it accordingly in the -dataex- output for this post.
Comment
Stella Jo

Join Date: Oct 2023

Posts: 7
#7

07 Nov 2023, 15:47

Oh, Thank you so much!

I can make a dataset which I wanted!!!
Thank you very much for your guidance.
I will study thoroughly the command you taught me and use them in the future when I organize the dataset.

Code:

generate age1=age if visit_number==1 generate age2=age if visit_number==2 generate age2_1=age2-1 by id (visit_number), sort : gen baseline_age = min (age1, age2_1) by id (visit_number): keep if _n == cond(missing(age1), 2, 1) by id (visit_number), sort : keep if _n==1
Comment

Announcement

Selection one variable among the multiple variables

Comment

Comment

Comment

Comment

Comment

Comment