Creating a new string variable conditioned on multiple other variables

Li Huang

Join Date: Jul 2019
Posts: 9

Creating a new string variable conditioned on multiple other variables

05 Jul 2019, 14:38

Hello,

I am relatively new to Stata and I am confused at how to approach problem. I read through gen, egen, and what I could find that seemed relevant in the forloops help guide. This may be a case of misunderstanding loops or not searching the correct terms used in Stata. Also, please forgive me for not using dataex as this dataset and question would contain PHI.

I have recreated the example substituting names in the data that are of equivalent type but are not PHI.

I have a numeric ID variable "ID" that has multiple of the same value within the column. This is not the primary identifier in the dataset and I am assuming this is a reason for why it has multiples of the same value.
I have a string variable indicating the round of visit, i.e. Round 1, Round 1.2, Round 2 where each round represents some specific activity.
I have a randomization status variable (string) that represents what the random assignment was, i.e. Group 1 or Group 2.
The dataset was given to me as "complete" and so at this point I do not have the option of going back to look at potential flaws from the merges of separate spreadsheet files.

The issue is that the randomization status was only listed for "Round 1" of the respective ID values. I cannot simply list and type in values because there are too many unique ID values. Also that is more error prone.

Example (have):

ID	Round (string)	Randomization
1	Round 1	Group 1
2	Round 1	Group 2
3	Round 1	Group 1
1	Round 1.2
2	Round 1.2
3	Round 1.2
1	Round 2
2	Round 2
3	Round 2

I would like to create a variable that will have a completed column of Randomization values based on what the ID's respective Randomization value is in Round 1. Each ID value does not have the same number of Rounds as some patients completed the entire study, while others had differing completion of the Rounds.

Example (want):

ID	Round (string)	Randomization	Randomization_Complete
1	Round 1	Group 1	Group 1
2	Round 1	Group 2	Group 2
3	Round 1	Group 1	Group 1
1	Round 1.2		Group 2
2	Round 1.2		Group 2
3	Round 1.2		Group 1
1	Round 2		Group 2
2	Round 2		Group 2
3	Round 2		Group 1

Thank you in advance,

LH

Last edited by Li Huang; 05 Jul 2019, 14:48.

Tags: data, loop, string, syntax

Jean-Claude Arbaut

Join Date: Jul 2017

Posts: 209
#2

05 Jul 2019, 15:19

In your example, the Round 1 matches Ids 1, 2, 3 with respectively Group 1, Group 2 and Group 1. But the association is not the same in the other two (Group 2, 2, 1instead of 1, 2, 1).
Is this correct?

I ask because 1/ if IDs identify people and people are in a randomization group, I would expect they stay in the same group all along. and 2/ if the IDs does not allow to find the group, how are we supposed to find the group?

If on the other hand you meant repeating Group 1 2 1 (that is, there is a mapping ID->Group), then here is a solution:

Code:

preserve keep if Round=="Round 1" /* or maybe 'keep if !mi(Randomization)' */ keep ID Randomization rename Randomization Randomization_Complete save temp restore merge m:1 ID using temp

That is, extract only the Group data you know, and merge with the original file.
Comment
Li Huang

Join Date: Jul 2019

Posts: 9
#3

05 Jul 2019, 15:38

Yes, my mistake. The "Group #" should stay the same. Your solution is correct and simple. Thank you very much.
Comment

Announcement

Creating a new string variable conditioned on multiple other variables

Comment

Comment