Hi everyone,
I'm working with a WGS dataset that is long format with multiple observations per participant. Ultimately I need to reshape the dataset to a wide format so that I can append it with the clinical data of the participants. Before that I want to make some modifications and restructure the data.
For example I want to work with 3 variables: generic_id (participant identifier), gene (specifies the gene located for each observation in the long data), and drug (specifies the antibiotic that the gene mutation conferred a resistance to), all 3 are strings.
Based on the generic_id an the drug I would like to concatenate the genes (multiple genes can confer resistanceto the same drug, likewise one gene can confer resistance to multiple drugs)
This way I'm hoping to have a variable that is the same for each drug within a particpant listing all genes responsible for conferring resistance to this drug.
I have read this article: https://journals.sagepub.com/doi/pdf...36867X20909698
and tried the following:
But this does not work in my case. The number of observations for each generic_id is not constant in my dataset. Would appreciate any nudge or ideas of how to do this. Thanks!
I'm working with a WGS dataset that is long format with multiple observations per participant. Ultimately I need to reshape the dataset to a wide format so that I can append it with the clinical data of the participants. Before that I want to make some modifications and restructure the data.
For example I want to work with 3 variables: generic_id (participant identifier), gene (specifies the gene located for each observation in the long data), and drug (specifies the antibiotic that the gene mutation conferred a resistance to), all 3 are strings.
Based on the generic_id an the drug I would like to concatenate the genes (multiple genes can confer resistanceto the same drug, likewise one gene can confer resistance to multiple drugs)
This way I'm hoping to have a variable that is the same for each drug within a particpant listing all genes responsible for conferring resistance to this drug.
I have read this article: https://journals.sagepub.com/doi/pdf...36867X20909698
and tried the following:
Code:
bysort generic_id(drug): generate gene2= gene[1] by generic_id: replace gene2= gene2[_n-1] + gene if _n>1 by generic_id: replace gene2=gene[_N]
Comment