Hello,
(First-time poster to the forum. I have reviewed the guidelines in "Advice on Posting" https://www.statalist.org/forums/help#stata but my sincere apologies if I have not gotten something right!)
I am a student working with a NHANES derived dataset on prescription drug use. Its drug level data needs to be converted to personal level data (i.e. a record for each person) before merging it with NHANES demographic data files by the unique identifier for each individual (variable called seqn).
"Analysts should convert a drug level data to a personal level data, that is, a record for each person, before merging it with NHANES demographic and other data files by using SEQN.”
Source: https://wwwn.cdc.gov/nchs/nhanes/2009-2010/RXQ_RX_F.htm
This is because each individual may take more than one medication. If that occurs, each medication is listed on its own line with the seqn. Within my particular dataset, the number of drugs an individual is on ranges from 0 to 20.
This is an example of how the dataset is currently organized (using medication examples for two individuals, real dataset has 20k)
seqn rxduse rxddrgid rxdcount
73557 1 d00262 2
73557 1 d04113 2
73557 1 d00262 4
73558 1 d04538 4
73558 1 d00746 4
73558 1 d03182 4
rxduse refers to whether someone is using a prescription medication or not, rxddrgid is a string variable referring to the the specific type of medication (each entry corresponds to a medication in a codebook), and rxdcount is the total number of medications the individual is on.
I would like to have it this way
seqn rxduse rxdcount rxddrgid_1 rxddrgid_2 rxddrgid_3 rxddrgid_4
73557 1 2 d00262 d04113 . .
73558 1 4 d00262 d04538 d00746 d03182
I have reviewed posts in the forum, looked at the help system, and visited various sites such as
https://stats.idre.ucla.edu/stata/se...ta-management/
https://cph.osu.edu/sites/default/fi...DataMartin.pdf
Based on reading, I thought this would be an option
gen num =_n, over (seqn)
reshape wide rxddrgid, i(seqn) j(num)
but got the message after the first line
options not allowed
r (101);
Thank you very much in advance. Any advice would be very greatly appreciated! Additionally, I recognize there are likely multiple ways to achieve this data restructuring so please let me know if I should take a different approach entirely.
Sincerely,
Kate L. Taylor
(First-time poster to the forum. I have reviewed the guidelines in "Advice on Posting" https://www.statalist.org/forums/help#stata but my sincere apologies if I have not gotten something right!)
I am a student working with a NHANES derived dataset on prescription drug use. Its drug level data needs to be converted to personal level data (i.e. a record for each person) before merging it with NHANES demographic data files by the unique identifier for each individual (variable called seqn).
"Analysts should convert a drug level data to a personal level data, that is, a record for each person, before merging it with NHANES demographic and other data files by using SEQN.”
Source: https://wwwn.cdc.gov/nchs/nhanes/2009-2010/RXQ_RX_F.htm
This is because each individual may take more than one medication. If that occurs, each medication is listed on its own line with the seqn. Within my particular dataset, the number of drugs an individual is on ranges from 0 to 20.
This is an example of how the dataset is currently organized (using medication examples for two individuals, real dataset has 20k)
seqn rxduse rxddrgid rxdcount
73557 1 d00262 2
73557 1 d04113 2
73557 1 d00262 4
73558 1 d04538 4
73558 1 d00746 4
73558 1 d03182 4
rxduse refers to whether someone is using a prescription medication or not, rxddrgid is a string variable referring to the the specific type of medication (each entry corresponds to a medication in a codebook), and rxdcount is the total number of medications the individual is on.
I would like to have it this way
seqn rxduse rxdcount rxddrgid_1 rxddrgid_2 rxddrgid_3 rxddrgid_4
73557 1 2 d00262 d04113 . .
73558 1 4 d00262 d04538 d00746 d03182
I have reviewed posts in the forum, looked at the help system, and visited various sites such as
https://stats.idre.ucla.edu/stata/se...ta-management/
https://cph.osu.edu/sites/default/fi...DataMartin.pdf
Based on reading, I thought this would be an option
gen num =_n, over (seqn)
reshape wide rxddrgid, i(seqn) j(num)
but got the message after the first line
options not allowed
r (101);
Thank you very much in advance. Any advice would be very greatly appreciated! Additionally, I recognize there are likely multiple ways to achieve this data restructuring so please let me know if I should take a different approach entirely.
Sincerely,
Kate L. Taylor
Comment