Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Modifying a dataset to facilitate merging

    Hello,

    I am working with NHANES data and have been merging datasets together using a 1:1 merge with a unique identifier "seqn." Each survey respondent is assigned a different sequence number (seqn) which makes it easy to merge the different datasets. This was all going well, but one of the datasets I want to merge (on prescription drug data) with the others contains multiple entries for each "seqn" so that it is not a unique identifier. In other words, if a person were taking multiple prescription drugs there would be multiple data entries that correspond to the same sequence number. I am trying to figure out a way to reorganize the data so that there is a single sequence number that corresponds to each drug a person is taking. Please let me know how best I can do this. I am new to stata.

  • #2
    What is the year and what is the name of the data component? Is it this one? https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/RXQ_RX_J.htm

    Ultimately, what is the objective of this analysis? Aka, what are you trying to do with the information? In 2017-18, there are 666 unique drug codes, which can add 666 variables to your current data. Is that what you want? Try not to focus on "packing everything into the data set," but "how would I get the information needed for the analysis?" In many cases, it may actually be better to process in this sub-component first, aggregate down to individual level, then merge.
    Last edited by Ken Chui; 16 Jun 2021, 15:15.

    Comment


    • #3
      Hi,
      I am using the 2003-2004 version of that data: https://wwwn.cdc.gov/Nchs/Nhanes/200...ng_and_Editing. The objective of the analysis is to compare different groups of people who are and are not taking specific types of drugs to treat diabetes. To do this, I thought I would need to merge this dataset with demographic data and other data sets containing relevant bio-measurements.

      Comment


      • #4
        This sounds to me like a 1:m merge, where the "1" is the file of individuals, and the "m" denotes the file with the several drug observations. Merging these would produce a file with multiple entries per individual (the so-called "long" format or layout). The long layout is generally the most useful in Stata, although your situation might be an exception. In any event, one can easily start with the long layout and change it to the "wide" layout. First, though, you need to do the merge. See -help merge- if you haven't already.

        Comment


        • #5
          Originally posted by Harry Sultan View Post
          Hi,
          I am using the 2003-2004 version of that data: https://wwwn.cdc.gov/Nchs/Nhanes/200...ng_and_Editing. The objective of the analysis is to compare different groups of people who are and are not taking specific types of drugs to treat diabetes. To do this, I thought I would need to merge this dataset with demographic data and other data sets containing relevant bio-measurements.
          In that case, it may be easier to identify what drugs are "diabetes" related, flag them, and then collapse the data into individual level, then merge back to the demographic file. Here is an example:

          Code:
          * Import data
          import sasxport5 "https://wwwn.cdc.gov/Nchs/Nhanes/2003-2004/RXQ_RX_C.XPT", clear
          
          * Find out the codes of diabetes drugs (e.g. https://wwwn.cdc.gov/nchs/nhanes/1999-2000/RXQ_DRUG.htm)
          * For instance Tolbutamide is d00394; Glimepiride is d03864, etc
          
          * Build an indicator:
          gen dmdrug = 0
          replace dmdrug = 1 if rxddrgid == "d00394"
          replace dmdrug = 1 if rxddrgid == "d03864"
          * So on so forth
          
          * Collapse into unique seqn level data:
          collapse (max) dmdrug, by(seqn)


          Comment

          Working...
          X