Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging linked data

    I am hoping to get some advice on linking data between a spreadsheet that contains baseline (cross-sectional) parameters (master dataset) and a spreadsheet that contains longitudinal (outcome) data (using dataset), related to hospital encounters with admission date and ICD codes. Both data sheets have a column with a unique participant identifier.

    The using dataset contains many rows with the same participant, given they had multiple hospital encounters.

    As such, when I try to undertake ' merge m:1 varlist [id] using filename ' , it returns an error stating "variable varlist does not uniquely identify observations in the using data".

    I have tried to troubleshoot this by performing ' reshape wide varlist [ICD code], i(varlist [id]) j(varlist [admission date]) ' but again return an error "variable varlist [admission date] takes on too many values.

    Is there another way I can work this through such that I can merge my two data sheets so all hospital encounters with ICD codes and dates are included in the master dataset?

  • #2
    What is the precise merge command you are using? Is it truly

    Code:
    merge m:1 varlist [id] using filename
    or did you only include id as the matching variables, like so?

    Code:
    merge m:1 id using filename
    if you typed something like the first command, then it’s not legal syntax. If your command was instead like the second example, then you probably don’t have unique patient ids. Investigate this by typing the following, replacing id with the name of your id variable.

    Code:
    isid id

    Comment


    • #3
      Hi Leonardo,

      The 2nd code as you have written.

      Is there an alternative way to solve the issue I have raised?

      Comment


      • #4
        It's difficult to say without a minimal working example. Your hospital encounter data with one observation per patient and encounter would be your master dataset. This will necessarily have multiple observations for patients that had multiple encounters. The using dataset is your baseline demographic and characteristics dataset, with one observation per patient. Both must be linked by the same patient id (and in Stata, that means they have to be the same datatype, too). Then you can perform a -merge m:1 id using ...- or -frlink m:1-. The manual shows examples for each command. If that doesn't work, then post back with some example data using -dataex-. It doesn't have to be your actual data since there may be privacy or legal issues in sharing it, but you should then provide similar fake data which we can import to Stata and work with.

        Comment


        • #5
          If I am understanding #1 correctly, the master data set is the one with baseline data (one observation per patient) and the using data set is the one with longitudinal data (encounters). If that is correct, then the merge should go like this:
          Code:
          use baseline_data, clear
          merge 1:m id using longitudinal_data // N.B. not m:1
          If that does not resolve the problem, I think that rather than everybody flying blind here, it would make sense for O.P. to post back with example data from both data sets (using the -dataex- command, of course) and the exact code that was used to attempt the -merge-. If the data set presents confidentiality issues, I recommend changing the id variable to a false id based on consecutive numbers starting at 1, making sure to use the same false id for observations that have the same true id. Then show only the id variable and, in the case of the longitudinal data set, the date variable (which can be falsified in whatever way you like). It is the data structure, not the actual data values, that matter for present purposes.

          If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.


          Comment

          Working...
          X