Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • study2: merging data

    Dear all,

    I am trying to merge 3 data with different numbers of records for every ID (all data are from one study). Each participants in three data has different number of visits. For example: id 1 has visit 0, 1, 4, 6 and 8 in Data No1, then same person has visits 0, 1, 2, 3, 4, 5, 6 in Data No2. When I merge them it creates a lot of missing variables. I would like to match 2 additional data (Data No2 and Data No3) to main one (Data No1) by id and visit and drop extra visits.

    Is it possible to do this with STATA?

    Thank you in advance.
    Oyun





    Last edited by Buyadaa Oyunchimeg; 08 Aug 2018, 22:15.

  • #2
    I can assure you it is possible to do this in Stata. I can also assure you that without seeing good, representative examples of the data in each of these three data sets, it is unlikely that any code that somebody guesses at will actually work.

    So please post back with example data from all three data sets. Use -dataex-, of course. And make sure the examples you show contain some observations that you actually want to be linked together by the joining process.

    Also, do not assume that it will be obvious to others which observations should be linked together and which not. Provide a careful, detailed explanation of that, drawing on the examples you post.

    If you do that, I'm confident that you will get a helpful response in short order.

    Comment


    • #3
      Thank you so much prof.Schechter. As suggested I've tried to create an example data.
      I would like to combine Data 2 and Data 3 to Data 1 according to ID and visit of Data 1 and drop unmatched visits from Data 2 and 3. Each ID has different numbers of visit in all three data. For example: id 1 has visit 1, 4, 8, 12 , 24 and 36 in Data No1, then same person has visit 1, 2, 3, 4, 8, 12, 16, 20, 24... 36 in Data No2 and 1, 4, 8, 12, 16, 20, 24, 28, 32 and 36 in Data 3. I would like to create data where ID1 has visit 1, 4, 8, 12 , 24 and 36 and related variables to these visits (creatinine, gfr, sbp, dbp and glucose) only.



      Thank you in advance.
      Oyun


      Data looks as below:

      PHP Code:

      Data No1 
      (main):

      [
      CODE]
      Example generated by -dataex-. To installssc install dataex
      clear
      input long id byte visit double
      (screat gfr)
      1  1 1.7 53.7
      1  4 1.7 43.8
      1  8 1.5 50.5
      1 12 1.5 50.4
      1 24 1.6 46.7
      1 36 1.9 38.1
      2  1  .8   95
      2  4 1.4 49.7
      2  8   
      .    .
      2 12 1.2 59.3
      2 20 1.2 59.1
      2 24  .9 82.3
      end
      [/CODE
      PHP Code:

      Data No2
      :

      [
      CODE]
      Example generated by -dataex-. To installssc install dataex
      clear
      input long id byte visit int
      (sbp dbp)
      1  1 127 77
      1  1 140 83
      1  2 133 81
      1  3 126 74
      1  4 117 69
      1  6 130 75
      1  8 121 69
      1 12 125 72
      1 16 123 67
      1 20 124 69
      1 24 118 69
      1 28 117 59
      1 30 125 66
      1 32 121 65
      1 34 111 65
      1 36 115 60
      2  1 143 78
      2  4 146 93
      2 12 125 72
      2 16 152 85
      2 24 107 61
      end
      [/CODE
      PHP Code:

      Data No3
      :

      [
      CODE]
      Example generated by -dataex-. To installssc install dataex
      clear
      input long id byte visit double glucose_level
      1  1 8.7
      1  4 7.3
      1  8 7.9
      1 12 7.7
      1 20 8.5
      1 24   7
      1 28 7.4
      1 32 7.8
      1 36 7.3
      2  1 7.9
      2  4 6.5
      2  8 6.7
      2 12 6.7
      2 16 6.4
      2 20 6.7
      2 24 6.8
      end
      [/CODE
      Last edited by Buyadaa Oyunchimeg; 09 Aug 2018, 00:19.

      Comment


      • #4
        There's usually a choice between merge or joinby with these types of assignments.
        With merge, either the master or using data needs to have unique (ie., non-repeated) id variables, here (id visit)

        With merge, you can get there, but you have to pay attention to which of the datasets has unique combinations of (id visit). Preventing all the missing observations means dropping those observations that came out of the using dataset, like so:


        Code:
        use data1, clear
        merge 1:m id visit using data2
        drop if _merge==2
        drop _merge
        merge m:1 id visit using data3
        drop if _merge==2
        drop _merge
        But I would advise using joinby, which creates pairwise combinations for all combinations. With joinby, you can also specify which observations to keep, those from teh master, from the using, or only those that appear in both:
        Code:
        use data1, clear
        joinby id visit using data2, unmatched(master)
        joinby id visit using data3, unmatched(master)

        Comment


        • #5
          Thank you so much Mr.Gosens. As suggested I've tried joinby in example data and it works.

          Comment

          Working...
          X