Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reordering data from line/test to line/date by date of attendance and patient ID (then converting from long to wide format) CODING HELP

    I have a data set I need to merge with another by patient id & date of attendance. Currently I have the data in long format by ‘test’, this means there could be the same patient recording multiple tests on the same day, and as it does not count as a duplicate observation (I cannot remove it), it means I can no longer uniquely identify for merging based on ID and attendance date.

    Current variables: pat_id; date_of_attendance; test

    Example:
    Pat_id Date_of_attendance test
    1 04/05/2022 1
    1 04/05/2022 2
    2 04/05/2022 5
    3 06/05/2022 1
    3 06/05/2022 3
    3 09/05/2022 7
    (dates are in 'DMY' format)


    What I want to do:

    Sort the data by patient ID & date of attendance, where there are multiple observations on the same day by the same patient create a new variable such as ‘Test_number: 0, 1, 2, 3’. This would then reset if the day changes and/or the patient ID changes.

    i.e.
    Pat_id Date_of_attendance test Test_no
    1 04/05/2022 1 0
    1 04/05/2022 2 1
    2 04/05/2022 5 0
    3 06/05/2022 1 0
    3 06/05/2022 3 1
    3 06/05/2022 6 2
    3 09/05/2022 7 0
    I would then use the reshape command by ‘test_no’ to change from long format to wide format so I get data recorded per date instead of per test.

    The wide format would then consist of: Pat_ID; date_of_attendance; test1; test2; test3; test4; etc…

    i.e.
    Pat_id Date_of_attendance Test0 Test1 Test2
    1 04/05/2022 1 2 .
    2 04/05/2022 5 . .
    3 06/05/2022 1 3 6
    3 09/05/2022 7 . .
    Last edited by Jack Brown; 25 May 2022, 12:07.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte pat_id float date_of_attendance byte test
    1 22769 1
    1 22769 2
    2 22769 5
    3 22771 1
    3 22771 3
    3 22771 6
    3 22774 7
    end
    format %td date_of_attendance
    
    by pat_id date_of_attendance, sort: gen _j = _n-1
    reshape wide test, i(pat_id date_of_attendance) j(_j)
    will do what you ask.

    That said, it isn't clear that you need to do this. If pat_id and date_of_attendance uniquely identify observations in the data set that you want to -merge- this with, then it isn't necessary for them to do so in this data set. You can just -merge m:1- and everything will be fine.

    If pat_id and date_of_attendance do not uniquely identify observations in the data set you want to -merge- this one with, then the end result of the -merge- process may prove unwieldy or unusable. That's because you will now have multiple observations for each pat_id date combination, the multiplicity being due to differences on some other variable(s) not described here, and with each of those observations containing the data on all of the tests. Depending on what you want to analyze, that may be a problem. For most kinds of analysis, you will want separate observations for each test, i.e. long layout. But then the question arises, which value of the variable(s) not described here should be associated with which test(s)? You will need to figure that out and then find a way to implement that.

    By the way, if the answer is that every value of the variable(s) not described here should properly be associated with every test, and you also need the tests in long layout, then the solution is not along the lines of #1, but to abandon -merge- altogether and combine the datasets using -joinby-.

    So before you lunge ahead with this, do think ahead about how you will manage this situation. You may have no problem at all following your original plan. Or you may need to completely rethink your approach. Or there may be a simple solution, which may or may not require you to do what you asked in #1.

    Added: In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thank you so much for this reply! Worked how I wanted it to go, but I have taken on board all your subsequent comments! Much appreciated.

      Comment

      Working...
      X