Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to map data better

    Hey all,

    I'm looking for some advice re mapping my data set better.

    I have a cohort of patients with a number of health service visits, each of these visits come with diagnostic codes and the date at which they occurred linked to a unique patient identifier.

    I would like to limit my dataset to only one set of values per patient so I can run my analysis more easily. I have around 20 variables total.

    Any help would be much appreciated.

    Cheers,
    A

    dataex patient_id hlth_dx_icd9x_code_1 Healthservice_enddate

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long patient_id str6 hlth_dx_icd9x_code_1 str9 Healthservice_enddate
    1 "780"   "13OCT2015"
    1 "783.0" "12APR2013"
    1 "V65.1" "17JUN2014"
    1 "825"   "30MAR2015"
    1 "V65.1" "24JUN2018"
    1 ""      "19AUG2016"
    1 "915"   "05DEC2017"
    1 "842"   "07NOV2016"
    1 "780"   "23OCT2015"
    1 "910"   "18SEP2012"
    1 "V70"   "23OCT2017"
    1 "V65.1" "17MAR2015"
    1 "780"   "29AUG2013"
    1 "691"   "09APR2012"
    1 "V65.1" "18NOV2017"
    1 "009.0" "27JUN2012"
    1 "780"   "27DEC2017"
    1 "845"   "09MAR2015"
    1 "462"   "30MAY2016"
    1 "595"   "08FEB2013"
    1 "465"   "05JAN2016"
    1 "V68"   "01DEC2017"
    1 "780"   "15FEB2016"
    1 "915"   "03DEC2017"
    1 "558"   "12JAN2014"
    1 "372.0" "20JUL2013"
    2 "780"   "31DEC2012"
    2 "078.1" "13MAR2016"
    2 "078.1" "18JUN2015"
    2 "780"   "30JUL2015"
    2 "780"   "11APR2013"
    2 "V20.2" "12OCT2017"
    2 "259.1" "31JUL2014"
    2 "780"   "27NOV2014"
    2 ""      "08DEC2016"
    2 "780"   "04JUN2015"
    2 "V20"   "16AUG2016"
    2 "078.1" "09JUN2015"
    2 "259.1" "03APR2014"
    2 "078.1" "18JUN2015"
    2 "259.1" "26JUN2014"
    2 "780"   "24DEC2012"
    2 "078.1" "26JUN2015"
    2 "780"   "15MAY2014"
    2 "259.1" "31JUL2014"
    2 "078.1" "18FEB2016"
    2 "780"   "30JUL2015"
    2 "V70"   "09JUN2015"
    2 "259.1" "17JUL2014"
    2 ""      "24DEC2015"
    2 ""      "31DEC2014"
    2 "259.1" "26JUN2014"
    2 ""      "19MAY2016"
    2 "780"   "29JUN2015"
    2 "259.1" "17JUL2014"
    2 "078.1" "26JUN2015"
    2 ""      "05APR2014"
    2 "078.1" "16AUG2016"
    2 "078.1" "28FEB2016"
    2 "616.1" "15MAY2014"
    2 "780"   "29JUN2015"
    2 "259.1" "14OCT2014"
    2 "078.1" "20AUG2015"
    2 "V20"   "26AUG2016"
    2 "078.1" "20AUG2015"
    2 "379.4" "19MAY2016"
    2 "780"   "29JUN2015"
    2 "078.1" "13MAR2016"
    2 "078.1" "28FEB2016"
    2 "780"   "05JUN2017"
    2 "V20.2" "12OCT2017"
    2 "780"   "15MAY2013"
    2 "078.1" "18FEB2016"
    2 "780"   "27NOV2014"
    2 "616.1" "15MAY2014"
    3 "848"   "14NOV2017"
    3 "845"   "14NOV2017"
    3 "780"   "23MAR2018"
    3 "786"   "03FEB2015"
    3 ""      "14NOV2017"
    3 ""      "03FEB2015"
    3 ""      "14NOV2017"
    3 "V70"   "21JAN2013"
    3 "789.0" "20NOV2014"
    3 "460"   "19JAN2015"
    3 "V70"   "05JAN2016"
    3 "787"   "06APR2018"
    3 "034.0" "03FEB2014"
    3 ""      "15FEB2018"
    3 "782"   "21APR2016"
    4 "780.6" "11FEB2013"
    4 "372"   "21APR2015"
    4 "078.1" "10DEC2015"
    4 "216.7" "19JUN2017"
    4 "V70"   "29FEB2016"
    4 "382"   "05NOV2013"
    4 "078.1" "05JUN2017"
    4 "V70"   "28MAY2012"
    4 "487"   "21FEB2013"
    4 "078.1" "05AUG2015"
    end

  • #2
    I am unsure what you are after.
    If, for example, you had a list of codes from a single visit (and therefore single date), you could reshape this list, and make dummy variables for all possible codes.
    However, because each code comes with a specific date, you would still end up with the same number of observations/lines as you currently have.

    So please explain a bit more on what you would want the final result to look like, preferably with a short data example of the final result.

    Comment


    • #3
      Hey Jorrit,

      Thank you for the reply.

      Since I have patients with multiple visits, my data is set up like the above, where each new visit is a new line, contacting all the variables for each patient from many to one.

      ie. I would like to reshape the list so that there is only one id per patient and each visit would be in line with that ID.

      Example:
      Patient ID: 1 Visit 1 Code Visit 1 Date Visit 2 Code Visit 2 Date Visit 3 Code Visit 3 Date
      Patient ID: 2 Visit 1 Code Visit 1 Date Visit 2 Code Visit 2 Date Visit 3 Code Visit 3 Date

      Right now my data set is as follows:
      Patient ID: 1 Visit 1 Code Visit 1 Date
      Patient ID: 1 Visit 2 Code Visit 2 Date
      Patient ID: 1 Visit 3 Code Visit 3 Date
      Patient ID: 2 Visit 1 Code Visit 1 Date
      Patient ID: 2 Visit 2 Code Visit 2 Date
      Patient ID: 2 Visit 3 Code Visit 3 Date

      Comment


      • #4
        Okay. SO the code for that would be:
        Code:
        gen date = daily(Healthservice_enddate, "DMY")
        format date %td
        drop Healthservice_enddate
        ren hlth_dx_icd9x_code_1 code
        bys patient_id (date): gen n=_n
        reshape wide code date, i(patient_id) j(n)
        In your opening post you state you want this because your analyses will be made more easily.
        I doubt that that will be the case with this data format, at least not with most analyses you would want to do in Stata.
        So you could also post a follow up question, on how to get some results or data manipulation youre after once you have this shape of data. Likely the answer will be you'd have to reshape back to long.

        Comment

        Working...
        X