Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • help with parsing out a string variable and corresponding date variable across multiple different variables

    I have tried to search for an answer to this but I fear I am not searching for the right thing because I cannot find the answer.

    I have a long list (300,000+ entries) of medications administered (med_name) and the date they were given (med_date). The list is long and I've converted to wide. Some individuals have 5 meds but others have >1000.

    I need to collapse these data into a usable format, such that I am able to pull out individual meds as a dichotomous variable. For example, I'd like to create 'bloodpressure_med' as a variable, and have it coded 1/0 based on if a list of ~10 blood pressure meds is found anywhere in the list of med1, med2, med3, med4... list. I have tried the strpos(med*,"blood pressure")>0 but I receive an error that med* is not a valid variable name. It seems extremely inefficient to perform separate commands across the list of 1000+ medications to parse out individual meds for each person. I also need the med_date for that variable. so ideally I'd convert my current dataset of:
    id med1 med1_date med2 med2_date
    1 bloodpressure 2/1/2018 steroid 3/1/2018
    2 steroid 4/5/2017 insulin 4/5/2017
    3 insulin 1/5/2016
    4 insulin 3/15/2017 bloodpressure 3/1/2016
    ... and so forth

    into this:
    id bloodpressure bloodpressure_date insulin insulin_date
    1 1 2/1/2018 0
    2 0 1 4/5/2017
    3 0 1 1/5/2016
    4 1 3/1/2016 1 3/15/2017
    ... and so forth



    how can I do this efficiently across 1000's of variables ?

    thank you in advance - I am learning so appreciate the help and patience with what may be a simple question!!

  • #2
    Welcome to the Stata Forum / Statalist.

    Please read the FAQ, particularly the topics about sharing data/command/output. In short, you may use - dataex - or the code delimiters for that matter.

    That being said, and assuming med1 and med2 are strings, you may start by: gen bloodpressure = med1 == "bloodpressure" | med2 == "bloodpressure". You can do the same with "insulin". After that, you may type: gen bloodpressure_date = med1_date if bloodpressure == 1. then: replace bloodpressure_date = med2_date if med2 == "bloodpressure". You can do the same with insulin_date.

    Hopefully that helps.
    Best regards,

    Marcos

    Comment


    • #3
      Thank you! I am wondering though how I will get around the fact that some subjects have 1000 meds - anyway to get around typing 1000 meds? e.g., it will take a long time to get to med1000

      Comment


      • #4
        Much depends on what you want to do with this, but I wouldn't advise the layout you are seeking. Your data example (as Marcos Almeida suggests, it is better to use dataex) is such that your date variable looks like a string, but what is below will work fine if it is a Stata daily date variable.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte id str13 med1 str9 med1_date str13 med2 str8 med2_date
        1 "bloodpressure" "2/1/2018"  "steroid"       "3/1/2018"
        2 "steroid"       "4/5/2017"  "insulin"       "4/5/2017"
        3 "insulin"       "1/5/2016"  ""              ""        
        4 "insulin"       "3/15/2017" "bloodpressure" "3/1/2016"
        end
        
        rename (med*_date) (date*)
        
        reshape long med date , i(id) j(seq) 
        drop if missing(med)
        list, sepby(id)
        
             +--------------------------------------+
             | id   seq             med        date |
             |--------------------------------------|
          1. |  1     1   bloodpressure    2/1/2018 |
          2. |  1     2         steroid    3/1/2018 |
             |--------------------------------------|
          3. |  2     1         steroid    4/5/2017 |
          4. |  2     2         insulin    4/5/2017 |
             |--------------------------------------|
          5. |  3     1         insulin    1/5/2016 |
             |--------------------------------------|
          6. |  4     1         insulin   3/15/2017 |
          7. |  4     2   bloodpressure    3/1/2016 |
             +--------------------------------------+

        Comment


        • #5
          Do note that Nick started by returning your dataset to a long layout, likely similar to what you started with.

          The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. If there is some compelling reason you need to go back to wide layout, you can do that when the time comes. But try to keep you data in a long layout until going wide becomes a necessity.

          Comment

          Working...
          X