Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fusing multiple rows into a single column

    Hello, I am a beginner using Stata 15. I have a data in the long format which I will like to convert to wide format. I have a column with unique IDs and for each ID the date is the same but the drug options are multiple with up to 26 different dugs possible (but each patient ID has a maximum of seven drugs).
    Present format
    id drug date
    1001 drug A 31 May 2019
    1001 drug B 31 May 2019
    1001 drug C 31 May 2019
    1002 drug A 20 May 2019
    1002 drug B,D 20 May 2019
    1003 drug E 10 May 2019
    1003 drug F 10 May 2019
    1004 drug A, B, C 30 May 2019
    1004 drug A 30 May 2019
    1004 drug C 30 May 2019
    1004 drug B 30 May 2019
    Desired format
    id date drug1 drug2 drug3 drug4
    1001
    1002
    1003
    *where Drug1 represents the drug occurring in the first row for a given ID, Drug 2 for the drug on the second row for same ID … (order is not important)

    I have attempted
    reshape wide drug1 drug2 drug3 drug4 drug5 drug6 drug7, i(id) j(drug)
    with this error message
    variable drug is string; specify string option

    I am unsure of how next to proceed, Please, I need assistance.
    Kind regards.

  • #2
    well, Stata is telling you to use the "string" option because the variable "drug" is a string variable; did you try that? see
    Code:
    help reshape
    and see what it says under options about the string option

    Comment


    • #3
      Welcome to Statalist.

      Rich is correct that Stata told you exactly what your immediate problem was. But in fact you have several problems, most of which involve not understanding how the reshape command works.

      The example below shows an approach to what you describe, but I doubt that it is precisely what you want.
      Code:
      cls
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int id str12 drugs float date
      1001 "A"       21700
      1001 "B"       21700
      1001 "C"       21700
      1002 "A"       21689
      1002 "B,D"     21689
      1003 "E"       21679
      1003 "F"       21679
      1004 "A, B, C" 21699
      1004 "A"       21699
      1004 "C"       21699
      1004 "B"       21699
      end
      format %td date
      
      list, sepby(id) noobs
      sort id date, stable
      by id date: generate row = _n
      rename drugs drug
      reshape wide drug, i(id) j(row)
      order id date
      list, noobs
      Code:
      . list, sepby(id) noobs
      
        +----------------------------+
        |   id     drugs        date |
        |----------------------------|
        | 1001         A   31may2019 |
        | 1001         B   31may2019 |
        | 1001         C   31may2019 |
        |----------------------------|
        | 1002         A   20may2019 |
        | 1002       B,D   20may2019 |
        |----------------------------|
        | 1003         E   10may2019 |
        | 1003         F   10may2019 |
        |----------------------------|
        | 1004   A, B, C   30may2019 |
        | 1004         A   30may2019 |
        | 1004         C   30may2019 |
        | 1004         B   30may2019 |
        +----------------------------+
      
      . sort id date, stable
      
      . by id date: generate row = _n
      
      . rename drugs drug
      
      . reshape wide drug, i(id) j(row)
      (note: j = 1 2 3 4)
      
      Data                               long   ->   wide
      -----------------------------------------------------------------------------
      Number of obs.                       11   ->       4
      Number of variables                   4   ->       6
      j variable (4 values)               row   ->   (dropped)
      xij variables:
                                         drug   ->   drug1 drug2 ... drug4
      -----------------------------------------------------------------------------
      
      . order id date
      
      . list, noobs
      
        +----------------------------------------------------+
        |   id        date     drug1   drug2   drug3   drug4 |
        |----------------------------------------------------|
        | 1001   31may2019         A       B       C         |
        | 1002   20may2019         A     B,D                 |
        | 1003   10may2019         E       F                 |
        | 1004   30may2019   A, B, C       A       C       B |
        +----------------------------------------------------+
      Now some advice, should you need help getting from the above example to what you eventually need.

      Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question.

      I have a feeling the data you showed was created in Excel and copied and pasted into Stata. You should instead go through the task of importing your actual data — not made up things like "drug B,D" —into Stata and using the dataex command to prepare a useful listing that can be copied from the Stata Results window and pasted into your Statalist post. And if indeed your data is in Excel, use the "Import > Excel Spreadsheet" item on Stata's File menu to get it into Stata - don't just copy-and-paste it into the Data Editor window.

      You should select a few observations that do a good job of representing your data. If you indeed have observations that include several drug names within one observation of "drugs" how are they separated?

      You should show us what you want the results to be like, using a few of the observations you provide as examples.

      Finally, you should explain how you hope to use this data. It seems likely to me that having "penicillin" be drug1 on one observation and drug5 on another is going to cause problems. But then, perhaps I've completely misunderstood what you want.

      Comment


      • #4
        Dear William,
        Thank you, and I will use your advice in my subsequent posts (which I suspect wouldn't be long from now).

        Comment

        Working...
        X