Fusing multiple rows into a single column

Vitalis Feteh

Join Date: Jun 2019
Posts: 6

Fusing multiple rows into a single column

03 Jun 2019, 03:57

Hello, I am a beginner using Stata 15. I have a data in the long format which I will like to convert to wide format. I have a column with unique IDs and for each ID the date is the same but the drug options are multiple with up to 26 different dugs possible (but each patient ID has a maximum of seven drugs).
Present format

id	drug	date
1001	drug A	31 May 2019
1001	drug B	31 May 2019
1001	drug C	31 May 2019
1002	drug A	20 May 2019
1002	drug B,D	20 May 2019
1003	drug E	10 May 2019
1003	drug F	10 May 2019
1004	drug A, B, C	30 May 2019
1004	drug A	30 May 2019
1004	drug C	30 May 2019
1004	drug B	30 May 2019

Desired format

id	date	drug1	drug2	drug3	drug4
1001
1002
1003

*where Drug1 represents the drug occurring in the first row for a given ID, Drug 2 for the drug on the second row for same ID … (order is not important)

I have attempted
reshape wide drug1 drug2 drug3 drug4 drug5 drug6 drug7, i(id) j(drug)
with this error message
variable drug is string; specify string option

I am unsure of how next to proceed, Please, I need assistance.
Kind regards.

Tags: None

Rich Goldstein

Join Date: Mar 2014

Posts: 4457
#2

03 Jun 2019, 05:26

well, Stata is telling you to use the "string" option because the variable "drug" is a string variable; did you try that? see

Code:

help reshape

and see what it says under options about the string option
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

03 Jun 2019, 13:31

Welcome to Statalist.

Rich is correct that Stata told you exactly what your immediate problem was. But in fact you have several problems, most of which involve not understanding how the reshape command works.

The example below shows an approach to what you describe, but I doubt that it is precisely what you want.

Code:

cls
* Example generated by -dataex-. To install: ssc install dataex
clear
input int id str12 drugs float date
1001 "A"       21700
1001 "B"       21700
1001 "C"       21700
1002 "A"       21689
1002 "B,D"     21689
1003 "E"       21679
1003 "F"       21679
1004 "A, B, C" 21699
1004 "A"       21699
1004 "C"       21699
1004 "B"       21699
end
format %td date

list, sepby(id) noobs
sort id date, stable
by id date: generate row = _n
rename drugs drug
reshape wide drug, i(id) j(row)
order id date
list, noobs

Code:

. list, sepby(id) noobs

  +----------------------------+
  |   id     drugs        date |
  |----------------------------|
  | 1001         A   31may2019 |
  | 1001         B   31may2019 |
  | 1001         C   31may2019 |
  |----------------------------|
  | 1002         A   20may2019 |
  | 1002       B,D   20may2019 |
  |----------------------------|
  | 1003         E   10may2019 |
  | 1003         F   10may2019 |
  |----------------------------|
  | 1004   A, B, C   30may2019 |
  | 1004         A   30may2019 |
  | 1004         C   30may2019 |
  | 1004         B   30may2019 |
  +----------------------------+

. sort id date, stable

. by id date: generate row = _n

. rename drugs drug

. reshape wide drug, i(id) j(row)
(note: j = 1 2 3 4)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                       11   ->       4
Number of variables                   4   ->       6
j variable (4 values)               row   ->   (dropped)
xij variables:
                                   drug   ->   drug1 drug2 ... drug4
-----------------------------------------------------------------------------

. order id date

. list, noobs

  +----------------------------------------------------+
  |   id        date     drug1   drug2   drug3   drug4 |
  |----------------------------------------------------|
  | 1001   31may2019         A       B       C         |
  | 1002   20may2019         A     B,D                 |
  | 1003   10may2019         E       F                 |
  | 1004   30may2019   A, B, C       A       C       B |
  +----------------------------------------------------+

Now some advice, should you need help getting from the above example to what you eventually need.

Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question.

I have a feeling the data you showed was created in Excel and copied and pasted into Stata. You should instead go through the task of importing your actual data — not made up things like "drug B,D" —into Stata and using the dataex command to prepare a useful listing that can be copied from the Stata Results window and pasted into your Statalist post. And if indeed your data is in Excel, use the "Import > Excel Spreadsheet" item on Stata's File menu to get it into Stata - don't just copy-and-paste it into the Data Editor window.

You should select a few observations that do a good job of representing your data. If you indeed have observations that include several drug names within one observation of "drugs" how are they separated?

You should show us what you want the results to be like, using a few of the observations you provide as examples.

Finally, you should explain how you hope to use this data. It seems likely to me that having "penicillin" be drug1 on one observation and drug5 on another is going to cause problems. But then, perhaps I've completely misunderstood what you want.

Comment

Vitalis Feteh

Join Date: Jun 2019

Posts: 6
#4

04 Jun 2019, 03:19

Dear William,
Thank you, and I will use your advice in my subsequent posts (which I suspect wouldn't be long from now).
Comment

Announcement

Fusing multiple rows into a single column

Comment

Comment

Comment