Hi,
I have a dataset where a number of items are included in one variable (string). I need to reshape this so that each item is recorded on an individual row. I provide a dummy dataset below to explain my query.
Currently, I use split to separate the items over individual variables followed by reshape long & drop any missing.
The number of items differs & so when I split, many of the new rows have blank observations. My actual data has thousands of observations, so the reshape is intensive & generates many unnecessary/blank observations.
This method works, but I'm wondering if there's a way to make this more efficient. Do you have any suggestions?
Thank you!
Bryony
I have a dataset where a number of items are included in one variable (string). I need to reshape this so that each item is recorded on an individual row. I provide a dummy dataset below to explain my query.
Currently, I use split to separate the items over individual variables followed by reshape long & drop any missing.
Code:
split product, p("+") rename product prodlist reshape long product, i(id) j(n) drop if mi(product) replace product=trim(product)
This method works, but I'm wondering if there's a way to make this more efficient. Do you have any suggestions?
Thank you!
Bryony
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float id str34 product 1 "oranges" 2 "oranges" 3 "apples" 4 "oranges" 5 "apples + pears" 6 "bananas" 7 "oranges" 8 "oranges" 9 "apples + bananas" 10 "apples" 11 "apples" 12 "apples + bananas" 13 "apples" 14 "bananas" 15 "oranges + apples + pears + bananas" 16 "pears" 17 "pears" 18 "pears" 19 "pears" 20 "apples + pears + bananas" end
Comment