Reshape with blank observations

Bryony Simmons

Join Date: Jan 2018
Posts: 37

Reshape with blank observations

13 Feb 2020, 09:38

Hi,

I have a dataset where a number of items are included in one variable (string). I need to reshape this so that each item is recorded on an individual row. I provide a dummy dataset below to explain my query.

Currently, I use split to separate the items over individual variables followed by reshape long & drop any missing.

Code:

split product, p("+")
rename product prodlist
reshape long product, i(id) j(n)
drop if mi(product)
replace product=trim(product)

The number of items differs & so when I split, many of the new rows have blank observations. My actual data has thousands of observations, so the reshape is intensive & generates many unnecessary/blank observations.

This method works, but I'm wondering if there's a way to make this more efficient. Do you have any suggestions?

Thank you!
Bryony

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str34 product
 1 "oranges"                           
 2 "oranges"                           
 3 "apples"                            
 4 "oranges"                           
 5 "apples + pears"                    
 6 "bananas"                           
 7 "oranges"                           
 8 "oranges"                           
 9 "apples + bananas"                  
10 "apples"                            
11 "apples"                            
12 "apples + bananas"                  
13 "apples"                            
14 "bananas"                           
15 "oranges + apples + pears + bananas"
16 "pears"                             
17 "pears"                             
18 "pears"                             
19 "pears"                             
20 "apples + pears + bananas"          
end

Tags: reshape, reshape long, split

Carole J. Wilson

Join Date: Jan 2015

Posts: 932
#2

13 Feb 2020, 10:21

You can try Kenneth Simons' sreshape command (available from Stata Journal (16-3)) with the missing(drop) option.

Code:

net describe dm0090, from(http://www.stata-journal.com/software/sj16-3) sreshape long product, i(id) j(n) missing(drop) replace product=trim(product)

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1
1 like
Comment
Bryony Simmons

Join Date: Jan 2018

Posts: 37
#3

13 Feb 2020, 10:30

Thank you, Carole!
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10219

13 Feb 2020, 10:33

You can also use expand + a loop, but I don't know if it's more efficient than reshaping. Loops are known to be notoriously slow!

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str34 product
 1 "oranges"                          
 2 "oranges"                          
 3 "apples"                            
 4 "oranges"                          
 5 "apples + pears"                    
 6 "bananas"                          
 7 "oranges"                          
 8 "oranges"                          
 9 "apples + bananas"                  
10 "apples"                            
11 "apples"                            
12 "apples + bananas"                  
13 "apples"                            
14 "bananas"                          
15 "oranges + apples + pears + bananas"
16 "pears"                            
17 "pears"                            
18 "pears"                            
19 "pears"                            
20 "apples + pears + bananas"          
end

split product, p("+") g(q)
egen expand= rownonmiss(q*), strok
expand expand
bys id: gen order=_n
qui sum expand
forval i= 1/`r(max)'{
bys id: replace q1=q`i' if order==`i'
}
gen wanted= trim(q1)
keep id product wanted

Res.:

Code:

. l, clean

       id                              product    wanted  
  1.    1                              oranges   oranges  
  2.    2                              oranges   oranges  
  3.    3                               apples    apples  
  4.    4                              oranges   oranges  
  5.    5                       apples + pears    apples  
  6.    5                       apples + pears     pears  
  7.    6                              bananas   bananas  
  8.    7                              oranges   oranges  
  9.    8                              oranges   oranges  
 10.    9                     apples + bananas    apples  
 11.    9                     apples + bananas   bananas  
 12.   10                               apples    apples  
 13.   11                               apples    apples  
 14.   12                     apples + bananas    apples  
 15.   12                     apples + bananas   bananas  
 16.   13                               apples    apples  
 17.   14                              bananas   bananas  
 18.   15   oranges + apples + pears + bananas   oranges  
 19.   15   oranges + apples + pears + bananas    apples  
 20.   15   oranges + apples + pears + bananas     pears  
 21.   15   oranges + apples + pears + bananas   bananas  
 22.   16                                pears     pears  
 23.   17                                pears     pears  
 24.   18                                pears     pears  
 25.   19                                pears     pears  
 26.   20             apples + pears + bananas    apples  
 27.   20             apples + pears + bananas     pears  
 28.   20             apples + pears + bananas   bananas

Announcement

Reshape with blank observations

Comment

Comment

Comment