Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape with blank observations

    Hi,

    I have a dataset where a number of items are included in one variable (string). I need to reshape this so that each item is recorded on an individual row. I provide a dummy dataset below to explain my query.

    Currently, I use split to separate the items over individual variables followed by reshape long & drop any missing.

    Code:
    split product, p("+")
    rename product prodlist
    reshape long product, i(id) j(n)
    drop if mi(product)
    replace product=trim(product)
    The number of items differs & so when I split, many of the new rows have blank observations. My actual data has thousands of observations, so the reshape is intensive & generates many unnecessary/blank observations.

    This method works, but I'm wondering if there's a way to make this more efficient. Do you have any suggestions?

    Thank you!
    Bryony


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id str34 product
     1 "oranges"                           
     2 "oranges"                           
     3 "apples"                            
     4 "oranges"                           
     5 "apples + pears"                    
     6 "bananas"                           
     7 "oranges"                           
     8 "oranges"                           
     9 "apples + bananas"                  
    10 "apples"                            
    11 "apples"                            
    12 "apples + bananas"                  
    13 "apples"                            
    14 "bananas"                           
    15 "oranges + apples + pears + bananas"
    16 "pears"                             
    17 "pears"                             
    18 "pears"                             
    19 "pears"                             
    20 "apples + pears + bananas"          
    end

  • #2
    You can try Kenneth Simons' sreshape command (available from Stata Journal (16-3)) with the missing(drop) option.

    Code:
    net describe dm0090, from(http://www.stata-journal.com/software/sj16-3)
    sreshape long product, i(id) j(n) missing(drop)
    replace product=trim(product)
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Thank you, Carole!

      Comment


      • #4
        You can also use expand + a loop, but I don't know if it's more efficient than reshaping. Loops are known to be notoriously slow!

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float id str34 product
         1 "oranges"                          
         2 "oranges"                          
         3 "apples"                            
         4 "oranges"                          
         5 "apples + pears"                    
         6 "bananas"                          
         7 "oranges"                          
         8 "oranges"                          
         9 "apples + bananas"                  
        10 "apples"                            
        11 "apples"                            
        12 "apples + bananas"                  
        13 "apples"                            
        14 "bananas"                          
        15 "oranges + apples + pears + bananas"
        16 "pears"                            
        17 "pears"                            
        18 "pears"                            
        19 "pears"                            
        20 "apples + pears + bananas"          
        end
        
        split product, p("+") g(q)
        egen expand= rownonmiss(q*), strok
        expand expand
        bys id: gen order=_n
        qui sum expand
        forval i= 1/`r(max)'{
        bys id: replace q1=q`i' if order==`i'
        }
        gen wanted= trim(q1)
        keep id product wanted
        Res.:

        Code:
        . l, clean
        
               id                              product    wanted  
          1.    1                              oranges   oranges  
          2.    2                              oranges   oranges  
          3.    3                               apples    apples  
          4.    4                              oranges   oranges  
          5.    5                       apples + pears    apples  
          6.    5                       apples + pears     pears  
          7.    6                              bananas   bananas  
          8.    7                              oranges   oranges  
          9.    8                              oranges   oranges  
         10.    9                     apples + bananas    apples  
         11.    9                     apples + bananas   bananas  
         12.   10                               apples    apples  
         13.   11                               apples    apples  
         14.   12                     apples + bananas    apples  
         15.   12                     apples + bananas   bananas  
         16.   13                               apples    apples  
         17.   14                              bananas   bananas  
         18.   15   oranges + apples + pears + bananas   oranges  
         19.   15   oranges + apples + pears + bananas    apples  
         20.   15   oranges + apples + pears + bananas     pears  
         21.   15   oranges + apples + pears + bananas   bananas  
         22.   16                                pears     pears  
         23.   17                                pears     pears  
         24.   18                                pears     pears  
         25.   19                                pears     pears  
         26.   20             apples + pears + bananas    apples  
         27.   20             apples + pears + bananas     pears  
         28.   20             apples + pears + bananas   bananas

        Comment

        Working...
        X