Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • split + reshape

    Dear All, I have this dataset,
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str12 Symbol str93 ViolationYear
    "000002" "2015;2016"                    
    "000002" "2016"                         
    "000004" "2008;2009"                    
    "000004" "2007;2008;2009;2010"
    "000004" "2010;2012;N/A"           
    "000004" "2014;2016"               
    "000004" "2014;2016"          
    end
    and wish to have the following results:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str12 Symbol  ViolationYear
    "000002" 2015                    
    "000002" 2016                         
    "000004" 2007 
    "000004" 2008 
    "000004" 2009 
    "000004" 2010 
    "000004" 2012 
    "000004" 2014 
    "000004" 2016     
    end
    Any suggestions? Thanks.
    Ho-Chuan (River) Huang
    Stata 17.0, MP(4)

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 Symbol str93 ViolationYear
    "000002" "2015;2016"          
    "000002" "2016"               
    "000004" "2008;2009"          
    "000004" "2007;2008;2009;2010"
    "000004" "2010;2012;N/A"      
    "000004" "2014;2016"          
    "000004" "2014;2016"          
    end
    
    gen toexpand= length(ViolationYear)- length(subinstr(ViolationYear, ";", "",.)) +1
    expand toexpand
    bys Symbol ViolationYear: gen year= real(word(subinstr(ViolationYear, ";", " ",.), _n))
    contract Symbol year, nomiss
    Res.:

    Code:
    . l, sepby(Symbol)
    
         +-----------------------+
         | Symbol   year   _freq |
         |-----------------------|
      1. | 000002   2015       1 |
      2. | 000002   2016       2 |
         |-----------------------|
      3. | 000004   2007       1 |
      4. | 000004   2008       2 |
      5. | 000004   2009       2 |
      6. | 000004   2010       2 |
      7. | 000004   2012       1 |
      8. | 000004   2014       1 |
      9. | 000004   2016       1 |
         +-----------------------+

    Comment


    • #3
      Many thanks, Andrew. This is great.
      Ho-Chuan (River) Huang
      Stata 17.0, MP(4)

      Comment


      • #4
        Another approach starts with

        Code:
        gen long newid = _n
        split ViolationYear, parse(;)
        drop ViolationYear 
        reshape long ViolationYear, i(newid) j(which)
        and ends with clean-up.

        Comment


        • #5
          Dear Nick, Thanks. This is helpful, too.
          Ho-Chuan (River) Huang
          Stata 17.0, MP(4)

          Comment

          Working...
          X