split + reshape

River Huang

Join Date: Mar 2016
Posts: 1908

26 Mar 2023, 17:22

Dear All, I have this dataset,

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str12 Symbol str93 ViolationYear
"000002" "2015;2016"                    
"000002" "2016"                         
"000004" "2008;2009"                    
"000004" "2007;2008;2009;2010"
"000004" "2010;2012;N/A"           
"000004" "2014;2016"               
"000004" "2014;2016"          
end

and wish to have the following results:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str12 Symbol  ViolationYear
"000002" 2015                    
"000002" 2016                         
"000004" 2007 
"000004" 2008 
"000004" 2009 
"000004" 2010 
"000004" 2012 
"000004" 2014 
"000004" 2016     
end

Any suggestions? Thanks.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10138

26 Mar 2023, 17:44

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 Symbol str93 ViolationYear
"000002" "2015;2016"          
"000002" "2016"               
"000004" "2008;2009"          
"000004" "2007;2008;2009;2010"
"000004" "2010;2012;N/A"      
"000004" "2014;2016"          
"000004" "2014;2016"          
end

gen toexpand= length(ViolationYear)- length(subinstr(ViolationYear, ";", "",.)) +1
expand toexpand
bys Symbol ViolationYear: gen year= real(word(subinstr(ViolationYear, ";", " ",.), _n))
contract Symbol year, nomiss

Res.:

Code:

. l, sepby(Symbol)

     +-----------------------+
     | Symbol   year   _freq |
     |-----------------------|
  1. | 000002   2015       1 |
  2. | 000002   2016       2 |
     |-----------------------|
  3. | 000004   2007       1 |
  4. | 000004   2008       2 |
  5. | 000004   2009       2 |
  6. | 000004   2010       2 |
  7. | 000004   2012       1 |
  8. | 000004   2014       1 |
  9. | 000004   2016       1 |
     +-----------------------+

Comment

River Huang

Join Date: Mar 2016

Posts: 1908
#3

26 Mar 2023, 21:17

Many thanks, Andrew. This is great.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35496
#4

27 Mar 2023, 00:45

Another approach starts with

Code:

gen long newid = _n split ViolationYear, parse(;) drop ViolationYear reshape long ViolationYear, i(newid) j(which)

and ends with clean-up.
1 like
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#5

28 Mar 2023, 18:58

Dear Nick, Thanks. This is helpful, too.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

Announcement

split + reshape

Comment

Comment

Comment

Comment