How to split a string variable and add it to separate rows in STATA

Ilkin Huseynov

Join Date: Dec 2015

Posts: 6
#1

How to split a string variable and add it to separate rows in STATA

15 Aug 2016, 16:29

I want to add multiple rows by deriving them from a string column in Stata. To be more precise I have a dataset like below:
year countryname intensitylevel 1990 India, Pakistan 1 1991 India, Pakistan 1 1992 India, Pakistan 1 1996 India, Pakistan 1 So, I want to split the countryname variable for each countries separately and at the end I want to have a dataset like below:
year countryname intensitylevel 1990 India 1 1990 Pakistan 1 1991 India 1 1991 Pakistan 1
so on.... Any help would be much appreciated. Thanks.
Tags: data, panel, string
Shruthi Venkatesh

Join Date: Aug 2016

Posts: 17
#2

15 Aug 2016, 16:43

You can use the split command. Documentation here: http://www.stata.com/manuals13/dsplit.pdf

It's not clear to me what exactly each string in your file looks like. I'm guessing each string looks like: "1990 India, Pakistan 1". When you split each string, how do you determine which country name (India or Pakistan) to keep?
1 like
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

15 Aug 2016, 18:55

Also posted on Stack Overflow at http://stackoverflow.com/questions/3...-rows-in-stata. The Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, tells you to announce cross-posting so time isn't wasted answering a question that has already been answered in another location. The Statalist FAQ will also explain how to post your sample data in a way that is readable and usable by others who want to demonstrate their answers to your question.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

15 Aug 2016, 19:02

Code:

clear
input year  str42 countryname intensitylevel
1990    "India, Pakistan"    1
1991    "India, Pakistan"    1
1992    "India, Pakistan"    1
1996    "India, Pakistan"    1
end
rename countryname country
split country, p(,)
drop country
reshape long country, i(country? year)
drop country?
sort year country
list, sepby(year)

     +----------------------------------+
     | year   _j   intens~l     country |
     |----------------------------------|
  1. | 1990    2          1    Pakistan |
  2. | 1990    1          1       India |
     |----------------------------------|
  3. | 1991    2          1    Pakistan |
  4. | 1991    1          1       India |
     |----------------------------------|
  5. | 1992    2          1    Pakistan |
  6. | 1992    1          1       India |
     |----------------------------------|
  7. | 1996    2          1    Pakistan |
  8. | 1996    1          1       India |
     +----------------------------------+

Code:

clear 
input year  str42 countryname intensitylevel
1990    "India, Pakistan"    1
1991    "India, Pakistan"    1
1992    "India, Pakistan"    1
1996    "India, Pakistan"    1
end 
expand 2 
local cpos strpos(countryn, ",") 
gen country1 = substr(countryn, 1, `cpos' - 1) 
gen country2 = substr(countryn, `cpos' + 1, .) 
drop countryn 
sort year country?  
list, sepby(year)

Last edited by Nick Cox; 15 Aug 2016, 19:21.

Comment

Ilkin Huseynov

Join Date: Dec 2015

Posts: 6
#5

16 Aug 2016, 04:56

Dear Nick,
Thank you very much for your help. It worked well for my purpose!
Comment
Ruben Perez

Join Date: Apr 2019

Posts: 18
#6

27 Apr 2019, 11:28

Hi Nick,

How could I do exactly the same but taking into account that I have between 1 to 600 values within a variable?, i.e. as if in the problem Ilkin showed he had 1, 2, ..., or 600 of countries per each row

I have tried split, but it only splits up to 146 new variables

KR
Comment
Awais Ahmed

Join Date: Oct 2020

Posts: 8
#7

11 Mar 2021, 07:13

Dear Nick Cox and Clyde Schechter.

I have one question. I have following variables

description time real gdp Province/region
Chain-type quantity indexes for real GD 2019q1 4583 A
Chain-type quantity indexes for real GDP 2019q2 4444 B
Chain-type quantity indexes for real GDP 2019q3 2233 C
Chain-type quantity indexes for real GDP 2019q4 2222 D
Chain-type quantity indexes for real GDP 2020q1 4444 F
Chain-type quantity indexes for real GDP 2020q2 4433 M
Chain-type quantity indexes for real GDP 2019q3 3322 Q

I wanted to convert description's observation in new variables so that I get seven variables (total 7 seven if we count bases on quarter) by province or region. Appreciate if you could advice or offer any help. Thank you

Ahmed!

Last edited by Awais Ahmed; 11 Mar 2021, 07:21.
Comment

Announcement

How to split a string variable and add it to separate rows in STATA

Comment

Comment

Comment

Comment

Comment

Comment