Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to split a string variable and add it to separate rows in STATA

    I want to add multiple rows by deriving them from a string column in Stata. To be more precise I have a dataset like below:
    year countryname intensitylevel 1990 India, Pakistan 1 1991 India, Pakistan 1 1992 India, Pakistan 1 1996 India, Pakistan 1 So, I want to split the countryname variable for each countries separately and at the end I want to have a dataset like below:
    year countryname intensitylevel 1990 India 1 1990 Pakistan 1 1991 India 1 1991 Pakistan 1
    so on.... Any help would be much appreciated. Thanks.

  • #2
    You can use the split command. Documentation here: http://www.stata.com/manuals13/dsplit.pdf

    It's not clear to me what exactly each string in your file looks like. I'm guessing each string looks like: "1990 India, Pakistan 1". When you split each string, how do you determine which country name (India or Pakistan) to keep?

    Comment


    • #3
      Also posted on Stack Overflow at http://stackoverflow.com/questions/3...-rows-in-stata. The Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, tells you to announce cross-posting so time isn't wasted answering a question that has already been answered in another location. The Statalist FAQ will also explain how to post your sample data in a way that is readable and usable by others who want to demonstrate their answers to your question.

      Comment


      • #4
        Code:
        clear
        input year  str42 countryname intensitylevel
        1990    "India, Pakistan"    1
        1991    "India, Pakistan"    1
        1992    "India, Pakistan"    1
        1996    "India, Pakistan"    1
        end
        rename countryname country
        split country, p(,)
        drop country
        reshape long country, i(country? year)
        drop country?
        sort year country
        list, sepby(year)
        
             +----------------------------------+
             | year   _j   intens~l     country |
             |----------------------------------|
          1. | 1990    2          1    Pakistan |
          2. | 1990    1          1       India |
             |----------------------------------|
          3. | 1991    2          1    Pakistan |
          4. | 1991    1          1       India |
             |----------------------------------|
          5. | 1992    2          1    Pakistan |
          6. | 1992    1          1       India |
             |----------------------------------|
          7. | 1996    2          1    Pakistan |
          8. | 1996    1          1       India |
             +----------------------------------+
        or

        Code:
        clear 
        input year  str42 countryname intensitylevel
        1990    "India, Pakistan"    1
        1991    "India, Pakistan"    1
        1992    "India, Pakistan"    1
        1996    "India, Pakistan"    1
        end 
        expand 2 
        local cpos strpos(countryn, ",") 
        gen country1 = substr(countryn, 1, `cpos' - 1) 
        gen country2 = substr(countryn, `cpos' + 1, .) 
        drop countryn 
        sort year country?  
        list, sepby(year)
        Last edited by Nick Cox; 15 Aug 2016, 19:21.

        Comment


        • #5
          Dear Nick,
          Thank you very much for your help. It worked well for my purpose!

          Comment


          • #6
            Hi Nick,

            How could I do exactly the same but taking into account that I have between 1 to 600 values within a variable?, i.e. as if in the problem Ilkin showed he had 1, 2, ..., or 600 of countries per each row

            I have tried split, but it only splits up to 146 new variables

            KR

            Comment


            • #7
              Dear Nick Cox and Clyde Schechter.

              I have one question. I have following variables

              description time real gdp Province/region
              C
              hain-type quantity indexes for real GD 2019q1 4583 A
              Chain-type quantity indexes for real GDP ​​​​​​​2019q2 4444 B
              Chain-type quantity indexes for real GDP ​​​​​​​2019q3 2233 C
              Chain-type quantity indexes for real GDP ​​​​​​​ 2019q4 2222 D
              Chain-type quantity indexes for real GDP ​​​​​​​2020q1 4444 F
              Chain-type quantity indexes for real GDP ​​​​​​​2020q2 4433 M
              Chain-type quantity indexes for real GDP ​​​​​​​2019q3 3322 Q

              I wanted to convert description's observation in new variables so that I get seven variables (total 7 seven if we count bases on quarter) by province or region. Appreciate if you could advice or offer any help. Thank you

              Ahmed!
              Last edited by Awais Ahmed; 11 Mar 2021, 07:21.

              Comment

              Working...
              X