Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • reshape data?

    Dear All, I find this question here (https://bbs.pinggu.org/thread-6924503-1-1.html). The data is
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id int year str41 TypeID
    4 2010 "P2503,P2515,P2599"      
    4 2011 "P2503 P2504 P2505"      
    5 2008 "P2503,P2504,P2505,P2599"
    5 2009 "P2504,P2515,P2599"      
    end
    and wish to reshape as
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id year P2503 P2504 P2505 P2515 P2599)
    4 2010 1 . . 1 1
    4 2011 1 1 1 . .
    5 2008 1 1 1 . 1
    5 2009 . 1 . 1 1
    end
    Any suggestions? It changes P* to be variable names, and for each id year, gives the value 1 if P* are included in TypeID.
    Ho-Chuan (River) Huang
    Stata 15.1, MP(4)

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id int year str41 TypeID
    4 2010 "P2503,P2515,P2599"      
    4 2011 "P2503 P2504 P2505"      
    5 2008 "P2503,P2504,P2505,P2599"
    5 2009 "P2504,P2515,P2599"      
    end
    
    local vars P2503 P2504 P2505 P2515 P2599
    
    foreach v of local vars{
    gen `v'=.
    replace `v'=1 if strpos(TypeID, "`v'")
    }

    Comment


    • #3
      The solution of Jorrit Gosens depends on writing down the variables in advance. That works for a toy example, but is less practical for a bigger one.

      Code:
      clear
      input long id int year str41 TypeID
      4 2010 "P2503,P2515,P2599"      
      4 2011 "P2503 P2504 P2505"      
      5 2008 "P2503,P2504,P2505,P2599"
      5 2009 "P2504,P2515,P2599"      
      end
      
      replace TypeID = subinstr(TypeID, ",", " ", .) 
      gen N = wordcount(TypeID) 
      expand N 
      bysort id year : gen which = word(TypeID, _n) 
      
      gen z = 1 
      reshape wide z , i(id year) j(which) string 
      drop N TypeID 
      rename (z*) (*) 
      list 
      
           +---------------------------------------------------+
           | id   year   P2503   P2504   P2505   P2515   P2599 |
           |---------------------------------------------------|
        1. |  4   2010       1       .       .       1       1 |
        2. |  4   2011       1       1       1       .       . |
        3. |  5   2008       1       1       1       .       1 |
        4. |  5   2009       .       1       .       1       1 |
           +---------------------------------------------------+

      Comment


      • #4
        Dear @Jorrit Gosens and @Nick Cox, Thank you both for the helpful suggestions.
        Ho-Chuan (River) Huang
        Stata 15.1, MP(4)

        Comment


        • #5
          Dear Nick, Can you have a look at this example?
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input long id int year str41 TypeID
          4 2010 "P2503,P2515,P2599"      
          4 2011 "P2503,P2515,P2599"      
          4 2012 "P2503,P2515,P2599"      
          4 2013 "P2503,P2515,P2599"      
          4 2008 "P2503,P2599"            
          4 2009 "P2503,P2599"            
          4 2010 "P2503,P2599"            
          5 2007 "P2503,P2504,P2505,P2599"
          5 2008 "P2503,P2504,P2505,P2599"
          5 2009 "P2503,P2504,P2505,P2599"
          end
          Something wrong?
          Ho-Chuan (River) Huang
          Stata 15.1, MP(4)

          Comment


          • #6
            You have duplicated observations for id 4 year 2010. You tell us what would be right. Here I remove duplicates and take mentions to be correct otherwise.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input long id int year str41 TypeID
            4 2010 "P2503,P2515,P2599"      
            4 2011 "P2503,P2515,P2599"      
            4 2012 "P2503,P2515,P2599"      
            4 2013 "P2503,P2515,P2599"      
            4 2008 "P2503,P2599"            
            4 2009 "P2503,P2599"            
            4 2010 "P2503,P2599"            
            5 2007 "P2503,P2504,P2505,P2599"
            5 2008 "P2503,P2504,P2505,P2599"
            5 2009 "P2503,P2504,P2505,P2599"
            end
            
            replace TypeID = subinstr(TypeID, ",", " ", .) 
            bysort id year : replace TypeID = TypeID[_n-1] + " " + TypeID if _n > 1 
            bysort id year : replace TypeID = TypeID[_N] if _N > 1 
            bysort id year : keep if _n == _N 
            gen N = wordcount(TypeID) 
            expand N 
            bysort id year : gen which = word(TypeID, _n) 
            drop N TypeID 
            duplicates drop id year which , force 
            gen z = 1 
            reshape wide z , i(id year) j(which) string 
            rename (z*) (*) 
            list , sepby(id) 
            
            
                 +---------------------------------------------------+
                 | id   year   P2503   P2504   P2505   P2515   P2599 |
                 |---------------------------------------------------|
              1. |  4   2008       1       .       .       .       1 |
              2. |  4   2009       1       .       .       .       1 |
              3. |  4   2010       1       .       .       1       1 |
              4. |  4   2011       1       .       .       1       1 |
              5. |  4   2012       1       .       .       1       1 |
              6. |  4   2013       1       .       .       1       1 |
                 |---------------------------------------------------|
              7. |  5   2007       1       1       1       .       1 |
              8. |  5   2008       1       1       1       .       1 |
              9. |  5   2009       1       1       1       .       1 |
                 +---------------------------------------------------+
            
            .

            Comment


            • #7
              Dear Nick, Thanks for the reply. I found the problem is due to duplicates as well. I am asking the one who raised this question to see what happened?
              Ho-Chuan (River) Huang
              Stata 15.1, MP(4)

              Comment


              • #8
                The following approach postpones the decision on what to do about duplicates. It also replaces missing values in the results with zeroes, which may be preferable depending on what is to be done with the results.
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input long id int year str41 TypeID
                4 2010 "P2503,P2515,P2599"      
                4 2011 "P2503,P2515,P2599"      
                4 2012 "P2503,P2515,P2599"      
                4 2013 "P2503,P2515,P2599"      
                4 2008 "P2503,P2599"            
                4 2009 "P2503,P2599"            
                4 2010 "P2503,P2599"            
                5 2007 "P2503,P2504,P2505,P2599"
                5 2008 "P2503,P2504,P2505,P2599"
                5 2009 "P2503,P2504,P2505,P2599"
                end
                
                split TypeID, parse(,)
                drop TypeID
                gen seq = _n
                reshape long TypeID, i(seq) j(num)
                drop if TypeID==""
                list if seq==1, clean noobs
                
                drop num
                generate one = 1
                reshape wide one, i(seq id year) j(TypeID) string
                rename (one*) (*)
                list if seq==1, clean noobs
                
                collapse (sum) P*, by(id year)
                list, sepby(id) noobs
                Code:
                . list, sepby(id) noobs
                
                  +---------------------------------------------------+
                  | id   year   P2503   P2504   P2505   P2515   P2599 |
                  |---------------------------------------------------|
                  |  4   2008       1       0       0       0       1 |
                  |  4   2009       1       0       0       0       1 |
                  |  4   2010       2       0       0       1       2 |
                  |  4   2011       1       0       0       1       1 |
                  |  4   2012       1       0       0       1       1 |
                  |  4   2013       1       0       0       1       1 |
                  |---------------------------------------------------|
                  |  5   2007       1       1       1       0       1 |
                  |  5   2008       1       1       1       0       1 |
                  |  5   2009       1       1       1       0       1 |
                  +---------------------------------------------------+
                Last edited by William Lisowski; 14 Feb 2019, 08:25.

                Comment


                • #9
                  Dear William, Thank you for this additional suggestion.
                  Ho-Chuan (River) Huang
                  Stata 15.1, MP(4)

                  Comment

                  Working...
                  X