Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Elementary Question on string/concatenate

    Hi Stata Community,

    I would like to destring multiple answers in my dataset. The survey allowed respondents to choose more than one answer and I would like to separate each one of the selected answers into the parent category. So for example, referring to the table below, if someone chose computer-related, Architecture and Engineering then I would like to take this response add 1 to computer-related, 1 to Architcture and 1 to Engineering. Double counting is ok.

    I have tried to find some solutions here but I am stumped - https://www.stata.com/support/faqs/d...ponses/#concat

    Thank you in advance for all your help!

    Occupation Freq. Percent
    Missing 8 0.79
    Administrative Specializations 76 7.55
    Administrative Specializations,Educat.. 1 0.1
    Architecture, Engineering and Surveying 109 10.82
    Architecture, Engineering and Surveyi.. 2 0.2
    Architecture, Engineering and Surveyi.. 2 0.2
    Architecture, Engineering and Surveyi.. 3 0.3
    Architecture, Engineering and Surveyi.. 1 0.1
    Computer-related 473 46.97
    Computer-related,Administrative Speci.. 8 0.79
    Computer-related,Architecture, Engine.. 15 1.49
    Computer-related,Architecture, Engine.. 2 0.2
    Computer-related,Architecture, Engine.. 1 0.1
    Computer-related,Architecture, Engine.. 2 0.2
    Computer-related,Education 10 0.99
    Computer-related,Education,Health 1 0.1
    Computer-related,Health 3 0.3
    Computer-related,Others 4 0.4
    Education 119 11.82
    Education,Health 7 0.7
    Health 133 13.21
    Others 27 2.68
    Total 1,007 100

  • #2
    Code:
    // prepare the example data
    clear
    input str56 fields long freq
    "Missing"     8     
    "Administrative Specializations"     76     
    "Administrative Specializations,Education"     1     
    "Architecture, Engineering and Surveying"     109     
    "Architecture, Engineering and Surveying"     2     
    "Architecture, Engineering and Surveying"     2     
    "Architecture, Engineering and Surveying"     3     
    "Architecture, Engineering and Surveying"     1     
    "Computer-related"     473     46.97
    "Computer-related,Administrative Specializations"     8     
    "Computer-related,Architecture, Engineering and Surveying"     15
    "Computer-related,Architecture, Engineering and Surveying"     2     
    "Computer-related,Architecture, Engineering and Surveying"     1     
    "Computer-related,Architecture, Engineering and Surveying"     2     
    "Computer-related,Education"     10     
    "Computer-related,Education,Health"     1     
    "Computer-related,Health"     3     
    "Computer-related,Others"     4     
    "Education"     119     
    "Education,Health"     7     
    "Health"     133     
    "Others"     27     
    end
    expand freq
    drop freq
    
    // split fields into separate variables based on order
    split fields, gen(field) parse(,)
    
    local nvars = r(nvars) // number of new variables generated
    
    // collect list of all fields
    forvalues i = 1/`nvars'{
        levelsof field`i'
        local levs = `"`levs' `r(levels)'"'
        local totest "`totest', field`i'"
    }
    local levs : list clean levs
    local levs : list uniq levs
    
    // create the desired indicator (dummy) variables
    foreach field of local levs {
        local name : subinstr local field " " "_", all
        local name : subinstr local name "-" "_", all
        gen byte `name' = inlist(`"`field'"' `totest')
    }
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Welcome to Statalist.

      It's not clear to me exactly what your ultimate objective is. My approach is maybe more direct than Maarten's.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str56 fields long freq
      "Missing"                                                    8
      "Administrative Specializations"                            76
      "Administrative Specializations,Education"                   1
      "Architecture, Engineering and Surveying"                  109
      "Architecture, Engineering and Surveying"                    2
      "Architecture, Engineering and Surveying"                    2
      "Architecture, Engineering and Surveying"                    3
      "Architecture, Engineering and Surveying"                    1
      "Computer-related"                                         473
      "Computer-related,Administrative Specializations"            8
      "Computer-related,Architecture, Engineering and Surveying"  15
      "Computer-related,Architecture, Engineering and Surveying"   2
      "Computer-related,Architecture, Engineering and Surveying"   1
      "Computer-related,Architecture, Engineering and Surveying"   2
      "Computer-related,Education"                                10
      "Computer-related,Education,Health"                          1
      "Computer-related,Health"                                    3
      "Computer-related,Others"                                    4
      "Education"                                                119
      "Education,Health"                                           7
      "Health"                                                   133
      "Others"                                                    27
      end
      
      split fields, gen(field) parse(,)
      drop fields
      generate id = _n
      
      list if inrange(id,15,18), clean noobs
      
      reshape long field, i(id) j(num)
      drop if missing(field)
      drop num
      
      list if inrange(id,15,18), clean noobs
      
      collapse (sum) freq, by(field)
      list, clean noobs
      Code:
      . split fields, gen(field) parse(,)
      variables created as string: 
      field1  field2  field3
      
      . drop fields
      
      . generate id = _n
      
      . 
      . list if inrange(id,15,18), clean noobs
      
          freq             field1      field2   field3   id  
            10   Computer-related   Education            15  
             1   Computer-related   Education   Health   16  
             3   Computer-related      Health            17  
             4   Computer-related      Others            18  
      
      . 
      . reshape long field, i(id) j(num)
      (note: j = 1 2 3)
      
      Data                               wide   ->   long
      -----------------------------------------------------------------------------
      Number of obs.                       22   ->      66
      Number of variables                   5   ->       4
      j variable (3 values)                     ->   num
      xij variables:
                         field1 field2 field3   ->   field
      -----------------------------------------------------------------------------
      
      . drop if missing(field)
      (23 observations deleted)
      
      . drop num
      
      . 
      . list if inrange(id,15,18), clean noobs
      
          id   freq              field  
          15     10   Computer-related  
          15     10          Education  
          16      1   Computer-related  
          16      1          Education  
          16      1             Health  
          17      3   Computer-related  
          17      3             Health  
          18      4   Computer-related  
          18      4             Others  
      
      . 
      . collapse (sum) freq, by(field)
      
      . list, clean noobs
      
                                   field   freq  
               Engineering and Surveying    137  
          Administrative Specializations     85  
                            Architecture    137  
                        Computer-related    519  
                               Education    138  
                                  Health    144  
                                 Missing      8  
                                  Others     31

      Comment


      • #4
        Hi William,

        WOW, this looks awesome!! Thank you for your response and my apologies for not being clear enough, I am a newbie here so I appreciate your patience.

        Below is a snip of the survey question and respondents can choose more than one answer. The individual categories I wish to split these up are:

        1. Computer-related
        2. Architecture, Engineering and Surveying
        3. Administrative Specializations
        4. Education
        5. Health
        6. Others
        Click image for larger version

Name:	Snip20190214_1.png
Views:	1
Size:	18.8 KB
ID:	1483741



        In your code, Architecture is split from Engineering and Surveying - is it possible to put them in one category like I mention above.

        Thank you!

        Comment


        • #5
          Starting from the same data, the following code does what you want. Two extra lines were added, highlighted in red.
          Code:
          replace fields = subinstr(fields, "Architecture, ", "Architecture ", .)
          
          split fields, gen(field) parse(,)
          drop fields
          generate id = _n
          
          list if inrange(id,15,18), clean noobs
          
          reshape long field, i(id) j(num)
          drop if missing(field)
          replace field = subinstr(field, "Architecture ", "Architecture, ", .)
          drop num
          
          list if inrange(id,15,18), clean noobs
          
          collapse (sum) freq, by(field)
          list, clean noobs
          Code:
          . list, clean noobs
          
                                                field   freq  
                       Administrative Specializations     85  
              Architecture, Engineering and Surveying    137  
                                     Computer-related    519  
                                            Education    138  
                                               Health    144  
                                              Missing      8  
                                               Others     31

          Comment


          • #6
            Thank you so much for all your help William!!

            Comment

            Working...
            X