Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Split destring and crosstabs

    Hello Statalist, this is my first post and so I will try to follow the rules as best I can.

    I have presented data below. The "Question1" variable was from a question where respondents could select multiple responses, hence the commas. I found code from a previous post here that adequately splits those commas to produce a nice table:

    Code:
    split Question1, p(,) destring
    tabm Question1?, transpose
    But then I have the 'agencytype' variable. Essentially, what I am hoping to do is to "group" or categorize the different agency types into three groups: 1-6 would be AgencyA, 7-12 would be AgencyB, and 13-17 would be AgencyC. And then I would produce tabs of Question1 by each group.

    Right now, the best way I can think of doing that is:

    Code:
    tabm Question1? if agencytype ==1 | agencytype ==2 | agencytype ==3 | agencytype ==4 | agencytype ==5 | agencytype ==6, transpose
    And that would give me the output for Question1 by AgencyA. I would then repeat for AgencyB and AgencyC.

    But, I feel like there must be some more efficient way of doing this, or something I am not thinking of.

    My data is below. Thank you in advance for the help.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte agencytype str13 Question1
     3 "4"          
     3 "4"          
    2 "1,2,3"      
     7 "4"          
     1 "1,2,3,4"    
     5 "2"          
    3 "4"  
    ​​​​​​​11 "1,2,3"      
     4 "4"          
     1 "1,2,3,4"    
    10 "1,2,3,4,5"  
     1 "3,4,5,6"    
     7 "2"           
     5 "2"          
     3 "5"          
     5 "3,4"        
     5 "3"          
     1 "1"          
     1 "2,3,4"  
     2 "4"          
    12 "2,3,4,5,6"  
     5 "4"          
     9 "4"          
     5 "3"          
     3 "4"          
     1 "4,5,6"      
     5 "3,4"        
     1 "1,2,3,4,5,6"
     3 "4"          
     3 "3,4"         
    17 "1,2,3,4,5,6"
     5 "3"              
     2 "2"          
     1 "1,2,3,4,5,6"
     5 "3"          
     9 "1,2,3,4,5,6"
     3 "3"          
    17 "1,2,3"             
     5 "3"          
     2 "3"          
     3 "4"    
    ​​​​​​​11 "1,2,3"      
     4 "4"          
     1 "1,2,3,4"            
     1 "1"          
     1 "2,3,4"  
     2 "4"          
    12 "2,3,4,5,6"  
     5 "4"          
     9 "4"               
    11 "1,2,3,4,5,6"
     3 "4"          
     4 "4"          
     4 "3"          
     9 "2"          
     9 "7"          
     1 "1"          
    17 "4"          
     2 "4"          
     4 "4,5"        
     5 "3"          
     2 "1,2,3"      
     2 "4"          
     8 "1,2,3"      
    17 "1,2,3"      
     5 "3"          
     3 "1,2,3,4,6"  
    end

  • #2
    HI Dakota,

    One idea:

    Code:
    split Question1, p(,) destring
    
    gen agencyA = (agencytype <= 6)
    gen agencyB = (agencytype <= 12) & !agencyA
    gen agencyC = (agencytype <= 17) & !agencyA & !agencyB  
    
    foreach v in A B C {
    tabm Question1? if agency`v', transpose
    }
    Last edited by Julian Duggan; 20 Aug 2019, 11:15. Reason: formatting stuck all my code on one line, oddly

    Comment


    • #3
      Thanks for the data example. There aren't any rules here other than not spamming and not being rude, and you are not at risk of either. But there are reasonable requests, including explaining community-contributed commands you refer to, Here tabm is from tab_chi (SSC) and I have no interest in disparaging it, but I am allowed to say that t's a dead end for what you want to do. Here is some technique:

      Code:
      split Q, parse(,) destring
      drop Question1 
      gen id = _n 
      reshape long Question1, i(id) 
      drop if missing(Question1)
      gen agency = cond(agencytype <= 6, "A", cond(agencytype <= 12, "B", "C"))
      tab agencytype agency 
      tab Question1 agency
      and some results;


      Code:
      . tab agencytype agency 
      
                 |              agency
      agencytype |         A          B          C |     Total
      -----------+---------------------------------+----------
               1 |        40          0          0 |        40 
               2 |        12          0          0 |        12 
               3 |        16          0          0 |        16 
               4 |         6          0          0 |         6 
               5 |        15          0          0 |        15 
               7 |         0          2          0 |         2 
               8 |         0          3          0 |         3 
               9 |         0         10          0 |        10 
              10 |         0          5          0 |         5 
              11 |         0          6          0 |         6 
              12 |         0         10          0 |        10 
              17 |         0          0         13 |        13 
      -----------+---------------------------------+----------
           Total |        89         36         13 |       138 
      
      . tab Question1 agency 
      
                 |              agency
       Question1 |         A          B          C |     Total
      -----------+---------------------------------+----------
               1 |        11          4          3 |        18 
               2 |        13          8          3 |        24 
               3 |        24          6          3 |        33 
               4 |        30          8          2 |        40 
               5 |         6          5          1 |        12 
               6 |         5          4          1 |        10 
               7 |         0          1          0 |         1 
      -----------+---------------------------------+----------
           Total |        89         36         13 |       138
      and we are counting answers, not people (or respondents).

      Comment


      • #4
        Thank you both. Nick Cox I will try this out, it seems like it may be a very efficient way to do what I hope. I'm going to try to learn a bit about a few of the codes you provided (I always mess up -reshape- and am not too familiar with split, parse). Thank you again for the responses.

        Comment


        • #5
          Articles about very rich people often have them saying things like "the first million (billion, trillion) was the hardest". The first hundred reshapes are the hardest. (Well, let's say ten.)

          Comment


          • #6
            Hopefully it doesn't take me 100!! Thanks again, Nick Cox

            Comment

            Working...
            X