Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a variable that summarizes the number of distinct combinations of string variables

    Hi all,

    Apologies in advance if my problem is unclear. I'm a former Stata user who has since been using Excel and previously SAS so I'm trying to remember what I used to know in Stata. I'm trying to count the number of times a distinct combination appears for two string variables. What I would like to see is something like this where the total number of times a country appears across the combination of country and area:


    Country Number of Actions
    Afghanistan 2
    Albania 1
    Algeria 2
    Angola 2
    Armenia 3
    Azerbaijan 1
    Bangladesh 2
    Belize 2
    Benin 5
    Bhutan 4
    Bolivia 6
    Bosnia and Herzegovina 1

    This is what my data looks like:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str47 country str19 area
    "Afghanistan" "Social Programs"    
    "Afghanistan" "Social Programs"    
    "Albania"     "Enforcement"        
    "Algeria"     "Enforcement"        
    "Algeria"     "Legal Framework"    
    "Angola"      "Enforcement"        
    "Angola"      "Social Programs"    
    "Armenia"     "Coordination"      
    "Armenia"     "Social Programs"    
    "Armenia"     "Social Programs"    
    "Azerbaijan"  "Social Programs"    
    "Bangladesh"  "Enforcement"        
    "Bangladesh"  "Legal Framework"    
    "Belize"      "Enforcement"        
    "Belize"      "Legal Framework"    
    "Benin"       "Enforcement"        
    "Benin"       "Enforcement"        
    "Benin"       "Government Policies"
    "Benin"       "Social Programs"    
    "Benin"       "Social Programs"    
    end
    I've tried egen distinct2 = rowvals(country area) after installing egenmore using ssc install egenmore, but I keep getting:

    unknown egen function rowvals()
    r(133);

    Any help would be appreciated and please let me know if I can clarify!

    Thanks!
    Last edited by Claudia Guidi; 29 Apr 2021, 17:43.

  • #2
    Hello and welcome (back).

    Just want to make sure I understand, because the expected table does not agree with your text description. Do you want unique combination of BOTH country and area? You said so in the text, but in the table it was different. Like for Benin, it has 5 repetitions as a country, but if considering area as well it's only 3.

    Anyhow, there are probably more elegant ways, but this should get the results: if only country:
    Code:
    gen total = 1
    collapse (sum) total, by(country)
    If both country & area:
    Code:
    gen total = 1
    collapse (sum) total, by(country area)
    From your code, it also looks like you may want to retain the current number of rows and not collapsing. If that's the case then the code would be different. Either way, some clarification will be helpful.
    Last edited by Ken Chui; 29 Apr 2021, 18:07.

    Comment


    • #3
      Some confusion here.

      There is,no function rowvals() in egenmore (SSC). There are functions rownvals() and rowsvals() but neither is what you want. Each looks across two or more variables but you want to compare values in different observations on one variable.

      This may help, but I am not especially clear what you want. The total number of actions is not the same as the number of distinct actions unless each action occurs just once. See also
      https://www.stata-journal.com/articl...article=dm0042

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str47 country str19 area
      "Afghanistan" "Social Programs"    
      "Afghanistan" "Social Programs"    
      "Albania"     "Enforcement"        
      "Algeria"     "Enforcement"        
      "Algeria"     "Legal Framework"    
      "Angola"      "Enforcement"        
      "Angola"      "Social Programs"    
      "Armenia"     "Coordination"      
      "Armenia"     "Social Programs"    
      "Armenia"     "Social Programs"    
      "Azerbaijan"  "Social Programs"    
      "Bangladesh"  "Enforcement"        
      "Bangladesh"  "Legal Framework"    
      "Belize"      "Enforcement"        
      "Belize"      "Legal Framework"    
      "Benin"       "Enforcement"        
      "Benin"       "Enforcement"        
      "Benin"       "Government Policies"
      "Benin"       "Social Programs"    
      "Benin"       "Social Programs"    
      end
      
      bysort country : gen count = _N 
      
      egen tag = tag(country area)
      egen distinct = total(tag), by(country)
      
      list, sepby(country)
      
      
           +------------------------------------------------------------+
           |     country                  area   count   tag   distinct |
           |------------------------------------------------------------|
        1. | Afghanistan       Social Programs       2     1          1 |
        2. | Afghanistan       Social Programs       2     0          1 |
           |------------------------------------------------------------|
        3. |     Albania           Enforcement       1     1          1 |
           |------------------------------------------------------------|
        4. |     Algeria           Enforcement       2     1          2 |
        5. |     Algeria       Legal Framework       2     1          2 |
           |------------------------------------------------------------|
        6. |      Angola           Enforcement       2     1          2 |
        7. |      Angola       Social Programs       2     1          2 |
           |------------------------------------------------------------|
        8. |     Armenia          Coordination       3     1          2 |
        9. |     Armenia       Social Programs       3     1          2 |
       10. |     Armenia       Social Programs       3     0          2 |
           |------------------------------------------------------------|
       11. |  Azerbaijan       Social Programs       1     1          1 |
           |------------------------------------------------------------|
       12. |  Bangladesh           Enforcement       2     1          2 |
       13. |  Bangladesh       Legal Framework       2     1          2 |
           |------------------------------------------------------------|
       14. |      Belize           Enforcement       2     1          2 |
       15. |      Belize       Legal Framework       2     1          2 |
           |------------------------------------------------------------|
       16. |       Benin           Enforcement       5     1          3 |
       17. |       Benin           Enforcement       5     0          3 |
       18. |       Benin   Government Policies       5     1          3 |
       19. |       Benin       Social Programs       5     1          3 |
       20. |       Benin       Social Programs       5     0          3 |
           +------------------------------------------------------------+

      Comment


      • #4
        Thank you both for the helpful replies! Both solutions worked very well!

        Comment

        Working...
        X