Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate a new variable by counting character ( "/" )

    Code:
    . list country
    
         +------------------------------+
         |                      country |
         |------------------------------|
      1. |                           UK |
      2. |                       France |
      3. |     France / Singapore / UAE |
      4. |                  Switzerland |
      5. |                   Spain / US |
         |------------------------------|
      6. |                        Italy |
      7. |                  Switzerland |
      8. |                       France |
      9. |                  Netherlands |
     10. |                           UK |
         |------------------------------|
     11. |  FR / GB / DE / ES / IT / PL |
    I would like to generate a new variable, say, number_of_countries, by counting the number of "/" in country. That is, if there is no "/" in country, number_of_countries will be 1; if there is one "/" in country, number_of_countries will be 2, and so on.

    For example, for observations 1 and 2 number_of_countries will be 1, and for observation 3 number_of_countries will be 3.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str28 country
    "UK"                         
    "France"                     
    "France / Singapore / UAE"   
    "Switzerland"                
    "Spain / US"                 
    "Italy"                      
    "Switzerland"                
    "France"                     
    "Netherlands"                
    "UK"                         
    "FR / GB / DE / ES / IT / PL"
    end

  • #2
    Code:
    gen int n_countries = length(country) - length(subinstr(country, "/", "", .)) + 1

    Comment


    • #3
      Awesome! The code works perfectly. Thank you Clyde Schechter.

      Comment


      • #4
        This may be taking a step backwards (since Clyde's code gives the desired output in 1 line), but I thought I would mention one other way to do it. egenmore (SSC install egenmore) has noccur(), string() which counts the number of occurrences of whatever is listed in string().

        Code:
        ssc install egenmore  // in case not already installed
        egen n_countries = noccur(country), string(/)  // string(/) is the same as string("/"). And string(IT /) is the same as string("IT /")
        replace n_countries = n_countries + 1
        
        . list, noobs abbrev(12)
        
          +-------------------------------------------+
          |                     country   n_countries |
          |-------------------------------------------|
          |                          UK             1 |
          |                      France             1 |
          |    France / Singapore / UAE             3 |
          |                 Switzerland             1 |
          |                  Spain / US             2 |
          |-------------------------------------------|
          |                       Italy             1 |
          |                 Switzerland             1 |
          |                      France             1 |
          |                 Netherlands             1 |
          |                          UK             1 |
          |-------------------------------------------|
          | FR / GB / DE / ES / IT / PL             6 |
          +-------------------------------------------+
        This could be reduced to 1 line, except that the following throw errors:
        Code:
        . egen n_countries = noccur(country) + 1, string(/)
        varlist not allowed
        r(101);
        
        . egen n_countries = 1 + noccur(country), string(/)
        unknown egen function 1()
        r(133);

        Comment


        • #5
          #4 David Benson Many people would like to do that in egen, but the bare or basic syntax forbids it:

          egen newvar = fcn(arguments)
          In citing that I have omitted possibilities that don't change the point. The syntax for the right-hand side is just calling up an egen function, and no more.

          There is more on Clyde's strategy in https://www.stata-journal.com/sjpdf....iclenum=dm0056

          Comment

          Working...
          X