Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify Specific Letter/Numeric from a String Variable

    Dear all,
    I have a list of countryname and countrycode, as follow:
    countrycode countryname
    004 Afghanistan
    008 Albania
    012 Algeria
    108 Burundi
    191 Croatia
    212 Dominica
    233 Estonia
    300 Greece
    360 Indonesia
    388 Jamaica
    918 European Union
    927 Euro Area
    WLD World
    LMC Lower Middle Income
    I would like to add another variable regionalcat to identify if an observation is a country (0) or country group (1).
    In this case, country groups have countrycode beginning with numeric 9 or letter (alphabet).

    I tried using this code

    Code:
    gen regionalcat = .
    gen regionalcat1 = substr(countrycode,1,1)
    egen regionalcat2 = sieve(regionalcat1), keep(a)
    capture assert regionalcat2 == regionalcat1
    // if there's no alphabet
    if _rc{
      // identify if the first character is "9"
        egen regionalcat9 = sieve(regionalcat1),omit(9)
        capture assert regionalcat9 == regionalcat1
        if _rc {
     // if the first character is 9
        replace regionalcat = 1
        }
        else {
    // if the first character is numeric 012345678
        replace regionalcat = 0
        }
    }
    // if there's alphabet
    else {
    replace regionalcat = 1 
    }
    However, this code returns value 0 for all observations in regionalcat, can anybody help me? Thank you

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str3 countrycode str19 countryname
    "004" "Afghanistan"        
    "008" "Albania"            
    "012" "Algeria"            
    "108" "Burundi"            
    "191" "Croatia"            
    "212" "Dominica"           
    "233" "Estonia"            
    "300" "Greece"             
    "360" "Indonesia"          
    "388" "Jamaica"            
    "918" "European Union"     
    "927" "Euro Area"          
    "WLD" "World"              
    "LMC" "Lower Middle Income"
    end
    
    gen regionalcat = 1
    replace regionalcat = 0 if real(substr(countrycode,1,1))<9

    Comment

    Working...
    X