Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Command subinstr doesn't work to remove space in number string

    Hi Statalist, I have this data set:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str18 country int year str11(var72 var73 var74) str8 var75
    "Australia"          2000 "69 582"     "23 793"     "9 060"      "4 193"  
    "Australia"          2005 "72 234"     "22 273"     "9 286"      "4 716"  
    "Australia"          2010 "76 580"     "22 717"     "10 253"     "5 215"  
    "Australia"          2016 "81 255"     "22 822"     "10 516"     "5 089"  
    "Brazil"             2000 "153 353"    "56 154"     "20 682"     "7 189"  
    "Brazil"             2005 "171 861"    "67 314"     "21 816"     "7 153"  
    "Brazil"             2010 "210 943"    "79 826"     "23 649"     "9 326"  
    "Brazil"             2016 "224 269"    "78 352"     "24 835"     "11 429" 
    "Chile"              2000 "20 382"     "6 644"      "4 863"      "531"     
    "Chile"              2005 "21 844"     "7 281"      "4 907"      "712"     
    "Chile"              2010 "23 860"     "8 710"      "5 324"      "805"     
    "Chile"              2016 "26 491"     "10 613"     "4 381"      "1 169"  
    "China"              2000 "781 194"    "302 215"    "277 727"    "12 485" 
    "China"              2005 "1 226 643" "619 988"    "285 726"    "24 805" 
    "China"              2010 "1 626 860" "906 616"    "280 368"    "44 064" 
    "China"              2016 "1 969 366" "993 954"    "324 602"    "72 404" 
    "France"             2000 "162 163"    "33 015"     "40 472"     "11 068" 
    "France"             2005 "167 609"    "33 414"     "42 871"     "11 907" 
    "France"             2010 "159 993"    "28 913"     "44 213"     "13 888" 
    "France"             2016 "152 163"    "26 814"     "39 834"     "13 706" 
    "Germany"            2000 "231 404"    "51 337"     "65 119"     "11 221" 
    "Germany"            2005 "230 686"    "54 197"     "63 397"     "12 150" 
    "Germany"            2010 "228 902"    "55 301"     "62 384"     "12 184" 
    "Germany"            2016 "223 929"    "55 561"     "55 959"     "10 972" 
    "Indonesia"          2000 "120 234"    "30 129"     "52 764"     "2 472"  
    "Indonesia"          2005 "132 721"    "35 460"     "55 205"     "3 401"  
    "Indonesia"          2010 "142 891"    "39 909"     "54 261"     "4 968"  
    "Indonesia"          2016 "164 728"    "38 830"     "63 161"     "7 819"  
    "WORLD"              2000 "7 035 492" "1 867 537" "1 808 058" "306 609"
    "WORLD"              2005 "7 978 082" "2 229 265" "1 901 915" "362 956"
    "WORLD"              2010 "8 832 148" "2 620 644" "1 998 463" "427 551"
    "WORLD"              2016 "9 555 323" "2 752 604" "2 061 152" "488 440"
    end
    I tried to remove space in the number string to reformat, by command -subinstr- and -strip- but boths do not work
    Code:
    foreach var of varlist var72-var75 {
        replace `var'=subinstr(`var'," ","",.)
        }
    and
    Code:
    foreach var of varlist var72-var75 {
        strip `var',of(" ") g(`var'_1)
        }
    Could someone explain what wrong and how can I remove those space?

    Thanks alot!

  • #2
    I cannot replicate this problem:

    Code:
    . foreach var of varlist var72-var75 {
      2.     gen _`var'=subinstr(`var'," ","",.)
      3.     }
    
    . 
    . l in 1
    
         +----------------------------------------------------------------------------------------+
         |   country   year    var72    var73   var74   var75   _var72   _var73   _var74   _var75 |
         |----------------------------------------------------------------------------------------|
      1. | Australia   2000   69 582   23 793   9 060   4 193    69582    23793     9060     4193 |
         +----------------------------------------------------------------------------------------+
    
    . l in -1
    
         +-----------------------------------------------------------------------------------------------------+
         | country   year       var72       var73       var74     var75    _var72    _var73    _var74   _var75 |
         |-----------------------------------------------------------------------------------------------------|
     32. |   WORLD   2016   9 555 323   2 752 604   2 061 152   488 440   9555323   2752604   2061152   488440 |
         +-----------------------------------------------------------------------------------------------------+

    Comment


    • #3
      See https://www.statalist.org/forums/for...-a-space/page2 where the solution turned out to be removing uchar(160)

      Note that subinstr() is a function, not a command. strip is community-contributed from SSC.

      Comment


      • #4
        Originally posted by Scott Merryman View Post
        I cannot replicate this problem:

        Code:
        . foreach var of varlist var72-var75 {
        2. gen _`var'=subinstr(`var'," ","",.)
        3. }
        
        .
        . l in 1
        
        +----------------------------------------------------------------------------------------+
        | country year var72 var73 var74 var75 _var72 _var73 _var74 _var75 |
        |----------------------------------------------------------------------------------------|
        1. | Australia 2000 69 582 23 793 9 060 4 193 69582 23793 9060 4193 |
        +----------------------------------------------------------------------------------------+
        
        . l in -1
        
        +-----------------------------------------------------------------------------------------------------+
        | country year var72 var73 var74 var75 _var72 _var73 _var74 _var75 |
        |-----------------------------------------------------------------------------------------------------|
        32. | WORLD 2016 9 555 323 2 752 604 2 061 152 488 440 9555323 2752604 2061152 488440 |
        +-----------------------------------------------------------------------------------------------------+
        As suggested by Nick Cox, my data has unicode character. I think when I used -dataex- to extract data, that's problem has been fixed so you can use function subinstr() as usual.
        Thanks Nick for your help (and corrections )

        Comment

        Working...
        X