Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing consecutive spaces in string variable

    Dear all,

    I have been struggling with removing spaces from a string variable called booth_id. Here is how it looks:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str62 booth_id
    "98-4     "
    "98-5     "
    "98-6     "
    "98-7     "
    "98-8     "
    "98-9"     
    "98-10 "   
    "98-11 "   
    "98-12 "   
    end
    I have tried removing spaces, tabs in all the possible ways I could think of but nothing works and 0 real changes are made.
    Code:
    replace booth_id=itrim(booth_id)
    replace booth_id=ltrim(booth_id)
    replace booth_id= subinstr(booth_id, char(9), "",.) 
    replace booth_id= subinstr(booth_id, " ", "",.)
    When I try to destring after splitting booth_id by "-", it says non-numeric characters however when I do charlist for the second part of the string (following "-"), it returns the following:
    0123456789

    I don't understand how to remove these spaces from booth_id if they are indeed spaces. I would be grateful if somebody could help me with this.

  • #2
    Well, the first two won't do what you want because -itrim()- is designed to remove spaces between words: but you don't have any of those, you just have them after. Similarly, -ltrim()- won't do it because that is designed to remove leading spaces, and you only have trailing spaces. -subinstr(booth_id, char(9), "", .)- won't do it because char(9) is not the space character. The space character is char(32). But all of the following work with your example data:

    Code:
    replace booth_id=rtrim(booth_id)
    replace booth_id = trim(booth_id)
    replace booth_id= subinstr(booth_id, char(32), "",.) 
    replace booth_id= subinstr(booth_id, " ", "",.)
    You say you tried the last of these yourself and it didn't replace any characters, but I can't replicate that behavior on my setup. For me, it does exactly what you want.

    And thank you for using -dataex-.

    Comment


    • #3
      Note that itrim() reduces multiple internal spaces to single spaces:


      Code:
      . di  itrim("This   is         a   test")
      This is a test
      So that's removing (all but one of) spaces between each pair of words.

      Comment


      • #4
        Thank you Clyde and Nick for your very valuable comments. Yes, the part of the dataset that I produced above, the code works. However there are some observations in this variable for which this doesn't work completely. Here are those:
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str62 booth_id
        "98-114             "
        "98-115             "
        "98-116             "
        "98-117             "
        "98-118             "
        "98-119             "
        "98-120             "
        "98-121             "
        "98-122             "
        "98-123             "
        "98-124             "
        "98-125             "
        "98-126             "
        "98-127             "
        "98-128             "
        "98-129             "
        "98-130             "
        "98-131             "
        "98-132             "
        end

        Comment


        • #5
          You need to post the exact commands you ran and show the exact results Stata gave you. When I run each of those four commands on this data in my setup, each one of them removes all of the blanks from all 19 observations:
          Code:
          . * Example generated by -dataex-. To install: ssc install dataex
          . clear
          
          . input str62 booth_id
          
                                                                     booth_id
            1. "98-114             "
            2. "98-115             "
            3. "98-116             "
            4. "98-117             "
            5. "98-118             "
            6. "98-119             "
            7. "98-120             "
            8. "98-121             "
            9. "98-122             "
           10. "98-123             "
           11. "98-124             "
           12. "98-125             "
           13. "98-126             "
           14. "98-127             "
           15. "98-128             "
           16. "98-129             "
           17. "98-130             "
           18. "98-131             "
           19. "98-132             "
           20. end
          
          . 
          . preserve
          
          . replace booth_id=rtrim(booth_id)
          (19 real changes made)
          
          . restore, preserve
          
          . replace booth_id = trim(booth_id)
          (19 real changes made)
          
          . restore, preserve
          
          . replace booth_id= subinstr(booth_id, char(32), "",.) 
          (19 real changes made)
          
          . restore
          
          . replace booth_id= subinstr(booth_id, " ", "",.)
          (19 real changes made)
          
          .
          So you are doing something that is somehow different, and without seeing exactly what, there is no way to advise you.

          Comment


          • #6
            Sure, this is what I did:
            Code:
            . tab booth_id
            
                                           booth_id |      Freq.     Percent        Cum.
            ----------------------------------------+-----------------------------------
                                98-113              |          1        9.09        9.09
                                98-114              |          1        9.09       18.18
                                98-115              |          1        9.09       27.27
                                98-116              |          1        9.09       36.36
                                98-117              |          1        9.09       45.45
                                98-118              |          1        9.09       54.55
                                98-119              |          1        9.09       63.64
                                98-120              |          1        9.09       72.73
                                98-121              |          1        9.09       81.82
                                98-122              |          1        9.09       90.91
                                98-123              |          1        9.09      100.00
            ----------------------------------------+-----------------------------------
                                              Total |         11      100.00
            
            . 
            . replace booth_id=rtrim(booth_id)
            (0 real changes made)
            
            . replace booth_id = trim(booth_id)
            (0 real changes made)
            
            . replace booth_id= subinstr(booth_id, char(32), "",.) 
            (11 real changes made)
            
            . replace booth_id= subinstr(booth_id, " ", "",.)
            (0 real changes made)
            
            . tab booth_id
            
                                           booth_id |      Freq.     Percent        Cum.
            ----------------------------------------+-----------------------------------
                                 98-113             |          1        9.09        9.09
                                 98-114             |          1        9.09       18.18
                                 98-115             |          1        9.09       27.27
                                 98-116             |          1        9.09       36.36
                                 98-117             |          1        9.09       45.45
                                 98-118             |          1        9.09       54.55
                                 98-119             |          1        9.09       63.64
                                 98-120             |          1        9.09       72.73
                                 98-121             |          1        9.09       81.82
                                 98-122             |          1        9.09       90.91
                                 98-123             |          1        9.09      100.00
            ----------------------------------------+-----------------------------------
                                              Total |         11      100.00
            . charlist booth_id
            -0123456789
            But I see when I copy-paste the -dateex- input, the above code works perfectly. Could it because I imported this data from excel and maybe data is not read properly? Thank you.

            Comment


            • #7
              The help for charlist (SSC, as you are asked to explain)discusses what to do with unprintable and/or invisible codes, such as char(160).
              Last edited by Nick Cox; 20 Sep 2016, 03:01.

              Comment


              • #8
                Thank you so much for your help, Nick and Clyde. Apparently these were not spaces but some invisible characters. In the end, I solved this problem by opening the data in another editor and removing them.

                Comment


                • #9
                  Thanks for reporting and good that you solved your problem. You could have removed them in Stata too: the help for charlist gives examples.

                  Comment

                  Working...
                  X