Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bizarre Problem: Unread trailing blanks

    Hi All,

    I'm using panel data pulled from a website and a curious anomaly has stalled my progress.

    My ID variable occasionally appears to leave a trailing blank space. For example, in one time period the ID will be listed as "texas", but then it will appear to be listed as "texas ".

    I say "appears", because Stata does not seem to recognize this blank space as a blank space.
    Simple methods of removing blank spaces have not worked.
    Searching for blank spaces turns up without finding any
    It DOES recognize the character (using a 'length' function does yields a value of 6 instead of 5 in the "texas " example)



    Has anyone ever had a similar problem? What can be done?


  • #2
    you don't say what "simple methods" you have tried, but here are a couple; note first, that if you want to know what the variable is, you can use the -hexdump- command to find out
    Code:
    replace id=trim(id) // in case it really is a blank
    replace id=substr(id,1,5) if substr(id,1,5)=="texas" // if not a blank and you don't care what it is

    Comment


    • #3
      You can also use the -charlist- command (by Nick Cox, available from SSC) to get a listing of all of the characters in the variable. Most likely what you have there is some kind of "non-printing" character. After running -charlist-, run -return list- to see a list of the ascii codes. Then look that up in an ascii table to see what character it is. You should then be able to remove using -subinstr()-.

      Comment


      • #4
        Simple methods of removing blank spaces have not worked.
        You didn't inform which simple methods you used, to no avail.

        But I strongly believe you may use the ltrim() and the rtrim() functions so as to - generate - a "clean" ID variable.

        I've faced such a problem several times, and these functions worked perfectly.
        Best regards,

        Marcos

        Comment


        • #5
          Now that I got my computer back, the example below shows how simply it is to get rid of the blanks of string variables, using the rtrim() function, since the blank spaces are to right in your case:

          Code:
          . set obs 8
          number of observations (_N) was 0, now 8
          
          . gen yvar = rnormal()
          
          . gen id = _n
          
          . gen str20 locality = "Texas  " if id <= 4
          (4 missing values generated)
          
          . replace locality ="California" if id > 4
          (4 real changes made)
          . */ the code above was just to create a toy example
          
          . gen locality_2 = rtrim(locality)
          
          . list, sep(4)
          
               +------------------------------------------+
               |      yvar   id     locality   locality_2 |
               |------------------------------------------|
            1. | -.2555817    1      Texas          Texas |
            2. | -.2768313    2      Texas          Texas |
            3. | -.6544462    3      Texas          Texas |
            4. | -1.336185    4      Texas          Texas |
               |------------------------------------------|
            5. | -.0546583    5   California   California |
            6. |  .1403725    6   California   California |
            7. | -.9984499    7   California   California |
            8. | -1.291706    8   California   California |
               +------------------------------------------+
          Hope that helps.
          Best regards,

          Marcos

          Comment


          • #6
            Thank you all. The "simple methods" I alluded to were indeed the trim() and substr() commands. I actually just went ahead and fixed my problem manually for this project before most of the replies were given - not a big data set, fortunately. I'll bookmark this, in case I run into the problem again, which I'm sure I will. Thanks for the replies.

            Adam

            Comment

            Working...
            X