Bizarre Problem: Unread trailing blanks

Adam Harper

Join Date: Jun 2014

Posts: 26
#1

Bizarre Problem: Unread trailing blanks

15 Mar 2017, 13:24

Hi All,

I'm using panel data pulled from a website and a curious anomaly has stalled my progress.

My ID variable occasionally appears to leave a trailing blank space. For example, in one time period the ID will be listed as "texas", but then it will appear to be listed as "texas ".

I say "appears", because Stata does not seem to recognize this blank space as a blank space.
Simple methods of removing blank spaces have not worked.
Searching for blank spaces turns up without finding any
It DOES recognize the character (using a 'length' function does yields a value of 6 instead of 5 in the "texas " example)

Has anyone ever had a similar problem? What can be done?
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#2

15 Mar 2017, 13:47

you don't say what "simple methods" you have tried, but here are a couple; note first, that if you want to know what the variable is, you can use the -hexdump- command to find out

Code:

replace id=trim(id) // in case it really is a blank replace id=substr(id,1,5) if substr(id,1,5)=="texas" // if not a blank and you don't care what it is
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30090
#3

15 Mar 2017, 14:44

You can also use the -charlist- command (by Nick Cox, available from SSC) to get a listing of all of the characters in the variable. Most likely what you have there is some kind of "non-printing" character. After running -charlist-, run -return list- to see a list of the ascii codes. Then look that up in an ascii table to see what character it is. You should then be able to remove using -subinstr()-.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

15 Mar 2017, 15:19

Simple methods of removing blank spaces have not worked.

You didn't inform which simple methods you used, to no avail.

But I strongly believe you may use the ltrim() and the rtrim() functions so as to - generate - a "clean" ID variable.

I've faced such a problem several times, and these functions worked perfectly.

Best regards,

Marcos
Comment

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

16 Mar 2017, 13:24

Now that I got my computer back, the example below shows how simply it is to get rid of the blanks of string variables, using the rtrim() function, since the blank spaces are to right in your case:

Code:

. set obs 8
number of observations (_N) was 0, now 8

. gen yvar = rnormal()

. gen id = _n

. gen str20 locality = "Texas  " if id <= 4
(4 missing values generated)

. replace locality ="California" if id > 4
(4 real changes made)
. */ the code above was just to create a toy example

. gen locality_2 = rtrim(locality)

. list, sep(4)

     +------------------------------------------+
     |      yvar   id     locality   locality_2 |
     |------------------------------------------|
  1. | -.2555817    1      Texas          Texas |
  2. | -.2768313    2      Texas          Texas |
  3. | -.6544462    3      Texas          Texas |
  4. | -1.336185    4      Texas          Texas |
     |------------------------------------------|
  5. | -.0546583    5   California   California |
  6. |  .1403725    6   California   California |
  7. | -.9984499    7   California   California |
  8. | -1.291706    8   California   California |
     +------------------------------------------+

Hope that helps.

Best regards,

Marcos

Comment

Adam Harper

Join Date: Jun 2014

Posts: 26
#6

27 Mar 2017, 10:34

Thank you all. The "simple methods" I alluded to were indeed the trim() and substr() commands. I actually just went ahead and fixed my problem manually for this project before most of the replies were given - not a big data set, fortunately. I'll bookmark this, in case I run into the problem again, which I'm sure I will. Thanks for the replies.

Adam
Comment

Announcement

Bizarre Problem: Unread trailing blanks

Comment

Comment

Comment

Comment

Comment