Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need introductory help with filefilter, ASCII characters, and binary variables

    Hello Statalist. First, let me apologize for the poor formatting of this post. I am working from a remote area with very limited access to good wifi, so the text editor toolbar will not load. Also let me apologize because I believe this is a very elementary coding question, but after many hours of googling, reading the Stata PDF documentation, and trial and error I have been unsuccessful in finding a solution. I have been doing data cleaning and analysis work for a health NGO for the past year, and up to this point the binary variables in the unclean files that they send me have been in string format ("true" or "false" or ""), so converting them to numeric format has only required the generate and replace commands. In the most recent file they sent me, binary variables are in ACSII format ("\0" or "SOH" appearing as " " or ""). I want to convert these variables to the numeric format. It quickly became obvious that the replace command would not work for the "SOH" observations. After looking at other posts, it seems I will need to use filefilter, but it is not obvious to me how exactly the command should be typed (I had never heard of ACSII until yesterday, so both the Stata documentation and other forum posts are a bit over my head). After looking through other Statalist posts, I took a guess and tried filefilter "file1.dta" "file2.dta", from(\0) to ( ). I also tried filefilter "file1.dta" "file2.dta", from(BS0) to ( ). Neither worked. Is there anyone willing to give me a lesson on converting binary variables from ACSII format to just basic numeric format? Once again, apologies for the poor formatting of this post. Also, I am working in Stata 14.

  • #2
    Welcome to Statalist.

    You certainly do not want to be applying the filefilter command to Stata datasets (.dta files). Perhaps it can be applied successfully to the files the NGO provided to you. But that depends very much on what format the file is in. I am assuming the NGO is not giving you a Stata dataset. If they are, stop reading at this point and post a reply telling us that, because if so I have very much misunderstood your problem.

    Also, I am assuming that you have checked with the NGO that they intended to make the change they did, and not that someone new chose the wrong way of outputting the uncleaned data from whatever program they use to prepare it, and could easily rerun their program to correct the output they gave you.

    Suppose you have a file named ngodata.xyz that contains the data you describe. Those are not "ASCII format" rather they are the computer representation of the numbers 0 and 1 (as if they were byte variables in Stata), where the convention is that 0 is false and 1 is true. And the numbers 0 and 1 when interpreted as characters (which they are not meant to be) correspond to the "ASCII control characters" named NUL and SOH that normally are not used in data meant to be read as text, for example in Wordpad.

    With that said, perhaps
    Code:
    filefilter "ngodata.xyz" "temp.xyz", from(\0d) to ("false") replace
    filefilter "temp.xyz" "newdata.xyz", from(\1d) to ("truex") replace
    will accomplish something useful for you. Please note that I have deliberately written "truex" to have the same number of characters as "false" because I am concerned that this ngodata.xyz file is in a "fixed format" where data items start in a given column, and so the single characters NUL and SOH should be both be replaced by 5-character strings so the rest of the records line up correctly, just shifted over by 4 extra characters.

    Even this is fraught with problems, and perhaps you would be better off using the infix command to read these data. This assumes indeed that particular data always appears in the same position of each input record. The character you need to read would be described to infix as a byte appearing in exactly one column.

    For more information on using infix you can read the full documentation in the Stata Data-Management Reference Manual PDF included with your Stata installation and available through Stata's Help menu.

    Comment

    Working...
    X