Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Infile skipping beginning quotation mark when reading fixed width text?

    Hello StataList,

    I'm new to Stata, and I'm trying to understand what is happening when I'm reading quoted strings in fixed width text. After reading the fixed width text, infile appears to skip the beginning quote, but keep the end quote. Does Stata treat the beginning quote like an empty space? Is there a way to retain the beginning quote?

    I created the following fixed width data:
    123"temp" 456
    456"test" 789
    789a b c 123


    I created a .dct file to specify the columns and informats for the variables:
    dictionary using "C:\test.dat" {
    _column(1) int TEST_ID %3f
    _column(4) str8 STRING_VAR %8s
    _column(12) int V3 %3f
    }

    Then I use the .dct file in the following infile statement:
    infile using "C:\test.dct"

    When I browse the data, I see the end quote in the first two rows, but not the beginning quote:
    TEST_ID STRING_VAR V3
    123 temp" 456
    456 test" 789
    789 a b c 123

    The documentation for infile (fixed format) states:
    The logic is a bit more complicated. For instance, when skipping forward to find the data, infile
    might encounter a quote. If so, it then collects the characters for the data by skipping forward until
    it finds the matching quote. If you specified a % infmt, then infile skips the skipping-forward step
    and simply collects the specified number of characters. If you specified a %S infmt, then infile
    does not skip leading or trailing blanks. Nevertheless, the general logic is (optionally) skip, collect,
    and reset.


    I thought the %8s informat would tell infile to not skip the beginning quote, but it doesn't seem to be the case. I'm using Stata/MP 17.0. Any insight would be appreciated.

    Thank you
    Terence Lew

  • #2
    It appears to be a bug, i.e., the leading quote is stripped off if it is the first character of the string variable. You should send a report to Technical Services to check it out: https://www.stata.com/support/tech-support/contact/. A workaround assuming the last character of the previous variable is not a space is to start the string with this character and then delete it once you import the data.

    Code:
    dictionary{
    _column(1) int TESTID %3f
    _column(3) str7 STRING_VAR %7s  
    _column(10) int V3 %4f
    }
    123"temp" 456
    456"test" 789
    789a b c 123
    Code:
    infile using "C:\test.dct"
    replace STRING_VAR= substr(STRING_VAR, 2, .)
    Res.:

    Code:
    replace STRING_VAR= substr(STRING_VAR, 2, .)
    (3 real changes made)
    
    . l
    
         +-------------------------+
         | TESTID   STRING~R    V3 |
         |-------------------------|
      1. |    123     "temp"   456 |
      2. |    456     "test"   789 |
      3. |    789      a b c   123 |
         +-------------------------+

    Comment


    • #3
      This is very helpful - thanks so much, Andrew!

      Comment

      Working...
      X