Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • insheet options

    Dear All,

    I am using data (example file) recorded by software that has a convention to enclose in double quotes the content that itself contains any double quotes, and doubling those double quotes.

    I find that the data can be imported correctly with the import delimited command, but not insheet command.

    Code:
    local fn "http://www.radyakin.org/statalist/2022/pdsample.txt"
    import delimited "`fn'", varnames(1) clear
    list
    
    insheet using "`fn'", tab case names clear
    list
    1. I'd like to maintain my code compatibility with earlier versions of Stata, before the introduction of the import delimited command, but I don't find any option that could affect the treatment of the quotes in the input file by insheet. I'd like to avoid parsing individual lines/quotes myself, of course, which is always a last resort possibility.

    2. Is there any documentation on the specifics of the insheet's behavior for such cases? Like what exactly to expect and what not to expect from it? (presence of quotes, unmatched quotes, special characters, and other 'corner cases')?

    Thank you, Sergiy Radyakin


    Code:
    . local fn "http://www.radyakin.org/statalist/2022/pdsample.txt"
    
    . import delimited "`fn'", varnames(1) clear
    (encoding automatically selected: ISO-8859-1)
    (1 var, 1 obs)
    
    . list
    
         +----------------------------------------------------------------+
         |                                                        comment |
         |----------------------------------------------------------------|
      1. | school_categories||"some comment (from the user), goes here"|| |
         +----------------------------------------------------------------+
    
    . 
    . insheet using "`fn'", tab case names clear
    (1 var, 1 obs)
    
    . list
    
         +---------+
         | comment |
         |---------|
      1. |      || |
         +---------+

  • #2
    Dear Sergiy,

    Thank you for your question. It seems like there may not be a perfect answer to your question. I believe the problem arises when the insheet command is trying to interpret the various sets of double quotes in your .txt file. I found the following code that worked, however, it would require you to change or omit entirely the double quotes enclosing "some comment (from the user), goes here".

    Code:
    clear
    input str100 comment
    
    "school_categories||'some comment (from the user), goes here'||"
    
    end
    
    outsheet using "test_data.txt", replace
    
    insheet using "test_data.txt", tab case names clear
    list
    This produces the following output. Also note that in your case, "test_data.txt" would be replaced by your URL.

    Code:
    . clear
    
    . input str100 comment
    
                                                                                
    >                       comment
      1. 
    . "school_categories||'some comment (from the user), goes here'||"
      2. 
    . end
    
    . 
    . outsheet using "test_data.txt", replace
    
    . 
    . insheet using "test_data.txt", tab case names clear
    (1 var, 1 obs)
    
    . list
    
         +----------------------------------------------------------------+
         |                                                        comment |
         |----------------------------------------------------------------|
      1. | school_categories||'some comment (from the user), goes here'|| |
         +----------------------------------------------------------------+
    
    .
    There may be some way to include the double quotes as you intended, but I am not aware of it. Maybe another user could help with that.

    I hope I was of some help to you.

    Best,
    Salvatore

    Comment


    • #3
      I found the following code that worked, however, it would require you to change or omit entirely the double quotes
      Hello Salvatore,

      thank you very much for your time and effort. Unfortunately the system from which the data originates is producing the data in this very format and the files are rather large for the cleanup. Since import delimited imports them correctly (it seems) then I will go through my code and do the replacements of insheet with import delimited. I am hopeful there are no other subtle differences with respect to such corner cases between the two commands, so that I don't introduce more bugs than I fix, though.

      Thank you, Sergiy

      Comment

      Working...
      X