Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Doubt when exporting string with quotes to .txt

    Hey all,

    Please consider the following scenario:

    Code:
    clear
    set obs 2
    gen id_genbank = ">Seq" + " -j " + `"""' + "bla" + `"""' in 1
    replace id_genbank = ">Seq" + " -j " + "bla" in 2
    export delimited using test.txt, delimiter(tab) replace
    
    *to display where the file was saved
    pwd
    Can someone explain why the observation 1 gets exported with four extra quotes in the .txt file, as
    Code:
    ">Seq -j ""bla"""
    despite being shown in Stata browse as
    Code:
    >Seq -j "bla"
    , given that I'm not specifying the option "quote" on the command export? I fail to see why does the observation 2 gets exported exactly the way it is shown on Stata browse, while observation 1 receives extra quotes.

    In the end, I would like to export the observations in a manner such that they look like
    Code:
    >Seq -j "bla"
    when shown in the txt file. Any ideas on achieving that?

    Cheers

  • #2
    Stata's -export delimited- quotes string more or less like Excel: without the 'quote' option, strings which contain quotes or commas are double-quoted. With the 'quote' option, all strings are quoted (which may be helpful for strings which contain numeric data).

    I didn't find an option to circumvent this, nor a package on SSC (but I didn't search much).

    Here seems to be a good occasion to test the new Python integration in Stata 16:

    Code:
    python
    import csv
    
    with open("test.csv", "wt", newline="") as f:
        writer = csv.writer(f, quotechar="", quoting=csv.QUOTE_NONE)
        writer.writerows([["id_genbank"], [">Seq -j \"bla\""], [">Seq -j bla"]])
    end
    Here is the documentation for Python's csv module: https://docs.python.org/3/library/csv.html

    One question though: since most programs, including Stata, Excel, R, SAS, and Python can deal with these quotes, is it really necessary to remove them?
    Last edited by Jean-Claude Arbaut; 18 Jul 2019, 13:33.

    Comment


    • #3
      Hey Jean-Claude, thank you for your answer!

      I'm working with Stata 15, Python integration is not available for me. I'm getting data exported from Stata to be uploaded to a website that has very specific formatting requirements and very little wiggle room around those requirements (it requires a variation of .txt files, with a pair of quotes the -j argument (-j "bla"). I got stuck on why Stata display the data in one way in the browse window but when I export it to txt, it comes up differently, but I suppose it's how Stata tries to be consistent with txt files containing explicit string (shown by quotes) or separators (shown by commas). In any case, I corrected the issue on my dataset using find/replace in a text editor, just wanted to understand the reasoning behind Stata behavior.

      Thank you again!

      Comment


      • #4
        We can do it the old way:

        In Stata:
        Code:
        save test
        In Python:
        Code:
        import pandas as pd
        import csv
        df = pd.read_stata("test.dta")
        df.to_csv("test.csv", quoting=csv.QUOTE_NONE, index=False)
        That is, save as .dta and convert from Python. You need first to install Python and the pandas module.
        You may also have a look at the documentation for read_stata and to_csv.

        It should also be possible to write a Mata program to export as csv, but it's not entirely trivial to write a generic program.
        Last edited by Jean-Claude Arbaut; 19 Jul 2019, 10:22.

        Comment


        • #5
          Code:
          . list
          
               +---------------+
               |    id_genbank |
               |---------------|
            1. | >Seq -j "bla" |
            2. |   >Seq -j bla |
               +---------------+
          Code:
          tempvar id_genbank fwrc
          local EOL = cond( c(os)=="Windows", char(13) + char(10) , char(10) )
          gen `id_genbank' = id_genbank + "`EOL'"
          gen `fwrc' = filewrite("test.txt",`id_genbank',2) /* 2 = append */
          Code:
          . type test.txt
          >Seq -j "bla"
          >Seq -j bla

          Comment


          • #6
            Thank you for your help and time Jean-Claude and Bjarte!

            Comment

            Working...
            X