Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • String coloring in data viewer

    Hi all,
    I'm running Stata 13.1. For a project, I'm importing some text files with one very long string field (the abstract for a grant proposal). I import the files using -import delimited file.csv, delimiter(comma) bindquote(strict) varnames(1) clear-. When I go to check out the imported file in the data viewer, I notice that some of the long strings are colored red, as normal, but some are colored a light gray. At first I thought it meant there was some truncation, but that doesn't seem to be the case. Is this something I should be worried about?
    The kind of files I'm working with can be found here http://exporter.nih.gov/CRISP_Catalo...?sid=0&index=1 under "Abstracts". If, for example, you download the FY 1983 csv file and import it as I have, you should be able to replicate this. For example, I see application_id 4203764 and 4203843 as the only two gray abstracts in the first 50 lines, the rest are all red. I haven't been able to find any mention of this in the archives or by Googling.

    Any insight would be very appreciated. Many thanks!
    Conor

  • #2
    I have a partial answer for you. Anything with a length>2045 is gray. Try:

    Code:
    gen mylen =length(abstract_text)
    gsort -mylen
    and you'll see. As far as I can tell, the only thing magic about the 2045 cut-off is when displaying them with the "list" command (see "help strL" which will guide you to Chapter 12.4.12 . . . . How to see the full contents of strLs or str#s).

    Interesting.
    Last edited by ben earnhart; 25 Jan 2015, 15:37.

    Comment


    • #3
      Interesting indeed, thank you very much Ben!

      Comment


      • #4
        Hi Conor,

        Did you ever find a solution to this problem? I'm having some difficulty working with the light gray shaded strings.

        Comment


        • #5
          ben earnhart touched on the correct explanation. The Data Editor is not capable of editing strings longer than 2045 bytes or binary strL's, so these cells are displayed in gray and the status bar indicates that they are "not editable". However, the data can still be changed using the replace command in Stata.

          Comment


          • #6
            Thanks James. Perhaps I'm not understand. But, my issue is that whenever a variable exceeds 2045 bytes and I attempt to use the "cleaned" command that is part of the txttool, I get this error for any strL that is over 2045 bytes. Any thoughts on why this might be occurring?


            . txttool case_info, gen(cleaned)
            st_addvar(): 3300 argument out of range
            mm_txttool(): - function returned error
            <istmt>: - function returned error

            Comment


            • #7
              txttool is a user-written command from SSC/SJ, as you are asked to explain (FAQ Advice #12).

              I think you may need to

              * ask the authors for a view

              * copy the source code and write your own version with debug lines to see what is happening.

              Comment

              Working...
              X