Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Saving metadata in gph files (and possibly generated pdfs)?

    Dear Statalist,

    Is it possible to include simple strings as metadata in .gph files? For example, it would be helpful for documentation to be able to save the name of the do-file that generated the gph, the dataset used, etc. Of course one could put this into a graph note, but this would add unnecessary clutter.

    Related / followup: what about adding metadata to pdfs created from gph files by graph export? I think this could be done with an external program like pdftk, but it would be convenient to do it in Stata (especially given the incompatability of batch mode with external programs when using Windows).

    - BL

  • #2
    Embedding comments into graphs: yes possible
    Embedding comments into PDFs: don't know, but likely impossible, Stata lacks the PDF manipulation part. (very desirable indeed).
    Best, Sergiy Radyakin

    Comment


    • #3
      I understand that graph save by default saves some metadata, which can subsequently be retrieved with graph describe. What I'd like to know is whether I can add (and, of course, retrieve) metadata of my own choosing.

      Comment


      • #4
        A graph file is (by default) just a text file with some information as a preamble. See below for an example.

        But there seem to be some limitations here. In the example, it all looks simple, but it is easy to find graphs created by programs in which the last command that creates the graph makes use of temporary variable names, which are not reproducible. And I don't know how to add information arbitrarily that doesn't show up on the graph.

        I've got to say that I've been happy over the several years that I've needed this with the thought that a .do file that reads in the data and issues the graph commands is all the documentation I need. I can add comments to that exactly as I need them and I depend on nothing but any text editor. But I may be missing your point. (I am certainly not addressing your question about .pdf files.)

        Code:
         
        . sysuse auto, clear
        (1978 Automobile Data)
        
        . scatter mpg weight
        
        . graph save foo.gph
        (file foo.gph saved)
        
        . type foo.gph
        StataFileTM:00001:01000:LiveGPH:                       :
        00003:00003:
        *! classname: twowaygraph_g
        *! family: twoway
        *! command: twoway scatter mpg weight
        *! command_date: 28 Apr 2014
        *! command_time: 19:03:42
        *! datafile: C:\Program Files (x86)\Stata13\ado\base/a/auto.dta
        *! datafile_date: 13 Apr 2013 17:45
        *! scheme: s1color
        *! naturallywhite: 1
        *! xsize: 5.5
        *! ysize: 4
        *! end
        <BeginItem> serset K36b7e78 
        <BeginSerset>
        <BeginSeries>
        .name = `"mpg"'
        .label = `"Mileage (mpg)"'
        .format = `"%8.0g"'
        .type.set numeric
        .min =  12
        .max =  41
        .median = (.)
        .pct25 = (.)
        .pct75 = (.)
        .categories = (.)
        <EndSeries>
        <BeginSeries>
        .name = `"weight"'
        .label = `"Weight (lbs.)"'
        .format = `"%8.0gc"'
        .type.set numeric
        .min =  1760
        .max =  4840
        .median = (.)
        .pct25 = (.)
        .pct75 = (.)
        .categories = (.)
        <EndSeries>
        .weight_id = (.)
        <BeginSersetData>
        sersetreadwrite.......J.....mpg.ues.............................................
        > ..weight.les.t..........................................%8.0g.................
        > ...........................%8.0gc.............................................
        > ....(@[email protected]@[email protected]....
        ..P
        ..........V.

        Comment


        • #5
          Originally posted by Bert Lloyd View Post
          Is it possible to include simple strings as metadata in .gph files? For example, it would be helpful for documentation to be able to save the name of the do-file that generated the gph, the dataset used, etc. Of course one could put this into a graph note, but this would add unnecessary clutter.
          Bert can do what he requested in Stata 10.0 or later after installing new Stata command gph_cmnt with command
          Code:
          net from http://www.radyakin.org/stata/gph_cmnt/beta/
          and clicking on the corresponding link. The help file for it is self-explanatory, but ask me if you do have questions. Also if no problem is found, I will forward it to Kit for SSC distribution.

          The demo is here:
          Code:
          do http://www.radyakin.org/stata/gph_cmnt/beta/gph_cmnt_demo.do
          Best, Sergiy Radyakin

          Comment


          • #6
            Thanks Sergiy, that is nice. It would be helpful if the comment could easily be extracted, e.g. gph_cmnt_di example.gph returned
            r(gph_cmnt) : "Stata graph comment by Sergiy Radyakin." But it's great as is.

            Nick, to your comment: your method works well in the dofile -> gph direction. Occasionally, though, it is useful to go in the other direction, e.g. when I have a .gph file but am unsure of its origin. This applies more frequently to pdfs, though, for which pdftk seems necessary.

            Comment


            • #7
              Bert, I guess that's a bit different thing: gph_cmnt was all about auto-display of the text. Storing is not really a problem, and can be implemented of course quite easily with time at hand. But I somehow still lack of understanding what kind of problem exactly you are trying to solve. If you are creating a bunch of graphs, why not creating an index with them and a readme.txt, and pack everything in a zip file? The descriptions are going to be available to everybody and would not require any additional tools to be extracted, etc. There might be other suggestions as well, depending on details.

              If you are really into automation of reports, have a look at the ADePT program: http://www.worldbank.org/adept

              If there is anyone else interested let me know. Because there must be more than one user to justify a command, imho. And this does not help with the PDF files that you seem to care more about. For PDFs have a look at VeryPDF.

              Nick writes that a saved gph file is a text file. The format of that type of file is not known to me, and most likely not documented by StataCorp, and differs between different Stata versions. It does appear binary to me, not text (since it contains non-printable characters), which means editing it in something like a notepad.exe is likely to render it unusable in the future. REing this file type would take a lot of time.

              Best, Sergiy Radyakin

              Comment


              • #8
                I don't mind at all whether the graph file format is documented and I wasn't advocating editing it. My point was merely that the preamble includes some readable information, and I immediately stated some limitations for that.

                For now the only way to get documentation in the form you want is to write a .do file in the way you want.

                Naturally I appreciate that the reverse problem of working from a graph to what created it does arise, for me too, but hitherto I have blamed myself for any lack of record keeping.

                Comment

                Working...
                X