Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • describe a variable in a data file; obtain its type

    Hello,
    I haven't checked in at Stata Forum in a while, so I don't know if this has been discussed lately.

    I want to describe a variable in a file, and obtain its type. I can get this information easily from the in-memory dataset, but not from a file. If I describe the variable in the file...
    des somevar using somefile
    I get to see the type, but I don't know how to capture that information -- if it is possible.

    Of course, I can -use- the file, and capture the desired information, but I'd like to know if I can avoid that route.

    I use Stata 13.
    Thanks
    --David

  • #2
    David,

    How about:

    Code:
    desc somevar using somefile, replace
    This doesn't store anything in r(), which is probably what you wanted, but it creates a summary data set without having to open up "somefile".

    Regards,
    Joe

    Comment


    • #3
      The replace option is not an option ​to describe data in file, only "in memory". You might run describe on a file then reading the logfile and returning the information. A draft progam:

      Code:
      prog define getformat , rclass
      syntax namelist using
      
      tempname tmplog
      capt log close `"`tmplog'"'
      qui log using `"`tmplog'"', text replace
      local tmplog = r(filename)
      describe `namelist' `using'
      qui log close
      
      tempname fh
      file open `fh' using `"`tmplog'"' , read text
      file read `fh' line
      
      while r(eof)==0 {
      
          if  strpos(`"`macval(line)'"', "`namelist'" ) > 0 {
          
              tokenize `"`macval(line)'"'
              
              return local displayformat = "`3'"  
              return local storagetype = "`2'"
              return local varname = "`1'"
          }
          
           file read `fh' line
      }
      
      file close `fh'
      capture erase `"`tmplog'"'
      end

      running on some data
      Code:
      getformat varname using filename

      you might get:
      Code:
      . return list
      
      macros:
                  r(varname) : "died"
              r(storagetype) : "byte"
            r(displayformat) : "%8.0g"

      Comment


      • #4
        David,

        Sorry for the bad advice; Bjarte is correct. For some reason the help file for describe does not actually specify that the replace option is not compatible with using.

        Regards,
        Joe

        Comment


        • #5
          Thanks very much to Joe and Bjarte.

          Note that in my Stata, the help for -describe- does indicate that the replace option does not apply to using.

          I had though of something like the getformat program that Bjarte provided. But I didn't go as far as to try writing it. Thank you for providing that bit of code. I had not been aware that you can name logs and have multiple logs open concurrently; that solves one potential problem (if a log was already open). (If I had read the help for log, I would have seen the naming and multiple log features. But I had not looked at that help in sufficient detail lately. I learned about logs a long time ago; maybe (??) there was not the capability for multiple logs then.)

          I still see one possibly problem with that program. If a variable name is longer than a certain length, the attributes are written on the following line, rather than the same line.
          Also, this seems to accept multiple names, but I believe it will work only when given a single name.

          Both these matters can be fairly easily rectified.
          Thanks again for your help.
          --David

          Comment


          • #6
            Hello again,
            I want to clarify what was actually my own confusion.

            In Bjarte's code, there was only one use of a named log:

            capt log close `"`tmplog'"' But that is actually erroneous, though it passes due to the -capt-. It also confuses logname with the filename of the log.
            The other log commands refer to the unnamed log, though it would be best to use a named log.
            I will start with your code and make it more robust. If I get it to work, I'll report back.
            Thanks for the framework with which to start.
            --David

            Comment


            • #7
              Hello again,
              If anyone is interested, here is my revision to Bjarte's program:


              /*
              getformat.ado; 2016aug4 from Bjarte Aagnes, via Stata Forum.
              Modified / fixed various issues, by David Kantor.

              For now, let it accept one name at a time.
              If we were to allow multiple names, then we would (1) need to cycle through the
              name in namelist, and (2) incorporate the varnames in the returned locals
              storagetype_thisvar
              storagetype_thatvar
              Or maybe that would create too-long names. You might, instead, use numbers:
              storagetype1
              storagetype2
              (And you could "find" the one you want by cycling through the varname1, varname2,
              etc..)

              Note that simply changing the -syntax- to have namelist instead of name will allow
              multiple names, but will fail to find any of the variables if you specify more than
              one.

              This will fail in the unlikely event that a file with the same name as
              `log1' is open. This might happen if you issued something like...
              log using mylogname, name(__000000)

              This does not accept wildcards in the name; that's a feature of -syntax-.
              This does not accept abbreviated names; that's a feature of -describe-.
              */

              prog define getformat , rclass
              version 13
              syntax name using

              tempname log1
              tempfile tmplog

              capture log query `log1'
              if ~_rc {
              disp as err "log name `log1' in use"
              exit 604
              }
              if _rc~=111 {
              disp as err "unexpected error " _rc " in log query"
              exit _rc
              }

              capture log using `"`tmplog'"', text replace name(`log1')
              if _rc {
              disp as err "error " _rc " in opening tmplog"
              exit _rc
              }
              /* --some of that testing may be redundant. */

              local logfilename = r(filename)
              capture noisily describe `namelist' `using'
              qui log close `log1'

              if ~_rc {
              tempname fh
              file open `fh' using `"`tmplog'"' , read text
              file read `fh' line

              while r(eof)==0 {

              if strpos(`"`macval(line)'"', "`namelist'" ) > 0 {
              tokenize `"`macval(line)'"'
              return local varname = "`1'"
              if "`2'" == "" {
              /* look in next line */
              file read `fh' line
              if r(eof) {
              disp as err "failed to read continuation line"
              }
              else {
              tokenize `"`macval(line)'"'
              return local displayformat = "`2'"
              return local storagetype = "`1'"
              }
              }
              else {
              return local displayformat = "`3'"
              return local storagetype = "`2'"
              }
              }

              file read `fh' line
              }

              file close `fh'
              *--not needed: capture erase `"`tmplog'"'
              }
              end

              Comment

              Working...
              X