Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Describing a whole set of .dta files in a folder with "describe" and collecting the displayed output in one dataset.

    Dear Statalisters
    My goal is to describe every Stata datasets contained in a specific folder and to collect al the output displayed in one Stata dataset. My strategy was to use "describe" with the "replace" option and gather the different information obtained from this command in a dataset. There is one drawback with this apprach. You have to read each dataset and put them in the RAM and it will take time if the datasets are large. Of course you can add an in 1/1 qualifier to just read the first observation of the dataset (it is possible to recover the number of observations of each dataset by using describe using). It will greatly reduce the running time, but it will still take time if the datasets are large.

    On the other hand the "describe" command with "using" avoids to read the dataset and is very fast since the execution time does not depend on the size of the dataset. But it only gives the possibility to retrieve a limited of info into local macros from the outpout displayed by "describe". What would have been nice is the possibility to combine using and replace, as describe using Unfortunately describe does not allow it. The reason for this is a mistery to me. Any idea how I could retrieve ALL the info of the datasets given by describe without reading the dataset?Any suggestions are welcome :-).

    I have not been able to find a package that does just that or previous posts related to this topic.

    Christophe


    I am using Stata 13.1 on Windows.

    I give here the code I have written so far.
    Code:
    cap prog drop describeFolder
    prog describeFolder
    
            syntax [anything(name=pathname)] [using/] 
            
            if ("`pathname'"=="") local pathname `c(pwd)'
            
            di "{text}Path: `pathname'"
            local filenames : dir "`pathname'" files "*.dta"        
            
            if ("`: word 1 of `filenames''" != "") {
                    local i = 1 
                    foreach x in `filenames' {
                            if ("`using'"!="") {
                                    tempfile temp`i'
                                    clear
                                    di "{text}Describing: {res}`x'"
                                    cap use `pathname'\\`x'  in 1/1 , clear
                                    // di _rc
                                    if (_rc == 198 ) {
                                            di "{text}The {res}dataset `pathname'\\`x' {text}is empty" 
                                            use `pathname'\\`x' , clear
                                    }
                                    describe , replace
                                    qui describe using `pathname'\\`x' 
                                    gen path = "`pathname'"
                                    gen file = "`x'"
                                    gen N = r(N)
                                    qui save `temp`i++'' , replace
                            }
                            else describe using `pathname'\\`x'
                    }
                    local k = `i'-2
                    
                    
                    forval i = 1/`k' {
                            append using `temp`i''
                    }
                    sort path file position
                    save `using' , replace
            }
    
            tabu file
    
            
    
    end

  • #2
    Christophe,

    There is no particular technical reason why describe cannot be called with both using and replace, but for whatever reason StataCorp has chosen not to allow this (or perhaps never considered the idea). However, it is a fairly simple matter to modify their code to do so. Make your own copies of describe.ado and mk_describe.ado (found in the the ado folder for your Stata installation) and modify mk_describe.ado as follows:

    Code:
    program mydesc_mk, rclass
        version 11
        syntax [varlist] [using], [CLEAR REPLACE]
    
         if ("`using'"!="") {
          use `using', clear
        }
    
       ...
    Then modify describe.ado to call your personal version of mk_describe.ado.

    Of course, you could tighten this up a bit by checking to see if there is a data set in memory and requiring a clear if so.

    Regards,
    Joe

    Comment


    • #3
      Joe
      Thanks for your suggestion. Unfortunately it does not help avoiding to read the dataset. Describe with the using qualifier does not use the dataset. I guess, the program reads only information describing the dataset and not the data, since the execution time of describe is not affected by the size of the dataset.
      Best regards
      Christophe

      Comment


      • #4
        Christophe,

        based on how the dialog behaves, this is intentional, not a bug. I don't think there is a convincing reason for such a limitation, besides not having to brunch describe_mk.ado for every file format. The documentation is correct, and shows the replace option is only available for data in memory.

        Perhaps, you can read the Stata file yourself, directly following the dta specifications to retrieve the desired information about the file. This is what des10 is doing. On the other hand, reading the full file can produce much more valuable information about the data, so a combination of use-describe-summarize-tabulate can give a more informative report

        Best, Sergiy
        Last edited by Sergiy Radyakin; 21 Apr 2015, 07:22.

        Comment


        • #5
          Christophe,

          Sorry, I neglected to consider that part of the question. Unfortunately, the _describe command, which is called by describe using, is built-in and is not amenable to modification. Accordingly, I concur with Sergiy's assessment, that the only other solution is to read the .dta header yourself to extract the variable names.

          Regards,
          Joe

          Comment

          Working...
          X