Dear Statalisters
My goal is to describe every Stata datasets contained in a specific folder and to collect al the output displayed in one Stata dataset. My strategy was to use "describe" with the "replace" option and gather the different information obtained from this command in a dataset. There is one drawback with this apprach. You have to read each dataset and put them in the RAM and it will take time if the datasets are large. Of course you can add an in 1/1 qualifier to just read the first observation of the dataset (it is possible to recover the number of observations of each dataset by using describe using). It will greatly reduce the running time, but it will still take time if the datasets are large.
On the other hand the "describe" command with "using" avoids to read the dataset and is very fast since the execution time does not depend on the size of the dataset. But it only gives the possibility to retrieve a limited of info into local macros from the outpout displayed by "describe". What would have been nice is the possibility to combine using and replace, as describe using Unfortunately describe does not allow it. The reason for this is a mistery to me. Any idea how I could retrieve ALL the info of the datasets given by describe without reading the dataset?Any suggestions are welcome :-).
I have not been able to find a package that does just that or previous posts related to this topic.
Christophe
I am using Stata 13.1 on Windows.
I give here the code I have written so far.
My goal is to describe every Stata datasets contained in a specific folder and to collect al the output displayed in one Stata dataset. My strategy was to use "describe" with the "replace" option and gather the different information obtained from this command in a dataset. There is one drawback with this apprach. You have to read each dataset and put them in the RAM and it will take time if the datasets are large. Of course you can add an in 1/1 qualifier to just read the first observation of the dataset (it is possible to recover the number of observations of each dataset by using describe using). It will greatly reduce the running time, but it will still take time if the datasets are large.
On the other hand the "describe" command with "using" avoids to read the dataset and is very fast since the execution time does not depend on the size of the dataset. But it only gives the possibility to retrieve a limited of info into local macros from the outpout displayed by "describe". What would have been nice is the possibility to combine using and replace, as describe using Unfortunately describe does not allow it. The reason for this is a mistery to me. Any idea how I could retrieve ALL the info of the datasets given by describe without reading the dataset?Any suggestions are welcome :-).
I have not been able to find a package that does just that or previous posts related to this topic.
Christophe
I am using Stata 13.1 on Windows.
I give here the code I have written so far.
Code:
cap prog drop describeFolder prog describeFolder syntax [anything(name=pathname)] [using/] if ("`pathname'"=="") local pathname `c(pwd)' di "{text}Path: `pathname'" local filenames : dir "`pathname'" files "*.dta" if ("`: word 1 of `filenames''" != "") { local i = 1 foreach x in `filenames' { if ("`using'"!="") { tempfile temp`i' clear di "{text}Describing: {res}`x'" cap use `pathname'\\`x' in 1/1 , clear // di _rc if (_rc == 198 ) { di "{text}The {res}dataset `pathname'\\`x' {text}is empty" use `pathname'\\`x' , clear } describe , replace qui describe using `pathname'\\`x' gen path = "`pathname'" gen file = "`x'" gen N = r(N) qui save `temp`i++'' , replace } else describe using `pathname'\\`x' } local k = `i'-2 forval i = 1/`k' { append using `temp`i'' } sort path file position save `using' , replace } tabu file end
Comment