wordcb creates a Microsoft Word format codebook of the dataset in memory. Stata 15.1 is required.
My research group and I work with a lot of different datasets from a multitude of sources; we need to document certain aspects of the data we have regularly. We got tired of that being time consuming, so I wrote this. The command is useful for data documentation and archival, or for initial data exploration.
By default, the output Microsoft Word file includes data file metadata, and for each variable specified provides variable information (label, value label, type, notes, etc) and five random examples of values. Users can control how many values are shown, and can optionally specify to show a frequency distribution sorted ascending by value or descending by frequency (similar to the sort option of tabulate oneway).
The number of values shown cannot be specified for each variable; instead users should invoke the command multiple times with the nodta option, which suppresses file metadata, and the append option.
There is another limit... Stata 15's putdocx command, on which this relies, can run out of memory when either a large number of variables (i.e., hundreds) or a large number of values are specified.
I was all set to present this at the Stata Conference, but an existential threat to my employer changed my ability to travel to Chicago.
Thanks as ever to Kit Baum for getting this up to SSC so quickly!
My research group and I work with a lot of different datasets from a multitude of sources; we need to document certain aspects of the data we have regularly. We got tired of that being time consuming, so I wrote this. The command is useful for data documentation and archival, or for initial data exploration.
By default, the output Microsoft Word file includes data file metadata, and for each variable specified provides variable information (label, value label, type, notes, etc) and five random examples of values. Users can control how many values are shown, and can optionally specify to show a frequency distribution sorted ascending by value or descending by frequency (similar to the sort option of tabulate oneway).
The number of values shown cannot be specified for each variable; instead users should invoke the command multiple times with the nodta option, which suppresses file metadata, and the append option.
There is another limit... Stata 15's putdocx command, on which this relies, can run out of memory when either a large number of variables (i.e., hundreds) or a large number of values are specified.
I was all set to present this at the Stata Conference, but an existential threat to my employer changed my ability to travel to Chicago.
Thanks as ever to Kit Baum for getting this up to SSC so quickly!
Comment