Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • wordcb, a new command for creating codebooks in Microsoft Word format, is available on SSC

    wordcb creates a Microsoft Word format codebook of the dataset in memory. Stata 15.1 is required.

    My research group and I work with a lot of different datasets from a multitude of sources; we need to document certain aspects of the data we have regularly. We got tired of that being time consuming, so I wrote this. The command is useful for data documentation and archival, or for initial data exploration.

    By default, the output Microsoft Word file includes data file metadata, and for each variable specified provides variable information (label, value label, type, notes, etc) and five random examples of values. Users can control how many values are shown, and can optionally specify to show a frequency distribution sorted ascending by value or descending by frequency (similar to the sort option of tabulate oneway).

    The number of values shown cannot be specified for each variable; instead users should invoke the command multiple times with the nodta option, which suppresses file metadata, and the append option.

    There is another limit... Stata 15's putdocx command, on which this relies, can run out of memory when either a large number of variables (i.e., hundreds) or a large number of values are specified.

    I was all set to present this at the Stata Conference, but an existential threat to my employer changed my ability to travel to Chicago.

    Thanks as ever to Kit Baum for getting this up to SSC so quickly!

  • #2
    Dear Troy, will you make available your presentation hand out?
    http://publicationslist.org/eric.melse

    Comment


    • #3
      ericmelse Attached!

      I'll also tell you a not-so-secret secret: The two big motivators behind this command were:
      1. I hate doing menial copy/paste/formatting crap myself; and
      2. It was time consuming for our staff to produce consistently-formatted codebooks.
      #2 is understandable... when you've got 200 variables in a dataset, it is legit hard to produce identical tables in Word if you're doing it by hand! But I could see minor inconsistencies in their manually-created output and it drove me nuts.

      So we started work on this about a year ago, building up little pieces of it here and there. A team member would say, "hey it would be nice if [INSERT FEATURE REQUEST]" and then I'd go about coding it in between other tasks. We'd go back and forth on implementation details, like how to specify options, until we found something that worked for our our group.

      At a certain point, we realized that what started out as a few lines of code that I really wrote just for us -- the original command name was ajicbook, since we're the Alaska Justice Information Center -- could be extended into a slightly more generalizable solution that others might find useful. So it was submitted to the Stata Conference, where I'd intended to give a talk about how relatively easy it was for us to automate away what was otherwise a very tedious and error-prone process using Stata 15's suite of commands for writing to MS Word.

      Another not-so-secret secret here is that I have rudimentary programming skills -- and I was still able to write a Stata ado file that will save my workgroup dozens of hours on just about every project we touch. It was absolutely worth the trouble to write, and I hope others find it useful.

      Finally, if anyone has enhancement requests or bug reports, please get in touch. I'll fix what I can when I can.
      Attached Files

      Comment


      • #4
        An update to wordcb is now available on SSC.

        Fixed:
        Now requires Stata 15.1 born 06jun2018. StataCorp added an option to putdocx] to control table border widths that I use to format the tables. In versions of Stata 15.1 prior to 06jun2018, that option did not exist, and wordcb would give a "too many arguments" error as a result.

        New:
        Option values(0) will omit the frequency distribution and other information containing values while still showing metadata about the variable (type, label, notes, etc). At the request of one of our team members, who wanted a way to include things such as unique identifiers in the output while not showing any values.

        Comment


        • #5
          troy Payne

          Great!

          Progress:
          0% 20% 40% 60% 80% 100%
          ................................................

          Microsoft Word file lian_auto.docx written.

          Is it possible to add the absolute path of the word file, e.g. Microsoft Word file C:\Stata15\lian_auto.docx written ?

          Then the users can locate the file easier without using cd command to recognize the current working directory.

          Comment


          • #6
            Is it possible to add the absolute path of the word file, e.g. Microsoft Word file C:\Stata15\lian_auto.docx written ?
            Probably possible, but unless you've got a suggested implementation, I'm very unlikely to do it.

            The using bit of Stata's default syntax parsing (in the syntax command) allows users to specify either a relative path, an absolute path, or no path (in which case the current directory is used). So when you type
            Code:
            somecommand using filename
            Stata assumes ./ if no path is specified, but the user could just as well have typed
            Code:
            somecommand using ../filename
            to do somecommand on a file that's up one level in the directory hierarchy (as just one example).

            One could parse the `using' macro returned by the syntax command for likely directory delimiters, but it would take a fair bit of thought to ensure that worked properly across Windows, Unix, and macOS.

            If you want the present working directory displayed, a safe workaround for this could be to edit my wordcb ado file to add the command pwd at the end. NB: you wouldn't want to use cd, since it will change the directory to the user's home directory in macOS and Unix; cd without arguments only displays the current directory in Windows. Stata does it that way to be consistent with each OS's command-line behavior.

            A less safe workaround that does exactly what you ask would be to edit line 338 in wordcb.ado to read:
            Code:
            di _newline(2) as text "Microsoft Word file " as result "`c(pwd)'\""`using'" as text " written."
            Why is that less safe? If the user specifies a different path when they invoke wordcb, e.g.,
            Code:
            wordcb using "C:\foo\bar"
            and the current directory is not c:\foo\bar, then the notice about where the file was written will be wrong.

            Comment


            • #7
              Thanks for the explanation.

              Comment


              • #8
                Hi everyone,
                I am trying to use wordcb to prepare a codebook for various variables in my dataset. However, instead of giving me the table, it only gives a summary (see the attached picture) for whatever variables I use. I have tried the command with one variable only, five variables, and with all the variables (see generic syntax below). All of them give similar results--no table, just the summaries.

                Generic syntaxes I have used:
                1. wordcb var1 using abc
                2. wordcb var1 var2 var3 var4 var5 using efg
                3. wordcb using hij

                To demonstrate the problem, I have used the auto.dta:

                syuse auto, clear
                wordcb using auto

                Result is attached.

                I need to get a table like the one explained in the description of wordcb. Can anyone help?

                Wamrest,
                Yaqoob
                [email protected]
                Click image for larger version

Name:	Capture.PNG
Views:	2
Size:	29.7 KB
ID:	1578746


                Attached Files

                Comment


                • #9
                  Have you tried asdoc?
                  Code:
                   sysuse auto
                  
                   asdoc des, position type isnumeric format vallab replace
                  Last edited by Attaullah Shah; 24 Oct 2020, 05:02.
                  Regards
                  --------------------------------------------------
                  Attaullah Shah, PhD.
                  Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                  FinTechProfessor.com
                  https://asdocx.com
                  Check out my asdoc program, which sends outputs to MS Word.
                  For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

                  Comment


                  • #10
                    My apologies, people. It appears there was some trouble with my MS Word. Got it sorted. So the command works perfectly fine.

                    Comment


                    • #11
                      Attaullah Shah : Your suggestion is nice. Could you possibly make addition to this command so that we can add the actual value labels and their frequencies and percentages? That would be really helpful, as a lot of PIs ask for a codebook of the entire dataset in word format. wordcd does this but there is a separate table for each variable. A command that offers one to put all the variables in one table, and gives option as to which characteristics/statistics one wants to put in the columns would help a lot (something like the attached sample)
                      Click image for larger version

Name:	sample.PNG
Views:	1
Size:	24.6 KB
ID:	1578761
                      .

                      Comment


                      • #12
                        Attaullah Shah's excellent
                        Code:
                        asdoc
                        is a command that sends Stata output to Word/RTF. It's very impressive! It's also dependent on the output from a Stata command. In the example he posted, he's rerouting the output from
                        Code:
                        describe
                        to Word. I'm sure he can provide more details, but my understanding is that you'd need to find a command that outputs what you want to use
                        Code:
                        asdoc
                        in the way you desire.

                        I couldn't find anything that made exactly what I wanted, which is why I wrote
                        Code:
                        wordcb
                        .

                        Comment

                        Working...
                        X