Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Developing user-written package for inputing and outputing to excel

    First off I want to thank stata list. I don't post here often, but I use posts from the stata list all the time. At my job we use XLS and CSV files a lot. There are many things we do that take several steps to do. Often times these are tedious. I am looking into developing a set of ado files that would automate these task for our office as well as helping outside stata users. When researching these methods I have found multiple people on stata list, commenting on the stata blog, and other websites wishing some of these functions would exist. For all I know some of these things already exist I just couldn't find them in my search. I'd love some input and guidance as this is my first time doing something like this.

    These are the commands that are on my to do list. Some I already have working and just need to be refined.
    • tabstat2xl, just like tab2xl except instead of exporting a tabulate to excel it exports a tabstat.
    • collapsetable2xl, often we will preserve a data set, collapse it it and export this as a table to excel before restoring it. This is especially useful if you want to have a table with more collumns than tabstat will allow.
    • ss, this would be like fs or ds. you could get a list of sheets in an xls or limit your search using wildcards
    • xlsmerge/csvmerge this would let you merge in an excel file or csv without loading it, saving it, opening the first dataset and merging in the temporary dataset
    One of the things I'm trying to take my time with and really think out is making sure the syntax is similar to commands that already excist to make it easy for users.

    Any input is appreciated. I am also interested in good resources to make help files.
    Owner of StataTutor.com

  • #2
    Some of this functionality (especially the last bullet) can be found in Daniel Klein's -xls2dta- package available on SSC.

    Comment


    • #3
      tabstat2xl sounds like sumup and collapsetbl2xl sounds like xcollapse (both from SSC). -tabout- has a lot of the features mentioned in these bullets (particularly the beta for tabout2 from Ian Watson's website). These stop short from what you're asking about in the sense that they dont natively export the results/tables to xlsx into excel worksheets and with excel formatting that you can get from commands like -putexcel-. In my workflows I have a wrapper program that uses -tabout- or -sumup- to produce a file, reimport it into Stata, do some post-processing of the output, and then export using export excel or putexcel to add formatting or divide it into worksheets. I'm guessing that some of the packages you're describing could follow a similar workflow and benefit from existing packages.
      Last edited by eric_a_booth; 21 Jul 2017, 16:03.
      Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

      Comment


      • #4
        Thanks guys. This is useful. There is a lot of functionality these don't do that I want, like returning a scalar of how many rows were printed to excel so you can increment the starting row in excel. This would allow a user to more easily have multiple tables on one sheet which is essential. I've been hand coding it for each sheet which is tedious. I think I should base my collapsetable2xl on xcollapse because it's so much better than collapse. The most useful thing to me is to see how these ADOs handle syntax.

        Eric, Yes this is the kind of work flow I am talking about., This isn't so much a new command as it is a "wrapper"

        When I am done with my commands how do I publish them? Do you have any tips on writing help files? Is there an editor or do people hand code?
        Owner of StataTutor.com

        Comment


        • #5
          Originally posted by Mark Davis View Post
          When I am done with my commands how do I publish them?
          The easiest way is to sent them to Kit Baum for upload on the SSC. It is simple because you only need the ado and help file(s) and Kit invests his time and does all the rest (putting together the pkg, description, uploading etc.). Though it seems outdated, here is how you do this.

          Code:
          Do you have any tips on writing help files? Is there an editor or do people hand code?[/QUOTE]
          There are user-written program for creating help files in an automated fashion,l but I did not find any of them very useful. I would start out with a copy of an existing file and tweak this.

          Best
          Daniel

          Comment


          • #6
            Thanks Daniel
            Owner of StataTutor.com

            Comment


            • #7
              Originally posted by Mark Davis View Post
              Thanks guys. This is useful. There is a lot of functionality these don't do that I want, like returning a scalar of how many rows were printed to excel so you can increment the starting row in excel. This would allow a user to more easily have multiple tables on one sheet which is essential. I've been hand coding it for each sheet which is tedious. I think I should base my collapsetable2xl on xcollapse because it's so much better than collapse. The most useful thing to me is to see how these ADOs handle syntax.

              Eric, Yes this is the kind of work flow I am talking about., This isn't so much a new command as it is a "wrapper"

              When I am done with my commands how do I publish them? Do you have any tips on writing help files? Is there an editor or do people hand code?

              +1 to Daniel Klein's advice. In addition, creating wrapper programs is something I do frequently but I don't put them on SSC (or not yet). The reasons for this include:

              1) if I'm using the original author's work and simply adding some additional features (like capturing more return results like the scalars you describe), I'd rather contact the author and ask them to include the new features. I've done this on many occasions, often sending them the code I wrote to add the functionality in case they want to use it in some way. I don't want to run afoul of using someone else's core code (even if I were to cite/reference them properly, it just feels wrong).

              2) if the original author isn't available or interested in the work, my next decision point focuses whether the package I'm creating the wrapper for is currently being maintained/updated. The reason for this is that if I create, for example, a wrapper for -tabout- that does some file post processing to make the table panels append in wide format instead of long format or add in some of the -putexcel- type formatting (both of which I'm currently doing in my own workflows) then my code would need to always be toggled to how -tabout- (or the forthcoming -tabout2-) changes when it is officially updated (continuing the example: the -tabout2- functionality of removing columns/panels will likely completely change whether/how my wide format wrapper code functions). Also, I wouldn't want any wrapper functions to stifle, interfere, or overlap with any functionality the author adds or plans to add to the original package that I'm benefiting from.
              Last edited by eric_a_booth; 24 Jul 2017, 11:16.
              Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

              Comment


              • #8
                Originally posted by eric_a_booth View Post
                2)The reason for this is that if I create, for example, a wrapper for -tabout- [...] then my code would need to always be toggled to how -tabout- [...] changes when it is officially updated
                Unfortunately, these worries might be justified. Once I completely changed the way one of my early commands worked and I have changed syntax, returned results and more pretty casually for quite some time. Nowadays I design programs much more carefully and sometimes put quite some effort into making sure that old code would not be broken by any updates. I strongly believe that the (justified) lack of confidence in the stability of user-written programs, especially in the area of data management, is one of the most important reasons that make user-written additions much less valuable and beneficial than they could be.

                Originally posted by eric_a_booth View Post
                [...] Also, I wouldn't want any wrapper functions to stifle, interfere, or overlap with any functionality the author adds or plans to add to the original package that I'm benefiting from.
                I would not worry about that too much, especially if the original author is not responsive. I do agree, however, that tiny details, such as adding returned results etc. should first be approached by suggesting the change to the original author and is probably not worth publishing a new command for that sole purpose. If one decides otherwise, indeed credit should be given to the original author.

                Best
                Daniel
                Last edited by daniel klein; 24 Jul 2017, 12:27.

                Comment


                • #9
                  Thanks for bring these issues up. They didn't occur to me. I was planning on using exclusively default commands. After reading everyone's suggestions I am considering using xcollapse. If I do I can ask for permission before making it available to others. If not I can just use the regular collapse. If I use someone else's command am I better off calling their command or using the code from their ado files so if they change the command my command won't break?
                  Owner of StataTutor.com

                  Comment


                  • #10
                    Originally posted by Mark Davis View Post
                    If I use someone else's command am I better off calling their command or using the code from their ado files so if they change the command my command won't break?
                    This is exactly what I was referring to. If you call another author's command, you do not need to ask permission at all. The author will probably be happy to hear about it, but there is nothing wrong with this. However, if the other program changes or is withdrawn completely, then your command breaks down.

                    On the other hand, if you just copy the complete code then, indeed no changes outside your command can break it. However, if functionality is added to the other program, or bugs are fixed, your command cannot profit from this either. Also, I would get explicit, written permission of the original author to copy their hole work for only adding a few bells and whistles. Borrowing code can lead to nasty discussions as we have seen on the list before. This seems not to be necessarily true for programs by StataCorp. They do not seem to be worried too much about users borrowing their code.

                    In general, I guess I would try to go with the first approach and deal with changes in the other program when they come up.

                    In you specific case, on the pro-side is that Roger Newson keeps older versions of his program available. On the con-side is that these versions are available from his website, only, meaning that you will have to think about how to tell users when and where they are to get xcollapse from.

                    The last point above is yet another topic that could be discussed in length. You might be tempted to just include the installation process of xcollapse in your code. Do not do this. Do never install third party programs from within your code without the user's explicit request. You can, however, provide an option for updating (or downgrading) required third party software in your code.

                    Best
                    Daniel

                    Comment

                    Working...
                    X