Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How project from SSC is different from Stata built in project

    I would like comments from Robert Picard (author of project) and others who have used project from SSC in relation to Stata built in project. Is it useful only in those versions of Stata which do not have built in project?
    Regards
    --------------------------------------------------
    Attaullah Shah, PhD.
    Associate Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
    www.FinTechProfessor.com
    If you use MS Word, do check my asdoc program that easily sends Stata output to MS Word

  • #2
    There are some misunderstandings here - probably partly due to problems of terminology.

    Official Stata has two types of commands (not projects): built-in commands and ado-files. This is the case for all versions of Stata that I know of. The details of built-in commands are hidden to us users; commands defined by ado-files are legible (although some of us don't understand everything we see).

    Try these commands:
    Code:
    which summarize
    which logstic
    Here you see that summarize is a built-in command; logistic is in the ado-file logistic.ado.

    SSC is a library of unofficial commands, generated by users; most of these commands are in the form of ado-files.

    Perhaps you could rephrase your question.

    Svend

    Comment


    • #3
      I think that the OP is referring to the Stata "project manager"; see "help project manager"; I do not know the answer to the original question however

      Comment


      • #4
        I think that Attaullah wants to know the difference between project, a user-written command of mine available on SSC, and Stata's Project Manager (see help Project Manager).

        I can't say much about Stata's Project Manager because I have never used it. As I understand it, it's a user-interface tool that lets users organize files using Stata. It creates a window per project and users collect, group, and move around files within the project's window. Clicking on files will trigger the relevant action, depending on the type of file. I think of Stata's Project Manager as a collection of file aliases.

        project is a program I wrote to manage my workflow in Stata. It can be installed from SSC using

        Code:
        ssc install project
        I usually work on large projects that typically extend over several months, followed by long period of inactivity while waiting for referee reports, additional revisions and so on. It's quite common that several years go by from the laying down of the first command in the first do-file to the final replication run. The evolution of a research project is not linear and the Stata code that supports it is bound to change in structure as the project evolves. Some good insight and solutions can be found in Scott Long's "The Workflow of Data Analysis Using Stata".

        With project, I propose a slightly different approach that I think guards better against people's general lack of organization skills. From years of experience observing how (some) academics work, I identify 4 types:
        1. the interactives - those that do not use do-files
        2. the single filers - those that put the whole project in a single do-file
        3. the many filers - those that split the work in many do-files
        4. the organized - those who split the work in a master do-file that calls nested do-files.
        Anyone who presents results generated in Stata should be prepared to show their work, that is provide data and code that replicates the results. Even with a bunch of log files, type 1 users do not meet that minimal standard.

        Type 2 users have the right idea in that all it takes is to run the do-file and show that it replicates the results. The problem with this workflow is that as the project grows, it becomes cumbersome and time consuming to run the whole thing any time a new addition or change is made. So the typical type 2 user will start commenting large swats of code after they have created datasets that are used later in the do-file. Eventually, a "final_data.dta" is created and 95% of the do-file is commented. The results are those generated from "final_data.dta". Given the non-linear evolution of a research project, the commented code contains several forks and it's not clear which one was used to generate "final_data.dta". When you start unrolling the comments and reconstruct the path from original data to final data, you often find that you cannot get there. I've spend many months of my life "fixing" the work of type 2 users.

        Type 3 users are better organized than type 2 users but are vulnerable to the same problem. They rarely look back and focus on "final_data.dta".

        Type 4 users are those who are best positioned to present code that replicates results because the master do-file can run everything. In real life however, they are vulnerable to the same problems as types 2 and 3 since they will also comment code/do-files that are done. If they are very good at organization and have good memory, they will never forget or touch what is done. That's not typically how research evolves and very often a small change is made upstream and the user just forgets or does not notice which files downstream are affected. Type 4 users still end up with a "final_data.dta" and they may not take the time to re-run everything if a small change is made upstream.

        Because I was spending a lot of time fixing other people's work so that their results could be replicated, I wrote project. This extends the type 4 user model by using dependency tracking to correctly determine when a do-file should run. No more commenting of anything. If the inputs of a do-file have not changed and the code in the do-file is the same and the products of the do-file are still there and have not changed, then the do-file does not need to be run again.

        With project, you organize your Stata code using a master do-file which then calls nested do-files. Because you are using project to run the master do-file and all nested do-files, project knows the name and location of all do-files, log files, input and output files in the project. Unlike Stata's Project Manager, the database of files maintained by project includes all files that are logically associated with the project and is continually updated as your code evolves. You can at any time list project files in various ways, including a very useful concordance table that lists all do-files associated with each project file.

        The most important feature of project is the replicate task. This moves all files created by the project to a replicate directory and then completely re-runs the project. The newly created files are then compared with the previous copy to check for differences. All files created by the project are compared, including datasets, log files, text files, etc.

        project is the only tool currently available for Stata that can check that all results are replicated.

        Comment


        • #5
          Thanks Robert for the detailed reply, your reply was helpful. I was trying to learn about project and wondered whether time invested in learning it will be incrementally beneficial
          Regards
          --------------------------------------------------
          Attaullah Shah, PhD.
          Associate Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
          www.FinTechProfessor.com
          If you use MS Word, do check my asdoc program that easily sends Stata output to MS Word

          Comment


          • #6
            Picard! can you make a video example of how the project works, and illustrate us its core functionalities. Right now, the documentation is detailed, but difficulty for me to grasp the whole functionality the way you intended. And if video is not is possible, then can you give us a detailed example that uses a dummy data and walks us step by step through the basic functions of project.
            Last edited by Attaullah Shah; 12 Jul 2015, 11:01.
            Regards
            --------------------------------------------------
            Attaullah Shah, PhD.
            Associate Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
            www.FinTechProfessor.com
            If you use MS Word, do check my asdoc program that easily sends Stata output to MS Word

            Comment


            • #7
              I like writing code, I'm not an actor. I don't try to monetize my programs so I'm not going to hire an actor either. So a video is just not going to happen.

              I understand that it's not obvious to grasp project just by reading the help file. That's why there's a demonstration project that you can get once project is installed by typing in the command window

              Code:
              net get project
              The above will download two versions of a demo project to the current directory. So unzip "project_examples_v12.zip" and locate the "examples_v12" directory, that's where you'll find all the files for the demo project.

              From Stata, type in the command window

              Code:
              project, setup
              The above will launch a file open dialog. Use the Browse... button on the right side of the dialog to navigate to the "examples_v12" directory and select the file called "ex12.do" at the base of the "examples_v12" directory and click the Open button. This brings you back to the dialog which will now show the full path to "ex12.do". You can choose to have plain text log files by selecting the checkbox at the bottom left, otherwise all log files will be in SMCL format. Click OK to dismiss the dialog. This just set up the master do-file for the project. The name of the project is the name of the master do-file (minus the extension).

              The demo project is a simulation of what StataCorp could use to prepare and test the various examples presented in the help files. Only a small subset of commands from the Data Management and Base reference manuals are covered.

              To build the project the first time, type in Stata's command window

              Code:
              project ex12, build
              If you are running Stata 14, you get an error when "icd9.do" is run (it's in the "data-management" directory) because, even though each do-file is run under version control, the input dataset has changed on Stata's web site. This would be a clear indication that the "icd9.do" examples need to be reviewed. I guess it's time that I update the demo project. Anyway, a quick fix is to comment like this

              Code:
              *    project, do("icd9.do")
              in the "d_examples.do" do-file that calls the problematic "icd9.do". After that, build the project again using

              Code:
              project ex12, build
              and the project will finish without errors. If you repeat the above command, you'll get something like

              Code:
              Build start: 12 Jul 2015, 14:01:21
              ==============================================================================================
              project ex12 > Skipping /ex12.do; no change in the 171 files linked to it.
              ==============================================================================================
              Build successfully completed: 12 Jul 2015, 14:01:21
              because nothing has changed.

              After that, just play with the various options. For example, you can get a concordance table of all files in the project and which do-files use them using

              Code:
              project ex12, list(concordance)
              If you want to check if the project replicates, do

              Code:
              project ex12, replicate
              It usually takes a second replication run to have all log files replicate exactly.

              From there, just edit any file, and build the project again. Just the do-files that are affected will run. Look through all the do-files to note how dependencies are handled. All of this should get you well on the way to figuring out how to create your own projects.

              Comment


              • #8
                Thanks again for the detail reply. Yes, we all have the problem of acting/social interactions, reminding me the Big Bang Theory TV series. The first code returns the following error.
                Code:
                 net get project
                file http://fmwww.bc.edu/repec/bocode/s/project.pkg not found
                could not load project.pkg from http://fmwww.bc.edu/repec/bocode/s/
                r(601);
                Regards
                --------------------------------------------------
                Attaullah Shah, PhD.
                Associate Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                www.FinTechProfessor.com
                If you use MS Word, do check my asdoc program that easily sends Stata output to MS Word

                Comment


                • #9
                  This is one of the quirks of installing packages from SSC. You must get the ancillary files just after installing the package or the net directory may be reset to something else. Either reinstall project and get the files using

                  Code:
                  ssc install project
                  net get project
                  or specify the correct directory using

                  Code:
                  net get project, from(http://fmwww.bc.edu/repec/bocode/p)


                  Comment


                  • #10
                    The two commands together worked.
                    Regards
                    --------------------------------------------------
                    Attaullah Shah, PhD.
                    Associate Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                    www.FinTechProfessor.com
                    If you use MS Word, do check my asdoc program that easily sends Stata output to MS Word

                    Comment

                    Working...
                    X