Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using macros to comment out blocks of code

    Hi all,

    I'd like to be able to create a local or global "switch" that I can use to comment out large blocks of code in long dofiles. I understand that can simply put in /* and */ manually. But, often, I have several sections of a dofile that I think of as parts (e.g. variable recoding, first set of regressions, second set of regressions, table generation, etc...).

    I would love to be able to have a list of locals at the top of the dofile that lets me toggle any section on or off without having to look for all of the "/*"'s in the file. For shorter bits of code, I often create a global as:

    global save_graphs_off "*"

    which I can put in front of any set of lines I don't want to run. For exmaple, if I don't want to re-save all of the figures in a dofile, I leave the asterisk in the global:

    $save_graphs_off graph save "graph1.gph", replace
    $save_graphs_off graph save "graph2.gph", replace
    $save_graphs_off graph save "graph3.gph", replace
    ...

    And to re-save all the graphs, I just remove the asterisk. But it's too much trouble to do this for every line of a BLOCK of code. And it's also time consuming to keep putting in and taking out /* ... */ when I want to comment out a bunch of noncontiguous sections throughout the dofile. For example, if I could get this to work as I'd like, it would look something like the following:

    * BEGIN DOFILE
    #delimit ;
    use "data.dta" ;

    * NOW I WANT TO WRITE CODE THAT WOULD ALLOW ME TO COMMENT OUT ANY/EACH BLOCK OF COMMANDS ;

    * TO COMMENT OUT ANY BLOCK, PUT "/*" IN LOCAL ;
    * TO RUN ANY BLOCK, PUT " " IN LOCAL ;
    local begin_comment_blk1 " " ;
    local end_comment_blk1 " " ;

    local begin_comment_blk2 "/*" ;
    local end_comment_blk2 "*/" ;

    local begin_comment_blk3 " " ;
    local end_comment_blk3 " " ;


    * 1ST BLOCK OF COMMANDS;
    `begin_comment_blk1'
    reg y x;
    sum x;
    ... (many lines of code here)
    sum y;
    `end_comment_blk1'


    * 2ND BLOCK OF COMMANDS;
    `begin_comment_blk2'
    reg y2 x2;
    sum x2;
    ... (many lines of code here)
    sum y2;
    `end_comment_blk2'


    * 3RD BLOCK OF COMMANDS ;
    `begin_comment_blk3'
    reg y3 x3;
    sum x3;
    ... (many lines of code here)
    sum y3;
    `begin_comment_blk3'

    * END PROGRAM ;

    The problem is that this doesn't work. Wondering if anyone has suggestions.

    Thanks,
    - Dan

  • #2
    I often do the following:

    Code:
    * these locals are near the top of my do file
    loc prepdata 0
    loc table1 0
    loc table2 0
    loc figure1 0
    
    if `prepdata' {
    * code that does data management
    }
    if `table1' {
    * code that creates table 1
    }
    * etc...
    Then switch the 0's to 1's for whichever part of the code I'd like to run.

    HTH.

    Comment


    • #3
      Dan,

      This might not be quite as aesthetically pleasing, but it should perform the desired function:

      Code:
      use "data.dta"
      
      * TO COMMENT OUT ANY BLOCK, PUT 0 IN LOCAL
      * TO RUN ANY BLOCK, PUT 1 IN LOCAL
      local do_blk1 0
      local do_blk2 1
      local do_blk3 0
      
      * 1ST BLOCK OF COMMANDS
      if `do_blk1' {
      reg y x
      sum x
      sum y
      }
      
      * 2ND BLOCK OF COMMANDS
      if `do_blk2' {
      reg y2 x2
      sum x2
      sum y2
      }
      
      * 3RD BLOCK OF COMMANDS
      if `do_blk3' {
      reg y3 x3
      sum x3
      sum y3
      }
      
      * END PROGRAM
      Regards,
      Joe

      Comment


      • #4
        I would often store those different blocks in different .do files, and have one master .do file that does each sub do-file. Then the commenting is usually simple enough.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          I would discourage a workflow that relies on a single do-file that increases in length and complexity as a project moves forward and where dynamic commenting is used to select blocks of code to execute at each run. If such blocks of code are really independent, then you should split them into separate do-files and use a master do-file to control their execution. The master do-file could look something like:

          Code:
          do prepdata
          do table1
          do table2
          do figure1
          Each do-file can also include nested do-files. So for example, prepdata.do may contain

          Code:
          do input_rawdata
          do clean_data
          do descriptive_stats
          You can control the execution of each do-file by simply commenting the do statement that executes it. This is something that I also discourage as the results you get when table1.do is executed depends on what has happened upstream. So if you changed something in clean_data.do, you have to remember to uncomment every do-file that can be impacted by the change. As projects become complicated and as time goes by, this can easily lead to errors. One strategy is to completely remove all comments and completely re-run all code and check that the results do not change. Again, a project can generate lots of results and figures and errors (i.e. differences) may still slip through.

          My solution to this Stata workflow problem was to write project (available from SSC), a program that allows a user to specify and track each do-file's dependencies. For example, if final_data.do creates the final dataset used for all the tables, then you would include, after the dataset is saved a dependency directive that tell project that final_data.do creates the dataset, e.g.

          Code:
          save "final_data.dta", replace
          project, creates("final_data.dta")
          Then if table1.do uses this dataset to generate a table, you would add a directive to that effect, e.g.

          Code:
          project, uses("final_data.dta")
          use "final_data.dta"
          
          * code for regressions follow..
          When a project is built (essentially this means running the master do-file), project will run any do-file that has changed (i.e. the code has changed) or that uses inputs that have changed. If a do-file has not changed and its inputs have not changed, then the do-file is skipped. This completely removes the need to manually comment blocks of code to avoid having to re-run code that has been finalized.

          I have spent a lot of time as a Stata code "fixer" for people who discover, after their paper is accepted and they work on cleaning-up their Stata code, that they can't replicate exactly the results presented in the paper. Many times, their code generates slightly different results at each run. Sometimes they are "single filers" (as the OP) and the version that created the final results is peppered with blocks of commented code. Others just have a bunch of do-files, each executed on an ad hoc basis. The common thread is that they all used some form of selective execution of code to generate a final set of tables.

          With project, you can, with a single-line command create an archive for the journal of all the data and code needed to replicate your results. You can run replication builds that will check for changes in every single thing (dataset, log files, et.) created by the code. There is of course an upfront cost in getting familiar with project but I think the investment is well worth it.

          Comment


          • #6
            All great suggestions. In general, I agree with using sub-dofiles and use this in practice. The toggle I was looking for is just meant as a stand-in to use on the fly. As it turns out, I can think of other applications for the simple solutions provided above as well. Many thanks to all.

            Comment

            Working...
            X