Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Programming with syntax: adding an [if] restriction which refers to as-yet undefined variables

    I would like to write a program which loads a dataset and then only keeps variables specified in an if condition, before proceeding with further analysis in the program.

    The issue is that it seems [if] in syntax checks whether the variables exist at the time the program is called. Therefore I cannot run the program unless the data happens to already be open. Minimal working example:

    Code:
    * define the program
    cap program drop example
    program define example
        syntax [if]
        sysuse auto.dta, clear
        keep `if'
        // more analysis...
    end
    
    * call the program
    example if foreign == 1
    This will only work if auto.dta happens to already be open when I call the program. Any clever quick work-arounds which allow me to keep the nice structure of a standard "if" expression in my program call?

  • #2
    There is not usually a good reason to wire a specific dataset into a program -- that's what usually calls for a do-file. But assuming a good reason for your design a reference to a variable that doesn't yet exist in the dataset would have to be read in otherwise, perhaps as part of

    Code:
    syntax anything 
    or as a string option that you process within the program.

    Comment


    • #3
      Nothing to add to Nick's comment about (not) hard-wiring a dataset into a program.

      Nick's first suggested solution would need to be tweaked a little bit to read

      Code:
      syntax anything(everything)
      to make sure that any if qualifiers are treated as part of anything.


      I would like to point out that

      Originally posted by Matthew Lala View Post
      [...] only keep[ing] variables specified in an if condition
      if taken literally, is not trivial because it requires manually parsing an expression to find (possibly abbreviated) variable names.

      Comment


      • #4
        Your example may be proxy for something more elaborate, but the more I think on this the design looks flawed any way. By all means write a program to do some analysis, but set up a dataset first.

        Comment


        • #5
          The code in post #1 could be written
          Code:
          * define the program
          cap program drop example
          program define example
              syntax [if]
              preserve
              keep `if'
              // more analysis...
          end
          
          sysuse auto, clear
          * call the program twice
          example if foreign == 1
          example if foreign == 0
          so that the program doesn't have to worry about having altered the dataset in memory. As we are told by the output of help preserve
          preserve and restore deal with the programming problem where the user's data must be changed to achieve the desired result but, when the program concludes, the programmer wishes to undo the damage done to the data.

          Comment


          • #6
            load the [if] in the use command?

            Code:
            * define the program
            cap program drop example
            program define example
                syntax [, keep(string)]
                di "`keep'"
                preserve
                use "S:\STATA\auto" `keep', clear
                summ price
                restore
            end
            
            * call the program
            example , keep(if foreign==1)
            example

            Comment


            • #7
              Thanks everyone for your helpful comments!

              Nick Cox, daniel klein, William Lisowski: I'm sorry, I should have explained my use-case better (just wasn't sure how much detail to go into). In trying to create a VERY minimal minimal working example, I abstracted from a key purpose of my program which is I think unrelated to my technical question but certainly is related to whether the question makes sense, and other workaround solutions (such as William's). I forgot that in Statalist often the best answer is "you're taking the wrong approach", so I didn't think to justify my approach.

              In my actual use-case, the dataset is not hard-coded into the program. A program option is specified to choose which dataset to load. I have many datasets which have a common structure but with a few different variables, and the program then uses the dataset name to load a particular dataset and then to know which specific variables to transform and graph after collapsing at a level which is also specified in the program options.

              I could append all of these datasets (but not merge) and work from that rather than call specific datasets in the program, but
              1) I think in 6 months time, as I forget details of what I've done, I'm more likely to make mistakes with the appended dataset as it is less natural to me, and
              2) I would like to avoid loading or preserving/restoring the large appended dataset as this would entail non-trivial running time

              George Ford: (a slightly amended version of) your solution works perfectly for me, thank you! Specifying a "keep" option gives me a very intuitive way of doing this, so I'm happy with that replacing the [if] syntax.

              I don't think I *need* it anymore, but if you have time Nick and daniel I'm still interested in the
              Code:
                 
               syntax anything(everything)
              approach you mentioned, which I'm not sure I understand. Is this substantially the same as George's approach (specifying the "if" condition in a string, then applying it later by calling the local into an expression), or is there something else here I'm missing?

              I'm sorry I can't share a data example -- I am working with proprietary data.

              Comment


              • #8
                Thanks for the context in #7. The questions here are as much about style as much as syntax. If your problem was my problem, I am still not clear that I would write a program at all. It sounds like a case for a do file with arguments to me, or perhaps a series of do files.

                A rule of thumb for a program (meaning, a program defining a command) is that you should feel some obligation to write a help file and that's natural because a command has a specific goal that can be explained and a specific syntax that makes sense when it is explained. If you're twisting away from standard syntax because you're needing to do something slightly non-standard -- well, that's clearly allowed. Specifically, you can't use an if qualifier here in a standard way because of your set-up, so in effect you're obliged to feed the same information otherwise if you're using syntax to define syntax. Your call on what to do, as always.

                Much depends on who else may be using this program, and what they know already and what they need to know.



                Comment


                • #9
                  Code:
                  program define myprog1
                      syntax [anything(everything)]
                      display `"`anything'"'
                  end
                  
                  program define myprog2
                      syntax [anything] [if]
                      display `"`anything'"'
                      display `"`if'"'
                  end
                  
                  clear
                  gen foo = 1
                  
                  myprog1 x y z if foo==1
                  myprog2 x y z if foo==1
                  
                  myprog1 x y z if bar==1
                  myprog2 x y z if bar==1
                  ​
                  Note that myprog1 will accept both variable "foo" (which does exist) and variable "bar" (which does not); whereas myprog2 will fail with variable "bar". See also help syntax.

                  Using the anything(everything) approach, you get the entire information "x y z if foo==1" stored in macro `anything', so you will need to parse this yourself before applying the if-statement to your yet-to-be-loaded dataset.

                  Best wishes,
                  David.
                  Last edited by David Fisher; 15 Oct 2021, 04:39.

                  Comment


                  • #10
                    Nick, thanks very much for that reflection on the appropriate cases for programs versus other approaches. In my case, I am the only person who will use the program, and I am mainly worried about future-me understanding it. It is a very simple program (in the context of the do-file where I define it) except for this syntactical issue. Having used it many times today already, I am happy with the approach I have taken, which allows me to call the program to produce graphs with different options very quickly, and interactively as well as in a do-file. But I understand the general principle you are referring to and I appreciate the reflection.

                    David, thanks very much. I understand that approach now. I haven't used the
                    Code:
                    anything(everything)
                    macro before though I have seen it in others' code.

                    Comment


                    • #11
                      If you're picking among files, you could do this:
                      Code:
                      * define the program
                      cap program drop example
                      program define example
                          syntax [, data(string) keep(string)]
                          di "`keep'"
                          di "`data'"
                          preserve
                          use `data' `keep', clear
                          summ price
                          restore
                      end
                      
                      * call the program
                      example , data(S:\STATA\auto) keep(if foreign==1)
                      if all the data is in the master directory, you could just list the filename without the directory info.

                      Code:
                      cd S:\STATA\
                      cap program drop example
                      program define example
                          syntax [, data(string) keep(string)]
                          di "`keep'"
                          di "`data'"
                          preserve
                          use `data' `keep', clear
                          summ price
                          restore
                      end
                      
                      * call the program
                      example , data(auto) keep(if foreign==1)

                      Comment


                      • #12
                        George Ford thank you! Yes very similar to what I ultimately implemented.

                        Comment

                        Working...
                        X