Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about the command --tuples--

    Dear all,
    I get an error when I run the command --tuples--(SSC).

    sysuse auto,clear
    global m mpg rep78 weight length turn trunk foreign
    tuples $m ,conditionals(1&2) max(5) asis
    dis `ntuples'

    Traceback (most recent call last):
    File "/Applications/Stata 17/ado/plus/py/st_tuples_py.py", line 33, in <module
    > >
    exec("tuplelist = list(filter(lambda tupl: " + py_condits + ", tuplelist))")
    File "<string>", line 1
    tuplelist = list(filter(lambda tupl: ("mpg" in tupl and "rep"foreign" in tup
    > l8" in tupl), tuplelist))
    ^
    SyntaxError: invalid syntax
    failed to execute the specified Python script file
    r(7103);

    end of do-file

    r(7103);


    Any help will be appreciated.

    Best
    Raymond
    Best regards.

    Raymond Zhang
    Stata 17.0,MP

  • #2
    daniel klein Dear Daniel, Can you help me to solve this problem?
    Best regards.

    Raymond Zhang
    Stata 17.0,MP

    Comment


    • #3
      I think it is something wrong with the variable "rep78".I can not put this variable into the "conditionals()" option.If I modify the codes like below,it runs well.

      Code:
      . sysuse auto,clear
      (1978 automobile data)
      
      . global m  "mpg weight rep78  length turn trunk foreign" 
      
      . tuples $m ,conditionals(1&2) max(5) asis
      
      . dis `ntuples'
      26
      Best regards.

      Raymond Zhang
      Stata 17.0,MP

      Comment


      • #4
        I cannot replicate the problem (with Stata 16.1).

        Do you have the latest version of tuples?

        Code:
        . which tuples
        c:\ado\plus\t\tuples.ado
        *! 4.0.3 Joseph N. Luchman, daniel klein, & NJC 16 May 2021
        Also, do your have a recent Python

        Code:
        . python query
        ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
            Python Settings
               (output omitted)
        
            Python system information
              initialized          yes
              version              3.9.0
              architecture         64-bit
              (output omitted)

        Comment


        • #5
          Dear daniel @daniel klein,can you send me the adofile and help file of the newest tuples command.I can not install it from ssc.

          Many thanks in advance.

          Best
          Raymond
          Best regards.

          Raymond Zhang
          Stata 17.0,MP

          Comment


          • #6
            I have sent you an email.Thanks.
            Best regards.

            Raymond Zhang
            Stata 17.0,MP

            Comment


            • #7
              Raymond has privately asked a question regarding the speed of tuples with "long" lists. This is both a generic and quite common question, which is why I have decided to take the discussion public.

              Right now, tuples runs reasonably fast with lists up to about 18 elements. Those 18 elements amount to 262,143 tuples. Usually, we want to work with those tuples. We might want to run regression analyses. Assuming that running a regression takes 1 second, processing the 262,143 tuples would take about 3 days. The thing with combinatorics is that they tend to produce huge numbers pretty quickly. Increasing the elements in the list by just 2 will increase the time to process the resulting tuples to 12 days. With lists of 25 elements, we are looking at 1 year.

              The general point is that when we think tuples is too slow, we might want to take a step back and rethink our approach.

              Comment


              • #8
                Hi All,

                Have been able to get a similar error when actually using the `rep78` variable in `conditionals()`:

                Code:
                . global m  "mpg weight rep78 length turn trunk foreign"
                
                . tuples $m ,conditionals(1&2 | 3) max(5) asis display
                Traceback (most recent call last):
                  File "c:\ado\personal/st_tuples_py.py", line 33, in <module>
                    exec("tuplelist = list(filter(lambda tupl: " + py_condits + ", tuplelist))")
                  File "<string>", line 1
                    tuplelist = list(filter(lambda tupl: ("mpg" in tupl and "weight" in tupl) and ( or ) and ("rep"foreign" in tupl8"
                >  in tupl), tuplelist))
                                                                                                    ^
                SyntaxError: invalid syntax
                failed to execute the specified Python script file
                r(7103);
                daniel klein, you may have already figured out the source of the error but it looks to me like it is undoubtedly the way Python is substituting the number 7 as a conditional into `rep78`. It is plugging in "foreign" in tupl" (which is '7' in the Python conditionals code) into the variable name `rep78`.

                Not something we've tested for to this point. As a fix Raymond Zhang , use `nopython` as an option (see below). That will work until we get a chance to patch the Python version.

                Code:
                . tuples $m ,conditionals(1&2 | 3) max(5) asis display nopython
                tuple1: mpg weight rep78
                tuple2: mpg weight rep78 foreign
                tuple3: mpg weight rep78 trunk
                tuple4: mpg weight rep78 turn
                tuple5: mpg weight rep78 length
                tuple6: mpg weight rep78 trunk foreign
                tuple7: mpg weight rep78 turn foreign
                tuple8: mpg weight rep78 turn trunk
                tuple9: mpg weight rep78 length foreign
                tuple10: mpg weight rep78 length trunk
                tuple11: mpg weight rep78 length turn
                - joe

                Joseph Nicholas Luchman, Ph.D., PStatĀ® (American Statistical Association)
                ----
                Research Fellow
                Fors Marsh

                ----
                Version 18.0 MP

                Comment


                • #9
                  Joseph Luchman Thank you for your reply.I have updated the command to the newest version.Now I runs well with rep78.
                  Code:
                  . global m  "mpg rep78 weight length turn trunk foreign"
                  
                  . tuples $m,cond(1&2)
                  
                  . dis `ntuples'
                  32
                  daniel klein has updated the command to solve this problem.

                  best
                  Raymond
                  Best regards.

                  Raymond Zhang
                  Stata 17.0,MP

                  Comment


                  • #10
                    EDIT: crossed with #9. The below is a direct answer to #8


                    Again, I cannot reproduce the problem. I get another one, however:

                    Code:
                    . tuples $m ,conditionals(1&2 | 3) max(5) asis display
                    Traceback (most recent call last):
                      File "c:\ado\plus/py/st_tuples_py.py", line 48, in <module>
                        check_conditionals(sys.argv[3], tuple_args.__len__())
                      File "c:\ado\plus/py/st_tuples_py.py", line 17, in check_conditionals
                        if not conditional_statement_OK(cs, max_val) == True:
                      File "c:\ado\plus/py/st_tuples_py.py", line 24, in conditional_statement_OK
                        max_cs_vals = max(cs_vals)
                    ValueError: max() arg is an empty sequence
                    failed to execute the specified Python script file
                    r(7103);
                    The problem that Raymond and Joe show indeed indicates a problem with substitution numeral conditionals in Python. However, this problem should already be fixed in the latest update from SSC. My guess is that Joe uses an outdated version of the Python code (Joe: we should really indicate a version number in the st_tuples_py.py file; will look whether I can polish the code over the weekend).


                    As for the problem that I am getting, this probably has something to do with including spaces around the "or" (|) statement. Strictly speaking, this is not allowed according to documentation:

                    Spaces are used to separate conditional statements with conditionals(). A single statement must, then, contain no spaces.
                    There should arguably be a more informative error message; but given that this is illegal syntax, I am not sure whether a "nicer" error message justifies the necessary checks that we would need to implement.
                    Last edited by daniel klein; 10 Jun 2021, 08:08.

                    Comment


                    • #11
                      @daniel klein Yes,I want to use --tuples--to select control variables in regression analysis.We often have many control variables and should select some variables to
                      satisfy our demand, such as we want to choose controls from 30+ variables to make the variable which we are interested to be significant positive.in fact, i have tried 22 elements ,
                      and it runs about 70 seconds,and there are 4194303 tuples. If I use mata ,it takes 71.39 seconds. I don't know why Mata is slower than Stata.
                      Code:
                      . timer on 1
                      
                      . global m  x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22
                      
                      . tuples $m
                      
                      . dis `ntuples'
                      4194303
                      
                      . timer off 1
                      
                      . timer list
                         1:     70.05 /        1 =      70.0540
                      Code:
                      . timer on 2
                      
                      . mata
                      ------------------------------------------------- mata (type end to exit) ------
                      : stata("tuples $m")
                      : end
                      --------------------------------------------------------------------------------
                      
                      . timer off 2
                      
                      . timer list
                         1:     70.05 /        1 =      70.0540
                         2:     71.39 /        1 =      71.3940


                      Thank you very much for your kind help.

                      Best
                      Raymond
                      Best regards.

                      Raymond Zhang
                      Stata 17.0,MP

                      Comment


                      • #12
                        When you type

                        Code:
                        tuples $m
                        Stata substitutes $m and executes

                        Code:
                        tuples x1 x2 ... x22
                        Done.

                        When you type

                        Code:
                        mata : stata("tuples $m")
                        Stata substitutes $m. Stata then sees

                        Code:
                        mata :
                        and calls Mata. Mata starts, sees

                        Code:
                        stata("tuples x1 x2 ... x22")
                        and calls Stata (this what the function stata() does), passing the argument "tuples x1 x2 ... x22". Stata now executes

                        Code:
                        tuples x1 x2 ... x22
                        and returns control to Mata. Mata return control back to Stata. Done. The second approach obviously is about one second longer than calling tuples directly.

                        In general, calling a Stata command from Mata is always slower than calling a Stata command from Stata. Mata does not make Stata programs faster. Some Mata programs are faster than Stata programs.


                        Concerning the more general idea of running about 4 million regression models to find significant predictors: you do not want to do that!

                        Comment


                        • #13
                          daniel klein, you're absolutely right, have an old version in my /ado/personal/ directory.

                          Code:
                          . which tuples
                          c:\ado\personal\tuples.ado
                          *! 4.0.1 Joseph N. Luchman,    daniel    klein,    &    NJC    16    May    2020
                          My apologies for adding to the confusion.

                          Agreed re: this being a solved issue in v4.0.3 (when the code conditionals statement is submitted correctly that is!).

                          - joe
                          Joseph Nicholas Luchman, Ph.D., PStatĀ® (American Statistical Association)
                          ----
                          Research Fellow
                          Fors Marsh

                          ----
                          Version 18.0 MP

                          Comment

                          Working...
                          X