Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Embedding Python Code

    From what I can tell, python code cannot be run inside a loop (forvalue, foreach..). Instead I tried to embed the code within a program function definition. Following the most basic example from the help menu, I still cannot get the code to run. I can get the python code running fine without embedding the code (for a single iteration of many). It can't seem to find the calcsum def. Any one else have this problem or see the issue?

    Code:
    cap program drop varsum
    program define varsum
        version 16.0
        syntax varname [if] [in]
        marksample touse
        di "`varlist'"
        ta `touse', m
        python: calcsum("`varlist'", "`touse'")
        display as txt " sum of varlist: " as res r(sum)
    end
    
    version 16.0
    python:
    from sfi import Data, Scalar
    def calcsum(varname, touse):
        x = Data.get(varname, None, touse)
        Scalar.setValue("r(sum)", sum(x))
    end
    
    
    sysuse auto, clear
    varsum price
    Error:
    Code:
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    NameError: name 'calcsum' is not defined
    Thanks!

  • #2
    Eric

    You can save the following code in an ado-file named varsum.ado and save it in your current directory.

    Code:
    program define varsum
        version 16.0
        syntax varname [if] [in]
        marksample touse
        di "`varlist'"
        ta `touse', m
        python: calcsum("`varlist'", "`touse'")
        display as txt " sum of varlist: " as res r(sum)
    end
    
    version 16.0
    python:
    from sfi import Data, Scalar
    def calcsum(varname, touse):
        x = Data.get(varname, None, touse)
        Scalar.setValue("r(sum)", sum(x))
    end
    Then type

    Code:
    clear all
    sysuse auto, clear
    varsum price

    Comment


    • #3
      Thanks Zhao Xu (StataCorp)

      You cannot embed the python similarly in a do-file, it has to be and ado-file?

      Comment


      • #4
        Eric
        You can embed code in do-file too. In your case, you need to define the calsum function first.

        Code:
        python:
        from sfi import Data, Scalar
        def calcsum(varname):
            x = Data.get(varname, None)
            Scalar.setValue("r(sum)", sum(x))
        end
        
        sysuse auto, clear
        python: calcsum("price")
        display as txt " sum of varlist: " as res r(sum)

        Comment


        • #5
          I had to wonder: how do you manage to isolate function definitions from within Ado files in Python?

          Found it:

          Code:
          python
          globals()
          __stata_varsum_ado__["calcsum"]
          end

          Where do global variables go? Let's see.

          Code:
          prog setpyglobal
          args name val
          python: test("`name'",`val')
          end
          
          python
          def test(name, val):
          globals()[name] = val
          print(globals())
          end
          After calling this command a few times, it's apparent that globals are kept between calls, and they end up in __stata_setpyglobal_ado__ outside of the Ado program. Rather interesting, as a command can update its inner Python state.

          It's not entirely clear how Stata is doing this, but I'd say the Ado-visible namespace is simply a module, and the __stata_setpyglobal_ado__ variable is a reference to this module's namespace. For instance, if you call __stata_setpyglobal_ado__["test"] from the command prompt, the global variable it modifies will appear as an entry of the __stata_setpyglobal_ado__ dictionary.
          As Stata programmer's manual explains (p. 434), it's still possible to access the main namespace from the Ado program, with 'import __main__'. The converse does not seem possible however.

          So, if I understand correctly, when there is a call to 'python' inside a Stata program, it will only see globals that lie in the specific Python namespace associated with the program. If the Python function is defined outside of an Ado file, it ends up in the __main__ module (the one accessed when typing 'python' from Stata prompt), and the Stata program can't see it unless you tell it to import __main__. Hence the error you got.

          I don't know if Stata has an undocumented way to make the Ado module visible to other commands, but there is a trick. If two commands import the same module, they share its contents. And it's possible to change the contents of this module, either by functions it provides, or directly with setattr/getattr functions. This module may even be the __main__ module.

          For instance, create two ado files, update1 and update2, with the same program (just not the same name). This program prints the value of a variable, and set a new value for the same variable, in the __main__ module.
          Code:
          prog update1
              args name val
              python: update("`name'", `val')
          end
          
          python:
          import __main__
          def set(mod, name, val):
              setattr(mod, name, val)
              
          def get(mod, name):
              return getattr(mod, name)
          
          def update(name, val):
              if hasattr(__main__, name):
                  print(get(__main__, name))
              set(__main__, name, val)
          end
          Then define likewise update2, and
          Code:
          update1 a 10
          update2 a 20
          python: a
          This allows to store variables that are kept between calls and shared between commands. It may be useful for commands that manipulate matplotlib plots or pandas dataframes, or complex data structures, for instance.
          Last edited by Jean-Claude Arbaut; 19 Jul 2019, 17:43.

          Comment


          • #6
            Originally posted by Jean-Claude Arbaut View Post
            I had to wonder: how do you manage to isolate function definitions from within Ado files in Python?

            Found it:

            Code:
            python
            globals()
            __stata_varsum_ado__["calcsum"]
            end

            Where do global variables go? Let's see.

            Code:
            prog setpyglobal
            args name val
            python: test("`name'",`val')
            end
            
            python
            def test(name, val):
            globals()[name] = val
            print(globals())
            end
            After calling this command a few times, it's apparent that globals are kept between calls, and they end up in __stata_setpyglobal_ado__ outside of the Ado program. Rather interesting, as a command can update its inner Python state.

            It's not entirely clear how Stata is doing this, but I'd say the Ado-visible namespace in simply a module, and the __stata_setpyglobal_ado__ variable is a reference to this module's namespace. For instance, if you call __stata_setpyglobal_ado__["test"] from the command prompt, the global variable it modifies will appear as an entry of the __stata_varsum_ado__ dictionary.
            As Stata programmer's manual explains (p. 434), it's still possible to access the main namespace from the Ado program, with 'import __main__'. The converse does not seem possible however.

            So, if I understand correctly, when there is a call to 'python' inside a Stata program, it will only see globals that lie in the specific Python namespace associated with the program. If the Python function is defined outside of an Ado file, it ends up in the __main__ module (the one accessed when typing 'python' from Stata prompt), and the Stata program can't see it unless you tell it to import __main__. Hence the error you got.

            I don't know if Stata has an undocumented way to make the Ado module visible to other commands, but there is a trick. If two commands import the same module, they share its contents. And it's possible to change the contents of this module, either by functions it provides, or directly with setattr/getattr functions. This module may even be the __main__ module.

            For instance, create two ado files, update1 and update2, with the same program (just not the same name). This program prints the value of a variable, and set a new value for the same variable, in the __main__ module.
            Code:
            prog update1
            args name val
            python: update("`name'", `val')
            end
            
            python:
            import __main__
            def set(mod, name, val):
            setattr(mod, name, val)
            
            def get(mod, name):
            return getattr(mod, name)
            
            def update(name, val):
            if hasattr(__main__, name):
            print(get(__main__, name))
            set(__main__, name, val)
            end
            Then define likewise update2, and
            Code:
            update1 a 10
            update2 a 20
            python: a
            This allows to store variables that are kept between calls and shared between commands. It may be useful for commands that manipulate matplotlib plots or pandas dataframes, or complex data structures, for instance.
            Thanks! This is super helpful. I was wondering about exactly this and thought to post a topic, but this post answers most of my questions about Python namespace management in Stata.

            Comment


            • #7
              Jean-Claude Arbaut You are quite right. Interactive mode and do-file use __main__, ado has its own namespace. I have a short discussion of passing python objects between ado files in Chicago talk https://huapeng01016.github.io/chicago19/#/section and we will document this in a future update.

              Comment


              • #8
                Addendum:

                I wrote It's still possible to access the main namespace from the Ado program, with 'import __main__'. The converse does not seem possible however.
                But it's actually wrong, the converse is possible, indirectly, via the __main__.__stata_<name>_ado dictionary (one can read and write it), where <name> is the name of the ado program. And since the __main__ module is visible from other ado program, so is the dictionary.

                To make this more like using a module, let's pretend the target ado program is named 'dummy', then from the other one, just do:

                Code:
                from __main__ import __stata_dummy_ado__ as dummy
                It's still not exactly like a module: a variable 'a' is accessed by dummy["a"] instead of dummy.a, but it's close enough.

                Comment


                • #9
                  Originally posted by Hua Peng (StataCorp) View Post
                  Jean-Claude Arbaut You are quite right. Interactive mode and do-file use __main__, ado has its own namespace. I have a short discussion of passing python objects between ado files in Chicago talk https://huapeng01016.github.io/chicago19/#/section and we will document this in a future update.
                  thank you very much for posting this response. was very helpful to solve a problem with my code.

                  Comment


                  • #10
                    Hi there, I have another problem in embedding python code into a stata program. For instance, I have this simple code in which I ask the user his age and then I give him back the age he will have next year using python for the addition. However, when i try to embed it into a program It gives an error for "unexpected indent". I have try to modify the position of the python code changing the indent it doesnt work, anyone can give me a helping hand?
                    Examples:
                    The code that works:
                    display "How old are you" _request(yourage)
                    display "You are $yourage years old"
                    python:
                    age_1 = "$yourage"
                    age = int(age_1)
                    print("Next year your age will be ",age+1)
                    end

                    The code that doesnt:
                    program age_test1
                    display "How old are you" _request(yourage)
                    display "You are $yourage years old"
                    *Recogidos los datos, invocamos a python para hacer los cálculos oportunos
                    *con age_1=$yourage incrustamos una variable definida en stata a python
                    python:
                    age_1 = "$yourage"
                    age = int(age_1)
                    print("Your age next year will be",age+1)
                    end
                    end

                    Comment

                    Working...
                    X