Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with calling arcpy.GeocodeAddresses_geocoding from Stata 16

    I have been trying to create a command to perform geocoding of address data with local resources (i.e. Streetmap Premium address locators) via arcpy integration with Stata. However, I have been running into a problem where the arcpy geocode command throws an error, stating that the output already exists. The trouble is that: 1) the program deletes the File Geodatabase where the output should be stored prior to this step and 2) you can verify that, immediately prior to the geocoding step, the .gdb contains no such feature class. This may be an arcpy/ArcGIS issue, but I have gotten the geocoding command to run in at the python command prompt (and even in Stata under certain circumstances). But this particular implementation is giving me trouble. The program assumes there is a dataset in memory in Stata that contains columns named "address", "city", "state", and "zip". The only argument to this minimal example is "using" which gives the path to where you want the
    geodatabase (.gdb) file with the geocoded output stored. I have also tried manually deleting the output feature class immediately before the geocode command is run. Any ideas would be greatly appreciated!

    Mike LeGower

    Code:
    program define minimal_example, rclass
        version 16.1
        syntax using/
        
        /* These are set by the syntax command in the real program, hard-coded here for demonstration */
        local locator_path "<path to ArcGIS locators>"
        local locator "<name of locator>"
        local address "address"
        local city "city"
        local state "state"
        local zip "zip"
        
        /* Simple tranformations of above */
        local full_locator "`locator_path'/`locator'"
        /* This is an argument to the arcpy geocoding function */
        local cmd "'Street or Intersection' `address' VISIBLE NONE;'City or Placename' `city' VISIBLE NONE;'State' `state' VISIBLE NONE; 'ZIP Code' `zip' VISIBLE NONE"
        
        python: import arcpy
        python: arcpy.ResetEnvironments()
        python: arcpy.env.overwriteOutput = True
        
        mata {
            pathsplit(st_local("using"), db_path="", db_name="")
            st_local("db_path", db_path)
            st_local("db_name", db_name)
        }
        
        capture confirm new file "`using'"
        if _rc != 0 {
            display as text "Deleting existing .gdb..."
            python: import shutil
            python: shutil.rmtree("`using'")
        }
        pause Just deleted `using'
        
        tempfile original input_table output_table
        local input_table: subinstr local input_table "\" "/", all
        local output_table: subinstr local output_table "\" "/", all
        local input_table: subinstr local input_table ".tmp" ".csv", all
        local output_table: subinstr local output_table ".tmp" ".csv", all
        
        qui save `original', replace
        
        keep `address' `city' `state' `zip'
        duplicates drop `address' `city' `state' `zip', force
        export delimited `address' `city' `state' `zip' using `input_table', quote replace
        local mrg `address' `city' `state' `zip'
        /* These are extraneous variables produced by the geocode */
        local drp "objectid match_type side-langcode distance displayx-arc_zip"
        
        display as text "Exporting data to arcpy..."
        python: arcpy.CreateFileGDB_management("`db_path'", "`db_name'")
        python: arcpy.env.workspace = "`using'"
        python: arcpy.CopyRows_management("`input_table'", "geocode_input")
        display as text "Geocoding addresses..."
        /* Here you can check that the .gdb just created does not contain "geocode_output"
        python: import arcpy
        python: arcpy.ListFeatureClasses()
        should return an empty list */
        pause About to geocode
        python: arcpy.GeocodeAddresses_geocoding("geocode_input", "`full_locator'", "`cmd'", "geocode_result")
        /* ERROR HERE: "geocode_output" already exists */
        python: arpy.CopyRows_management("`geocode_result'", "`output_table'")
        display as text "Importing to Stata..."
        import delimited using `output_table', clear stringcols(_all)
        drop `drp'
        /* arcpy can add suffixes to the original columns; rename back here for merge */
        cap rename `address' `address'
        cap rename `city' `city'
        cap rename `state' `state'
        cap rename `zip' `zip'
        
        merge 1:m `mrg' using `original', nogen
        qui destring x y, replace force
        qui replace x = . if status != "M"
        qui replace y = . if status != "M"
        
    end
    
    /* Here we need to make sure the right version of python is initialized */
    version 16.1
    qui python query
    if r(initialized)==0 {
        qui python set exec "<path to 64bit ArcGIS python install>"
    }
    else if r(initialized) == 1 & r(execpath) != "<path to 64bit ArcGIS python install>" {
        display as err "python initialized with wrong version; please restart Stata"
        exit 7101
    }

  • #2
    Hi Mike LeGower,

    Can you send your program and data files to [email protected] so that we can take a close look?

    Comment


    • #3
      Hi, Mike, I asked my colleague Zhao to take a look. He's the main developer of Stata/Python integration.

      Comment


      • #4
        Originally posted by Zhao Xu (StataCorp) View Post
        Hi Mike LeGower,

        Can you send your program and data files to [email protected] so that we can take a close look?
        Yes, although I can not send you the Esri locators as they are proprietary. I believe you will need to have a Streetmap Premium license to fully run the code.

        Comment


        • #5
          We will contact Esri to see if we can get a trial license. If not, we will let you know and may need you to run some tests and send us the output.

          Comment


          • #6
            Originally posted by Hua Peng (StataCorp) View Post
            We will contact Esri to see if we can get a trial license. If not, we will let you know and may need you to run some tests and send us the output.
            Sure. For the record, I'm using the "classic" locators from Streetmap Premium 2019.

            Comment


            • #7
              Can you try the following to see whether it helps your case?

              1. change

              Code:
              python: arpy.CopyRows_management("`geocode_result'", "`output_table'")
              to

              Code:
              python: arpy.CopyRows_management("geocode_result", "`output_table'")
              It seems that the double quotes `' is not needed here.

              2. After the first time you run minimal_example, in Stata's Command Window,
              type

              Code:
              python describe, all
              There should be a dictionary named __stata_minimal_example_ado__ listed
              there. Before you run minimal_example again, type

              Code:
              python: del __stata_minimal_example_ado__
              in Command Window first.

              You can also add this line on the top of minimal_example.ado like

              Code:
              capture python: del __stata_minimal_example_ado__
              so that you do not need to explicitly call it.

              Comment


              • #8
                Originally posted by Zhao Xu (StataCorp) View Post
                Can you try the following to see whether it helps your case?

                1. change

                Code:
                python: arpy.CopyRows_management("`geocode_result'", "`output_table'")
                to

                Code:
                python: arpy.CopyRows_management("geocode_result", "`output_table'")
                It seems that the double quotes `' is not needed here.
                This is a good catch that results from me having a tempname in the other version of the code that never got changed to the hardcoded name in this version. However, the error occurs in the line before that, so it never gets to that point.

                Originally posted by Zhao Xu (StataCorp) View Post
                Can you try the following to see whether it helps your case?
                2. After the first time you run minimal_example, in Stata's Command Window,
                type

                Code:
                python describe, all
                There should be a dictionary named __stata_minimal_example_ado__ listed
                there. Before you run minimal_example again, type

                Code:
                python: del __stata_minimal_example_ado__
                in Command Window first.

                You can also add this line on the top of minimal_example.ado like

                Code:
                capture python: del __stata_minimal_example_ado__
                so that you do not need to explicitly call it.
                I will keep this in mind, but I seem to have fixed it some other way. In order to send the code to you, I had to manually transcribe it from a secure server without internet access. In that process, I cleaned up a few things, which seems to have resolved the problem. But I couldn't tell you what I cleaned up in order to make it work! I will test it a bit more thoroughly to make sure I actually resolved the problem. If I encounter it again, I will send you the code and data.

                Comment


                • #9
                  Glad to hear that you solved the problem.

                  Comment


                  • #10
                    Okay, I may not have solved the problem after all. And I encountered another strange problem as well. Email incoming...

                    Comment


                    • #11
                      One problem appears to be that, when Stata throws a python exception, it doesn't release all the resources it was using. For example, if the Stata program is operating on a Geodatabase via arcpy and there is a python exception that causes the program to fail, the geodatabase is then considered locked until Stata is closed. If you attempt to re-run the program and overwrite the geodatabase, python says that the file is in use by another process. If you then close Stata and try to delete again, it deletes without error. I don't know if this is a python issue or a Stata issue, but closing Stata releases the lock.

                      Comment


                      • #12
                        We received your email and are looking into this case. We will let you know once we get something.

                        Comment


                        • #13
                          After some digging I have solved one problem and narrowed down the other. The first problem (CopyRows being unable to write to the target table) had to do with schema.ini files generated by arcpy. Every time arcpy creates a table, it creates a schema with the data types allowed. If you try to write data to a table and it doesn't match the schema, it will throw an exception. So when you try to write to a file that was the target of arcpy previously (because the tempfile names are recycled from the geocode step) there already exists a schema for that file that doesn't match the new data. Solution: erase all of the crud that arcpy sticks in c(tmpdir) after every run of each program.

                          The second problem (geocode step fails after you run the census link program) seems to be caused by a failure to read in the actual address data from the exported .csv. After the .csv is exported to a tempfile, I attempt to read it into arcpy using CopyRows. But the second time around, it reads in a table of null values without throwing an error. When it then attempts to geocode those null values, it throws an exception. If you export the same data to a different file and re-try, it works fine. I think that this also has to do with name collisions between the tempfiles from the first tool and the second tool, but I can't determine what is causing the problem specifically.

                          Comment


                          • #14
                            Are you able to determine if the csv file is empty or not before CopyRows. You might use

                            Code:
                            checksum file
                            to see the size of the file.

                            Comment


                            • #15
                              Are you able to determine if the csv file is empty or not before CopyRows.
                              I was able to look at both the contents of the .csv and the contents of the .gdb table. The former had address data, as intended. The latter had the same dimensions, but all the data was "None".

                              In any case, I managed to fix both problems; to my knowledge both scripts work as intended now, although I will have someone else test it to make sure.

                              I still don't know the root cause of the second problem, but it turns out that if instead of using the arcpy.CopyRows function, you manually create a new table, new fields, and insert the rows from the csv file in a loop, it works. I have absolutely no idea why this works where their own CopyRows facility does not. The only two guesses I have are
                              1. arcpy uses Cursor objects to position a virtual cursor in their datasets and insert data at the cursor's position. Perhaps the cursor never gets reset between runs of this tool if the file name provided is the same?
                              2. if CopyRows is implemented with csv.reader, perhaps the same reader is invoked the second time around somehow and, since the iterator would be exhausted at that point from the first run, it generates null values?
                              It's a mystery to me, but it seems to be an Esri problem, not a StataCorp problem.

                              Comment

                              Working...
                              X