Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Accessing Census API through Stata

    Hi all,

    I'm having issues accessing a (U.S.) Census Bureau API through Stata. My hunch is that it may be due to the bureau switching to an HTTPS-only system for API requests last year (for more info on this see here).

    I've tried changing the URL i'm calling to be http but the census server automatically switches it to https, at which point I get a failure on the Stata side. Here is some code that shows the different errors:

    Code:
    cap prog drop apicheck
    program apicheck
        tempfile webfile
        copy "http://api.census.gov/data/2010/sf1?key=your_key_here&get=P0050001,P0050002&for=zip+code+tabulation+area:*&in=state:02" "webfile.txt"
        import delimited "webfile.txt", clear
    end
    apicheck
    To replicate this on your machine you'll need to get a key from the Census Bureau and input it in the above code where it says your_key_here; i can vouch that doing so is pain-free and only requires organizational name and email: https://api.census.gov/data/key_signup.html

    This is the error I get when I run the above program:

    Code:
    file http://api.census.gov/data/2010/sf1?key=224506d438e3392533d0309d44a6f45eda2d19d0&get=P0050003,P0050004,P00500
    > 05,P0050006&for=zip+code+tabulation+area:*&in=state:02 not found
    server says file temporarily redirected to https://api.census.gov/data/2010/sf1?key=224506d438e3392533d0309d44a6f4
    > 5eda2d19d0&get=p0050003,p0050004,p0050005,p0050006&for=zip+code+tabulation+area:*&in=state:02
    r(601);
    Note the above is http. If I change it to https (so the API call reads copy "https://api.census.gov...") Stata returns this error:

    Code:
    Received fatal alert: handshake_failure
    r(5100);
    I notice that William Lisowski has come across the same error previously (and related to the http / https issue), and flagged it in comment #4 of this Statalist post. I also found somebody who seems to be experiencing almost exactly the same issue accessing Census Bureau data through Stata on the American Community Survey Data Users Group site here.

    Does anyone have any further information about this? Are other user-written programs that use Census Bureau APIs now failing?

    I am running Stata 14.2 on Mac OS X El Capitan and wonder whether it might be a version thing and that Stata Corp shipped a modification with 15.x.

  • #2
    Works fine on my system.
    That's on Stata 15.1 on Win10
    I used the key from your second code block, btw, so that's not causing the trouble.

    Edit: also works if I include a line 'version 14.2' prior to this code, although I'm not 100% sure that would always recreate the issue.

    Code:
    . cap prog drop apicheck
    
    . program apicheck
      1.     tempfile webfile
      2.     copy "https://api.census.gov/data/2010/sf1?key=YOUR_KEY&get=P0050001,P0050002&for=zip+code+tabulation+area:*&in=state:02" "webfile.txt"
      3.     import delimited "webfile.txt", clear
      4. end
    
    . apicheck
    (5 vars, 238 obs)
    
    .
    end of do-file
    
    . edit
    
    .
    Last edited by Jorrit Gosens; 03 Oct 2018, 00:58.

    Comment


    • #3
      Thanks Jorrit. That's useful to know. Could you share a snippet of the output so I can see what it's pulling?

      I tried it on a colleague's machine also running Stata 14.2 but on Windows 10 this morning and I got the same errors.

      I'm more convinced that it's a Stata version thing but it'd be useful to know if others running 14.2 directly (and not through version control) experience the same problem

      Comment


      • #4
        I too am now running Stata 15.1, but on macOS 10.13. I also was successful with the same experiment as Jorrit Gosens.

        I attempted to recreate the problem that I documented in the topic linked to in post #1, but I no longer experience the problem on my friend's site. As I noted in that topic, at that time the Java 8 implementation of SSL in the Java Runtime Environment (JRE) only supported the AES128 cypher suite. I've confirmed that the site shows no evidence of having changed its configuration to support AES128, so it does not appear that configuration changes on the site were the explanation for my current success.

        My previous attempt was in June 2017. From help whatsnew14 we see that Stata's Java Runtime Environment had been updated to Version 8 Update 121 in the 7 March 2017 update to Stata 14.2. So that is likely the version I was running when I wrote.

        From help whatsnew (run on Stata 15.1) we see that Stata's Java Runtime Environment was updated to Version 8 update 162 in the 18 April 2018 update to Stata 15.1.

        Some searching led me to https://bugs.openjdk.java.net/browse/JDK-8170157 which suggests that changes to the JDK in Version 8 update 162 enabled AES256 cypher suites.

        So in summary it appears that fully updated versions 15.1 and later will be able to access the Census API. Stata's version control does not affect the JRE version used - there's only a single JRE installed - so this problem cannot be duplicated under version control.

        You might contact Stata Technical Services and see if they can advise you on updating the JRE in your Stata 14.2 installation.
        Last edited by William Lisowski; 03 Oct 2018, 07:47.

        Comment


        • #5
          Could you share a snippet of the output so I can see what it's pulling?
          As you requested. It seems that you will need to expreiment with the import delimited options, and perhaps still have some cleanup to do afterward.
          Code:
          . describe
          
          Contains data
            obs:           238                          
           vars:             5                          
           size:         5,236                          
          ------------------------------------------------------------------------------------------------
                        storage   display    value
          variable name   type    format     label      variable label
          ------------------------------------------------------------------------------------------------
          p0050001        str8    %9s                   [["P0050001"
          p0050002        long    %12.0g                P0050002
          state           byte    %8.0g                
          zipcodetabula~a str8    %9s                   zip code tabulation area"]
          v5              byte    %8.0g                
          ------------------------------------------------------------------------------------------------
          Sorted by:
          
          
          . list in 1/5, clean
          
                 p0050001   p0050002   state   zipcod~a   v5  
            1.   ["17603"      16225       2    99501"]    .  
            2.   ["24168"      22620       2    99502"]    .  
            3.   ["14563"      12952       2    99503"]    .  
            4.   ["40914"      37772       2    99504"]    .  
            5.    ["6174"       5324       2    99505"]    .
          Let me also add that your use of tempfile is incorrect. The copy command
          Code:
          copy "http://api.census.gov/..." "webfile.txt"
          should be
          Code:
          copy "http://api.census.gov/..." "`webfile.txt'"
          and similarly on the import delimited command. As it stands, you are ignoring the tempfile and always copying to the same file in your current working directory, so - unless you add the replace option - the copy command will fail the second time it is run.

          Comment


          • #6
            I'm getting the same as William, unsurprisingly, but wanted to add that your textfile looks like:
            Code:
            [["P0050001","P0050002","state","zip code tabulation area"],
            ["17603","16225","02","99501"],
            ["24168","22620","02","99502"],
            ["14563","12952","02","99503"],
            ...............,
            ["2338","2301","02","99929"]]
            So you will have some more steps to do before getting it read neatly into your dataset.
            But that is relevant only when you manage to work around the handshake error

            Comment


            • #7
              Thanks William and Jorrit! Both really helpful replies.

              William, thanks for flagging my (mis)use of tempfiles -- i'll be sure to address things like that once i've got the API call working.

              Your details about the JRE are incredibly useful. I've emailed Stata Corp technical support to make an enquiry about updating mine without upgrading to 15.

              Strangely (given you both managed to run the code and retrieve data using version 15), a friend who's based at a university and has 15 access was unable to make the code work; he received the same errors as I did. William, can you confirm you ran exactly the same code as I shared in #1 (apart from the key change that is)?

              Chris

              Comment


              • #8
                Yes, I can confirm that the code I ran was your CODE block from post #1, with http replaced by https and with the two corrections to the handling of the tempfile.

                Your friend should run the about command, as shown below, and confirm that the results he receives indicate that he is running Stata 15.1 with a revision date of 18 April 2018 or later.
                Code:
                . about
                
                Stata/SE 15.1 for Mac (64-bit Intel)
                Revision 07 Aug 2018

                Comment


                • #9
                  Ok, thanks a lot William. I'll check with my friend about that. I heard back from my enquiry with Stata Corp and it appears there is a solution to this (without upgrading to 15.x). This is the response I got from them:

                  The updated version of Java 8 runtime environment does indeed include the functionality for https. This is included in Stata 15, but not in 14. The easiest thing you can do is download the latest version of Java 8 from the following site (if you don't already have it installed on your system).

                  https://java.com/en/download/manual.jsp

                  You can use the Stata -set java_vmpath- command to point to the "jvm.dll" in your system Java installation. The default setting typically looks like:

                  Code:
                  set java_vmpath "C:\Program Files (x86)\Stata15\utilities\java\windows-x64\jre1.8.0_121\bin\server\jvm.dll"
                  The above is obviously for Windows, but the same syntax is used for Mac as long as you know the path, etc. Once the setting is changed, you may need to close and reopen Stata 14- then run your commands again and let me know if you continue to have issues with the -copy- command.
                  I've updated java (following the link suggested by Stata Corp) but unfortunately I could not find an analogous path on my Mac running OS X El Capitan. I did find a java executable file so tried pointing to there using set java_vmpath but I don't think I should've done this as I now get a different error (see below).

                  Code:
                  set java_vmpath "Applications/Stata/utilities/java/macosx-x64/jre1.8.0_121.jre/Contents/Home/bin/java" but I'm pretty sure I shouldn't have done this as Stata now returns the error:
                  
                  unable to create Java virtual machine
                  r(5003);
                  I've gone back to Stata Corp and will provide an update to this forum once I have good instructions for Mac users. The above should work for Windows.






                  Comment


                  • #10
                    The Java Runtime Environment in Stata 14 is a bit out of date which means the certificates needs for some websites are not available. You will need a more modern JRE than 1.8.0_121.

                    You can download a more up-to-date Java 8 runtime for Mac from:

                    https://github.com/AdoptOpenJDK/open...172-b11.tar.gz

                    Extract the file to a location of your choosing.

                    In Stata:

                    Code:
                    set java_vmpath "/<fullpath_to_extracted_dir>/jdk8u172-b11/jre/lib/jli/libjli.dylib"
                    If you need to set it back to the default:

                    Code:
                    set java_vmpath

                    Comment


                    • #11
                      Hi James,

                      Thanks for your comment and suggestion. I took the approach of taking the entire folder that downloads from the GitHub link you shared and popping that in the folder Stata installed for java versions. For others who may come across this post, this is the path:

                      Code:
                      /Applications/Stata/utilities/java/macosx-x64
                      And it now contains three folders:

                      Code:
                      jdk8u172-b11
                      jre1.8.0_31
                      jre1.8.0_121.jre
                      I set the java_vmpath to the .dylib file you indicate and this has resolved the issue. I can now access the Census API.

                      Many thanks all!

                      Comment


                      • #12
                        For some reason I couldn't get this to work. I have Stata 15.1 but also get the handshake error (in Stata/utilities/java/...) I seem to have the old 1.8.0_121 JRE. When I download the files from the github link and set the java_vmpath, I get error 5002 (failure to dynamically load Java runtime library).

                        A workaround I managed to get to work is to use curl instead

                        Code:
                        !curl "https://api.census.gov/data/1990/sf1?get=P0010001&for=place:*&in=state:12&key=<your key>" -o "<path>/Test1.txt"
                        This download population figures for all places in state 12 and export it to <path>/Test1.txt

                        You can then use the filefilter trick from this thread to get the file into Stata

                        Code:
                                tempfile b c
                                filefilter "<path>/Test1.txt" `b', replace from(",\n") to("\n")
                                filefilter `b' `c', replace from("[") to("")
                                filefilter `c' "<path>/Test1.txt", replace from("]") to("")
                                import delimited "<path>/Test1.txt", clear

                        Comment

                        Working...
                        X