Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Accessing the Census API via. Stata

    Dear all,

    I am currently attempting to access data from the U.S. Census Bureau's API using Stata. According to the Census Bureau's documentation, the API outputs data in a "nonstandard version of JSON that is streamlined". Raw output from the API looks like this:

    [["STNAME","POP","DATE","state"],
    ["Alabama","4849377","7","01"],
    ["Alaska","736732","7","02"],
    ["Arizona","6731484","7","04"],
    ["Arkansas","2966369","7","05"],
    ["California","38802500","7","06"],

    I am currently trying to import this data using insheetjson, but it appears that insheetjson is having difficulty parsing the output above. Any help on this issue would be greatly appreciated.

    Thanks,
    Zhao
    Last edited by Zhaochen He; 11 May 2017, 14:17. Reason: json, census api, insheetjson

  • #2
    A JSON file is a plain old text file. I'm a big fan of the sport of parsing such files using standard Stata string functions. I assume that you have managed to input the JSON file using something like
    Code:
    clear
    infix str line 1-244 using "census.txt"
    compress
    list
    and that what you show is representative of the content. In the example below, I use moss (from SSC) to extract the content because it can scan for multiple matches of the same search pattern. The pattern I use contains double quotes so it needs to be enclosed in compound double quotes like this: `"pattern_with_double_quotes"' (see help quotes). The match(`""([^"]+)""') pattern is a bit messy but it breaks down to
    • start from a double quote character
    • match one or more characters that are not a double quote: ([^"]+)
    • until another double quote is found
    The way moss reapplies the search pattern to find multiple matches, the comma between the items will also match. You can get rid of those extra variables by manually dropping them.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str35 line
    `"[["STNAME","POP","DATE","state"],"'  
    `"["Alabama","4849377","7","01"],"'    
    `"["Alaska","736732","7","02"],"'      
    `"["Arizona","6731484","7","04"],"'    
    `"["Arkansas","2966369","7","05"],"'   
    `"["California","38802500","7","06"],"'
    end
    
    moss line, match(`""([^"]+)""') regex
    drop _count* _pos*
    
    list, noobs compres sep(0)
    and the results
    Code:
    . list, noobs compres sep(0)
    
      +-----------------------------------------------------------------------------------------------------+
      |                                line      _match1   _ma~2    _match3   _ma~4   _ma~5   _ma~6   _ma~7 |
      |-----------------------------------------------------------------------------------------------------|
      |   [["STNAME","POP","DATE","state"],       STNAME       ,        POP       ,    DATE       ,   state |
      |     ["Alabama","4849377","7","01"],      Alabama       ,    4849377       ,       7       ,      01 |
      |       ["Alaska","736732","7","02"],       Alaska       ,     736732       ,       7       ,      02 |
      |     ["Arizona","6731484","7","04"],      Arizona       ,    6731484       ,       7       ,      04 |
      |    ["Arkansas","2966369","7","05"],     Arkansas       ,    2966369       ,       7       ,      05 |
      | ["California","38802500","7","06"],   California       ,   38802500       ,       7       ,      06 |
      +-----------------------------------------------------------------------------------------------------+
    
    .

    Comment


    • #3
      I just went through the same issue and I found this solution based on previous comments/posts in this thread and this presentation by William Matsuoka

      Code:
      tempfile f0 f1 f2 f3
      copy output.json `f0'
      hexdump `f0', analyze
      filefilter `f0' `f1', replace from(",\n") to("\n") 
      filefilter `f1' `f2', replace from("[") to("") 
      filefilter `f2' `f3', replace from("]") to("") 
      import delimited `f3', clear

      Comment

      Working...
      X