Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • JSONIO now available on SSC

    Thanks once again to KitBaum for posting a new program to the SSC. At the moment, the program only implements the output side of I/O operations, and I thought it would be helpful to others who might need/want to create JSON objects from their datasets. The program used the Jackson JSON APIs and requires that the user have JRE 1.8 or higher installed (and to have Stata pointing at a version 1.8 or higher JRE in order to run). If the JVM is already running, the program is fairly fast, but be aware that it may take a few moments if it is the activity that requires Stata to initialize the JVM. Additional information/source can be found at [URL]https://github.com/wbuchanan/StataJSON[/CODE].


    The examples below are from the README file of the program's GitHub repository:

    Code:
     // Load example dataset
    sysuse auto, clear  
    
    // Serialize a single object and print it to the console
    jsonio, what(record) obid(74)
    {
    "mpg" : 17.0,
    "price" : 11995.0,
    "headroom" : 2.5,
    "rep78" : 5.0,
    "length" : 193.0,
    "weight" : 3170.0,
    "displacement" : 163.0,
    "turn" : 37.0,
    "trunk" : 14.0,
    "make" : "Volvo 260",
    "gear_ratio" : 2.9800000190734863,
    "foreign" : 1.0
    }

    Code:
    // Serialize all records satisfying a given condition
    jsonio if rep78 == 1, what(data)
    [/CODE]

    {
    "fileName" : "Stata Data Set",
    "data" : [ {
    "mpg" : 24.0,
    "price" : 4195.0,
    "headroom" : 2.0,
    "rep78" : 1.0,
    "length" : 180.0,
    "weight" : 2730.0,
    "displacement" : 151.0,
    "turn" : 40.0,
    "trunk" : 10.0,
    "make" : "Olds Starfire",
    "gear_ratio" : 2.7300000190734863,
    "foreign" : 0.0
    }, {
    "mpg" : 18.0,
    "price" : 4934.0,
    "headroom" : 1.5,
    "rep78" : 1.0,
    "length" : 198.0,
    "weight" : 3470.0,
    "displacement" : 231.0,
    "turn" : 42.0,
    "trunk" : 7.0,
    "make" : "Pont. Firebird",
    "gear_ratio" : 3.0799999237060547,
    "foreign" : 0.0
    } ]
    }

    Code:
    // Serialize all data and metadata for records 71-74
    jsonio in 71/74, w(all)
    {
    "Value Labels" : [ {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : "Domestic",
    "1" : "Foreign"
    } ],
    "Starting Observation Number" : 71,
    "Data Set" : [ {
    "mpg" : 41.0,
    "price" : 5397.0,
    "headroom" : 3.0,
    "rep78" : 5.0,
    "length" : 155.0,
    "weight" : 2040.0,
    "displacement" : 90.0,
    "turn" : 35.0,
    "trunk" : 15.0,
    "make" : "VW Diesel",
    "gear_ratio" : 3.7799999713897705,
    "foreign" : 1.0
    }, {
    "mpg" : 25.0,
    "price" : 4697.0,
    "headroom" : 3.0,
    "rep78" : 4.0,
    "length" : 155.0,
    "weight" : 1930.0,
    "displacement" : 89.0,
    "turn" : 35.0,
    "trunk" : 15.0,
    "make" : "VW Rabbit",
    "gear_ratio" : 3.7799999713897705,
    "foreign" : 1.0
    }, {
    "mpg" : 25.0,
    "price" : 6850.0,
    "headroom" : 2.0,
    "rep78" : 4.0,
    "length" : 156.0,
    "weight" : 1990.0,
    "displacement" : 97.0,
    "turn" : 36.0,
    "trunk" : 16.0,
    "make" : "VW Scirocco",
    "gear_ratio" : 3.7799999713897705,
    "foreign" : 1.0
    }, {
    "mpg" : 17.0,
    "price" : 11995.0,
    "headroom" : 2.5,
    "rep78" : 5.0,
    "length" : 193.0,
    "weight" : 3170.0,
    "displacement" : 163.0,
    "turn" : 37.0,
    "trunk" : 14.0,
    "make" : "Volvo 260",
    "gear_ratio" : 2.9800000190734863,
    "foreign" : 1.0
    } ],
    "Variable Names" : [ "make", "price", "mpg", "rep78", "headroom", "trunk", "weight", "length", "turn", "displacement", "gear_ratio", "foreign" ],
    "Is Variable String" : [ true, false, false, false, false, false, false, false, false, false, false, false ],
    "Number of Observations" : 4,
    "Ending Observation Number" : 74,
    "Value Label Names" : [ "", "", "", "", "", "", "", "", "", "", "", "origin" ],
    "Variable Labels" : [ "Make and Model", "Price", "Mileage (mpg)", "Repair Record 1978", "Headroom (in.)", "Trunk space (cu. ft.)", "Weight (lbs.)", "Length (in.)", "Turn Circle (ft.) ", "Displacement (cu. in.)", "Gear Ratio", "Car type" ]
    }

    Code:
    // Serialize all data and metadata
    jsonio, w(all)
    {
    "Value Labels" : [ {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : ""
    }, {
    "0" : "Domestic",
    "1" : "Foreign"
    } ],
    "Starting Observation Number" : 1,
    "Data Set" : [ {
    "mpg" : 22.0,
    "price" : 4099.0,
    "headroom" : 2.5,
    "rep78" : 3.0,
    "length" : 186.0,
    "weight" : 2930.0,
    "displacement" : 121.0,
    "turn" : 40.0,
    "trunk" : 11.0,
    "make" : "AMC Concord",
    "gear_ratio" : 3.5799999237060547,
    "foreign" : 0.0
    }, {...}, {
    "mpg" : 17.0,
    "price" : 11995.0,
    "headroom" : 2.5,
    "rep78" : 5.0,
    "length" : 193.0,
    "weight" : 3170.0,
    "displacement" : 163.0,
    "turn" : 37.0,
    "trunk" : 14.0,
    "make" : "Volvo 260",
    "gear_ratio" : 2.9800000190734863,
    "foreign" : 1.0
    } ],
    "Variable Names" : [ "make", "price", "mpg", "rep78", "headroom", "trunk", "weight", "length", "turn", "displacement", "gear_ratio", "foreign" ],
    "Is Variable String" : [ true, false, false, false, false, false, false, false, false, false, false, false ],
    "Number of Observations" : 74,
    "Ending Observation Number" : 74,
    "Value Label Names" : [ "", "", "", "", "", "", "", "", "", "", "", "origin" ],
    "Variable Labels" : [ "Make and Model", "Price", "Mileage (mpg)", "Repair Record 1978", "Headroom (in.)", "Trunk space (cu. ft.)", "Weight (lbs.)", "Length (in.)", "Turn Circle (ft.) ", "Displacement (cu. in.)", "Gear Ratio", "Car type" ]
    }

    Code:
    // Serialize subset of variables when records satisfy a condition
    jsonio mpg weight price make if rep78 == 1, w(data)
    {
    "fileName" : "Stata Data Set",
    "data" : [ {
    "mpg" : 24.0,
    "price" : 4195.0,
    "weight" : 2730.0,
    "make" : "Olds Starfire"
    }, {
    "mpg" : 18.0,
    "price" : 4934.0,
    "weight" : 3470.0,
    "make" : "Pont. Firebird"
    } ]
    }

    Code:
    // Create output JSON file
    jsonio mpg foreign weight price make if rep78 == 1, w(all) filenm(test.json)
    
    // Create JSON object of variable names
    jsonio, metaprint(varnames)
    [ "make", "price", "mpg", "rep78", "headroom", "trunk", "weight", "length", "turn", "displacement", "gear_ratio", "foreign" ]

    Code:
    // Create JSON object of variable labels
    jsonio, metaprint(varlabels)
    [ "Make and Model", "Price", "Mileage (mpg)", "Repair Record 1978", "Headroom (in.)", "Trunk space (cu. ft.)", "Weight (lbs.)", "Length (in.)", "Turn Circle (ft.) ", "Displacement (cu. in.)", "Gear Ratio", "Car type" ]


  • #2
    Billy,

    This look like a useful tool; kudos!

    You say
    At the moment, the program only implements the output side of I/O operations
    For the input side (converting JSON to Stata) would insheetjson (available from SSC) be sufficient, or are you expecting to add other features? insheetjson does not require JRE, so that might be preferable.

    Regards,
    Joe

    Comment


    • #3
      I'm planning on building out the input side over time, but figured it would be possible for some folks to use insheetjson for the time being. There is an extension to the Jackson API for using JSON schema files to define the data structure, and it also has some functions for parsing XML data using schema as well. One case, in particular, that I can think of where this would help to avoid a lot of heartache is with a legislative data API that I wrote an R package for. The JSON and even the XML payload is structured in such a way that defining some form of mapping of the key value pairs could be extremely useful and save tons of time processing the data on the backend. I also use tools like elasticsearch relatively frequently, so having something already written in the same language as the elasticsearch API would be fairly helpful for those types of tasks. More importantly, thanks for the kind words. There are still a few things that I think could improve it (e.g., implement a streaming version of the Java methods to reduce memory overhead and speed up processing).

      Comment


      • #4
        Joe Canner I've just finished doing a bit of updating to things and think it makes it a bit easier to work with the JSON itself (e.g., individual objects are clearly/consistently named, some redundant data were removed, and I've tried to make objects that help to retrieve the metadata a bit easier). I just put together a quick project page for it here:

        https://wbuchanan.github.io/StataJSON/about/

        Which shows an updated version of the README file.

        Comment


        • #5
          I've recently updated jsonio and there is now an additional program ggeocode that people can use for handling geocoding with the Google Geocoding API. This program gives users the flexibility to return different parts of the payload (e.g., bounding boxes, viewport, etc...) as well as more flexibility in specifying the addresses to geocode (e.g., you can pass a single concatenated address string or can pass a varlist of the component parts). The program then handles the API call, parsing the payload, and pushing the values back into the dataset in memory.

          Comment


          • #6
            Can I use ggeocode to geocode addresses in stata without writing my own code? Is there another way to do so?

            Comment


            • #7
              Mark Merante I'm not quite sure I completely understand your question. The link above points to a page with some examples of how the program can be used:

              Code:
              // Creates a small data set of addresses that you can use for geocoding
              input int housenum str13 street str11 city str2 state str5 zip
              4287 "46th Ave N" "Robbinsdale" "MN" "55422"
              6675 "Old Canton Rd" "Ridgeland" "MS" "39157"
              12313 "33rd Ave NE" "Seattle" "WA" "98125"
              310 "Cahir St" "Providence" "RI" "02903"
              22 "Oaklawn Ave" "Cranston" "RI" "02920"
              61 "Pine St" "Attleboro" "MA" "02703"
              10 "Larkspur Rd" "Warwick" "RI" "02886"
              91 "Fallon Ave" "Providence" "RI" "02908"
              195 "Arlington Ave" "Providence" "RI" "02906"
              end
              
              // Geocode the addresses
              ggeocode housenum street city state zip
              The first block was just to establish some data that could be geocoded, but the geocoding is all contained on the last line. This is subject to any API limits and licensing requirements imposed by google, but provides an alternative working implementation of using Google's geocoding API service (since other users have had difficulties using existing programs to use the same API).

              Comment


              • #8
                I've not yet pushed an update to the SSC archives, but have developed some better methods for parsing JSON data that can be viewed in the project repository. The current options allow users to query elements from the data and to return any/all of the JSON in either a key-value format (i.e., creates two variables named key and format and insert the lineage/name of the node in key and the value in the value variable; it also handles casting the elements if they are of the same type) or row-value format (i.e., each node is read into a unique variable and each variable is labeled with the lineage/name of the node, in this case all elements are cast appropriately).

                Comment


                • #9
                  Hi I am trying to use ggeocode and it says the command is unrecognized, I installed jsonio before using the command ggeocode. What am i doing wrong? thank you very much!

                  Comment


                  • #10
                    Sriparna Ghosh ,

                    The ggeocode option is only available from the development version of the package hosted on GitHub. I recently fixed a minor bug with the package that caused errors in the return macro when serializing the data in memory to a JSON object as well. However, you should be able to accomplish the same thing using the row value method to import the data.

                    Comment

                    Working...
                    X