Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing JSON data using insheetjson

    Dear all,
    I'm trying to import data through an API that outputs JSON. My problem is the format of the file:

    {"error":[],"result":{"NameOfDatasource":[["v1[1]","v2[1]","v3[1]"],["v1[2]","v2[2]","v3[2]"],"v1[...]","v2[n]","v3[n]","v1[n]","v2[n]","v3[n]"]] ,"DataIDontNeed":""}}

    My challenge is to import all observations of v1-v3 into Stata. Using insheetjson, I can select table "result" and column "NameOfDatasource", but I can't find a way to get to the vars directly. I can specify:
    Code:
    insheetjson v1 v2 v3 using "URL", table("result") columns("NameOfDatasource:`obs':3")
    but this only gives me one observation at a time. As a workaround, I now just save the file to disk, then circle through each obs with a second insheetjson call in my do file to write the data to my Stata file line by line, but this is very slow and I am sure there is an easier way - I just couldn't find it. The aoa option also doesn't seem to work here, although I may be mis-specifying..
    Many thanks in advance for any hints!

  • #2
    Edmund Schuster
    Have you tried using jsonio?

    Comment


    • #3
      Originally posted by wbuchanan View Post
      Edmund Schuster
      Have you tried using jsonio?
      Yes - thanks - I have. Unfortunately, I ran into pretty much the same problem.

      Comment


      • #4
        1. If the JSON is generated by some API and going back to the source is still an option, you can usually broaden your options by looking at what other responses it may offer, typically XML.

        2. See if pre-parsing the file could be an option. E.g. in the example you posted you probably need just the content between each [[ and ]] pairs. So probably can filter the text to retain just that.

        best . Sergiy.

        Comment


        • #5
          Thanks for your reply!
          Originally posted by Sergiy Radyakin View Post
          1. If the JSON is generated by some API and going back to the source is still an option, you can usually broaden your options by looking at what other responses it may offer, typically XML.
          Unfortunately, it only outputs in this format.

          Originally posted by Sergiy Radyakin View Post
          2. See if pre-parsing the file could be an option. E.g. in the example you posted you probably need just the content between each [[ and ]] pairs. So probably can filter the text to retain just that.
          best . Sergiy.
          I am not sure how to solve this by parsing. My problem is the hierarchical structure of the data, which is why I use a loop to circle through each observation. I can't find a way around this being structured as `Table':`Observation_number':`variable_index' rather than `Table':`variable_index':`Observation_number'. I know there must be an easy way to do this since there are various free online JSON converters out there that do this without a problem. Wouldn't mind of course if it were just a table, but this is market data I download from an exchange server, which limits me to 900 observations per request (I need O(millions) of obs)..

          Comment


          • #6
            Edmund Schuster
            It might be useful if you could provide a better example of the data you are working with. The example you provided:

            Code:
            {"error":[],"result":{"NameOfDatasource":[["v1[1]","v2[1]","v3[1]"],["v1[2]","v2[2]","v3[2]"],"v1[...]","v2[n]","v3[n]","v1[n]","v2[n]","v3[n]"]] ,"DataIDontNeed":""}}
            isn't necessarily consistent with the structure that you described. If you can write a regular expression to identify the parts of the hierarchy that you are interested in then jsonio would allow you to just parse those components. If you use the key-value mode of importing the data you could then split the key variable based on the values of "/" and would retrieve the full hierarchy which could then be used to reshape the data however would fit your needs.

            Comment


            • #7
              Hi All,

              I have a data from a JSON file on country area codes and for their subdivisions for three countries (attached data in txt. format).

              I would like to import the data using "insheetjson" command, using the line of codes below.

              However, stata is only reporting one empty raw for each variable.

              I would appreciate any advice on how to use "insheetjson" command to correctly import all data in the JSON file.

              Thank you in advance.

              Code:
              insheetjson using "3166.txt", showresponse
              g str10 AF=""
              g str10 AL=""
              g str10 DZ=""
              insheetjson AF AL DZ using "3166.txt",  col("AF"  "AL"  "DZ" ) flatten replace
              3166.txt


              Comment

              Working...
              X