Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Mohamad Soltani
    Can you provide a small portion of the data directly in a post instead of a link to a large compressed file?

    That said, after looking at the documentation link you provided I saw "Format is one-review-per-line in json. See examples below for further help reading the data.". The problem is likely that this is not proper JSON. Line delimiters have no special meaning in the JSON specification. This is called JSONLines. That is likely the reason why you are having issues parsing the file.
    Last edited by wbuchanan; 27 Feb 2023, 13:32.

    Comment


    • #17
      Hi everyone,
      I see that this is a recurring issue and the possible fix is just a trick, not fancy coding.
      I recently discovered the Central Bank of Peru has lots of free and daily data on various items such as exchange rates, international reserves, commodity prices, etc.
      The key here was to use Python (or rather the python command) to import the data to Stata and then I just cleaned it in Stata .
      Please, see this example below for reference and intuition. It took me a while to figure it out, but then I realized the trick was simple. Instead of killing a fly with a tank, just working around it and use a newspaper haha.
      You don't need to create an account or anything like that. Just access the API and there you go.
      Code:
      *BANCO CENTRAL DE RESERVA DEL PERU
      *https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01288PM/json
      
      *EXAMPLE: Inflation
      
      clear all
      set more off
      
      *We use Python to call the API from BCRP
      python:
      
      
      import requests
      import json
      import pandas as pd
      
      # Construct the URL for the API call
      #inflation example
      API     = 'https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PN01288PM/json'
      #exchange rate example
      #API ='https://estadisticas.bcrp.gob.pe/estadisticas/series/api/PD04645PD/json/2016-1-1/2023-12-31'
      URL     = API
      
      # Submit the API data request
      data = requests.get(URL).json()
      
      # Extract the 'results' part of the JSON data
      bcrp = data.get('periods', [])
      
      # Convert the JSON data to a pandas data frame
      bcrp_df  = pd.read_json(json.dumps(bcrp))
      
      bcrp_df['name'] = bcrp_df['name'].astype('str')
      bcrp_df['values'] = bcrp_df['values'].astype('str')
      
      #Renaming variables
      bcrp_df=bcrp_df.rename(columns={'name':'date', 'values':'ipc'}, inplace=False)
      
      # new data frame with split value columns
      new = bcrp_df["date"].str.split(".", n = 1, expand = True)
       
      # making separate first name column from new data frame
      bcrp_df["month"]= new[0]
       
      # making separate last name column from new data frame
      bcrp_df["year"]= new[1]
      
      
      # Dropping old Name columns
      bcrp_df.drop(columns =["date"], inplace = True)
      
      #Removing brackets
      
      
      #bcrp_df['year'] = bcrp_df['year'].astype('str')
      #bcrp_df['ipc'] = bcrp_df['ipc'].astype('str')
      #bcrp_df['month'] = bcrp_df['month'].astype('str')
      
      print(bcrp_df.head())
      
      #Saving a dataframe as a Stata file
      bcrp_df.to_stata('bcrp.dta')
      
      
      end
      
      use bcrp.dta, clear
      drop index
      
      replace ipc = subinstr(ipc,"'","",.)
      replace ipc = subinstr(ipc,"[","",.)
      replace ipc = subinstr(ipc,"]","",.)
      
      destring ipc, replace
      destring year, replace
      
      *Cleaning up months
      replace month="1" if month=="Ene"
      replace month="2" if month=="Feb"
      replace month="3" if month=="Mar"
      replace month="4" if month=="Abr"
      replace month="5" if month=="May"
      replace month="6" if month=="Jun"
      replace month="7" if month=="Jul"
      replace month="8" if month=="Ago"
      replace month="9" if month=="Sep"
      replace month="10" if month=="Oct"
      replace month="11" if month=="Nov"
      replace month="12" if month=="Dic"
      
      destring ipc, replace
      destring month, replace
      destring year, replace
      
      *Creating the date variable
      gen date=ym(year,month)
      format date %tm
      
      
      *Inflation
      **Year on Year
      gen inf_yoy=100*(ipc[_n]-ipc[_n-12])/ipc[_n-12]
       
       **Month to Month
      gen inf_mom=100*(ipc[_n]-ipc[_n-1])/ipc[_n-1]
      
      *Cumulative inflation
      bysort year (month): gen inf_peru=sum(inf_mom)
      
      line inf_peru date
      
      saveold ipc_peru, replace

      Comment


      • #18
        Jorge L. Guzman while your example is definitely useful, it wouldn't solve the issue that others raised. The link you provided contains a proper JSON object (e.g., conforms to the JSON specification). The examples that others reference when having difficulty do not conform to the JSON standard. There is a JSONLines library in Python that could be used in a similar manner to your example, but jsonio uses a JSON serializing/deserializing library that is based on the JSON spec, so it fails when the JSON payload does not conform to the JSON spec.

        Comment


        • #19
          Hello all,

          After reviewing the discussions here, I wrote a do file to import data into Stata. However, I get the "invalid expression" error.


          Below is my code. What is wrong with it?

          Thanks!

          clear
          gen str8 case_number=""
          gen str6 case_status=""
          gen str12 manner_of_death=""
          gen str2 age=""
          gen str20 race=""
          gen str6 gender=""
          gen str27 death_date=""
          gen str15 death_city=""
          gen str10 death_zip=""
          gen str18 resident_city=""
          gen str5 resident_zip=""
          gen str40 incident_location=""
          gen str15 incident_city=""
          gen str5 incident_zip=""
          gen strL cause_of_death=""
          gen strL other_significant=""
          gen str12 latitude=""
          gen str12 longitude=""
          insheetjson case_number case_status manner_of_death age race gender death_date death_city death_zip resident_city resident_zip incident_location incident_city incident_zip cause_of_death other_significant latitude longitude using data_from_socrata2.json, table(results) col("case_number","case_status", "manner_of_death", "age", "race", "gender", "death_date", "death_city", "death_zip", "resident_city", "resident_zip", "incident_location", "incident_city", "incident_zip", "cause_of_death", "other_significant", "latitude", "longitude")


          Comment


          • #20
            Originally posted by Artur Deguzman View Post
            Hello all,

            After reviewing the discussions here, I wrote a do file to import data into Stata. However, I get the "invalid expression" error.


            Below is my code. What is wrong with it?

            Thanks!

            clear
            gen str8 case_number=""
            gen str6 case_status=""
            gen str12 manner_of_death=""
            gen str2 age=""
            gen str20 race=""
            gen str6 gender=""
            gen str27 death_date=""
            gen str15 death_city=""
            gen str10 death_zip=""
            gen str18 resident_city=""
            gen str5 resident_zip=""
            gen str40 incident_location=""
            gen str15 incident_city=""
            gen str5 incident_zip=""
            gen strL cause_of_death=""
            gen strL other_significant=""
            gen str12 latitude=""
            gen str12 longitude=""
            insheetjson case_number case_status manner_of_death age race gender death_date death_city death_zip resident_city resident_zip incident_location incident_city incident_zip cause_of_death other_significant latitude longitude using data_from_socrata2.json, table(results) col("case_number","case_status", "manner_of_death", "age", "race", "gender", "death_date", "death_city", "death_zip", "resident_city", "resident_zip", "incident_location", "incident_city", "incident_zip", "cause_of_death", "other_significant", "latitude", "longitude")

            I discovered that I should not have any commas between variables. After removing them, I get the following error:

            Unable to find data. Bad result selector 'results'?y

            Looking at the "Bad result..." portion, it is coming from the "Table(results)" part of the code.

            The description for the Table is:

            tableselector(string) specifies the selector to use in finding the table burried in the json object tree

            It seems like the "result" string is not correct. What should it be? Is it something I find in the JSON file?
            Last edited by Artur Deguzman; 30 Nov 2023, 12:25.

            Comment


            • #21
              Hi Artur, actually is very hard to troubleshoot a json parsing without "seeing" it, as they are very flexible by nature.

              If you are not able to share it here, I recommend you to use jsonio (from SSC).

              Code:
              . jsonio kv, file("exemple.json") nourl elem("")

              It will parse the WHOLE json file, allowing you to use Stata built-in commands to filter/transform it.


              Comment


              • #22
                Originally posted by Luis Pecht View Post
                Hi Artur, actually is very hard to troubleshoot a json parsing without "seeing" it, as they are very flexible by nature.

                If you are not able to share it here, I recommend you to use jsonio (from SSC).

                Code:
                . jsonio kv, file("exemple.json") nourl elem("")

                It will parse the WHOLE json file, allowing you to use Stata built-in commands to filter/transform it.

                Hello Luis,

                The data is publicly available at:

                https://data.sccgov.org/resource/s3fb-yrjp.json

                Thanks!

                Artur

                Comment


                • #23
                  In the -insheetjson- command, it looks like "other_significant" should be "other_significant_condition".

                  Comment


                  • #24
                    Artur,

                    did some changes here (bold ) and I was able to import most of the JSON file, however insheetjson keeps bugging on the last 2 columns, despite of being ordinary variables just like the others.

                    Code:
                    clear
                    gen str8 case_number=""
                    gen str6 case_status=""
                    gen str12 manner_of_death=""
                    gen str2 age=""
                    gen str20 race=""
                    gen str6 gender=""
                    gen str27 death_date=""
                    gen str15 death_city=""
                    gen str10 death_zip=""
                    gen str18 resident_city=""
                    gen str5 resident_zip=""
                    gen str40 incident_location=""
                    gen str15 incident_city=""
                    gen str5 incident_zip=""
                    gen str256 cause_of_death=""
                    gen str256 other_significant_condition=""
                    gen str12 latitude=""
                    gen str12 longitude=""
                    
                    . foreach v of varlist _all {
                      2. di "importing variable `v'"
                      3. insheetjson `v' using C:\Users\Downloads\s3fb-yrjp.json, col( `v' )
                      4. }
                    importing variable case_number
                     1000 observations updated/written.
                    importing variable case_status
                     1000 observations updated/written.
                    importing variable manner_of_death
                     1000 observations updated/written.
                    importing variable age
                     1000 observations updated/written.
                    importing variable race
                     1000 observations updated/written.
                    importing variable gender
                     1000 observations updated/written.
                    importing variable death_date
                     1000 observations updated/written.
                    importing variable death_city
                     1000 observations updated/written.
                    importing variable death_zip
                     1000 observations updated/written.
                    importing variable resident_city
                     1000 observations updated/written.
                    importing variable resident_zip
                     1000 observations updated/written.
                    importing variable incident_location
                     1000 observations updated/written.
                    importing variable incident_city
                     1000 observations updated/written.
                    importing variable incident_zip
                     1000 observations updated/written.
                    importing variable cause_of_death
                     1000 observations updated/written.
                    importing variable other_significant_condition
                     1000 observations updated/written.
                    importing variable latitude
                     Invalid column name/selector 'latitude'. (Possible name candidates are: "case_number" "case_status" "manner_of_death" "age" "race" "gender" "death_date" "death_city" "death_zip" "resident_city" "resident_zip" "
                    > incident_location" "incident_city" "incident_zip" "cause_of_death" "other_significant_condition" )
                     Empty result returned; Nothing to do.
                    importing variable longitude
                     Invalid column name/selector 'longitude'. (Possible name candidates are: "case_number" "case_status" "manner_of_death" "age" "race" "gender" "death_date" "death_city" "death_zip" "resident_city" "resident_zip"
                    > "incident_location" "incident_city" "incident_zip" "cause_of_death" "other_significant_condition" )
                     Empty result returned; Nothing to do.
                    I was able to completely import it using the Libjson Mata Class,

                    Code:
                                                                                                           1                                                                                  2
                         +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
                       1 |                                                                                     var1                                                                                  18-00004  |
                       2 |                                                                                     var2                                                                                    Closed  |
                       3 |                                                                                     var3                                                                                   Natural  |
                       4 |                                                                                     var4                                                                                        79  |
                       5 |                                                                                     var5                                                                      OtherPacificIslander  |
                       6 |                                                                                     var6                                                                                      Male  |
                       7 |                                                                                     var7                                                               2018-01-01 07:00:00.0000000  |
                       8 |                                                                                     var8                                                                               Santa Clara  |
                       9 |                                                                                     var9                                                                                     95050  |
                      10 |                                                                                    var10                                                                                 Sunnyvale  |
                      11 |                                                                                    var11                                                                                     94089  |
                      12 |                                                                                    var12                                                                                       N/A  |
                      13 |                                                                                    var13                                                                                       N/A  |
                      14 |                                                                                    var14                                                                                       N/A  |
                      15 |                                                                                    var15   probable complaications of MI due to ASCVD due to Kidney failure (not related to trauma  |
                      16 |                                                                                    var16                                                                                       N/A  |
                      17 |                                                                                    var17                                                                               +37.3511428  |
                      18 |                                                                                    var18                                                                              -121.9523083  |
                         +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
                    but looping with jsonio( from SSC) takes about 1 minute and avoids the need to move to Mata.


                    Code:
                    . clear
                    . local j=1
                    . while `j'>0 {
                      2. jsonio rv, file("c:\Users\Downloads\s3fb-yrjp.json") nourl element("id_`j'/") ob(`j') stub(var)
                      3. if var1[`j']=="" exit
                      4. local ++j
                      5. }
                    
                    . desc
                    
                    Contains data
                     Observations:         1,001                  
                        Variables:            18                  
                    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                    Variable      Storage   Display    Value
                        name         type    format    label      Variable label
                    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                    var1            str17   %17s                  /id_1000/case_number
                    var2            str17   %17s                  /id_1000/case_status
                    var3            str21   %21s                  /id_1000/manner_of_death
                    var4            str9    %9s                   /id_1000/age
                    var5            str20   %20s                  /id_1000/race
                    var6            str12   %12s                  /id_1000/gender
                    var7            str27   %27s                  /id_1000/death_date
                    var8            str16   %16s                  /id_1000/death_city
                    var9            str15   %15s                  /id_1000/death_zip
                    var10           str19   %19s                  /id_1000/resident_city
                    var11           str18   %18s                  /id_1000/resident_zip
                    var12           str40   %40s                  /id_1000/incident_location
                    var13           str19   %19s                  /id_1000/incident_city
                    var14           str18   %18s                  /id_1000/incident_zip
                    var15           str170  %170s                 /id_1000/cause_of_death
                    var16           str134  %134s                 /id_1000/other_significant_condition
                    var17           str14   %14s                  /id_1000/latitude
                    var18           str15   %15s                  /id_1000/longitude
                    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                    Sorted by:
                         Note: Dataset has changed since last saved.
                    LP
                    Last edited by Luis Pecht; 03 Dec 2023, 04:22.

                    Comment


                    • #25
                      Originally posted by Luis Pecht View Post
                      Artur,

                      did some changes here (bold ) and I was able to import most of the JSON file, however insheetjson keeps bugging on the last 2 columns, despite of being ordinary variables just like the others.

                      Code:
                      clear
                      gen str8 case_number=""
                      gen str6 case_status=""
                      gen str12 manner_of_death=""
                      gen str2 age=""
                      gen str20 race=""
                      gen str6 gender=""
                      gen str27 death_date=""
                      gen str15 death_city=""
                      gen str10 death_zip=""
                      gen str18 resident_city=""
                      gen str5 resident_zip=""
                      gen str40 incident_location=""
                      gen str15 incident_city=""
                      gen str5 incident_zip=""
                      gen str256 cause_of_death=""
                      gen str256 other_significant_condition=""
                      gen str12 latitude=""
                      gen str12 longitude=""
                      
                      . foreach v of varlist _all {
                      2. di "importing variable `v'"
                      3. insheetjson `v' using C:\Users\Downloads\s3fb-yrjp.json, col( `v' )
                      4. }
                      importing variable case_number
                      1000 observations updated/written.
                      importing variable case_status
                      1000 observations updated/written.
                      importing variable manner_of_death
                      1000 observations updated/written.
                      importing variable age
                      1000 observations updated/written.
                      importing variable race
                      1000 observations updated/written.
                      importing variable gender
                      1000 observations updated/written.
                      importing variable death_date
                      1000 observations updated/written.
                      importing variable death_city
                      1000 observations updated/written.
                      importing variable death_zip
                      1000 observations updated/written.
                      importing variable resident_city
                      1000 observations updated/written.
                      importing variable resident_zip
                      1000 observations updated/written.
                      importing variable incident_location
                      1000 observations updated/written.
                      importing variable incident_city
                      1000 observations updated/written.
                      importing variable incident_zip
                      1000 observations updated/written.
                      importing variable cause_of_death
                      1000 observations updated/written.
                      importing variable other_significant_condition
                      1000 observations updated/written.
                      importing variable latitude
                      Invalid column name/selector 'latitude'. (Possible name candidates are: "case_number" "case_status" "manner_of_death" "age" "race" "gender" "death_date" "death_city" "death_zip" "resident_city" "resident_zip" "
                      > incident_location" "incident_city" "incident_zip" "cause_of_death" "other_significant_condition" )
                      Empty result returned; Nothing to do.
                      importing variable longitude
                      Invalid column name/selector 'longitude'. (Possible name candidates are: "case_number" "case_status" "manner_of_death" "age" "race" "gender" "death_date" "death_city" "death_zip" "resident_city" "resident_zip"
                      > "incident_location" "incident_city" "incident_zip" "cause_of_death" "other_significant_condition" )
                      Empty result returned; Nothing to do.
                      I was able to completely import it using the Libjson Mata Class,

                      Code:
                       1 2
                      +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
                      1 | var1 18-00004 |
                      2 | var2 Closed |
                      3 | var3 Natural |
                      4 | var4 79 |
                      5 | var5 OtherPacificIslander |
                      6 | var6 Male |
                      7 | var7 2018-01-01 07:00:00.0000000 |
                      8 | var8 Santa Clara |
                      9 | var9 95050 |
                      10 | var10 Sunnyvale |
                      11 | var11 94089 |
                      12 | var12 N/A |
                      13 | var13 N/A |
                      14 | var14 N/A |
                      15 | var15 probable complaications of MI due to ASCVD due to Kidney failure (not related to trauma |
                      16 | var16 N/A |
                      17 | var17 +37.3511428 |
                      18 | var18 -121.9523083 |
                      +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
                      but looping with jsonio( from SSC) takes about 1 minute and avoids the need to move to Mata.


                      Code:
                      . clear
                      . local j=1
                      . while `j'>0 {
                      2. jsonio rv, file("c:\Users\Downloads\s3fb-yrjp.json") nourl element("id_`j'/") ob(`j') stub(var)
                      3. if var1[`j']=="" exit
                      4. local ++j
                      5. }
                      
                      . desc
                      
                      Contains data
                      Observations: 1,001
                      Variables: 18
                      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      Variable Storage Display Value
                      name type format label Variable label
                      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      var1 str17 %17s /id_1000/case_number
                      var2 str17 %17s /id_1000/case_status
                      var3 str21 %21s /id_1000/manner_of_death
                      var4 str9 %9s /id_1000/age
                      var5 str20 %20s /id_1000/race
                      var6 str12 %12s /id_1000/gender
                      var7 str27 %27s /id_1000/death_date
                      var8 str16 %16s /id_1000/death_city
                      var9 str15 %15s /id_1000/death_zip
                      var10 str19 %19s /id_1000/resident_city
                      var11 str18 %18s /id_1000/resident_zip
                      var12 str40 %40s /id_1000/incident_location
                      var13 str19 %19s /id_1000/incident_city
                      var14 str18 %18s /id_1000/incident_zip
                      var15 str170 %170s /id_1000/cause_of_death
                      var16 str134 %134s /id_1000/other_significant_condition
                      var17 str14 %14s /id_1000/latitude
                      var18 str15 %15s /id_1000/longitude
                      ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                      Sorted by:
                      Note: Dataset has changed since last saved.
                      LP
                      Luis,

                      Thank you!

                      I will look into it at work tomorrow.

                      Regards,

                      Artur

                      Comment

                      Working...
                      X