Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • open data from web url

    Dear Statalist users,

    I am trying to open the following link but I can't manage to open it correctly. It seems there's a problem with the delimiters. I appreciate your help in opening the dataset.
    Code:
     import delimited "https://ghoapi.azureedge.net/api/AIR_16", delimiter("</td>") clear 
    Thank you,
    Ana

  • #2
    I have several observations here:

    1) -help import delimited- indicates that the delimiter should be a single character, not a string like "</td>").

    2) I wonder about the file name specified in the URL. The lack of a extension, such as CSV, or DAT, or TXT as would be typical for a delimited text file made me wonder if that was the file you wanted to import.

    3) Against my better judgment, I copied "https://ghoapi.azureedge.net/api/AIR_16" to a local file, and inspected it and a) it does not appear at all to be a text file, since I don't find any end of line markers in it in anywhere. It's just one long string of text; and b) The supposed delimiter does not appear anywhere in that file.

    I'd suggest that you get some more information about the structure of that file, which does not appear to be anything you could process with -import delimited-. If you can get that information about the structure, someone here might be able to help you figure out how to read it. I wonder if it might be something like a JSON file, about which I have no particular knowledge, but other people here do.

    Comment


    • #3
      This is (as Mike Lacy says) what we'd call a JSON file. Very much indeed a Python problem. I sort of give some basic syntax here on how to do this.

      Comment


      • #4
        This isn't just any JavaScript Object Notion (JSON) object: it's a list of dictionaries, which is a common way to represent tabular data. JSON objects can represent many possible data structures (that's the big advantage), and not every JSON file is easy to convert into rows and columns of a table. Fortunately, this one is easy to convert. After downloading the file with a browser:

        Code:
        import json
        import csv
        
        with open("AIR_16.json", 'r') as json_file:
            json_object = json.load(json_file)
        list_of_dict = json_object["value"]
        with open("AIR_16.csv", 'w') as csv_file:
            csv_writer = csv.DictWriter(csv_file, fieldnames=list(list_of_dict[0].keys()))
            csv_writer.writeheader()
            csv_writer.writerows(list_of_dict)
        The above is a converter in python. Jared's code is more sophisticated and thoughtful than my code here, but I think the above is a bit simpler, more general, and perhaps a bit easier to work with. I'm sure this is possible in Stata, but I don't happen to know how to do it off the top of my head. Again, this won't work for any given JSON object, but it will work for a list of dictionaries. Note that we have to extract "value" from the high-level dictionary to get the data (some metadata is also stored at this level). I've also attached the converted data here as a .csv in case you aren't comfortable working with python.

        As an aside, JSON is great! I use to love JSON as a way to serialize objects (many of which aren't easily or efficiently representable as a table) back when I was doing a lot of software engineering/object oriented programming. JSON comes out of JavaScript object literals, which are a cool programing-language feature unique to JavaScript. Personally, I much prefer JSON over .xml for object serialization, although I do recognize that .xml is, perhaps, theoretically a bit more general. JSON objects are a bit more readable than xml, and I like that they tend to correspond 1 to 1 to objects I'm used to working with, especially dictionaries and lists.
        Attached Files
        Last edited by Daniel Schaefer; 02 Oct 2023, 12:55.

        Comment


        • #5
          I forgot to mention the biggest advantage of JSON over xml: There are almost always very easy ways to serialize and desterilize objects. I'm pretty lazy all things considered, so I don't want to have to write my own deserializer if I don't have too.

          At least, that's why JSON is easy in languages that encourage the programmer to define arbitrary objects and data structures. In languages like Stata where you spend most of your time manipulating a single object (i.e., the global data frame that stores all of your variables), JSON is maybe less useful.

          Comment


          • #6
            The contents could be extracted with Stata. The complication is that the colon is used both as a delimiter and a separator within the time variables, so one has to fix this manually.

            Code:
            import delimited "https://ghoapi.azureedge.net/api/AIR_16", delimiter("{}") varnames(nonames) stringcols(_all) clear
            foreach var of varlist *{
                if !ustrregexm(`var', "^Id"){
                    drop `var'
                }
            }
            rename (*) v#, addnumber(1)
            gen n=1
            reshape long v, i(n) j(seq)
            drop n seq
            replace v= ustrregexra(v, `"\""', " ")
            split v, p(: ,)
            replace v44= v44+":"+v45+":"+v46+":"+v47
            replace v51= v51+":"+v52+":"+v53+":"+v54
            replace v56= v56+":"+v57+":"+v58+":"+v59
            drop v v45 v46 v47 v52 v53 v54 v57 v58 v59
            rename v# v#, renumber
            
            local counter 1
            forval i=1(2)`c(k)'{
                local ++counter
                local j = `counter'-1
                cap destring v`counter', replace
                rename v`counter' `=strtoname("`=trim(itrim(v`j'[1]))'")'
                drop v`j'
                local++counter
            }
            Res.:

            Code:
            . list Id-Dim1 in 1/30, sep(0)
            
                 +----------------------------------------------------------------------------------------------------------------+
                 |       Id   Indica~e   Spatial~e   Spatia~m   TimeD~pe   Parent~e            ParentLocation   Dim1Type     Dim1 |
                 |----------------------------------------------------------------------------------------------------------------|
              1. | 29526467    AIR_16     COUNTRY        AFG       YEAR        EMR     Eastern Mediterranean        SEX     BTSX  |
              2. | 29526468    AIR_16     COUNTRY        AFG       YEAR        EMR     Eastern Mediterranean        SEX     FMLE  |
              3. | 29526469    AIR_16     COUNTRY        AFG       YEAR        EMR     Eastern Mediterranean        SEX      MLE  |
              4. | 29526470    AIR_16     COUNTRY        AGO       YEAR        AFR                    Africa        SEX     BTSX  |
              5. | 29526471    AIR_16     COUNTRY        AGO       YEAR        AFR                    Africa        SEX     FMLE  |
              6. | 29526472    AIR_16     COUNTRY        AGO       YEAR        AFR                    Africa        SEX      MLE  |
              7. | 29526473    AIR_16     COUNTRY        ALB       YEAR        EUR                    Europe        SEX     BTSX  |
              8. | 29526474    AIR_16     COUNTRY        ALB       YEAR        EUR                    Europe        SEX     FMLE  |
              9. | 29526475    AIR_16     COUNTRY        ALB       YEAR        EUR                    Europe        SEX      MLE  |
             10. | 29526476    AIR_16     COUNTRY        ARE       YEAR        EMR     Eastern Mediterranean        SEX     BTSX  |
             11. | 29526477    AIR_16     COUNTRY        ARE       YEAR        EMR     Eastern Mediterranean        SEX     FMLE  |
             12. | 29526478    AIR_16     COUNTRY        ARE       YEAR        EMR     Eastern Mediterranean        SEX      MLE  |
             13. | 29526479    AIR_16     COUNTRY        ARG       YEAR        AMR                  Americas        SEX     BTSX  |
             14. | 29526480    AIR_16     COUNTRY        ARG       YEAR        AMR                  Americas        SEX     FMLE  |
             15. | 29526481    AIR_16     COUNTRY        ARG       YEAR        AMR                  Americas        SEX      MLE  |
             16. | 29526482    AIR_16     COUNTRY        ARM       YEAR        EUR                    Europe        SEX     BTSX  |
             17. | 29526483    AIR_16     COUNTRY        ARM       YEAR        EUR                    Europe        SEX     FMLE  |
             18. | 29526484    AIR_16     COUNTRY        ARM       YEAR        EUR                    Europe        SEX      MLE  |
             19. | 29526485    AIR_16     COUNTRY        ATG       YEAR        AMR                  Americas        SEX     BTSX  |
             20. | 29526486    AIR_16     COUNTRY        ATG       YEAR        AMR                  Americas        SEX     FMLE  |
             21. | 29526487    AIR_16     COUNTRY        ATG       YEAR        AMR                  Americas        SEX      MLE  |
             22. | 29526488    AIR_16     COUNTRY        AUS       YEAR        WPR           Western Pacific        SEX     BTSX  |
             23. | 29526489    AIR_16     COUNTRY        AUS       YEAR        WPR           Western Pacific        SEX     FMLE  |
             24. | 29526490    AIR_16     COUNTRY        AUS       YEAR        WPR           Western Pacific        SEX      MLE  |
             25. | 29526491    AIR_16     COUNTRY        AUT       YEAR        EUR                    Europe        SEX     BTSX  |
             26. | 29526492    AIR_16     COUNTRY        AUT       YEAR        EUR                    Europe        SEX     FMLE  |
             27. | 29526493    AIR_16     COUNTRY        AUT       YEAR        EUR                    Europe        SEX      MLE  |
             28. | 29526494    AIR_16     COUNTRY        AZE       YEAR        EUR                    Europe        SEX     BTSX  |
             29. | 29526495    AIR_16     COUNTRY        AZE       YEAR        EUR                    Europe        SEX     FMLE  |
             30. | 29526496    AIR_16     COUNTRY        AZE       YEAR        EUR                    Europe        SEX      MLE  |
                 +----------------------------------------------------------------------------------------------------------------+

            Comment


            • #7
              Here is another pure Stata approach to parsing the JSON file:

              Code:
              // load the file from the filesystem.
              clear
              set obs 1
              file open jsonfile using "H:\Users\Daniel Schaefer\AIR_16.json", r
              file read jsonfile content
              file close jsonfile
              
              // extract data from metadata.
              gen strL content = `"`content'"'
              gen strL data = regexs(1) if regexm(content,"\[\{(.+)\}\]")
              split data, parse("},{") gen(row)
              drop data content
              gen i = _n
              reshape long row, i(i) j(j)
              drop i j
              
              // extract variable names.
              gen strL variable_names = ustrregexra(row,":(.+?)(,|$)"," ")
              replace variable_names = subinstr(variable_names, char(34), "", .)
              local colnum = wordcount(variable_names)
              assert variable_names == variable_names[1]
              
              // extract data from each row and place into named variables.
              forv varnum = 1/`colnum'{
                  local varname = word(variable_names, `varnum')
                  local regex = char(34) + "`varname'" + char(34) + ":(.+?)(,|$)"
                  gen `varname' = ustrregexs(1) if ustrregexm(row, `"`regex'"')
                  replace `varname' = subinstr(`varname', char(34), "", .)
                  replace `varname' = "" if `varname' == "null"
                  cap destring `varname', replace
              }
              drop row
              There are a few differences between my approach and the one in #6. Perhaps most significantly, I extract variable names and the corresponding data with regular expressions rather than by splitting using the colon character. Note as well that I don't have Java installed so I just download the file and load it from the filesystem directly. I'm not sure this is any better than #6, I just wanted to give it a try myself.

              Edit: forgot to post the results.

              Code:
              . list Id-Dim1 in 1/30, sep(0)
              
                   +-----------------------------------------------------------------------------------------------------------+
                   |       Id   Indica~e   Spatia~e   Spatia~m   TimeD~pe   Parent~e          ParentLocation   Dim1Type   Dim1 |
                   |-----------------------------------------------------------------------------------------------------------|
                1. | 29526467     AIR_16    COUNTRY        AFG       YEAR        EMR   Eastern Mediterranean        SEX   BTSX |
                2. | 29526468     AIR_16    COUNTRY        AFG       YEAR        EMR   Eastern Mediterranean        SEX   FMLE |
                3. | 29526469     AIR_16    COUNTRY        AFG       YEAR        EMR   Eastern Mediterranean        SEX    MLE |
                4. | 29526470     AIR_16    COUNTRY        AGO       YEAR        AFR                  Africa        SEX   BTSX |
                5. | 29526471     AIR_16    COUNTRY        AGO       YEAR        AFR                  Africa        SEX   FMLE |
                6. | 29526472     AIR_16    COUNTRY        AGO       YEAR        AFR                  Africa        SEX    MLE |
                7. | 29526473     AIR_16    COUNTRY        ALB       YEAR        EUR                  Europe        SEX   BTSX |
                8. | 29526474     AIR_16    COUNTRY        ALB       YEAR        EUR                  Europe        SEX   FMLE |
                9. | 29526475     AIR_16    COUNTRY        ALB       YEAR        EUR                  Europe        SEX    MLE |
               10. | 29526476     AIR_16    COUNTRY        ARE       YEAR        EMR   Eastern Mediterranean        SEX   BTSX |
               11. | 29526477     AIR_16    COUNTRY        ARE       YEAR        EMR   Eastern Mediterranean        SEX   FMLE |
               12. | 29526478     AIR_16    COUNTRY        ARE       YEAR        EMR   Eastern Mediterranean        SEX    MLE |
               13. | 29526479     AIR_16    COUNTRY        ARG       YEAR        AMR                Americas        SEX   BTSX |
               14. | 29526480     AIR_16    COUNTRY        ARG       YEAR        AMR                Americas        SEX   FMLE |
               15. | 29526481     AIR_16    COUNTRY        ARG       YEAR        AMR                Americas        SEX    MLE |
               16. | 29526482     AIR_16    COUNTRY        ARM       YEAR        EUR                  Europe        SEX   BTSX |
               17. | 29526483     AIR_16    COUNTRY        ARM       YEAR        EUR                  Europe        SEX   FMLE |
               18. | 29526484     AIR_16    COUNTRY        ARM       YEAR        EUR                  Europe        SEX    MLE |
               19. | 29526485     AIR_16    COUNTRY        ATG       YEAR        AMR                Americas        SEX   BTSX |
               20. | 29526486     AIR_16    COUNTRY        ATG       YEAR        AMR                Americas        SEX   FMLE |
               21. | 29526487     AIR_16    COUNTRY        ATG       YEAR        AMR                Americas        SEX    MLE |
               22. | 29526488     AIR_16    COUNTRY        AUS       YEAR        WPR         Western Pacific        SEX   BTSX |
               23. | 29526489     AIR_16    COUNTRY        AUS       YEAR        WPR         Western Pacific        SEX   FMLE |
               24. | 29526490     AIR_16    COUNTRY        AUS       YEAR        WPR         Western Pacific        SEX    MLE |
               25. | 29526491     AIR_16    COUNTRY        AUT       YEAR        EUR                  Europe        SEX   BTSX |
               26. | 29526492     AIR_16    COUNTRY        AUT       YEAR        EUR                  Europe        SEX   FMLE |
               27. | 29526493     AIR_16    COUNTRY        AUT       YEAR        EUR                  Europe        SEX    MLE |
               28. | 29526494     AIR_16    COUNTRY        AZE       YEAR        EUR                  Europe        SEX   BTSX |
               29. | 29526495     AIR_16    COUNTRY        AZE       YEAR        EUR                  Europe        SEX   FMLE |
               30. | 29526496     AIR_16    COUNTRY        AZE       YEAR        EUR                  Europe        SEX    MLE |
                   +-----------------------------------------------------------------------------------------------------------+
              Last edited by Daniel Schaefer; 03 Oct 2023, 16:07.

              Comment

              Working...
              X