Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Python Dictionary (I think) to dataframe

    Hey everyone. To anyone who may use Python... A coworker and I wish to obtain some trade data from U.N. Comtrade. Here is a small example, using requests.
    Code:
    cls
    clear *
    python:
    import requests
    url='https://comtrade.un.org/api/get?max=100000&type=C&freq=M&px=S2&ps=201001&r=all&p=156&rg=2&cc=9101'
    un_data=requests.get(url)
    print(un_data.content)
    end
    This produces a.... thing. It looks like a dictionary. It looks like a dictionary, of some kind. How might I convert this to a dataframe so I can put it into Stata and work with it? All i need is for "yr", "period", "periodDesc":"January 2010", and so on, to have their own column, respectively. Note, I will cross post this on Stack and link it to it. The final thing should look like this
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long(yr period) str24 rtTitle long TradeValue
    2017 201706 "Canada"                     147650
    2017 201703 "China, Hong Kong SAR"      3749878
    2017 201710 "Canada"                      57796
    2017 201708 "China, Hong Kong SAR"      2637734
    2017 201707 "Canada"                     250106
    2017 201712 "China, Hong Kong SAR"      2283290
    2017 201712 "Belgium"                     83828
    2017 201704 "Rep. of Korea"               78321
    2017 201707 "Belgium"                    105535
    2017 201710 "ASEAN"                      255697
    2017 201711 "ASEAN"                      809781
    2017 201709 "Canada"                     237649
    2017 201712 "Canada"                      19262
    2017 201706 "ASEAN"                      351560
    2017 201701 "Poland"                        186
    2017 201705 "China, Hong Kong SAR"      2585817
    2017 201711 "Belgium"                     54809
    2017 201711 "South Africa"                    0
    2017 201706 "Japan"                       23631
    2017 201708 "EU"                        3992007
    2017 201706 "United Kingdom"            1769918
    2017 201708 "United Kingdom"            2754432
    2017 201709 "United Kingdom"            2757139
    2017 201705 "Japan"                       31784
    2017 201702 "EU"                        4471396
    2017 201701 "Denmark"                     62459
    2017 201702 "Japan"                       26842
    2017 201705 "United Kingdom"            2497820
    2017 201706 "EU"                        3280962
    2017 201708 "Jordan"                       1352
    2017 201704 "Jordan"                       1127
    2017 201709 "EU"                        4515770
    2017 201703 "United Kingdom"             550694
    2017 201706 "Denmark"                       157
    2017 201705 "Switzerland"              24861018
    2017 201705 "Spain"                        3910
    2017 201702 "Spain"                        7367
    2017 201705 "Portugal"                   554711
    2017 201707 "Spain"                        2256
    2017 201705 "Indonesia"                    4546
    2017 201709 "Portugal"                   251010
    2017 201711 "Spain"                       61724
    2017 201702 "Portugal"                   345734
    2017 201709 "Switzerland"              18864650
    2017 201704 "Switzerland"              29645183
    2017 201707 "Portugal"                   519235
    2017 201708 "Spain"                       55665
    2017 201704 "Portugal"                   214740
    2017 201710 "Switzerland"              25679499
    2017 201712 "Switzerland"              39337860
    2017 201708 "Portugal"                    13542
    2017 201708 "Malaysia"                     3501
    2017 201710 "United States of America"   295324
    2017 201703 "United States of America"    95075
    2017 201711 "France"                     272189
    2017 201703 "Malaysia"                     6931
    2017 201705 "United States of America"   123130
    2017 201706 "Malta"                        8422
    2017 201710 "United Arab Emirates"        15488
    2017 201710 "Thailand"                    62043
    2017 201702 "France"                     136805
    2017 201701 "United Arab Emirates"         1715
    2017 201709 "France"                     368213
    2017 201707 "United States of America"   156303
    2017 201705 "France"                     329651
    2017 201704 "United States of America"    48700
    2017 201701 "Thailand"                        7
    2017 201707 "Thailand"                    10563
    2017 201710 "United Kingdom"            1535513
    2017 201711 "United Kingdom"            1736628
    2017 201703 "Japan"                       23308
    2017 201704 "South Africa"                30253
    2017 201704 "Japan"                        3089
    2017 201702 "Denmark"                      4755
    2017 201702 "Norway"                       6483
    2017 201710 "Japan"                       10429
    2017 201707 "Denmark"                    319572
    2017 201712 "EU"                        4755897
    2017 201711 "Honduras"                     9113
    2017 201707 "Japan"                       16469
    2017 201706 "South Africa"                  102
    2017 201704 "United Kingdom"             837483
    2017 201709 "Japan"                       19201
    2017 201701 "EU"                        2397025
    2017 201708 "Denmark"                     41270
    2017 201711 "EU"                        5367992
    2017 201701 "United Kingdom"            1426777
    2017 201707 "United Kingdom"            3199253
    2017 201703 "EU"                        3258888
    2017 201710 "EU"                        3521723
    2017 201704 "EU"                        2048226
    2017 201705 "Denmark"                    201524
    2017 201712 "United Kingdom"            2474348
    2017 201709 "Norway"                       2554
    2017 201707 "EU"                        5736215
    2017 201702 "United Kingdom"            2473939
    2017 201705 "EU"                        4782490
    2017 201707 "Italy"                      543315
    2017 201710 "Croatia"                     21260
    2017 201705 "Singapore"                  260451
    end

  • #2
    Hi Jared,

    You should start by "pretty printing" the json.

    Code:
    import requests
    import json
    import csv
    
    url='https://comtrade.un.org/api/get?max=100000&type=C&freq=M&px=S2&ps=201001&r=all&p=156&rg=2&cc=9101'
    un_data=requests.get(url)
    json_data = json.loads(un_data.content)
    print(json.dumps(json_data, indent=4))
    Data that is neatly organized into a single table (as Stata tends to expect) is actually fairly rare within the wider world of computer programming. You might start by noticing that a dictionary element can contain any object. Focus on the "dataset" key in the top-level dictionary. I believe everything else is metadata. The value associated with the "dataset" key is a list of dictionaries. Each dictionary in the list shares the same set of keys. This makes me think that at this level, keys are column names, values are observed values, and items in the list correspond to rows in the csv. It turns out that you can transform this data structure into a .csv with the csv python package.

    Code:
    to_csv = json_data["dataset"]
    print(len(to_csv)) # this gives you the expected number of rows.
    with open('result.csv', 'w') as f:
        dict_writer = csv.DictWriter(f, to_csv[0].keys())
        dict_writer.writeheader()
        dict_writer.writerows(to_csv)
    don't forget to "import csv."

    Edit: I bet you can convert this to a Stata dataframe "in memory" if you want to, but I would say it's better to write a .csv to the disk anyway. Why rely on an API over a network to access your data if you can just have what you need on the disk?
    Last edited by Daniel Schaefer; 05 Jan 2023, 22:26.

    Comment


    • #3
      Complex object literals like this are actually one of the best things about JavaScript. JavaScript isn't an object oriented language itself, but the JavaScript object literal system and particularly object literal serialization (JSON) is probably the best interchange system out there for 99% of problems in object oriented languages. I used to send C# objects over the network all the time with JSON. JSON is undoubtably one of the best parts of JavaScript.

      Edit: or better, https://www.youtube.com/watch?v=hQVTIJBZook&t=3139s
      Last edited by Daniel Schaefer; 05 Jan 2023, 22:42.

      Comment


      • #4
        Daniel Schaefer , I swear you're a lifesaver! My only other question is... how do I quiet the output the interpreter gives? So, I'm okay with the exception handling printing. But, I don't want the 297 to repeat as often as it does. In Stata, we'd simply do qui ..... but there's no way (that I'm aware of) to do this in Python, and this page didn't exactly have a straightforward answer. My loops looks as follows
        Code:
        python:
        import requests
        import json
        import csv
        import time
        
        base = 'https://comtrade.un.org/api/get?max=100000&type=C&freq=M&px=S2&ps='
        
        
        for i in range(2010,2017):
            for x in range(1,13):
                time.sleep(2)
                if x < 10:
                    x = str(x).zfill(2)
                lst = [base,str(i),str(x),'&r=all&p=156&rg=2&cc=9101']
                api = ''.join(lst)
                un_data=requests.get(api)
                json_data = json.loads(un_data.content)                
                to_csv = json_data["dataset"]
                try:
                    with open('LuxWatch' +str(i) + str(x) + '.csv', 'w') as f:
                        dict_writer = csv.DictWriter(f, to_csv[0].keys())
                        dict_writer.writeheader()
                        dict_writer.writerows(to_csv)
                except:
                    print("No data, moving on...")
                    pass
        end
        The issue appears to be from dict_writer.writeheader(), which I want Python to perform, just without any output/number being shown.

        Comment


        • #5
          Hi Jared,

          Glad I could help.

          But, I don't want the 297 to repeat as often as it does.
          This is an interesting question. So basically, you want to suppress certain output in python. Okay. First, I think it might be helpful to understand that there is an important difference between the program that you use to write, edit and run code (e.g. Stata) and the programming language (e.g. ado). Python has a variety of programs that you can use to write code and print to the console. In practice, you often use the code editing software to print certain kinds of output, and to not print other kinds of output, and you can go into the settings to change the way the software behaves. So, when I try to reproduce this in PyCharm, nothing is actually printed to the console, and I think this is actually the usual behavior in most python editing software.

          I typically use Stata on a server provided by my institution, and it looks like our python distribution is broken in such a way that I can't use requests. So I'm afraid I can't reproduce your code there either. I may, however, be able to explain what is happening in that stack overflow post. Sorry if this post is a little longwinded, but I think all of the concepts here are worth touching on.

          Python doesn't have commands in the same way that Stata has commands - or at least, python commands are relatively rare. Python has functions and methods. All functions take some kind of input (including the empty set) between parentheses and return something (even if that something is the empty set). Methods are just functions that are attached to objects, and objects are really prepackaged data and some methods. The details here aren't really important. For our purposes, methods are just functions in an object oriented programming context. Technically, "pure functions" only take inputs and return outputs. Functions and methods can also have what are called "side effects" where they change the state of something not within their own scope. This would be like writing data to the console or to the file system. It is a "side effect" because the function isn't just returning a value, it is also mutating something outside of itself. Generally, it is actually good practice to avoid side effects (although it is much more common to have side effects in object oriented programming).

          Whereas python is more or less a hybrid functional and object oriented language (this isn't quite right, but the details aren't important), ado is much more of an imperative language. Stata has commands as a first order way of doing things and a global dataset that exists in a global environment. The entire point of many Stata commands is to preform some operation on the global dataset, usually to mutate global data or to derive some useful statistics, then print those statistics to the console in a convenient and readable way. Python wasn't designed as a statistical language, and so it doesn't assume anything in particular about what you might want to do with the return value of a function. It just passes the value up to the next level in the scope and lets you do what you like with it, including printing to the console. This point is important: the author of dict_writer.writeheader() didn't write the function in a way that prints output - the function doesn't call print, and it would be poorly written if it did. The user should define the output of a program in Python. So, as far as python is concerned, there isn't anything to do quietly. Stata on the other hand takes the return value of a function in the global environment and prints it to the console. This makes sense given how Stata is designed to work, but it is still (perhaps) annoying.

          As a quick aside, R would really like to be a purely functional language, rather than an imperative language like Stata or a hybrid language like Python. These programming language concepts are really really useful if you want to start to program in a number of different languages, since one of easiest pitfalls when learning a new language is wishing it were more like something you are already familiar with. Programming language concepts help you to understand why a particular language is structured the way that it is.

          So back to the problem at hand. Stata appears to assume that if a python return value is passed into the global environment, and you don't do anything in particular with it, it should be printed to the console. Notice in the following code the return value of the example function is printed to the console.

          Code:
          python:
          def example_function(input):
              return input + 1
          
          example_function(1)
          end
          Now, instead, let's pass the return value into a variable called return_value. Notice as well that in Stata the word "variable" typically means a column (or vector) in your dataset. In python it can be any defined primitive or object.

          Code:
          python:
          def example_function(input):
              return input + 1
          
          return_value = example_function(1)
          end
          The return value is not printed to the console! This is exactly what the stack overflow post is saying: dict_writer.writeheader() is a method. It is a function that is attached to your dict_writer object. It doesn't require you to pass anything into it because the object it is attached to already stores the headers, so you pass in the empty set (). You passed the headers into the object when you created the object on the previous line. But it still works like a function. It still takes an input (in this case, the empty set) and it still returns an output. It also has a side effect, in that it writes data to a file on your file system. So in order to suppress the output in Stata, you need to store the return value in a variable. It doesn't matter if you use the variable, you just have to put the return value somewhere besides the global environment.

          Again, I can't reproduce your code - at least not without trying to hack the python installation on the server I use for Stata, but I believe something like this will work:

          Code:
          python:
          import requests
          import json
          import csv
          import time
          
          base = 'https://comtrade.un.org/api/get?max=100000&type=C&freq=M&px=S2&ps='
          
          for i in range(2010,2017):
              for x in range(1,13):
                  time.sleep(2)
                  if x < 10:
                      x = str(x).zfill(2)
                  lst = [base,str(i),str(x),'&r=all&p=156&rg=2&cc=9101']
                  api = ''.join(lst)
                  un_data=requests.get(api)
                  json_data = json.loads(un_data.content)              
                 to_csv = json_data["dataset"]
                  try:
                      with open('LuxWatch' +str(i) + str(x) + '.csv', 'w') as f:
                          dict_writer = csv.DictWriter(f, to_csv[0].keys())
                          bytes = dict_writer.writeheader()
                          dict_writer.writerows(to_csv)
                  except:
                      print("No data, moving on...")
                      pass
          end
          A couple of notes, unrelated to your question: I'm curious why you use time.sleep() to suspend execution. I expect this is to deal with some kind of multithreading issue where you want to wait for a thread to finish executing before moving on (if so, there are probably better ways to do that) but I'm not sure which of these operations is asynchronous...

          I'm not sure I would use the try and except blocks to handle a response with no data. This implementation is probably fine, but I would prefer to explicitly test to see if I have data in my Json object with an if statement, and only try to write data if data is present. The problem with the try and except blocks is that you may run into some other exception that is not caused by the problem you assume you have. In that case you probably want to stop execution and see the error, but here you will just quietly pass the error by.
          Last edited by Daniel Schaefer; 06 Jan 2023, 12:01.

          Comment

          Working...
          X