Python 3 JSON/Dictionary help , not sure how to parse values? - python

I am attempting to write out some JSON output into a csv file but first i am trying to understand how the data is structured. I am working from a sample script which connects to an API and pulls down data based a query specified.
The json is returned from the server with this query:
response = api_client.get_search_results(search_id, 'application/json')
body = response.read().decode('utf-8')
body_json = json.loads(body)
If i perform a
print(body_json.keys())
i get the following output:
dict_keys(['events'])
So from this is it right to assume that the entries i am really interested in are another dictionary inside the events dictionary?
If so how can i 'access' them?
Sample JSON data the search query returns to the variable above
{
"events":[
{
"source_ip":"10.143.223.172",
"dest_ip":"104.20.251.41",
"domain":"www.theregister.co.uk",
"Domain Path":"NULL",
"Domain Query":"NULL",
"Http Method":"GET",
"Protocol":"HTTP",
"Category":"NULL",
"FullURL":"http://www.theregister.co.uk"
},
{
"source_ip":"10.143.223.172",
"dest_ip":"104.20.251.41",
"domain":"www.theregister.co.uk",
"Domain Path":"/2017/05/25/windows_is_now_built_on_git/",
"Domain Query":"NULL",
"Http Method":"GET",
"Protocol":"HTTP",
"Category":"NULL",
"FullURL":"http://www.theregister.co.uk/2017/05/25/windows_is_now_built_on_git/"
},
]
}
Any help would be greatly appreciated.

Json.keys() only returns the keys associated with json.
Here is the code:
for key in json_data.keys():
for i in range(len(json_data[key])):
key2 = json_data[key][i].keys()
for k in key2:
print k + ":" + json_data[key][i][k]
Output:
Http Method:GET
Category:NULL
domain:www.theregister.co.uk
Protocol:HTTP
Domain Query:NULL
Domain Path:NULL
source_ip:10.143.223.172
FullURL:http://www.theregister.co.uk
dest_ip:104.20.251.41
Http Method:GET
Category:NULL
domain:www.theregister.co.uk
Protocol:HTTP
Domain Query:NULL
Domain Path:/2017/05/25/windows_is_now_built_on_git/
source_ip:10.143.223.172
FullURL:http://www.theregister.co.uk/2017/05/25/windows_is_now_built_on_git/
dest_ip:104.20.251.41

To answer your question: yes. Your body_json has returned a dictionary with a key of "events" which contains a list of dictionaries.
The best way to 'access' them would be to iterate over them.
A very rudimentary example:
for i in body_json['events']:
print(i)
Of course, during the iteration you could access the specific data that you needed by replacing print(i) with print(i['FullURL'])and saving it to a variable and so on.
It's important to note that whenever you're working with API's that return a JSON response, you're simply working with dictionaries and Python data structures.
Best of luck.

Related

Search for specific information within a json file

So I am trying to locate and acquire data from an api, I am fine with actually getting the data which is in json format from the api into my python program, however I am having troubles searching through the json for the specific data I want.
Here is a basic idea of what the json file from the api looks like:
{
"data": {
"inventory": {
"object_name": {
"amount": 8,
},
(Obviously the } close, I just didn't copy them)
I am trying to locate the amount within the json file of a specific object.
So far, here is the code I have, however, I have run into the error json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
I have researched the error and it appears to be caused usually by a faulty json file, however as I have imported the json file from an api, it having issues is not the case and must be an issue with some of the converting to strings and such I have done in my code.
data = requests.get('[api]',
headers={
"[api key name]" : "[api key]"
})
dataJson = data.json()
dataStr = str(dataJson)
amt = json.loads(dataStr)['data'].get('inventory').get('object_name').get('amount')
As stated previously, the main issue I have is actually collecting the data I need from the json endpoint, everything is fine with getting the data into the python script.
dataJson = data.json() is already python dict no need to json.loads(dataStr) just use
data = requests.get('[api]',
headers={
"[api key name]" : "[api key]"
})
dataJson = data.json()
amt = dataStr['data'].get('inventory').get('object_name').get('amount')

Extract fields from a JSON in Python(Django)

Hello I am new to the python world, and I am learning, I am currently developing a WebApp in Django and I am using ajax for sending requests, what happens is that in the view.py I get a JSON, from which I have not been able to extract the attributes individually to send to a SQL query, I have tried every possible way, I appreciate any help in advance.
def profesionales(request):
body_unicode = request.body.decode('utf-8')
received_json = json.loads(body_unicode)
data = JsonResponse(received_json, safe=False)
return data
Data returns the following to me
{opcion: 2, fecha_ini: "2021-02-01", fecha_fin: "2021-02-08", profesional: "168", sede: "Modulo 7", grafico: "2"}
This is the answer I get and I need to extract each of the values ​​of each key into a variable
You can interpret this as dict.
for key in received_json:
print(key,received_json[key])
# do your stuff here
but if it's always a object with same keys (fixed keys), you can access directly:
key_data = received_json[key]

How to speed up returning a 20MB Json file from a Python-Flask application?

I am trying to call an API which in turn triggers a store procedure from our sqlserver database. This is how I coded it.
class Api_Name(Resource):
def __init__(self):
pass
#classmethod
def get(self):
try:
engine = database_engine
connection = engine.connect()
sql = "DECLARE #return_value int EXEC #return_value = [dbname].[dbo].[proc_name])
return call_proc(sql, apiname, starttime, connection)
except Exception as e:
return {'message': 'Proc execution failed with error => {error}'.format(error=e)}, 400
pass
call_proc is the method where I return the JSON from database.
def call_proc(sql: str, connection):
try:
json_data = []
rv = connection.execute(sql)
for result in rv:
json_data.append(dict(zip(result.keys(), result)))
return Response(json.dumps(json_data), status=200)
except Exception as e:
return {'message': '{error}'.format(error=e)}, 400
finally:
connection.close()
The problem with the output is the way JSON is returned and the size of it.
At first the API used to take 1minute 30seconds: when the return statement was like this:
case1: return Response(json.dumps(json_data), status=200, mimetype='application/json')
After looking online, I found that the above statement is trying to prettify JSON. So I removed mimetype from the response & made it as
case2: return Response(json.dumps(json_data), status=200)
The API runs for 30seconds, although the JSON output is not aligned properly but its still JSON.
I see the output size of the JSON returned from the API is close 20MB. I observed this on postman response:
Status: 200 OK Time: 29s Size: 19MB
The difference in Json output:
case1:
[ {
"col1":"val1",
"col2":"val2"
},
{
"col1":"val1",
"col2":"val2"
}
]
case2:
[{"col1":"val1","col2":"val2"},{"col1":"val1","col2":"val2"}]
Will the difference in output from the two aforementioned cases are different ? If so, how can I fix the problem ?
If there is no difference, is there any way I speed up this further and reduce the run time further more, like compressing the JSON which I am returning ?
You can use gzip compression to make your plain text weight from Megabytes to even Kilobytes. Or even use flask-compress library for that.
Also I'd suggest to use ujson to make dump() call faster.
import gzip
from flask import make_response
import ujson as json
#app.route('/data.json')
def compress():
compression_level = 5 # of 9 max
data = [
{"col1": "val1", "col2": "val2"},
{"col1": "val1", "col2": "val2"}
]
content = gzip.compress(json.dumps(data).encode('utf8'), compression_level)
response = make_response(content)
response.headers['Content-length'] = len(content)
response.headers['Content-Encoding'] = 'gzip'
return response
Documentation:
https://docs.python.org/3/library/gzip.html
https://github.com/colour-science/flask-compress
https://pypi.org/project/ujson/
First of all, profile: if 90% the time is being spent transferring across the network then optimising processing speed is less useful than optimising transfer speed (for example, by compressing the response as wowkin recommended (though the web server may be configured to do this automatically, if you are using one)
Assuming that constructing the JSON is slow, if you control the database code you could use its JSON capabilities to serialise the data, and avoid doing it at the Python layer. For example,
SELECT col1, col2
FROM tbl
WHERE col3 > 42
FOR JSON AUTO
would give you
[
{
"col1": "foo",
"col2": 1
},
{
"col1": "bar",
"col2": 2
},
...
]
Nested structures can be created too, described in the docs.
If the requester only needs the data, return it as a download using flask's send_file feature and avoid the cost of constructing an HTML response:
from io import BytesIO
from flask import send_file
def call_proc(sql: str, connection):
try:
rv = connection.execute(sql)
json_data = rv.fetchone()[0]
# BytesIO expects encoded data; if you can get the server to encode
# the data instead it may be faster.
encoded_json = json_data.encode('utf-8')
buf = BytesIO(encoded_json)
return send_file(buf, mimetype='application/json', as_attachment=True, conditional=True)
except Exception as e:
return {'message': '{error}'.format(error=e)}, 400
finally:
connection.close()
You need to implement pagination on your API. 19MB is absurdly large and will lead to some very annoyed users.
gzip and clevererness with the JSON responses will sadly not be enough, you'll need to put in a bit more legwork.
Luckily, there's many pagination questions and answers, and Flasks modular approach to things will mean that someone probably wrote up a module that's applicable to your problem. I'd start off by re-implementing the method with an ORM. I heard that sqlalchemy is quite good.
To answer your question:
1 - Both JSON are semantically identical.
You can make use of http://www.jsondiff.com to compare two JSON.
2 - I would recommend you to make chunks of your data and send it across network.
This might help:
https://masnun.com/2016/09/18/python-using-the-requests-module-to-download-large-files-efficiently.html
TL;DR; Try restructuring your JSON payload (i.e. change schema)
I see that you are constructing the JSON response in one of your APIs. Currently, your JSON payload looks something like:
[
{
"col0": "val00",
"col1": "val01"
},
{
"col0": "val10",
"col1": "val11"
}
...
]
I suggest you restructure it in such a way that each (first level) key in your JSON represents the entire column. So, for the above case, it will become something like:
{
"col0": ["val00", "val10", "val20", ...],
"col1": ["val01", "val11", "val21", ...]
}
Here are the results from some offline test I performed.
Experiment variables:
NUMBER_OF_COLUMNS = 10
NUMBER_OF_ROWS = 100000
LENGTH_OF_STR_DATA = 5
#!/usr/bin/env python3
import json
NUMBER_OF_COLUMNS = 10
NUMBER_OF_ROWS = 100000
LENGTH_OF_STR_DATA = 5
def get_column_name(id_):
return 'col%d' % id_
def random_data():
import string
import random
return ''.join(random.choices(string.ascii_letters, k=LENGTH_OF_STR_DATA))
def get_row():
return {
get_column_name(i): random_data()
for i in range(NUMBER_OF_COLUMNS)
}
# data1 has same schema as your JSON
data1 = [
get_row() for _ in range(NUMBER_OF_ROWS)
]
with open("/var/tmp/1.json", "w") as f:
json.dump(data1, f)
def get_column():
return [random_data() for _ in range(NUMBER_OF_ROWS)]
# data2 has the new proposed schema, to help you reduce the size
data2 = {
get_column_name(i): get_column()
for i in range(NUMBER_OF_COLUMNS)
}
with open("/var/tmp/2.json", "w") as f:
json.dump(data2, f)
Comparing sizes of the two JSONs:
$ du -h /var/tmp/1.json
17M
$ du -h /var/tmp/2.json
8.6M
In this case, it almost got reduced by half.
I would suggest you do the following:
First and foremost, profile your code to see the real culprit. If it is really the payload size, proceed further.
Try to change your JSON's schema (as suggested above)
Compress your payload before sending (either from your Flask WSGI app layer or your webserver level - if you are running your Flask app behind some production grade webserver like Apache or Nginx)
For large data that you can't paginate using something like ndjson (or any type of delimited record format) can really reduce the server resources needed since you'd be preventing holding the JSON object in memory. You would need to get access to the response stream to write each object/line to the response though.
The response
[ {
"col1":"val1",
"col2":"val2"
},
{
"col1":"val1",
"col2":"val2"
}
]
Would end up looking like
{"col1":"val1","col2":"val2"}
{"col1":"val1","col2":"val2"}
This also has advantages on the client since you can parse and process each line on it's own as well.
If you aren't dealing with nested data structures responding with a CSV is going to be even smaller.
I want to note that there is a standard way to write a sequence of separate records in JSON, and it's described in RFC 7464. For each record:
Write the record separator byte (0x1E).
Write the JSON record, which is a regular JSON document that can also contain inner line breaks, in UTF-8.
Write the line feed byte (0x0A).
(Note that the JSON text sequence format, as it's called, uses a more liberal syntax for parsing text sequences of this kind; see the RFC for details.)
In your example, the JSON text sequence would look as follows, where \x1E and \x0A are the record separator and line feed bytes, respectively:
\x1E{"col1":"val1","col2":"val2"}\x0A\x1E{"col1":"val1","col2":"val2"}\x0A
Since the JSON text sequence format allows inner line breaks, you can write each JSON record as you naturally would, as in the following example:
\x1E{
"col1":"val1",
"col2":"val2"}
\x0A\x1E{
"col1":"val1",
"col2":"val2"
}\x0A
Notice that the media type for JSON text sequences is not application/json, but application/json-seq; see the RFC.

JSON Parse an element inside an element in Python

I have a JSON text grabbed from an API of a website:
{"result":"true","product":{"made":{"Taiwan":"Taipei","HongKong":"KongStore","Area":"Asia"}}}
I want to capture "Taiwan" and "Taipei" but always fail.
Here is my code:
import json
weather = urllib2.urlopen('url')
wjson = weather.read()
wjdata = json.loads(wjson)
print wjdata['product']['made'][0]['Taiwan']
I always get the following error:
Keyword 0 error
Whats the correct way to parse that json?
You are indexing an array where there are none.
The JSON is the following:
{
"result":"true",
"product": {
"made": {
"Taiwan":"Taipei",
"HongKong":"KongStore",
"Area":"Asia"
}
}
}
And the above contains no arrays.
You are assuming the JSON structure to be something like this:
{
"result":"true",
"product": {
"made": [
{"Taiwan":"Taipei"},
{"HongKong":"KongStore"},
{"Area":"Asia"}
]
}
}
From a brief look at the doc pages for the json package, I found this conversion table: Conversion table using json.loads
It tells us that a JSON object translates to a dict. And a dict has a method called keys, which returns a list of the keys.
I suggest you try something like this:
#... omitted code
objectKeys = wjdata['product']['made'].keys()
# You should now have a list of the keys stored in objectKeys.
for key in objectKeys:
print key
if key == 'Taiwan':
print 'Eureka'
I haven't tested the above code, but I think you get the gist here :)
wjdata['product']['made']['Taiwan'] works

Generating json in python for app engine

I am somewhat new to python and I am wondering what the best way is to generate json in a loop. I could just mash a bunch of strings together in the loop, but I'm sure there is a better way. Here's some more specifics. I am using app engine in python to create a service that returns json as a response.
So as an example, let's say someone requests a list of user records from the service. After the service queries for the records, it needs to return json for each record it found. Maybe something like this:
{records:
{record: { name:bob, email:blah#blah.com, age:25 } },
{record: { name:steve, email:blah#blahblah.com, age:30 } },
{record: { name:jimmy, email:blah#b.com, age:31 } },
}
Excuse my poorly formatted json. Thanks for your help.
Creating your own JSON is silly. Use json or simplejson for this instead.
>>> json.dumps(dict(foo=42))
'{"foo": 42}'
My question is how do I add to the
dictionary dynamically? So foreach
record in my list of records, add a
record to the dictionary.
You may be looking to create a list of dictionaries.
records = []
record1 = {"name":"Bob", "email":"bob#email.com"}
records.append(record1)
record2 = {"name":"Bob2", "email":"bob2#email.com"}
records.append(record2)
Then in app engine, use the code above to export records as json.
Few steps here.
First import simplejson
from django.utils import simplejson
Then create a function that will return json with the appropriate data header.
def write_json(self, data):
self.response.headers['Content-Type'] = 'application/json'
self.response.out.write(simplejson.dumps(data))
Then from within your post or get handler, create a python dictionary with the desired data and pass that into the function you created.
ret = {"records":{
"record": {"name": "bob", ...}
...
}
write_json(self, ret)

Categories