Search for specific information within a json file

Search for specific information within a json file - python

So I am trying to locate and acquire data from an api, I am fine with actually getting the data which is in json format from the api into my python program, however I am having troubles searching through the json for the specific data I want.
Here is a basic idea of what the json file from the api looks like:
{
"data": {
"inventory": {
"object_name": {
"amount": 8,
},
(Obviously the } close, I just didn't copy them)
I am trying to locate the amount within the json file of a specific object.
So far, here is the code I have, however, I have run into the error json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
I have researched the error and it appears to be caused usually by a faulty json file, however as I have imported the json file from an api, it having issues is not the case and must be an issue with some of the converting to strings and such I have done in my code.
data = requests.get('[api]',
headers={
"[api key name]" : "[api key]"
})
dataJson = data.json()
dataStr = str(dataJson)
amt = json.loads(dataStr)['data'].get('inventory').get('object_name').get('amount')
As stated previously, the main issue I have is actually collecting the data I need from the json endpoint, everything is fine with getting the data into the python script.

dataJson = data.json() is already python dict no need to json.loads(dataStr) just use
data = requests.get('[api]',
headers={
"[api key name]" : "[api key]"
})
dataJson = data.json()
amt = dataStr['data'].get('inventory').get('object_name').get('amount')

Related

Trying to Parse a JSON using Python but having issues

I usually use Powershell and have parsed JSONs from HTTP requests, successfully, before. I am now using Python and using the 'Requests' library. I have successfully got the JSON from the API. Here is the format it came through in (I removed some information and other fields).:
{'content': [
{
'ContactCompany': Star,
'ContactEmail': test#company.star,
'ContactPhoneNumber': 123-456-7894,
'assignedGroup': 'TR_Hospital',
'assignedGroupId': 'SGP000000132297',
'serviceClass': None, 'serviceReconId': None
}
]
}
I'm having trouble getting the values inside of the 'content.' With my Powershell experience in the past, I've tried:
tickets_json = requests.get(request_url, headers=api_header).json()
Tickets_Info = tickets_json.content
for tickets in tickets_info:
tickets.assignedGroup
How do I parse the JSON to get the information inside of 'Content' in Python?

tickets_json = requests.get(request_url, headers=api_header).json()
tickets_info = tickets_json['content']
for tickets in tickets_info:
print(tickets['assignedGroup'])

How to speed up returning a 20MB Json file from a Python-Flask application?

I am trying to call an API which in turn triggers a store procedure from our sqlserver database. This is how I coded it.
class Api_Name(Resource):
def __init__(self):
pass
#classmethod
def get(self):
try:
engine = database_engine
connection = engine.connect()
sql = "DECLARE #return_value int EXEC #return_value = [dbname].[dbo].[proc_name])
return call_proc(sql, apiname, starttime, connection)
except Exception as e:
return {'message': 'Proc execution failed with error => {error}'.format(error=e)}, 400
pass
call_proc is the method where I return the JSON from database.
def call_proc(sql: str, connection):
try:
json_data = []
rv = connection.execute(sql)
for result in rv:
json_data.append(dict(zip(result.keys(), result)))
return Response(json.dumps(json_data), status=200)
except Exception as e:
return {'message': '{error}'.format(error=e)}, 400
finally:
connection.close()
The problem with the output is the way JSON is returned and the size of it.
At first the API used to take 1minute 30seconds: when the return statement was like this:
case1: return Response(json.dumps(json_data), status=200, mimetype='application/json')
After looking online, I found that the above statement is trying to prettify JSON. So I removed mimetype from the response & made it as
case2: return Response(json.dumps(json_data), status=200)
The API runs for 30seconds, although the JSON output is not aligned properly but its still JSON.
I see the output size of the JSON returned from the API is close 20MB. I observed this on postman response:
Status: 200 OK Time: 29s Size: 19MB
The difference in Json output:
case1:
[ {
"col1":"val1",
"col2":"val2"
},
{
"col1":"val1",
"col2":"val2"
}
]
case2:
[{"col1":"val1","col2":"val2"},{"col1":"val1","col2":"val2"}]
Will the difference in output from the two aforementioned cases are different ? If so, how can I fix the problem ?
If there is no difference, is there any way I speed up this further and reduce the run time further more, like compressing the JSON which I am returning ?

You can use gzip compression to make your plain text weight from Megabytes to even Kilobytes. Or even use flask-compress library for that.
Also I'd suggest to use ujson to make dump() call faster.
import gzip
from flask import make_response
import ujson as json
#app.route('/data.json')
def compress():
compression_level = 5 # of 9 max
data = [
{"col1": "val1", "col2": "val2"},
{"col1": "val1", "col2": "val2"}
]
content = gzip.compress(json.dumps(data).encode('utf8'), compression_level)
response = make_response(content)
response.headers['Content-length'] = len(content)
response.headers['Content-Encoding'] = 'gzip'
return response
Documentation:
https://docs.python.org/3/library/gzip.html
https://github.com/colour-science/flask-compress
https://pypi.org/project/ujson/

First of all, profile: if 90% the time is being spent transferring across the network then optimising processing speed is less useful than optimising transfer speed (for example, by compressing the response as wowkin recommended (though the web server may be configured to do this automatically, if you are using one)
Assuming that constructing the JSON is slow, if you control the database code you could use its JSON capabilities to serialise the data, and avoid doing it at the Python layer. For example,
SELECT col1, col2
FROM tbl
WHERE col3 > 42
FOR JSON AUTO
would give you
[
{
"col1": "foo",
"col2": 1
},
{
"col1": "bar",
"col2": 2
},
...
]
Nested structures can be created too, described in the docs.
If the requester only needs the data, return it as a download using flask's send_file feature and avoid the cost of constructing an HTML response:
from io import BytesIO
from flask import send_file
def call_proc(sql: str, connection):
try:
rv = connection.execute(sql)
json_data = rv.fetchone()[0]
# BytesIO expects encoded data; if you can get the server to encode
# the data instead it may be faster.
encoded_json = json_data.encode('utf-8')
buf = BytesIO(encoded_json)
return send_file(buf, mimetype='application/json', as_attachment=True, conditional=True)
except Exception as e:
return {'message': '{error}'.format(error=e)}, 400
finally:
connection.close()

You need to implement pagination on your API. 19MB is absurdly large and will lead to some very annoyed users.
gzip and clevererness with the JSON responses will sadly not be enough, you'll need to put in a bit more legwork.
Luckily, there's many pagination questions and answers, and Flasks modular approach to things will mean that someone probably wrote up a module that's applicable to your problem. I'd start off by re-implementing the method with an ORM. I heard that sqlalchemy is quite good.

To answer your question:
1 - Both JSON are semantically identical.
You can make use of http://www.jsondiff.com to compare two JSON.
2 - I would recommend you to make chunks of your data and send it across network.
This might help:
https://masnun.com/2016/09/18/python-using-the-requests-module-to-download-large-files-efficiently.html

TL;DR; Try restructuring your JSON payload (i.e. change schema)
I see that you are constructing the JSON response in one of your APIs. Currently, your JSON payload looks something like:
[
{
"col0": "val00",
"col1": "val01"
},
{
"col0": "val10",
"col1": "val11"
}
...
]
I suggest you restructure it in such a way that each (first level) key in your JSON represents the entire column. So, for the above case, it will become something like:
{
"col0": ["val00", "val10", "val20", ...],
"col1": ["val01", "val11", "val21", ...]
}
Here are the results from some offline test I performed.
Experiment variables:
NUMBER_OF_COLUMNS = 10
NUMBER_OF_ROWS = 100000
LENGTH_OF_STR_DATA = 5
#!/usr/bin/env python3
import json
NUMBER_OF_COLUMNS = 10
NUMBER_OF_ROWS = 100000
LENGTH_OF_STR_DATA = 5
def get_column_name(id_):
return 'col%d' % id_
def random_data():
import string
import random
return ''.join(random.choices(string.ascii_letters, k=LENGTH_OF_STR_DATA))
def get_row():
return {
get_column_name(i): random_data()
for i in range(NUMBER_OF_COLUMNS)
}
# data1 has same schema as your JSON
data1 = [
get_row() for _ in range(NUMBER_OF_ROWS)
]
with open("/var/tmp/1.json", "w") as f:
json.dump(data1, f)
def get_column():
return [random_data() for _ in range(NUMBER_OF_ROWS)]
# data2 has the new proposed schema, to help you reduce the size
data2 = {
get_column_name(i): get_column()
for i in range(NUMBER_OF_COLUMNS)
}
with open("/var/tmp/2.json", "w") as f:
json.dump(data2, f)
Comparing sizes of the two JSONs:
$ du -h /var/tmp/1.json
17M
$ du -h /var/tmp/2.json
8.6M
In this case, it almost got reduced by half.
I would suggest you do the following:
First and foremost, profile your code to see the real culprit. If it is really the payload size, proceed further.
Try to change your JSON's schema (as suggested above)
Compress your payload before sending (either from your Flask WSGI app layer or your webserver level - if you are running your Flask app behind some production grade webserver like Apache or Nginx)

For large data that you can't paginate using something like ndjson (or any type of delimited record format) can really reduce the server resources needed since you'd be preventing holding the JSON object in memory. You would need to get access to the response stream to write each object/line to the response though.
The response
[ {
"col1":"val1",
"col2":"val2"
},
{
"col1":"val1",
"col2":"val2"
}
]
Would end up looking like
{"col1":"val1","col2":"val2"}
{"col1":"val1","col2":"val2"}
This also has advantages on the client since you can parse and process each line on it's own as well.
If you aren't dealing with nested data structures responding with a CSV is going to be even smaller.

I want to note that there is a standard way to write a sequence of separate records in JSON, and it's described in RFC 7464. For each record:
Write the record separator byte (0x1E).
Write the JSON record, which is a regular JSON document that can also contain inner line breaks, in UTF-8.
Write the line feed byte (0x0A).
(Note that the JSON text sequence format, as it's called, uses a more liberal syntax for parsing text sequences of this kind; see the RFC for details.)
In your example, the JSON text sequence would look as follows, where \x1E and \x0A are the record separator and line feed bytes, respectively:
\x1E{"col1":"val1","col2":"val2"}\x0A\x1E{"col1":"val1","col2":"val2"}\x0A
Since the JSON text sequence format allows inner line breaks, you can write each JSON record as you naturally would, as in the following example:
\x1E{
"col1":"val1",
"col2":"val2"}
\x0A\x1E{
"col1":"val1",
"col2":"val2"
}\x0A
Notice that the media type for JSON text sequences is not application/json, but application/json-seq; see the RFC.

JSON Parse an element inside an element in Python

I have a JSON text grabbed from an API of a website:
{"result":"true","product":{"made":{"Taiwan":"Taipei","HongKong":"KongStore","Area":"Asia"}}}
I want to capture "Taiwan" and "Taipei" but always fail.
Here is my code:
import json
weather = urllib2.urlopen('url')
wjson = weather.read()
wjdata = json.loads(wjson)
print wjdata['product']['made'][0]['Taiwan']
I always get the following error:
Keyword 0 error
Whats the correct way to parse that json?

You are indexing an array where there are none.
The JSON is the following:
{
"result":"true",
"product": {
"made": {
"Taiwan":"Taipei",
"HongKong":"KongStore",
"Area":"Asia"
}
}
}
And the above contains no arrays.
You are assuming the JSON structure to be something like this:
{
"result":"true",
"product": {
"made": [
{"Taiwan":"Taipei"},
{"HongKong":"KongStore"},
{"Area":"Asia"}
]
}
}
From a brief look at the doc pages for the json package, I found this conversion table: Conversion table using json.loads
It tells us that a JSON object translates to a dict. And a dict has a method called keys, which returns a list of the keys.
I suggest you try something like this:
#... omitted code
objectKeys = wjdata['product']['made'].keys()
# You should now have a list of the keys stored in objectKeys.
for key in objectKeys:
print key
if key == 'Taiwan':
print 'Eureka'
I haven't tested the above code, but I think you get the gist here :)

wjdata['product']['made']['Taiwan'] works

Python 3 JSON/Dictionary help , not sure how to parse values?

I am attempting to write out some JSON output into a csv file but first i am trying to understand how the data is structured. I am working from a sample script which connects to an API and pulls down data based a query specified.
The json is returned from the server with this query:
response = api_client.get_search_results(search_id, 'application/json')
body = response.read().decode('utf-8')
body_json = json.loads(body)
If i perform a
print(body_json.keys())
i get the following output:
dict_keys(['events'])
So from this is it right to assume that the entries i am really interested in are another dictionary inside the events dictionary?
If so how can i 'access' them?
Sample JSON data the search query returns to the variable above
{
"events":[
{
"source_ip":"10.143.223.172",
"dest_ip":"104.20.251.41",
"domain":"www.theregister.co.uk",
"Domain Path":"NULL",
"Domain Query":"NULL",
"Http Method":"GET",
"Protocol":"HTTP",
"Category":"NULL",
"FullURL":"http://www.theregister.co.uk"
},
{
"source_ip":"10.143.223.172",
"dest_ip":"104.20.251.41",
"domain":"www.theregister.co.uk",
"Domain Path":"/2017/05/25/windows_is_now_built_on_git/",
"Domain Query":"NULL",
"Http Method":"GET",
"Protocol":"HTTP",
"Category":"NULL",
"FullURL":"http://www.theregister.co.uk/2017/05/25/windows_is_now_built_on_git/"
},
]
}
Any help would be greatly appreciated.

Json.keys() only returns the keys associated with json.
Here is the code:
for key in json_data.keys():
for i in range(len(json_data[key])):
key2 = json_data[key][i].keys()
for k in key2:
print k + ":" + json_data[key][i][k]
Output:
Http Method:GET
Category:NULL
domain:www.theregister.co.uk
Protocol:HTTP
Domain Query:NULL
Domain Path:NULL
source_ip:10.143.223.172
FullURL:http://www.theregister.co.uk
dest_ip:104.20.251.41
Http Method:GET
Category:NULL
domain:www.theregister.co.uk
Protocol:HTTP
Domain Query:NULL
Domain Path:/2017/05/25/windows_is_now_built_on_git/
source_ip:10.143.223.172
FullURL:http://www.theregister.co.uk/2017/05/25/windows_is_now_built_on_git/
dest_ip:104.20.251.41

To answer your question: yes. Your body_json has returned a dictionary with a key of "events" which contains a list of dictionaries.
The best way to 'access' them would be to iterate over them.
A very rudimentary example:
for i in body_json['events']:
print(i)
Of course, during the iteration you could access the specific data that you needed by replacing print(i) with print(i['FullURL'])and saving it to a variable and so on.
It's important to note that whenever you're working with API's that return a JSON response, you're simply working with dictionaries and Python data structures.
Best of luck.

Parse requests.post object

How can i parse the entire output takes from a request.post object and extract only the "id" content, considering this piece of code?
import json
import requests
API = 'https://www.googleapis.com/urlshortener/v1/url'
elem = json.dumps({'longUrl':'http://www.longurl..'})
output = requests.post(API,elem, headers = {'content-type':'application/json'})
adding output.text it gives me this:
{
"kind": "urlshortener#url",
"id": "http://goo.gl/..",
"longUrl": "http://www.longurl.."
}
now I just need to extract the link in the id field, i also tried to put the content in a file and parse it as strings with file.read() but seems not work. Any ideas?

Load it into dictionary using json module:
data = json.loads(output.text)
print data['id'] # prints http://goo.gl/O5MIi

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Search for specific information within a json file - python

dataJson = data.json() is already python dict no need to json.loads(dataStr) just use data = requests.get('[api]', headers={ "[api key name]" : "[api key]" }) dataJson = data.json() amt = dataStr['data'].get('inventory').get('object_name').get('amount')

Related

Trying to Parse a JSON using Python but having issues

How to speed up returning a 20MB Json file from a Python-Flask application?

JSON Parse an element inside an element in Python

Python 3 JSON/Dictionary help , not sure how to parse values?

Parse requests.post object

Categories

Resources