Python Check if Key/Value exists in JSON output - python

I have a JSON output and I want to create an IF statement so if it contains the value I am looking for the do something ELSE do something else.
JSON Blob 1
[
{
"domain":"www.slatergordon.co.uk",
"displayed_link":"https://www.slatergordon.co.uk/",
"description":"We Work With Thousands Of People Across The UK In All Areas Of Personal Legal Services. Regardless Of How You Have Been Injured Through Negligence We're Here To Help You. Personal Injury Experts.",
"position":1,
"block_position":"top",
"title":"Car Claims Solicitors - No Win No Fee Solicitors - SlaterGordon.co.uk",
"link":"https://www.slatergordon.co.uk/personal-injury-claim/road-traffic-accidents-solicitors/",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABABGgJwdg&sig=AOD64_3u1ct0jmXAnvemxFHh_tfK5UK8Xg&q&adurl"
},
{
"is_phone_ad":true,
"phone_number":"0333 358 0496",
"domain":"www.accident-claimsline.co.uk",
"displayed_link":"http://www.accident-claimsline.co.uk/",
"description":"Car Insurance Claims Advice - Car Accident Claims Helpline",
"sitelinks":[
{
"title":"Replacement Vehicle Hire",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABALGgJwdg&ae=2&sig=AOD64_20YjAoyMY_c6XVTnBU1vQAD2tDTA&q=&ved=2ahUKEwjvlM-djLDwAhVmJzQIHSZHDLEQvrcBegQIBRAM&adurl="
},
{
"title":"Request a Call Back",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABAOGgJwdg&ae=2&sig=AOD64_36-Pd831AXrPbh1yvUyTbhXH2irg&q=&ved=2ahUKEwjvlM-djLDwAhVmJzQIHSZHDLEQvrcBegQIBRAN&adurl="
}
],
"position":6,
"block_position":"bottom",
"title":"Car Insurance Claims Advice - Car Accident Claims Helpline",
"link":"http://www.accident-claimsline.co.uk/",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABAGGgJwdg&ae=2&sig=AOD64_09pMtWxFo9s8c1dL16NJo5ThOlrg&q&adurl"
}
]
JSON Blob 2
JSON
[
{
"domain":"www.slatergordon.co.uk",
"displayed_link":"https://www.slatergordon.co.uk/",
"description":"We Work With Thousands Of People Across The UK In All Areas Of Personal Legal Services. Regardless Of How You Have Been Injured Through Negligence We're Here To Help You. Personal Injury Experts.",
"position":1,
"block_position":"top",
"title":"Car Claims Solicitors - No Win No Fee Solicitors - SlaterGordon.co.uk",
"link":"https://www.slatergordon.co.uk/personal-injury-claim/road-traffic-accidents-solicitors/",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABABGgJwdg&sig=AOD64_3u1ct0jmXAnvemxFHh_tfK5UK8Xg&q&adurl"
},
{
"is_phone_ad":true,
"phone_number":"0333 358 0496",
"domain":"www.accident-claimsline.co.uk",
"displayed_link":"http://www.accident-claimsline.co.uk/",
"description":"Car Insurance Claims Advice - Car Accident Claims Helpline",
"sitelinks":[
{
"title":"Replacement Vehicle Hire",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABALGgJwdg&ae=2&sig=AOD64_20YjAoyMY_c6XVTnBU1vQAD2tDTA&q=&ved=2ahUKEwjvlM-djLDwAhVmJzQIHSZHDLEQvrcBegQIBRAM&adurl="
},
{
"title":"Request a Call Back",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABAOGgJwdg&ae=2&sig=AOD64_36-Pd831AXrPbh1yvUyTbhXH2irg&q=&ved=2ahUKEwjvlM-djLDwAhVmJzQIHSZHDLEQvrcBegQIBRAN&adurl="
}
],
"position":6,
"block_position":"top",
"title":"Car Insurance Claims Advice - Car Accident Claims Helpline",
"link":"http://www.accident-claimsline.co.uk/",
"tracking_link":"https://www.google.co.uk/aclk?sa=l&ai=DChcSEwj8-NSdjLDwAhXBEH0KHRYwA1MYABAGGgJwdg&ae=2&sig=AOD64_09pMtWxFo9s8c1dL16NJo5ThOlrg&q&adurl"
}
]
Desired Output
if "block_position":"bottom" in JSONBlob:
do something
else:
do something else
but I cant seem to get it to trigger for me. I want it to search through the entire output and if it contains that key/value do something and if it doesnt contain it do something else.
Blob 1 would go down the IF path
Blob 2 would go down the else path

The main problem you have here is that the JSON output is a list/array with two objects inside. As you can have the block_position key in any of the inner objects, you could do something like this:
if any([obj.get('block_position') == 'bottom' for obj in JSONBlob]):
print('I do something')
else:
print('I do somehting else')
EDIT 1: OK, I think I got your point. You only need to do something for each object with block_position set to bottom. Then the following should do it:
for obj in JSONBlob:
if obj.get('block_position') == 'bottom':
print('I do something with the object')
else:
print('I do something else with the object')
EDIT 2: As spoken in the post, if you only want to do something with the objects with block_position set as bottom, you can suppress the else clause as follows:
for obj in JSONBlob:
if obj.get('block_position') == 'bottom':
print('I do something with the object')

you can use JMESPath library. Its a query language for JSON.
Basic jmespath expression for your case would be [?block_position==bottom]. This will filter out the specific node for you.
I tried it online here with data provided by you.
If you are looking for more nested node you will only have to alter your expression to search that specific node.

Related

Regex for specific Pattern

I have this text:
What can cause skidding on bends?
All of the following can:
SA Faulty shock-absorbers
SA Insufficient or uneven tyre pressure
[| Load is too small
What can cause a dangerous situation?
SA Brakes which engage heavily on one side
SA Too much steering-wheel play
[| Disturbed reception of traffic information on the radio
It starts raining. Why must you immediately increase the safe distance?
What is correct
[| Because the brakes react more quickly
SA Because a greasy film may form which increases the braking distance
SA Because a second greasy film may form which increases the braking distance
What the text is about?
Above are multiple choice questions with multiple options.
The question stem is almost always ends with '?' but sometimes there is additional text before the multiple option starts.
All options either starts by the word 'SA' or '[|' , all option starts with 'SA'are correct and the option starts with '[|' or '[]' are wrong.
What I want to Do
I want to split the questions and all multiple option and save them into python dictionary/list ideally as key values pairs
{'ques': 'blalal','opt1':'this is option one', 'option2': 'this is option two'} and so on
What I have tried?
rx='r.*\?$\s*\w*(?:SA|\[\|)'
this is Reg101 link
Assuming you have three options at all times:
p = r'(?m)^(?P<ques>\w[^?]*\?)[\s\S]*?^(?P<opt1>(?:SA|\[(?:\||\s])).*)\s+^(?P<opt2>(?:SA|\[(?:\||\s])\[\|).*)\s+^(?P<opt3>(?:SA|\[(?:\||\s])).*)'
dt = [x.groupdict() for x in re.finditer(p, string)]
See regex proof and Python proof.
Results:
[{'ques': 'What can cause skidding on bends?', 'opt1': 'SA Faulty shock-absorbers', 'opt2': 'SA Insufficient or uneven tyre pressure', 'opt3': '[| Load is too small'}, {'ques': 'What can cause a dangerous situation?', 'opt1': 'SA Brakes which engage heavily on one side', 'opt2': 'SA Too much steering-wheel play', 'opt3': '[| Disturbed reception of traffic information on the radio'}, {'ques': 'It starts raining. Why must you immediately increase the safe distance?', 'opt1': '[| Because the brakes react more quickly', 'opt2': 'SA Because a greasy film may form which increases the braking distance', 'opt3': 'SA Because a second greasy film may form which increases the braking distance'}]
This is one of the cases that I would recommend not using regex since it can get very complex very fast. My solution would be the following parser:
def parse(fname = "/tmp/data.txt"):
questions = []
with open(fname) as f:
for line in f:
lstrip = line.strip()
# Skip empty lines
if not lstrip:
continue
# Check for Questions
is_option = (
lstrip.startswith("[]")
or lstrip.startswith("[|")
or lstrip.startswith("SA")
)
if not is_option:
# Here we know that this line is not empty and is not
# an option... We have two options:
# 1. This is continuation of the last question
# 2. This is a new question
if not questions or questions[-1]["options"]:
# Last questions has options, this is a new question!
questions.append({
"ques": [lstrip],
"options": []
})
else:
# We are still parsing the questions part. Add a new line
questions[-1]["ques"].append(lstrip)
# We are done with the question part, move on
continue
# We are only here if we are parsing options!
is_correct = lstrip.startswith("SA")
# We _must_ have at least one question
assert questions
# Add the option
questions[-1]["options"].append({
"option": lstrip,
"correct": is_correct,
"number": len(questions[-1]["options"]) + 1,
})
# End of with
return questions
An example usage of the above and its output:
# main
data = parse()
# json just for pretty printing
import json
print(json.dumps(data, indent=4))
---
$ python3 ~/tmp/so.py
[
{
"ques": [
"What can cause skidding on bends?",
"All of the following can:"
],
"options": [
{
"option": "SA Faulty shock-absorbers",
"correct": true,
"number": 1
},
{
"option": "SA Insufficient or uneven tyre pressure",
"correct": true,
"number": 2
},
{
"option": "[| Load is too small",
"correct": false,
"number": 3
}
]
},
{
"ques": [
"What can cause a dangerous situation?"
],
"options": [
{
"option": "SA Brakes which engage heavily on one side",
"correct": true,
"number": 1
},
{
"option": "SA Too much steering-wheel play",
"correct": true,
"number": 2
},
{
"option": "[| Disturbed reception of traffic information on the radio",
"correct": false,
"number": 3
}
]
},
{
"ques": [
"It starts raining. Why must you immediately increase the safe distance?",
"What is correct"
],
"options": [
{
"option": "[| Because the brakes react more quickly",
"correct": false,
"number": 1
},
{
"option": "SA Because a greasy film may form which increases the braking distance",
"correct": true,
"number": 2
},
{
"option": "SA Because a second greasy film may form which increases the braking distance",
"correct": true,
"number": 3
}
]
}
]
There are few advantages in using a custom parser instead of regex:
A lot more readable (think what would you like to read when you go back to this project in 6 months :) )
More control on which lines you keep or how you trim them
Easier to deal with bad input data (debug them using logging)
That said, data are rarely perfect and in most cases few workarounds might be required to get the desired output. For example, in your original data, the "All of the following can:" does not seem like an option since it does not start with any of the option sequences. However, it also does not seem to me like part of the question! You will have to deal with such cases based on your dataset (and doing so in regex will be a lot harder). In this particular case you can:
Only consider part of the question anything that ends with ? (problematic in 3rd question)
Treat lines starting with "None" or "All" as options
etc
The exact solution depends on your data quality/cases but the code above should be easy to adjust in most cases

Get n number of documents from a collection using MongoDB/MongoEngine

Hi everyone I have a document inside a collection like this. (Ignore the absurdity of the question).
[
{
"tag": "english",
"difficulty": "hard",
"question": "What are alphabets",
"option_1": "98 billion light years",
"option_2": "23.3 trillion light years",
"option_3": "6 minutes",
"option_4": "It is still unknown",
"correct_answer": "option_1",
"id": "5f80befbaaf3c9ce2f4e2fb9"
}
]
There are multiple documents such as this one (10000).
I'm trying to write a python get to function using flask-restful to get n number of documents from this collection.
Currently, I'm confused about how to write a MongoEngine query.
This is what I do to get a single document based on it.
def get(self,id):
questions = Question.objects.get(id=id).to_json()
return Response(questions,
mimetype="application/json",
status = 200)
for n number of documents, I'm unable to figure out what to write inside.
def get_n_questions(self,n):
body = request.get_json(force =True)
questions = ???
return Response(questions,
mimetype="application/json",
status = 200)
You can use the limit(n) method (doc) on a queryset. This will let you retrieve the n firsts documents from the collection.
In your case that would mean:
questions = Question.objects().limit(n).to_json()
You may also be interested in the skip(n) method, this will allow you to do pagination (similarly to a limit/offset from MySQL for instance).

Python Flask - Get values from an API and print it to HTML

I did make a question about this earlier today, but a few hours ago I realized that there is a new API for what I am trying to make. Now the problem is that I need to get every product name, sell price and buy price, and a few more stuff into my website. I have gotten this far so far:
import requests
from flask import Flask, render_template
full_list = list()
app = Flask(__name__)
f = requests.get(
"https://api.hypixel.net/skyblock/bazaar?key=[key is supposed to be secret]").json()
for x in product:
buyPrice = f["products"][x]["buy_summary"][0]["pricePerUnit"]
#app.route('/')
def price():
return render_template("index.html", product=product, buyprice=buyPrice)
if __name__ == "__main__":
app.run(debug=True)
The product API looks a bit like this, I can't post it all because it's very big:
{
"products": {
"product_id": "BROWN_MUSHROOM",
"sell_summary": [
"amount": 3865,
"pricerPerUnit": 14.8,
"orders": 2
],
"buy_summary": [
"amount": 704,
"pricerPerUnit": 15.8,
"orders": 1
],
"quick_status": {
"productId": "BROWN_MUSHROOM",
"sellPrice": 14.527416975007378,
"sellVolume": 915286,
"sellMovingWeek": 23745501,
"sellOrders": 40,
"buyPrice": 15.898423574723452,
"buyVolume": 673646,
"buyMovingWeek": 8011243,
"buyOrders": 54
}
}
Now what I want is "product_id", which could either be grabbed from the beginning or from the "quick_status", I also want pricePerUnit, Amount and Orders from buy/sell_summary.
How do I do this? I have tried to store all values in a separate array named "price" and I used "price.append(buyPrice)" to add, but it only added one product price, I would like to have every product price.
It should end up being something like:
PRODUCT_ID
BUY PRICE: XXX
SELL PRICE: XXX
BUY ORDERS: X WITH AMOUNT OF X
BUY ORDERS: X WITH AMOUNT OF X
BUY VOLUME: XXX
SELL VOLUME: XXX
Of course I don't need the code for everything, just need a little help with how I extract these values from the API, and get it into my HTML code.
Currently my HTML looks like this:
{% for item in product %}
<h1>{{ item }}</h1>
{% endfor %}
I am new to flask and this is my first project :)
It looks like this information should be displayed in a table. You could also use the javascript library datatables to quickly add things like pagination and sorting to the table.
I have answered two questions you may wish to read, the first on how to process data like this and another on keeping hard-coded table headers out of the template.
I came up with the repo search-generic-tables (linked in the first of those answers) which implements this functionality. It is also compatible with your data, with some minimal processing on your API's JSON response.
For your data it looks like everything you want to display is in the quick_status object for each product.
So considering you have f: the JSON, converted to a dictionary thanks to requests, you could do something like this:
original_items = [] # Create an empty list
for _, data in f['products'].items():
original_items.append(data['quick_status'])
out is now a list, where each item is the quick_status JSON object as a python dictionary:
>>> print(out[0]) # To obtain the first dictionary:
{'buyMovingWeek': 8018735,
'buyOrders': 70,
'buyPrice': 15.848714783373357,
'buyVolume': 624894,
'productId': 'BROWN_MUSHROOM',
'sellMovingWeek': 23716981,
'sellOrders': 22,
'sellPrice': 12.7,
'sellVolume': 396395}
Of course a quicker way to write that code is with list comprehension which is well documented, so worth reading into:
original_items = [data['quick_status'] for _, data in f['products'].items()]
This can now be used in the linked code to end up with the following on the frontend:

Returning the minimum value in a JSON array

Good evening folks! I have been wracking my brain on this one for a good few hours now and could do with a little bit of a pointer in the right direction. I'm playing around with some API calls and trying to make a little project for myself.
The JSON data is stored in Arrays, and as such to get the information I want (it is from a Transport API) I have been making the following
x = apirequest
x = x.json
for i in range(0,4):
print(x['routes'][i]['duration'])
print(x['routes'][i]['departure_time'])
print(x['routes'][i]['arrival_time'])
This will return the following
06:58:00
23:39
06:37
05:08:00
05:14
10:22
03:41:00
05:30
09:11
03:47:00
06:24
10:11
What I am trying to do, is return only the shortest journeys - I could do it if it was a single layer JSON string but I am not too familliar with multi-level arrays. I can't return ['duration'] without utilising ['routes'] and route indicator (in this case 0 through 3 or 4).
I can use an if statement to iterate through them easily enough, but there must be a way to accomplish it directly through the JSON that I am missing. I also thought about adding the results to a separate array and then filtering that - but there is a few other fields I want to grab from the data when I've cracked this part.
What I am finding as I learn is that I tend to do things a long winded way, often finding out my 10-15 line solutions on codewars are actually aimed at being done in 2-3 lines.
Example JSON data
{
"request_time": "2018-05-29T19:03:04+01:00",
"source": "Traveline southeast journey planning API",
"acknowledgements": "Traveline southeast",
"routes": [{
"duration": "06:58:00",
"route_parts": [{
"mode": "foot",
"from_point_name": "Corunna Court, Wrexham",
"to_point_name": "Wrexham General Rail Station",
"destination": "",
"line_name": "",
"duration": "00:36:00",
"departure_time": "23:39",
"arrival_time": "00:15"
}]
}]
}
Hope you can help steer me in the right direction!
Here's one solution using datetime.timedelta. Data from #fferri.
from datetime import timedelta
x = {'routes': [{'duration':'06:58:00','departure_time':'23:39','arrival_time':'06:37'},
{'duration':'05:08:00','departure_time':'05:14','arrival_time':'10:22'},
{'duration':'03:41:00','departure_time':'05:30','arrival_time':'09:11'},
{'duration':'03:47:00','departure_time':'06:24','arrival_time':'10:11'}]}
def minimum_time(k):
h, m, s = map(int, x['routes'][k]['duration'].split(':'))
return timedelta(hours=h, minutes=m, seconds=s)
res = min(range(4), key=minimum_time) # 2
You can then access the appropriate sub-dictionary via x['routes'][res].
Using min() with a key argument to indicate which field should be used for finding the minimum value:
x={'routes':[
{'duration':'06:58:00','departure_time':'23:39','arrival_time':'06:37'},
{'duration':'05:08:00','departure_time':'05:14','arrival_time':'10:22'},
{'duration':'03:41:00','departure_time':'05:30','arrival_time':'09:11'},
{'duration':'03:47:00','departure_time':'06:24','arrival_time':'10:11'}
]}
best=min(x['routes'], key=lambda d: d['duration'])
# best={'duration': '03:41:00', 'departure_time': '05:30', 'arrival_time': '09:11'}
The min(iterable, key=...) function is what you are looking for:
x = { 'routes': [ {'dur':3, 'depart':1, 'arrive':4},
{'dur':2, 'depart':2, 'arrive':4}]}
min(x['routes'], key=lambda item: item['dur'])
Returns:
{'dur': 2, 'depart': 2, 'arrive': 4}
First, the fact that x is initialized from JSON isn't particularly relevant. It's a dict, and that's all that is important.
To answer your question, you just need the key attribute to min:
shortest = min(x['routes'], key=lambda d: d['duration'])

Json file to dictionary

I am using the yelp dataset and I want to parse the review json file to a dictionary. I tried loading it on a pandas DataFrame and then creating the dictionary, but because the file is too big it is time consuming. I want to keep only the user_id and stars values. A line of the json file looks like this:
{
"votes": {
"funny": 0, "useful": 2, "cool": 1},
"user_id": "Xqd0DzHaiyRqVH3WRG7hzg",
"review_id": "15SdjuK7DmYqUAj6rjGowg", "stars": 5, "date": "2007-05-17",
"text": (
"dr. goldberg offers everything i look for in a general practitioner. "
"he's nice and easy to talk to without being patronizing; he's always on "
"time in seeing his patients; he's affiliated with a top-notch hospital (nyu) "
"which my parents have explained to me is very important in case something "
"happens and you need surgery; and you can get referrals to see specialists "
"without having to see him first. really, what more do you need? i'm "
"sitting here trying to think of any complaints i have about him, but i'm "
"really drawing a blank."
),
"type": "review", "business_id": "vcNAWiLM4dR7D2nwwJ7nCA"
}
How can i iterate over every 'field' (for the lack o a better word)? So far i can only iterate over each line.
EDIT
As requested pandas code :
reading the json
with open('yelp_academic_dataset_review.json') as f:
df = pd.DataFrame(json.loads(line) for line in f)
Creating the dictionary
dict = {}
for i, row in df.iterrows():
business_id = row['business_id']
user_id = row['user_id']
rating = row['stars']
key = (business_id, user_id)
dict[key] = rating
You don't need to read this into a DataFrame. json.load() returns a dictionary. For example:
sample.json
{
"votes": {
"funny": 0,
"useful": 2,
"cool": 1
},
"user_id": "Xqd0DzHaiyRqVH3WRG7hzg",
"review_id": "15SdjuK7DmYqUAj6rjGowg",
"stars": 5,
"date": "2007-05-17",
"text": "dr. goldberg offers everything i look for in a general practitioner. he's nice and easy to talk to without being patronizing; he's always on time in seeing his patients; he's affiliated with a top-notch hospital (nyu) which my parents have explained to me is very important in case something happens and you need surgery; and you can get referrals to see specialists without having to see him first. really, what more do you need? i'm sitting here trying to think of any complaints i have about him, but i'm really drawing a blank.",
"type": "review",
"business_id": "vcNAWiLM4dR7D2nwwJ7nCA"
}
read_json.py
import json
with open('sample.json', 'r') as fh:
result_dict = json.load(fh)
print(result_dict['user_id'])
print(result_dict['stars'])
output
Xqd0DzHaiyRqVH3WRG7hzg
5
With that output you can easily create a DataFrame.
There are several good discussions about parsing json as a stream on SO, but the gist is it's not possible natively, although some tools seem to attempt it.
In the interest of keeping your code simple and with minimal dependencies, you might see if reading the json directory into a dictionary is a sufficient improvement.

Categories