How to parse specific data from JSON request

How to parse specific data from JSON request - python

I'm getting into coding, and I'm wondering how I'd go about retrieving the data for "tag_id": 4 specifically.
I know that to get the data for status, but how would I go about getting specific data if there are multiple entries?
r = requests.get('url.com', headers = user_agent).json()
event = (r['status'])
print(event)
//////////////////
{
"status": "SUCCESS",
"status_message": "blah blah blah",
"pri_tag": [
{
"tag_id": 1,
"name": "Tag1"
},
{
"tag_id": 2,
"name": "Tag2"
},
{
"tag_id": 3,
"name": "Tag3"
},
{
"tag_id": 4,
"name": "Tag4"
}
]
}

The for loop answer is sufficient, but this is a good chance to learn how to use list comprehensions, which are ubiquitous and "pythonic":
desired_tag_name = [tag["name"] for tag in event["pri_tag"] if tag["tag_id"] == 4]
List comprehensions are advantageous for readability (I know it may not seem so the first time you look at one) and because they tend to be much faster.
There is a bounty of documentation and blog posts out there to understand the syntax better, and I don't prefer any particular one over another.

I think you're looking for something like:
tags = event["pri_tag"]
for tag in tags:
if tag['tag_id']==4:
print(tag['name'])
Output:
Tag4

Related

How to create a tree using BFS in python?

So I have a flattened tree in JSON like this, as array of objects:
[{
aid: "id3",
data: ["id1", "id2"]
},
{
aid: "id1",
data: ["id3", "id2"]
},
{
aid: "id2",
nested_data: {aid: "id4", atype: "nested", data: ["id1", "id3"]},
data: []
}]
I want to gather that tree and resolve ids into data with recursion loops into something like this (say we start from "id3"):
{
"aid":"id3",
"payload":"1",
"data":[
{
"id1":{
"aid":"id1",
"data":[
{
"id3":null
},
{
"id2":null
}
]
}
},
{
"id2":{
"aid":"id2",
"nested_data":{
"aid":"id4",
"atype":"nested",
"data":[
{
"id1":null
},
{
"id3":null
}
]
},
"data":[
]
}
}
]
}
So that we would get breadth-first search and resolve some field into "value": "object with that field" on first entrance and "value": Null
How to do such a thing in python 3?

Apart from all the problems that your structure has in terms of syntax (identifiers must be within quotes, etc.), the code below will provide you with the requested answer.
But you should carefully think about what you are doing, and have the following into account:
Using the relations expressed in the flat structure that you provide will mean that you will have an endless recursion since you have items that include other items that in turn include the first ones (like id3 including id1, which in turn include id3. So, you have to define stop criteria, or be sure that this does not occur in your flat structure.
Your initial flat structure is better to be in the form of a dictionary, instead of a list of pairs {id, data}. That is why the first thing the code below does is to transform this.
Your final, desired structure contains a lot of redundancies in terms of information contained. Consider simplifying it.
Finally, you mentioned nothing about the "nested_data" nodes, and how they should be treated. I simply assumed that in case that exist, further expansion is required.
Please, consider trying to provide a bit of context in your questions, some real data examples (I believe the data provided is not real data, therefore the inconsistencies and redundancies), and try yourself and provide your efforts; that's the only way to learn.
from pprint import pprint
def reformat_flat_info(flat):
reformatted = {}
for o in flat:
key = o["aid"]
del o["aid"]
reformatted[key] = o
return reformatted
def expand_data(aid, flat, lvl=0):
obj = flat[aid]
if obj is None: return {aid: obj}
obj.update({"aid": aid})
if lvl > 1:
return {aid: None}
for nid,id in enumerate(obj["data"]):
obj["data"][nid] = expand_data(id, flat, lvl=lvl+1)
if "nested_data" in obj:
for nid,id in enumerate(obj["nested_data"]["data"]):
obj["nested_data"]["data"][nid] = expand_data(id, flat, lvl=lvl+1)
return {aid: obj}
# Provide the flat information structure
flat_info = [
{
"aid": "id3",
"data": ["id1", "id2"]
}, {
"aid": "id1",
"data": ["id3", "id2"]
}, {
"aid": "id2",
"nested_data": {"aid": "id4", "atype": "nested", "data": ["id1", "id3"]},
"data": []
}
]
pprint(flat_info)
print('-'*80)
# Reformat the flat information structure
new_flat_info = reformat_flat_info(flat=flat_info)
pprint(new_flat_info)
print('-'*80)
# Generate the result
starting_id = "id3"
result = expand_data(aid=starting_id, flat=new_flat_info)
pprint(result)

Mongodb how to update many and set profile specific to id

I have a list of ids called batch i want to update all of them to set a field called fetched to true.
Original Test Collection
[{
"user_id": 1,
},
{
"user_id": 2,
}
]
batch variable
[1, 2]
UpdateMany:
mongodb["test"].update_many({"user_id": {"$in": batch}}, {"$set": {"fetched": True}})
I can do that using the above statement.
I also have another variable called user_profiles which is a list/array of json objects. I now ALSO want to set a field profile to be the profile found in the list(user_profiles) where the id matches the user_id/batch(id) i am updating.
user_profiles
[{
"id": 1,
"name": "john"
},
{
"id": 2,
"name": "jane"
}
]
Expected Final Result
[{
"user_id": 1,
"fetched": true,
"profile": {
"id": 1,
"name": "john"
}
},
{
"user_id": 2,
"fetched": true,
"profile": {
"id": 2,
"name": "jane"
}
}
]
I have a millions of these documents so i am trying to keep performance in mind.

You'll want to use db.collection.bulkWrite, see the updateOne example in the docs
If you've got millions you'll want to batch the bulkWrites into smaller chunks that work with your database server's capabilities.
Edit:
#Kay I just re-read the second part of your question which I didn't address earlier. You may want to try the $out stage of the aggregation pipeline. Be careful though since it will overwrite the existing collection so if you don't project all fields you could lose data. Definitely worth using a temporary collection for testing first.
Finally, you could also just create a view based on the aggregation query (with $lookup) if you don't absolutely need that data physically stored in the same collection.

Accessing nested objects with python

I have a response that I receive from foursquare in the form of json. I have tried to access the certain parts of the object but have had no success. How would I access say the address of the object? Here is my code that I have tried.
url = 'https://api.foursquare.com/v2/venues/explore'
params = dict(client_id=foursquare_client_id,
client_secret=foursquare_client_secret,
v='20170801', ll=''+lat+','+long+'',
query=mealType, limit=100)
resp = requests.get(url=url, params=params)
data = json.loads(resp.text)
msg = '{} {}'.format("Restaurant Address: ",
data['response']['groups'][0]['items'][0]['venue']['location']['address'])
print(msg)
Here is an example of json response:
"items": [
{
"reasons": {
"count": 0,
"items": [
{
"summary": "This spot is popular",
"type": "general",
"reasonName": "globalInteractionReason"
}
]
},
"venue": {
"id": "412d2800f964a520df0c1fe3",
"name": "Central Park",
"contact": {
"phone": "2123106600",
"formattedPhone": "(212) 310-6600",
"twitter": "centralparknyc",
"instagram": "centralparknyc",
"facebook": "37965424481",
"facebookUsername": "centralparknyc",
"facebookName": "Central Park"
},
"location": {
"address": "59th St to 110th St",
"crossStreet": "5th Ave to Central Park West",
"lat": 40.78408342593807,
"lng": -73.96485328674316,
"labeledLatLngs": [
{
"label": "display",
"lat": 40.78408342593807,
"lng": -73.96485328674316
}
],
the full response can be found here

Like so
addrs=data['items'][2]['location']['address']

Your code (at least as far as loading and accessing the object) looks correct to me. I loaded the json from a file (since I don't have your foursquare id) and it worked fine. You are correctly using object/dictionary keys and array positions to navigate to what you want. However, you mispelled "address" in the line where you drill down to the data. Adding the missing 'a' made it work. I'm also correcting the typo in the URL you posted.
I answered this assuming that the example JSON you linked to is what is stored in data. If that isn't the case, a relatively easy way to see exact what python has stored in data is to import pprint and use it like so: pprint.pprint(data).
You could also start an interactive python shell by running the program with the -i switch and examine the variable yourself.

data["items"][2]["location"]["address"]
This will access the address for you.

You can go to any level of nesting by using integer index in case of an array and string index in case of a dict.
Like in your case items is an array
#items[int index]
items[0]
Now items[0] is a dictionary so we access by string indexes
item[0]['location']
Now again its an object s we use string index
item[0]['location']['address]

Parsing child nodes from JSON file with Python

I'm trying to parse specific child nodes from a JSON file using Python.
I know similar questions have been asked and answered before, but I simply haven't been able to translate those solutions to my own problem (disclaimer: I'm not a developer).
This is the beginning of my JSON file (each new "entry" starts at "_index"):
{
"took": 83,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"failed": 0
},
"hits": {
"total": 713628,
"max_score": 1.3753585,
"hits": [{
"_index": "offentliggoerelser-prod-20161006",
"_type": "offentliggoerelse",
"_id": "urn:ofk:oid:5135592",
"_score": 1.3753585,
"_source": {
"cvrNummer": 89986915,
"indlaesningsId": "AUzWhUXw3pscZq1LGK_z",
"sidstOpdateret": "2015-04-20T10:53:09.154Z",
"omgoerelse": false,
"regNummer": null,
"offentliggoerelsestype": "regnskab",
"regnskab": {
"regnskabsperiode": {
"startDato": "2014-01-01",
"slutDato": "2014-12-31"
}
},
"indlaesningsTidspunkt": "2015-04-20T11:10:53.529Z",
"sagsNummer": "X15-AA-66-TA",
"dokumenter": [{
"dokumentUrl": "http://regnskaber.virk.dk/51968998/ZG9rdW1lbnRsYWdlcjovLzAzLzdlL2I5L2U2LzlkLzIxN2EtNDA1OC04Yjg0LTAwZGJlNzUwMjU3Yw.pdf",
"dokumentMimeType": "application/pdf",
"dokumentType": "AARSRAPPORT"
}, {
"dokumentUrl": "http://regnskaber.virk.dk/51968998/ZG9rdW1lbnRsYWdlcjovLzAzLzk0LzNlL2RjL2Q4L2I1NjUtNGJjZC05NzJmLTYyMmE4ZTczYWVhNg.xhtml",
"dokumentMimeType": "application/xhtml+xml",
"dokumentType": "AARSRAPPORT"
}, {
"dokumentUrl": "http://regnskaber.virk.dk/51968998/ZG9rdW1lbnRsYWdlcjovLzAzLzc5LzM3LzUwLzMxL2NjZWQtNDdiNi1hY2E1LTgxY2EyYjRmOGYzMw.xml",
"dokumentMimeType": "application/xml",
"dokumentType": "AARSRAPPORT"
}],
"offentliggoerelsesTidspunkt": "2015-04-20T10:53:09.075Z"
}
},
More specifically, I'm trying to extract all "dokumentUrl" where "dokumentMimeType" is equal to "application/xhtml+xml".
When I use something simple like this:
import json
from pprint import pprint
with open('output.json') as data_file:
data = json.load(data_file)
pprint(data['hits']['hits'][0]['_source']['dokumenter'][1]['dokumentUrl'])
I get the first URL that matches my criteria. But how do I create a list of all URLs (all 713.628 of them) from the file with the criteria mentioned above and export it to a CSV file?
I should probably mention that my end goal is to create a program that can loop scrape my list of URLs (I'll save that for another post!).

Hopefully I am understand this right, and #roganjosh has a similar idea. You can loop through the specific parts with contain lists of useful things. So, we can do something like:
myURL = []
hits = data['hits']['hits']
for hit in hits:
// Making the assumption here that you want all of the URLs associated with a given document
document = hit['_source']['dokumenter']
for url in document:
if url['dokumentMimeType'] == "application/xhtml+xml":
myURL.append(url['dokumentUrl'])
Again, I am hoping that I understand your JSON schema enough that this does what you want it to. At least it should get you close.
Also just saw another part of your question regarding CSV outputting.

Formatting JSON output

I have a JSON file with key value pair data. My JSON file looks like this.
{
"professors": [
{
"first_name": "Richard",
"last_name": "Saykally",
"helpfullness": "3.3",
"url": "http://www.ratemyprofessors.com/ShowRatings.jsp?tid=111119",
"reviews": [
{
"attendance": "N/A",
"class": "CHEM 1A",
"textbook_use": "It's a must have",
"review_text": "Tests were incredibly difficult (averages in the 40s) and lectures were essentially useless. I attended both lectures every day and still was unable to grasp most concepts on the midterms. Scope out a good GSI to get help and ride the curve."
},
{
"attendance": "N/A",
"class": "CHEMISTRY1A",
"textbook_use": "Essential to passing",
"review_text": "Saykally really isn't as bad as everyone made him out to be. If you go to his lectures he spends about half the time blowing things up, but if you actually read the texts before his lectures and pay attention to what he's writing/saying, you'd do okay. He posts practice tests that were representative of actual tests and curves the class nicely!"
}]
{
{
"first_name": "Laura",
"last_name": "Stoker",
"helpfullness": "4.1",
"url": "http://www.ratemyprofessors.com/ShowRatings.jsp?tid=536606",
"reviews": [
{
"attendance": "N/A",
"class": "PS3",
"textbook_use": "You need it sometimes",
"review_text": "Stoker is by far the best professor. If you put in the effort, take good notes, and ask questions, you will be fine in the class. As far as her lecture, she does go a bit fast, but her lecture is in the form of an outline. As long as you take good notes, you will have everything you need for exams. She is funny and super nice if you speak with her"
},
{
"attendance": "Mandatory",
"class": "164A",
"textbook_use": "Barely cracked it open",
"review_text": "AMAZING professor. She has a good way of keeping lectures interesting. Yes, she can be a little everywhere and really quick with her lecture, but the GSI's are useful to make sure you understand the material. Oh, and did I mention she's hilarious!"
}]
}]
So I'm trying to do multiple things.
I'm trying to get the most mentioned ['class'] key under reviews. Then get the class name and the times it was mentioned.
Then I'd like to output my format in this manner. Also under professor array. It's just the info of professors for instance for CHEM 1A, CHEMISTRY1A - It's Richard Saykally.
{
courses:[
{
"course_name" : # class name
"course_mentioned_times" : # The amount of times the class was mentioned
professors:[ #The professor array should have professor that teaches this class which is in my shown json file
{
'first_name' : 'professor name'
'last_name' : 'professor last name'
}
}
So I'd like to sort my json file key-value where I have max to minimum. So far all I've been able to figure out isd
if __name__ == "__main__":
open_json = open('result.json')
load_as_json = json.load(open_json)['professors']
outer_arr = []
outer_dict = {}
for items in load_as_json:
output_dictionary = {}
all_classes = items['reviews']
for classes in all_classes:
arr_info = []
output_dictionary['class'] = classes['class']
output_dictionary['first_name'] = items['first_name']
output_dictionary['last_name'] = items['last_name']
#output_dictionary['department'] = items['department']
output_dictionary['reviews'] = classes['review_text']
with open('output_info.json','wb') as outfile:
json.dump(output_dictionary,outfile,indent=4)

I think this program does what you want:
import json
with open('result.json') as open_json:
load_as_json = json.load(open_json)
courses = {}
for professor in load_as_json['professors']:
for review in professor['reviews']:
course = courses.setdefault(review['class'], {})
course.setdefault('course_name', review['class'])
course.setdefault('course_mentioned_times', 0)
course['course_mentioned_times'] += 1
course.setdefault('professors', [])
prof_name = {
'first_name': professor['first_name'],
'last_name': professor['last_name'],
}
if prof_name not in course['professors']:
course['professors'].append(prof_name)
courses = {
'courses': sorted(courses.values(),
key=lambda x: x['course_mentioned_times'],
reverse=True)
}
with open('output_info.json', 'w') as outfile:
json.dump(courses, outfile, indent=4)
Result, using the example input in the question:
{
"courses": [
{
"professors": [
{
"first_name": "Laura",
"last_name": "Stoker"
}
],
"course_name": "PS3",
"course_mentioned_times": 1
},
{
"professors": [
{
"first_name": "Laura",
"last_name": "Stoker"
}
],
"course_name": "164A",
"course_mentioned_times": 1
},
{
"professors": [
{
"first_name": "Richard",
"last_name": "Saykally"
}
],
"course_name": "CHEM 1A",
"course_mentioned_times": 1
},
{
"professors": [
{
"first_name": "Richard",
"last_name": "Saykally"
}
],
"course_name": "CHEMISTRY1A",
"course_mentioned_times": 1
}
]
}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to parse specific data from JSON request - python

I think you're looking for something like: tags = event["pri_tag"] for tag in tags: if tag['tag_id']==4: print(tag['name']) Output: Tag4

Related

How to create a tree using BFS in python?

Mongodb how to update many and set profile specific to id

Accessing nested objects with python

Parsing child nodes from JSON file with Python

Formatting JSON output

Categories

Resources