I have a JSON file that has movie data in it. I want to create a dictionary that has the movie title as the key and a count of how many actors are in that movie as the value. An example from the JSON file is below:
{
"title": "Marie Antoinette",
"year": "2006",
"genre": "Drama",
"summary": "Based on Antonia Fraser's book about the ill-fated Archduchess of Austria and later Queen of France, 'Marie Antoinette' tells the story of the most misunderstood and abused woman in history, from her birth in Imperial Austria to her later life in France.",
"country": "USA",
"director": {
"last_name": "Coppola",
"first_name": "Sofia",
"birth_date": "1971"
},
"actors": [
{
"first_name": "Kirsten",
"last_name": "Dunst",
"birth_date": "1982",
"role": "Marie Antoinette"
},
{
"first_name": "Jason",
"last_name": "Schwartzman",
"birth_date": "1980",
"role": "Louis XVI"
}
]
}
I have the following but it's counting all of the actors from all of the movies instead of each movie and the number of actors per movie. I'm not sure how to do this correctly as I'm newer to Python so help would be great.
import json
def actor_count(json_data):
with open("movies_db.json", 'r') as file:
data = json.load(file)
for t in data:
title = [t['title'] for t in data]
for element in data:
for actor in element['actors']:
rolee = [actor['role'] for movie in data for actor in movie['actors']]
len_role = [len(role)]
newD = dict(zip(title, len_role))
print(newD)
json_data = open('movies_db.json')
actor_count(json_data)
You show json that only contains a dictionary, yet you seem to process it as if it were a list of dictionaries with the structure you have shown. Pending clarification, I am answering here as if the latter is true -- you have a list of dictionaries, since you would be asking a different question about a different error if this was not the case.
In your function, each element of data is a dictionary that contains the information for a single movie. To get a dict correlating the title to the count of actors in this movie, you just need to access the "title" key and the length of the "actors" key for each element.
def actor_count(json_data):
movie_actors = {}
for movie in json_data:
title = movie["title"]
num_actors = len(movie["actors"])
movie_actors[title] = num_actors
return movie_actors
Alternatively, use a dictionary comprehension to build this dictionary:
def actor_count(json_data):
movie_actors = {movie["title"]: len(movie["actors"]) movie in json_data}
return movie_actors
Now, load your json file once, and use that when you call actors_count. This will return a dictionary mapping each movie title to the number of actors.
with open("movies_db.json", 'r') as file:
data = json.load(file)
actors_count(data)
Note that loading the json file again in the function is unnecessary, since you already did it before calling the function, and are passing the parsed object to the function.
If you want to keep your current logic of using list comprehensions, and then zipping the resultant lists to create a dict, that is also possible although slightly less efficient. There are significant changes you will need to make:
def actor_count(json_data):
title = [t['title'] for t in json_data]
n_actors = [len(t['actors'] for t in json_data)]
newD = dict(zip(title, n_actors))
return newD
As before, no need to read the file again in the function
You're already looping over all elements in json_data as part of the list comprehension, so no need for another loop outside this.
You can get the number of actors simply by len(t['actors'])
You seem to have misconceptions about how list comprehensions and loops work. A list comprehension is a self-contained loop that builds a list. If you have a list comprehension, there's usually no need to surround it by the same for ... in ... statement that already exists in the comprehension.
def actor_count(json_data):
newD = dict()
with open("movies_db.json", 'r') as file:
data = json.load(file)
for t in data:
if t == 'title':
title_ = json_data[t]
newD[ title_ ] = 0
if t == 'actors':
newD[ title_ ] = len(json_data[t])
print(newD)
Output:
{'Marie Antoinette': 2}
Related
I am learning python, and I have two json files. The data structure in these two json files are different structures.
I start by importing both of the json files. I want to choose a course from the courses dict, and then add it to a specific education in the educations dict.
What I want to solve is via user input choose a key from the first dict, and then within a while loop, so I can add choose a key from the second dict to be added to the dict chosen from the first dict.
I am able to add the dict from the second dict to the one first as a sub dict as I want to, but with the update method it overwrites all previous values.
I have used the dict.update() method so not to overwrite previous values. I then want to write back the updated dict back to the first json file.
My code works partially, I am able to add a course to a educations, but it overwrites all previous courses I chose to add to a specific education.
This is the content of the first json file:
{
"itsak22": {
"edcuationId": "itsak22",
"edcuation_name": "cybersecurityspecialist"
},
"feu22": {
"edcuationId": "feu22",
"edcuation_name": "frontendutvecklare"
}
}
This is the content of the second json file:
{
"sql": {
"courseId": "itsql",
"course_name": "sql",
"credits": 35
},
"python": {
"courseId": "itpyt",
"course_name": "python",
"credits": 30
},
"agile": {
"courseId": "itagl",
"course_name": "agile",
"credits": 20
}
}
And this is my python code:
import json
# Load the first JSON file of dictionaries
with open('edcuations1.json') as f:
first_dicts = json.load(f)
# Load the second JSON file of dictionaries
with open('courses1.json') as f:
second_dicts = json.load(f)
# Print the keys from both the first and second JSON files
print("All educations:", first_dicts.keys())
print("All courses:", second_dicts.keys())
# Ask for input on which dictionary to add to which
first_key = input("Which education would you like to choose to add courses to? (Enter 'q' to quit): ")
while True:
second_key = input("Which course would you like to add to education? (Enter 'q' to quit)")
if second_key == 'q':
break
# Create a sub-dictionary named "courses" in the specific dictionary of the first file
if "courses" not in first_dicts[first_key]:
first_dicts[first_key]["courses"] = {}
first_dicts[first_key]["courses"].update(second_dicts[second_key])
#first_dicts = {**first_dicts, **second_dicts}
#first_dicts.update({'courses': second_dicts})
# Update the first JSON file with the new dictionaries
with open('edcuations1.json', 'w') as f:
json.dump(first_dicts, f, indent=4)
Here is my approach:
import json
# Load the first JSON file of dictionaries
with open("educations1.json") as f:
educations = json.load(f)
# Load the second JSON file of dictionaries
with open("courses1.json") as f:
courses = json.load(f)
# Print the keys from both the first and second JSON files
print("All educations:", educations.keys())
print("All courses:", courses.keys())
# Ask for input on which dictionary to add to which
education_key = input(
"Which education would you like to choose to add courses to? (Enter 'q' to quit): "
)
education_courses = educations[education_key].setdefault("courses", {})
while True:
course_key = input(
"Which course would you like to add to education? (Enter 'q' to quit): "
)
if course_key == "q":
break
education_courses[course_key] = courses[course_key]
# Update the first JSON file with the new dictionaries
with open("educations1.json", "w") as stream:
json.dump(educations, stream, indent=4)
A few notes
I fixed the typos: edcuations1.json -> educations1.json
Instead of generic names such as first_dicts, first_keys, ... I use more descriptive names
How it works
The heart of my solution is on this line:
education_courses = educations[education_key].setdefault("courses", {})
Which is the equivalent of:
if "courses" not in educations[education_key]:
educations[education_key]["courses"] = {}
education_courses = educations[education_key]["courses"]
The setdefault method basically assign a value (an empty dictionary in this case) to a dictionary if the key ("courses" in this case) is absent.
I'm not entirely sure how your desired result should look like but I think your dictionary courses should be a list and not a dictionary.
Then you can do
if "courses" not in first_dicts[first_key]:
first_dicts[first_key]["courses"] = []
first_dicts[first_key]["courses"].append (second_dicts[second_key])
And your result looks like this if you add all courses to itsak22
{
"itsak22": {
"edcuationId": "itsak22",
"edcuation_name": "cybersecurityspecialist",
"courses": [
{
"courseId": "itsql",
"course_name": "sql",
"credits": 35
},
{
"courseId": "itpyt",
"course_name": "python",
"credits": 30
},
{
"courseId": "itagl",
"course_name": "agile",
"credits": 20
}
]
},
"feu22": {
"edcuationId": "feu22",
"edcuation_name": "frontendutvecklare"
}
}
I'm trying to do a count of the employees titles
I've tried alot but I dont think I've applied them correctly to the scenario.
employees = [
{
"email": "jonathan2532.calderon#gmail.com",
"employee_id": 101,
"firstname": "Jonathan",
"lastname": "Calderon",
"title": "Mr",
"work_phone": "(02) 3691 5845"
}]
EDIT:
from collections import Counter
class Employee:
def __init__(self, title,):
self.title = title
title_count = Counter()
for employee in [Employee("title") for data in employees]:
title_count[employee.title,] += 1
print(title_count)
Counter({('title',): 4})
I can't seem to get the specific names there.
In your example, for title in employees actually yields a dict object in every iteration since employees is a list of dict objects. While the Counter accepts a dict mapping as input, it isn't quite what you're looking for. The cnt['title'] simply increases the count by 1 for each iteration, effectively counting the number of dict objects in the employees list.
To count by titles, you have to unpack each of the dict object in your list first.
from collections import Counter
titles = [e['title'] for e in employees]
>>>Counter(titles)
Counter({'Mr': 2, 'Mrs': 1, 'Ms': 1})
A few things here, welcome to stack overflow. Please read how to ask a good question. Next, python is trying to help you out with the error it is giving you.
Try copying and pasting a portion of the error into google. Then, visit the docs on the data type you are trying to use. I think your question has been edited, but yeah––it will still help.
Finally, we need to see a minimal, complete, and verifiable example. So, code, we need to see what kind of code you're attempting to solve your problem with.
It helps to think about the structure of your data:
from collections import Counter
class Employee:
def __init__(self, title, employee_id):
# all other fields omitted
self.title = title
self.employee_id = employee_id
Here is some minimal data for your problem (arguably you could use a little less).
employees = [
{
"title": "Mr",
"employee_id": 1
},
{
"title": "Mr",
"employee_id": 2
},
{
"title": "Mrs",
"employee_id": 3
},
{
"title": "Ms",
"employee_id": 4
}
]
Define other necessary data structures.
title_count = Counter()
# Just to demo results.
for employee in [Employee(**data) for data in employees]:
print(f"title: {employee.title} id: {employee.employee_id}")
I'll leave the **data notation up to google. But now you have some well-structured data and can process it accordingly.
# Now we have some Employee objects with named fields that are
# easier to work with.
for employee in [Employee(**data) for data in employees]:
title_count[employee.title] += 1
print(title_count) # Counter({'Mr': 2, 'Mrs': 1, 'Ms': 1})
I call an api via python and return this code as response:
{
"cast": [
{
"character": "Power",
"name": "George"
},
{
"character": "Max",
"job": "Sound",
"name": "Jash"
},
{
"character": "Miranda North",
"job": "Writer",
"name": "Rebecca"
}
]
}
I am trying to get the value of Rebecca because i need to get the Writer.
So i wrote:
for person in cast # cast is the variable keeps whole code above NOT inside the dict:
if person["job"] == "Writer":
writer = person["name"]
but it gives me:
KeyError at search/15
u'job'
how can i get the value?
FULL CODE:
writer = ""
for person in api['cast']:
if person.get('job') == 'Writer':
writer = person.get('name')
return render(request, 'home.html', {
'writer': writer
})
home.html:
<p>{{writer}}</p>
That's because not all elements in the list have the job key.
Change to:
for person in cast #whole code above:
if person.get('job') == 'Writer':
writer = person.get('name')
One liner to find one writer.
writer = next((person for person in api['cast'] if person.get('job') == 'Writer'), None)
One liner to find all writers.
writers = [person for person in api['cast'] if person.get('job') == 'Writer']
Syntax for dictionary get() method:
dict.get(key, default=None)
Parameters
key: This is the Key to be searched in the dictionary.
default: This is the Value to be returned in case key does not exist.
You need to specify the default value for get in case the key doesn't exist.
>>> for person in api['cast']:
... if person.get('job', '') == 'Writer':
... writer = person.get('name')
person.get(u"job") == "Writer"
for person in cast["cast"]:
# cast is the variable keeps whole code above NOT inside the dict
if person["job"] == "Writer":
writer = person["name"]
try this
cast["cast"] == Value of Key "cast" , which in turn is list of Dicts
and for looping through each Dictionary as person
I've got a json file that I've pulled from a web service and am trying to parse it. I see that this question has been asked a whole bunch, and I've read whatever I could find, but the json data in each example appears to be very simplistic in nature. Likewise, the json example data in the python docs is very simple and does not reflect what I'm trying to work with. Here is what the json looks like:
{"RecordResponse": {
"Id": blah
"Status": {
"state": "complete",
"datetime": "2016-01-01 01:00"
},
"Results": {
"resultNumber": "500",
"Summary": [
{
"Type": "blah",
"Size": "10000000000",
"OtherStuff": {
"valueOne": "first",
"valueTwo": "second"
},
"fieldIWant": "value i want is here"
The code block in question is:
jsonFile = r'C:\Temp\results.json'
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Summary"]:
print(i["fieldIWant"])
Not only am I not getting into the field I want, but I'm also getting a key error on trying to suss out "Summary".
I don't know how the indices work within the array; once I even get into the "Summary" field, do I have to issue an index manually to return the value from the field I need?
The example you posted is not valid JSON (no commas after object fields), so it's hard to dig in much. If it's straight from the web service, something's messed up. If you did fix it with proper commas, the "Summary" key is within the "Results" object, so you'd need to change your loop to
with open(jsonFile, 'w') as dataFile:
json_obj = json.load(dataFile)
for i in json_obj["Results"]["Summary"]:
print(i["fieldIWant"])
If you don't know the structure at all, you could look through the resulting object recursively:
def findfieldsiwant(obj, keyname="Summary", fieldname="fieldIWant"):
try:
for key,val in obj.items():
if key == keyname:
return [ d[fieldname] for d in val ]
else:
sub = findfieldsiwant(val)
if sub:
return sub
except AttributeError: #obj is not a dict
pass
#keyname not found
return None
Apologies if the answer to this is obvious - I'm very new to django/python & haven't been able to find a solution in my searching so far.
I have a straightforward queryset, eg
members = LibraryMembers.objects.all()
with this I can do:-
for m in members:
member_books = LibraryBorrows.objects.filter(member_id=m[u'id'])
What I really want though is to be able to serialize the results into json, so it looks something like this:-
{
"members":
[
{
"id" : "1",
"name" : "Joe Bloggs"
"books":
[
{
"name" : "Five Go Exploring",
"author" : "Enid Blyton",
},
{
"name" : "Princess of Mars",
"author" : "Edgar Rice Burroughs",
},
]
}
]
}
To my mind, the obvious thing to try was:-
for m in members:
m[u'books'] = LibraryBorrows.objects.filter(member_id=m[u'id'])
However I'm getting TypeError: 'LibraryBorrows' object does not support item assignment
Is there any way to achieve what I'm after?
Model instances are not indeed not dicts. Now if you want dicts instead of model instances, then Queryset.values() is your friend - you get a list of dicts with only the required fields, and you avoid the overhead of retrieving unneeded fields from the database and building full-blown model instances.
>> members = LibraryMember.objects.values("id", "name")
>> print members
[{"id" : 1, "name" : "Joe Bloggs"},]
Then you code would look like:
members = LibraryMember.objects.values("id", "name")
for m in members:
m["books"] = LibraryBorrows.objects.filter(
member_id=m['id']
).values("name", "author")
Now you still have to issue one additionnal db query for each parent row which may not be that efficient, depending on the number of LibraryMember. If you have hundreds or more LibraryMember, a better approach would be to query on the LibraryBorrow instead, including the related fields from LibraryMember, then regroup the rows based on LibraryMember id, ie:
from itertools import group_by
def filter_row(row):
for name in ("librarymember__id", "librarymember__name"):
del row[name]
return row
members = []
rows = LibraryBorrow.objects.values(
'name', 'author', 'librarymember__id', 'librarymember__name'
).order_by('librarymember__id')
for key, group in group_by(rows, lambda r: r['librarymember__id']):
group = list(group)
member = {
'id' : group[0]['librarymember_id'],
'name':group[0]['librarymember_name']
'books' = [filter_row(row) for row in group]
}
members.append(member)
NB : this can be seen as premature optimization (and would be if you only have a couple LibraryMember in your db), but trading hundreds or more queries for one single query and a bit of postprocessing usually makes a real difference for "real life" datasets.
Well m is a LibraryMember object so you won't be able to treat it as a dictionary. As a side note: Most people don't name the models in plural form since they are just a class modeling an object, not a collection of objects.
One possible solution is to make a list of dictionaries with the values that you need from both objects, something like this in a one-liner:
o = [ { "id": m.id, "name": m.name, "books": [{"name": b.name, "author": b.author} for b in m.libraryborrows_set.all()] } for m in LibraryMembers.objects.all()]
Note that you can use the related manager to get the books for a given member. For better clarity:
o = []
for m in LibraryMembers.objects.all():
member_books = [{"name": b.name, "author": b.author} for b in m.libraryborrows_set.all()]
o.append( { "id": m.id, "name": m.name, "books": member_books } )
EDIT:
To serialize all the fields:
members = []
for member in LibraryMembers.objects.all():
member_details = {}
for field in member._meta.get_all_field_names():
member_details[field] = getattr(member, field)
books = []
for book in member.librayborrows_set.all():
book_details = {}
for field in book._meta.get_all_field_names():
book_details[field] = getattr(book, field)
books.append(book_details)
member_details['books'] = books
members.append(member_details)
I also found DjangoFullSerializers which I hadn't heard about until today:
http://code.google.com/p/wadofstuff/wiki/DjangoFullSerializers