Related
How could I try and split my JSON with all the movies and send them separately to the finalScore function and then append to my list at the end?
Sorry the explain and code was big, I was really unsure of describing and showing my problem without showing what I have done it.
This is my current code:
import datetime, json
def jsonData(data):
return json.loads(data)
def findContentInformation(content):
contentData = convert_content_data(content)
for info in contentData:
contentBaseScore = info['popularityScore']
contentBrand = info['brand']
contentType = info['contentType']
contentName = info['title']
contentInformation = [contentBaseScore, contentBrand, contentType, contentName]
return contentInformation
#Calculator the overall rating for the film
def getRating(content_data, userBrandRate, userTypeRate):
contentScore = {}
#RATING
rating = 0
# Collecting information from the content to be tested
contentInfo = findContentInformation(content_data) # The content being tested
popularityScore += contentInfo[0] #Find base score and add this to popScore
#getBrandRating = str((userBrandPreference[contentInfo[1]])) # Get brand preference
# Check if brand preference rating is a valid rating type
if brandRating in Ratings:
popularityScore += Ratings[brandRating] # Get the ratings score & update popScore
else:
print("Unrecognized rating value found in this search")
user_content_type_preference = convert_type_preferences(content_type_preferences)
typeRating = getTypeRating(user_content_type_preference, contentInfo) # Get the type rating
# Check if type rating is a valid rating
if typeRating in Ratings:
popularityScore += Ratings[typeRating] # Update the popScore based on the rating score
else:
print("Unrecognized rating value found in this search")
contentScore[contentInfo[3]] = popularityScore
popularityScore = 0
return contentScore
result = getRating(content_data)
My output with only one movie (not sure how to use all the movies in the JSON)
JSON string:
content_data = """[{ "title": "Spider-Man", "brand": "Marvel",
"Rating": 98, "contentIs": "movie" }]"""
Output:
[{'Spider-Man': 128}]
To me it feels like you're making things unnecessarily complex. For example, you have three functions (convert_content_data, convert_preferences and convert_type_preferences) that all do the same thing - they all take one JSON encoded string and parse it. Turning those three functions into one would still be one too many functions, because I don't think json.loads, as a single side-effect, is a good candidate for an entirely separate function in the first place.
You do quite a bit of conversion also - from a JSON encoded string to a dictionary. You do that multiple times. Why not convert all your JSON once at the start of the program? Doing so will let you work with dictionaries for the rest of your tasks. Once you have a list of dictionaries, you can think of each dictionary as being one "movie-object", since that is what each dictionary represents. The brand- and content type JSON strings can also be converted once at the start of the program (instead of multiple times throughout the entire program).
EDIT - I've updated my example code.
First, I think you should put your movie data in a separate JSON file, so that you're not poluting your source code with a huge string literal. Let's name it movies.json:
[
{
"title": "Spider-Man",
"brand": "Marvel",
"availability": ["CA","FR","US"],
"availableDate": "2019-11-12T05:00:00.000Z",
"isKidsContent": true,
"popularityScore": 98,
"contentType": "movie"
},
{
"title": "Float",
"brand": "Pixar",
"availability": ["US"],
"availableDate": "2019-11-12T05:00:00.000Z",
"isKidsContent": true,
"popularityScore": 87,
"contentType": "short"
},
{
"title": "Avatar",
"brand": "21st Century Fox",
"availability": ["US","CA","FR","ES","DE"],
"availableDate": "2019-11-12T05:00:00.000Z",
"isKidsContent": false,
"popularityScore": 99,
"contentType": "movie"
},
{
"title": "Chapter 1: The Mandalorian",
"brand": "Star Wars", "availability": ["US","CA"],
"availableDate": "2019-11-02T23:00:00.000Z",
"isKidsContent": false,
"popularityScore": 92,
"contentType": "series"
},
{
"title": "Marvel Studios Avengers: Endgame",
"brand": "Marvel",
"availability": ["CA","FR","ES","DE","US"],
"availableDate": "2019-11-11T23:00:00.000Z",
"isKidsContent": false,
"popularityScore": 87,
"contentType": "movie"
},
{
"title": "Disney Mickey Mouse Clubhouse: Mickey Goes Fishing",
"brand": "Disney",
"availability": ["US"],
"availableDate": "2019-09-11T22:00:00.000Z",
"isKidsContent": true,
"popularityScore": 75,
"contentType": "series"
},
{
"title": "Disney High School Musical: The Musical: The Series: Act Two",
"brand": "Disney",
"availability": ["US","FR","ES"],
"availableDate": "2020-01-10T08:00:00.000Z",
"isKidsContent": false,
"popularityScore": 97,
"contentType": "series"
}
]
Then, I would also create a JSON file for all of your users. This is where you would store the user preferences. Let's name it users.json:
[
{
"name": "Bob",
"preferences": {
"brand": {
"Star Wars": "love",
"Disney": "like",
"Marvel": "dislike",
"Pixar": "dislike"
},
"contentType": {
"movie": "like",
"series": "like",
"short": "dislike"
}
}
},
{
"name": "Joe",
"preferences": {
"brand": {
"Star Wars": "dislike",
"Disney": "dislike",
"Marvel": "dislike",
"Pixar": "dislike"
},
"contentType": {
"movie": "like",
"series": "like",
"short": "dislike"
}
}
}
]
This users.json file has two users named Bob and Joe, with different preferences.
Then, the code:
def evaluate_score(user, movie):
"""
Evaluates and returns the score a user would assign to
a given movie based on the user's brand- and content-type preferences.
"""
ratings = {
"dislike": -20,
"indifferent": 0,
"like": 10,
"adore": 30,
"love": 50
}
brand_score = ratings.get(user["preferences"]["brand"].get(movie["brand"])) or 0
content_type_score = ratings.get(user["preferences"]["contentType"].get(movie["contentType"])) or 0
return movie["popularityScore"] + brand_score + content_type_score
def get_all_scores(user, movies):
for movie in movies:
yield {
"title": movie["title"],
"score": evaluate_score(user, movie)
}
def main():
import json
from operator import itemgetter
with open("movies.json", "r") as file:
movies = json.load(file)
with open("users.json", "r") as file:
users = json.load(file)
for user in users:
print(user["name"].center(16, "-"))
for movie in sorted(get_all_scores(user, movies), key=itemgetter("score"), reverse=True):
print("{}: {}".format(movie["title"], movie["score"]))
print()
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output:
------Bob-------
Chapter 1: The Mandalorian: 152
Disney High School Musical: The Musical: The Series: Act Two: 117
Avatar: 109
Disney Mickey Mouse Clubhouse: Mickey Goes Fishing: 95
Spider-Man: 88
Marvel Studios Avengers: Endgame: 77
Float: 47
------Joe-------
Avatar: 109
Spider-Man: 88
Disney High School Musical: The Musical: The Series: Act Two: 87
Chapter 1: The Mandalorian: 82
Marvel Studios Avengers: Endgame: 77
Disney Mickey Mouse Clubhouse: Mickey Goes Fishing: 65
Float: 47
>>>
We've got two functions and one generator:
evaluate_score (which I called get_movie_score before) takes a user dictionary and a movie dictionary, and returns the score (an integer) which this user would assign to the given movie, based on that user's preferences.
get_all_scores is a generator that takes a user and a list of movie dictionaries. It gets the scores for all movies, according to that user, and yields dictionaries - where each dictionary contains two key-value pairs: The movie title, and the final score assigned by that user. This generator will be userful later in the main function, when we want to print the final result in descending order.
main is the main entry-point of the entire script. It first opens and parses our two JSON files, and then, for every user, print a sorted summary (in descending order based on score) of that user's scores for all movies.
Ahoy,
I have a document that looks like this:
{"_id": "123abc456def",
"name": "John Smith",
"address": [
{"street": "First St.", "date": "yesterday", "last_updated": "two days ago"}
],
"age": 123}
I try to add another street document using $push, it errors out with:
pymongo.errors.WriteError: The field 'address' must be an array but is of type object in document {_id: ObjectId('6049e88657e43d8801197c72')}
Code I'm using:
mydb3 = myclient["catalogue"]
mycolALL = mydb3["locations"]
query = {"charID": 0}
newvalue = {"$push": {"address": {"street": "test123", "date": "test123", "last_updated": "now123"}}}
mycolALL.update_one(query, newvalue)
Not making an address book or anything, just edited it so it makes a bit more sense to anyone without context.
My desired output would be that the document would look like this:
{"_id": "123abc456def",
"name": "John Smith",
"address": [
{"street": "First St.", "date": "yesterday", "last_updated": "two days ago"},
{"street": "test123", "date": "test123", "last_updated": "now123"}
],
"age": 123}
Normally I can google my way to an answer that makes the coin drop and JACKPOT! but this time I'm outta luck.
$set = it just changes the existing document, effectively replacing it. Which is not what I want.
$addToSet = for arrays only, error message: "pymongo.errors.WriteError: Cannot apply $addToSet to non-array field. Field named 'address' has non-array type object"
Anyone that can help?
Just a guess but are you sure you're looking at the right data / database.
Based on the data you posted your update_one() won't update that record because it doesn't match your filter {"charID": 0}
I am trying to filter out data from API JSON response with Python and I get weird results. I would be glad if somebody can guide me how to deal with the situation.
The main idea is to remove irrelevant data in the JSON and keep only the data that is associated with particular people which I hold in a list.
Here is a snip of the JSON file:
{
"result": [
{
"number": "Number1",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "John Doe",
"link": "https://some_link.com"
}
},
{
"number": "Number2",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-10 11:07:13",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Tyrell Greenley",
"link": "https://some_link.com"
}
},
{
"number": "Number3",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-20 10:23:35",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Delmar Vachon",
"link": "https://some_link.com"
}
},
{
"number": "Number4",
"short_description": "Some Description",
"assignment_group": {
"display_value": "Some value",
"link": "https://some_link.com"
},
"incident_state": "Closed",
"sys_created_on": "2020-03-30 11:51:24",
"priority": "4 - Low",
"assigned_to": {
"display_value": "Samual Isham",
"link": "https://some_link.com"
}
}
]
}
Here is the Python code:
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
# Load JSON file
with open('extract.json', 'r') as input_file:
input_data = json.load(input_file)
# Create a function to clear the data
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in data:
print(elem['assigned_to']['display_value'] not in users)
if elem['assigned_to']['display_value'] not in users:
print('Removing {} from JSON as not present in list of names.'.format(elem['assigned_to']['display_value']))
data.remove(elem)
else:
print('Keeping the record for {} in JSON.'.format(elem['assigned_to']['display_value']))
return data
cd = clear_data(input_data['result'], users_test)
And here is the output, which seems to iterate through only 2 of the items in the file:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
Process finished with exit code 0
It seems that the problem is more or less related to the .remove() method however I don't find any other suitable solution to delete these particular items that I do not need.
Here is the output of the iteration without applying the remove() method:
True
Removing John Doe from JSON as not present in list of names.
True
Removing Tyrell Greenley from JSON as not present in list of names.
True
Removing Delmar Vachon from JSON as not present in list of names.
False
Keeping the record for Samual Isham in JSON.
Process finished with exit code 0
Note: I have left the check for the name visible on purpose.
I would appreciate any ideas to sort out the situation.
If you don't need to log info about people you are removing you could simply try
filtered = [i for i in data['result'] if i['assigned_to']['display_value'] in users_test]
users_test = ['Ahmad Wickert', 'Dick Weston', 'Gerardo Salido', 'Rosendo Dewey', 'Samual Isham']
solution = []
for user in users_test:
print(user)
for value in data['result']:
if user == value['assigned_to']['display_value']:
solution.append(value)
print(solution)
for more efficient code, as asked by #NomadMonad
solution = list(filter(lambda x: x['assigned_to']['display_value'] in users_test, data['result']))
You are modifying a dictionary while at the same time iterating through it. Check out this blog post which describes this behavior.
A safer way to do this is to make a copy of your dictionary to iterate over, and to delete from your original dictionary:
import copy
def clear_data(data, users):
"""Filter out the data and leave only records for the names in the users_test list"""
for elem in copy.deepcopy(data): # deepcopy handles nested dicts
# Still call data.remove() in here
I collected public course data from Udemy and put it all in a json file. Each course has an identifier number under which all the data is stored. I can perfectly list out any details I want, except for these identifier numbers.
How can I list out these numbers themselves? Thanks.
{
"153318":
{
"lectures data": "31 lectures, 5 hours video",
"instructor work": "Academy Of Technical Courses, Grow Your Skills Today",
"title": "Oracle Applications R12 Order Management and Pricing",
"promotional price": "$19",
"price": "$20",
"link": "https://www.udemy.com/oracle-applications-r12-order-management-and-pricing/",
"instructor": "Parallel Branch Inc"
},
"616990":
{
"lectures data": "24 lectures, 1.5 hours video",
"instructor work": "Learning Sans Location",
"title": "Cloud Computing Development Essentials",
"promotional price": "$19",
"price": "$20",
"link": "https://www.udemy.com/cloud-computing-development-essentials/",
"instructor": "Destin Learning"
}
}
You want the keys of that dictionnary.
import json
with open('course.json') as json_file:
course=json.load(json_file)
print course.keys()
giving :
[u'616990', u'153318']
Parse the json into a python dict, then loop over the keys
parsed = json.loads(input)
for key in parsed.keys():
print(key)
I'm trying to search a data file, for example Yelp.json. It has businesses in it in LA, Boston, DC.
I wrote this:
# Python 2
# read json
with open('updated_data.json') as facts_data:
data = json.load(facts_data)
# return every unique locality along with how often it occurs
locality = []
unique_locality = []
# Load items into lists
for item in data:
locality.append(data["payload"]["locality"])
if data["payload"]["locality"] not in unique_locality:
print unique_locality.append(data["payload"]["locality"])
# Loops over unique_locality and count from locality
print "Unique Locality Count:", unique_locality, locality.count(data["payload"]["locality"])
But I get an answer of "Portsmouth 1", which means it is not providing all the cities and might not even be provided all the counts. My goal for this section is to search that JSON file and have it say "DC: 10 businesses, LA: 20 businesses, Boston: 2 businesses." Each payload is a grouping of info about a single business and "locality" is just the city. So I want it to find how many unique cities there are and then how many businesses in each city. So one payload could be Starbucks in la, another payload could be Starbucks in dc, another could be Chipotle in la.
Example of JSON file (JSONlite.com says its valid):
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.9459720|-2.1971226|20|within_50m|4\"]",
"latitude": "56.945972",
"locality": "Stonehaven",
"_records_touched": "{\"crawl\":8,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "The Lodge, Dunottar",
"email": "dunnottarcastle#btconnect.com",
"existence_ml": 0.5694238217658721,
"domain_aggregate": "",
"name": "Dunnottar Castle",
"search_tags": ["Dunnottar Castle Aberdeenshire", "Dunotter Castle"],
"admin_region": "Scotland",
"existence": 1,
"category_labels": [
["Landmarks", "Buildings and Structures"]
],
"post_town": "Stonehaven",
"region": "Kincardineshire",
"review_count": "719",
"geocode_level": "within_50m",
"tel": "01569 762173",
"placerank": 65,
"longitude": "-2.197123",
"placerank_ml": 37.27916073464469,
"fax": "01330 860325",
"category_ids_text_search": "",
"website": "http://www.dunnottarcastle.co.uk",
"status": "1",
"geocode_confidence": "20",
"postcode": "AB39 2TL",
"category_ids": [108],
"country": "gb",
"_geocode_quality": "4",
"uuid": "3867aaf3-12ab-434f-b12b-5d627b3359c3"
},
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.237480|-5.073578|20|within_50m|4\"]",
"latitude": "56.237480",
"locality": "Inveraray",
"_records_touched": "{\"crawl\":11,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "Cherry Park",
"email": "enquiries#inveraray-castle.com",
"longitude": "-5.073578",
"domain_aggregate": "",
"name": "Inveraray Castle",
"admin_region": "Scotland",
"search_tags": ["Inveraray Castle Tea Room", "Inverary Castle"],
"existence": 1,
"category_labels": [
["Social", "Food and Dining", "Restaurants"]
],
"region": "Argyll",
"review_count": "532",
"geocode_level": "within_50m",
"tel": "01499 302203",
"placerank": 67,
"post_town": "Inveraray",
"placerank_ml": 41.19978087352266,
"fax": "01499 302421",
"category_ids_text_search": "",
"website": "http://www.inveraray-castle.com",
"status": "1",
"geocode_confidence": "20",
"postcode": "PA32 8XE",
"category_ids": [347],
"country": "gb",
"_geocode_quality": "4",
"existence_ml": 0.7914881102847783,
"uuid": "8278ab80-2cd1-4dbd-9685-0d0036b681eb"
},
If your "json" semantics is something like
{"payload":{ CONTENT_A }, "payload":{ CONTENT_B }, ..., "payload":{ CONTENT_LAST }}
it is a valid json string, but after you json.loads the string, it will be evaluated as
{"payload":{ CONTENT_LAST }}
And that is why you end up with one city and one business count.
You can verify this behaviour on this online json parser http://json.parser.online.fr/ by checking JS eval field.
In this case, one way to preprocess your json string is to get rid of the dummy "payload" key and wrap the content dictionary directly in a list. You will have a json string in the following format.
{[{CONTENT_A}, {CONTENT_B} ..., {CONTENT_LAST} ]}
Assume your json string is now a list of payload dictionary, and you have json.loads(json_str) to data.
As you iterate through json payload, build a lookup table along the way.
This will handle duplicated city for you automatically since business in the same city will be hashed to the same list.
city_business_map = {}
for payload in data:
city = payload['locality']
business = payload['name']
if city not in city_business_map:
city_business_map[city] = []
city_business_map[city].append(business)
Then later on, you can easily present the solution by
for city, business_list in city_business_map.items():
print city, len(business_list)
If you want to count the unique business in each city, initialize the value to set instead of list.
If this is an overkill, instead of initialize to list or set, just associate a counter with each key.