How to select, map and count data from JSON API with Python? - python

I am new to Python and am struggling to find the right method for the following:
I have 2 API responses, one is a list of devices, the other one is a list of organizations.
Each device is linked to an organization with an Organization ID.
organizations = [
{
'name': 'Aperture Science Inc.',
'description': 'Just a corporation!',
'id': 1
},
{
'name': 'Software Development Inc',
'description': "Making the world's next best app!",
'id': 2
}
]
devices = [
{
'id': 1,
'organizationId': 2,
'nodeClass': 'WINDOWS_WORKSTATION',
'displayName': 'DESKTOP_01'
},{
'id': 2,
'organizationId': 2,
'nodeClass': 'WINDOWS_SERVER',
'displayName': 'SERVER_01'
},{
'id': 3,
'organizationId': 1,
'nodeClass': 'WINDOWS_WORSTATION',
'displayName': 'DESKTOP_0123'
}
]
The OrganizationID in devices = the id in organizations.
I want to get a result with the number of Servers and workstations respectively for each organizations, like this:
results = [
{
'Organization Name' : 'Aperture Science Inc.',
'Number of Workstations': 1,
'Number of Servers': 0,
'Total devices': 1
},
{
'Organization Name' : 'Software Development Inc',
'Number of Workstations': 1,
'Number of Servers': 1,
'Total devices': 2
}
I started with this
wks_sum = sum(d.nodeClass == "WINDOWS_WORKSTATION" for d in devices)
print(wks_sum)
but I get this error:
AttributeError: 'dict' object has no attribute 'nodeClass'
and at the very end I convert and save in a csv file:
df = pd.DataFrame(results)
df.to_csv('results.csv', index=False)
I am struggling doing the count of each device types and also to map devices to the right organization name and would really appreciate some help :)
EDIT:
Thanks to #Vincent, I could come up with:
for device in devices:
for organization in organizations:
organization["workstations"] = organization.get("workstations", [])
organization["servers"] = organization.get("servers", [])
if device["organizationId"] != organization["id"]:
continue
if device["nodeClass"].__eq__("WINDOWS_SERVER"):
organization["servers"].append(device["nodeClass"])
elif device["nodeClass"].__eq__("WINDOWS_WORKSTATION"):
organization["workstations"].append(device["nodeClass"])
break
results = [
{
"Organization Name": organization["name"],
"Number of Workstations": len(organization["workstations"]),
"Number of Servers": len(organization["servers"]),
"Total devices": len(organization["workstations"] + organization["servers"]),
} for organization in organizations
]
# print(f"{results = }")
print(results)
# convert and save in a csv file
df = pd.DataFrame(results)
df.to_csv('results.csv', index=False)

This code will achieve you goal:
organizations = [
{
'name': 'Aperture Science Inc.',
'description': 'Just a corporation!',
'id': 1
},
{
'name': 'Software Development Inc',
'description': "Making the world's next best app!",
'id': 2
}
]
devices = [
{
'id': 1,
'organizationId': 2,
'nodeClass': 'WINDOWS_WORKSTATION',
'displayName': 'DESKTOP_01'
},{
'id': 2,
'organizationId': 2,
'nodeClass': 'WINDOWS_SERVER',
'displayName': 'SERVER_01'
},{
'id': 3,
'organizationId': 1,
'nodeClass': 'WINDOWS_WORSTATION',
'displayName': 'DESKTOP_0123'
}
]
for device in devices:
for organization in organizations:
organization["workstations"] = organization.get("workstations", [])
organization["servers"] = organization.get("servers", [])
if device["organizationId"] != organization["id"]:
continue
if device["displayName"].startswith("SERVER_"):
organization["servers"].append(device["nodeClass"])
elif device["displayName"].startswith("DESKTOP_"):
organization["workstations"].append(device["nodeClass"])
break
results = [
{
"Organization Name": organization["name"],
"Number of Workstations": len(organization["workstations"]),
"Number of Servers": len(organization["servers"]),
"Total devices": len(organization["workstations"] + organization["servers"]),
} for organization in organizations
]
print(f"{results = }")
Result:
[{'Organization Name': 'Aperture Science Inc.', 'Number of Workstations': 1, 'Number of Servers': 0, 'Total devices': 1}, {'Organization Name': 'Software Development Inc', 'Number of Workstations': 1, 'Number of Servers': 1, 'Total devices': 2}]
Indeed you can do it using obscure lib such as pandas, but I think a good slow code like this is better to know what is done and easier to modify if needed.
To deal with a huge amount of data, you should dump into two sql tables using sqlite3 for example and deal with SQL.

Related

Taking API response and adding it to json, AttributeError: 'dict' object has no attribute 'append'

I am making an API request and want to add the response to a JSON. Then subsequent request responses adding to the same JSON file.
I have separated out the block of code that isn't working, adding just one API call and dealing with the request. The issue is I cannot write the JSON file with this info. When trying I get the error "AttributeError: 'dict' object has no attribute 'append'" I, therefore, presumed my result from the API request is a dictionary. I then tried, in about 4 ways, to convert this into a list to allow the append. Obviously, none of these methods worked.
import json
import requests
fname = "NewdataTest.json"
request_API = requests.get("https://api.themoviedb.org/3/movie/745?api_key=***")
print(request_API)
# Check Reponse from API
#print(request_API.json())
newData = (request_API.json())
# function to add to JSON
def write_json(data, fname):
with open(fname, "w") as f:
json.dump(data, f, indent = 4)
with open (fname) as json_file:
data = json.load(json_file)
temp = data[0]
#print(newData)
y = newData
temp.append(y)
write_json(data)
JSON I am trying to add data too
[
{
"adult": false,
"backdrop_path": "/e1cC9muSRtAHVtF5GJtKAfATYIT.jpg",
"belongs_to_collection": null,
"budget": 0,
"genres": [
{
"id": 10749,
"name": "Romance"
},
{
"id": 35,
"name": "Comedy"
}
],
"homepage": "",
"id": 1063242,
"imdb_id": "tt24640474",
"original_language": "fr",
"original_title": "Disconnect: The Wedding Planner",
"overview": "After falling victim to a scam, a desperate man races the clock as he attempts to plan a luxurious destination wedding for an important investor.",
"popularity": 34.201,
"poster_path": "/tGmCxGkVMOqig2TrbXAsE9dOVvX.jpg",
"production_companies": [],
"production_countries": [
{
"iso_3166_1": "KE",
"name": "Kenya"
},
{
"iso_3166_1": "NG",
"name": "Nigeria"
}
],
"release_date": "2023-01-13",
"revenue": 0,
"runtime": 107,
"spoken_languages": [
{
"english_name": "English",
"iso_639_1": "en",
"name": "English"
},
{
"english_name": "Afrikaans",
"iso_639_1": "af",
"name": "Afrikaans"
}
],
"status": "Released",
"tagline": "",
"title": "Disconnect: The Wedding Planner",
"video": false,
"vote_average": 5.8,
"vote_count": 3
}
]
Example of print(request_API.json())
{'adult': False, 'backdrop_path': '/paUKxrbN2ww0JeT2JtvgAuaGlPf.jpg', 'belongs_to_collection': None, 'budget': 40000000, 'genres': [{'id': 9648, 'name': 'Mystery'}, {'id': 53, 'name': 'Thriller'}, {'id': 18, 'name': 'Drama'}], 'homepage': '', 'id': 745, 'imdb_id': 'tt0167404', 'original_language': 'en', 'original_title': 'The Sixth Sense', 'overview': 'Following an unexpected tragedy, a child psychologist named Malcolm Crowe meets an nine year old boy named Cole Sear, who is hiding a dark secret.', 'popularity': 32.495, 'poster_path': '/4AfSDjjCy6T5LA1TMz0Lh2HlpRh.jpg', 'production_companies': [{'id': 158, 'logo_path': '/jSj8E9Q5D0Y59IVfYFeBnfYl1uB.png', 'name': 'Spyglass Entertainment', 'origin_country': 'US'}, {'id': 862, 'logo_path': '/udTjbqPmcTbfrihMuLtLcizDEM1.png', 'name': 'The Kennedy/Marshall Company', 'origin_country': 'US'}, {'id': 915, 'logo_path': '/4neXXpjSJDZPBGBnfWtqysB5htV.png', 'name': 'Hollywood Pictures', 'origin_country': 'US'}, {'id': 17032, 'logo_path': None, 'name': 'Barry Mendel Productions', 'origin_country': 'US'}], 'production_countries': [{'iso_3166_1': 'US', 'name': 'United States of America'}], 'release_date': '1999-08-06', 'revenue': 672806292, 'runtime': 107, 'spoken_languages': [{'english_name': 'Latin', 'iso_639_1': 'la', 'name': 'Latin'}, {'english_name': 'Spanish', 'iso_639_1': 'es', 'name': 'EspaƱol'}, {'english_name': 'English', 'iso_639_1': 'en', 'name': 'English'}], 'status': 'Released', 'tagline': 'Not every gift is a blessing.', 'title': 'The Sixth Sense', 'video': False, 'vote_average': 7.94, 'vote_count': 10125}
There are two problems with your code
Your json file contains an array of objects [{...}], so data is an array of objects and data[0] is an object. What would you expect someobject.append(someotherobject) to do? You probably want to do data.append(y)
You define your def write_json(data, fname): function to take two parameters. But when calling it like write_json(data) you are only passing one parameter.
The second error occured only after you have fixed the previous one. Because as long as the append was throwing an error, it didn't even reach the write_json so it had no chance to throw an error there ...

Mapping JSON key-value pairs from source to destination using Python

Using Python requests I want to grab a piece of JSON from one source and post it to a destination. The structure of the JSON received and the one required by the destination, however, differs a bit so my question is, how do I best map the items from the source structure onto the destination structure?
To illustrate, imagine we get a list of all purchases made by John and Mary. And now we want to post the individual items purchased linking these to the individuals who purchased them (NOTE: The actual use case involves thousands of entries so I am looking for an approach that would scale accordingly):
Source JSON:
{
'Total Results': 2,
'Results': [
{
'Name': 'John',
'Age': 25,
'Purchases': [
{
'Fruits': {
'Type': 'Apple',
'Quantity': 3,
'Color': 'Red'}
},
{
'Veggie': {
'Type': 'Salad',
'Quantity': 2,
'Color': 'Green'
}
}
]
},
{
'Name': 'Mary',
'Age': 20,
'Purchases': [
{
'Fruits': {
'Type': 'Orange',
'Quantity': 2,
'Color': 'Orange'
}
}
]
}
]
}
Destination JSON:
{
[
{
'Purchase': 'Apple',
'Purchased by': 'John',
'Quantity': 3,
'Type': 'Red',
},
{
'Purchase': 'Salad',
'Purchased by': 'John',
'Quantity': 2,
'Type': 'Green',
},
{
'Purchase': 'Orange',
'Purchased by': 'Mary',
'Quantity': 2,
'Type': 'Orange',
}
]
}
Any help on this would be greatly appreciated! Cheers!
Just consider loop through the dict.
res = []
for result in d['Results']:
value = {}
for purchase in result['Purchases']:
item = list(purchase.values())[0]
value['Purchase'] = item['Type']
value['Purchased by'] = result['Name']
value['Quantity'] = item['Quantity']
value['Type'] = item['Color']
res.append(value)
pprint(res)
[{'Purchase': 'Apple', 'Purchased by': 'John', 'Quantity': 3, 'Type': 'Red'},
{'Purchase': 'Salad', 'Purchased by': 'John', 'Quantity': 2, 'Type': 'Green'},
{'Purchase': 'Orange', 'Purchased by': 'Mary', 'Quantity': 2, 'Type': 'Orange'}]

python - if availability is there, then print it out, else don't print it out

So I am playing a bit with json and I have been stuck into a code where I print out
items['ids']
which gives me a value of:
[
{
'id': '11891',
'availability': 'IsNotThere',
},
{
'id': '11892',
'availability': 'IsThere',
},
{
'id': '11893',
'availability': 'IsThere',
},
{
'id': '11894',
'availability': 'IsNotThere',
},
{
'id': '11895',
'availability': 'IsNotThere',
},
{
'id': '11896',
'availability': 'IsNotThere',
},
{
'id': '11897',
'availability': 'IsNotThere',
},
{
'id': '11898',
'availability': 'IsNotThere',
},
{
'id': '11899',
'availability': 'IsNotThere',
},
{
'id': '11900',
'availability': 'IsNotThere',
}
]
And I have been trying to figure out if it should be a for-loop including this problem. However I didn't come anywhere and here I am. Now I have been stuck and wondering how I can print out the ID where the avaliability "IsThere" otherwise just skip it?
EDIT:
id_list = [i for i in products['skus'] if i.get("id")]
for i in id_list:
if i['availability'] == 'IsThere':
print(i)
If you just care about the id, and you can fetch by its name id. and you can do it like :
id_list = [i for i in sample if i.get("availability") == 'IsThere']
To print it out, just simple loop and print it.
for i in id_list:
print(i)
If you have a JSON object you can just loop it and check the fields using the fieldname as index:
for line in json:
if(line['availability'] == "IsThere"):
# if available -> print the id
print(line['id'))

Parsing data in a dict

I have a dict that I am trying to obtain certain data from, an example of this dict is as follows:
{
'totalGames': 1,
'dates': [{
'totalGames': 1,
'totalMatches': 0,
'matches': [],
'totalEvents': 0,
'totalItems': 1,
'games': [{
'status': {
'codedGameState': '7',
'abstractGameState': 'Final',
'startTimeTBD': False,
'detailedState': 'Final',
'statusCode': '7',
},
'season': '20172018',
'gameDate': '2018-05-20T19:00:00Z',
'venue': {'link': '/api/v1/venues/null',
'name': 'Bell MTS Place'},
'gameType': 'P',
'teams': {'home': {'leagueRecord': {'wins': 9,
'losses': 8, 'type': 'league'}, 'score': 1,
'team': {'link': '/api/v1/teams/52',
'id': 52, 'name': 'Winnipeg Jets'}},
'away': {'leagueRecord': {'wins': 12,
'losses': 3, 'type': 'league'}, 'score': 2,
'team': {'link': '/api/v1/teams/54',
'id': 54, 'name': 'Vegas Golden Knights'}}},
'content': {'link': '/api/v1/game/2017030325/content'},
'link': '/api/v1/game/2017030325/feed/live',
'gamePk': 2017030325,
}],
'date': '2018-05-20',
'events': [],
}],
'totalMatches': 0,
'copyright': 'NHL and the NHL Shield are registered trademarks of the National Hockey League. NHL and NHL team marks are the property of the NHL and its teams. \xa9 NHL 2018. All Rights Reserved.',
'totalEvents': 0,
'totalItems': 1,
'wait': 10,
}
I am interested obtaining the score for a certain team if they played that night, for example if my team of interest is the Vegas Golden Knights I would like to create a variable that contains their score (2 in this case). I am completely stuck on this so any help would be greatly appreciated!
This just turns into ugly parsing but is easily doable following the JSON structure; would recommend flattening the structure for your purposes. With that said, if you'd like to find the score of a particular team on a particular date, you could do this:
def find_score_by_team(gamedict, team_of_interest, date_of_interest):
for date in gamedict['dates']:
for game in date['games']:
if game['gameDate'].startswith(date_of_interest):
for advantage in game['teams']:
if game['teams'][advantage]['team']['name'] == team_of_interest:
return game['teams'][advantage]['score']
return -1
Example query:
>>> d = {'totalGames':1,'dates':[{'totalGames':1,'totalMatches':0,'matches':[],'totalEvents':0,'totalItems':1,'games':[{'status':{'codedGameState':'7','abstractGameState':'Final','startTimeTBD':False,'detailedState':'Final','statusCode':'7',},'season':'20172018','gameDate':'2018-05-20T19:00:00Z','venue':{'link':'/api/v1/venues/null','name':'BellMTSPlace'},'gameType':'P','teams':{'home':{'leagueRecord':{'wins':9,'losses':8,'type':'league'},'score':1,'team':{'link':'/api/v1/teams/52','id':52,'name':'WinnipegJets'}},'away':{'leagueRecord':{'wins':12,'losses':3,'type':'league'},'score':2,'team':{'link':'/api/v1/teams/54','id':54,'name':'VegasGoldenKnights'}}},'content':{'link':'/api/v1/game/2017030325/content'},'link':'/api/v1/game/2017030325/feed/live','gamePk':2017030325,}],'date':u'2018-05-20','events':[],}],'totalMatches':0,'copyright':'NHLandtheNHLShieldareregisteredtrademarksoftheNationalHockeyLeague.NHLandNHLteammarksarethepropertyoftheNHLanditsteams.\xa9NHL2018.AllRightsReserved.','totalEvents':0,'totalItems':1,'wait':10,}
>>> find_score_by_team(d, 'VegasGoldenKnights', '2018-05-20')
2
This returns -1 if the team didn't play that night, otherwise it returns the team's score.

Mongo Distinct Query with full row object

first of all i'm new to mongo so I don't know much and i cannot just remove duplicate rows due to some dependencies.
I have following data stored in mongo
{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 2, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'},
{'id': 5, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}
you can see some of the rows are duplicate with different id
as long as it will take to solve this issue from input I must tackle it on output.
I need the data in the following way:
{'id': 1, 'key': 'qscderftgbvqscderftgbvqscderftgbvqscderftgbvqscderftgbv', 'name': 'some name', 'country': 'US'},
{'id': 3, 'key': 'pehnvosjijipehnvosjijipehnvosjijipehnvosjijipehnvosjiji', 'name': 'some name', 'country': 'IN'},
{'id': 4, 'key': 'pfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnewpfvvjwovnew', 'name': 'some name', 'country': 'IN'}
My query
keys = db.collection.distinct('key', {})
all_data = db.collection.find({'key': {$in: keys}})
As you can see it takes two queries for a same result set Please combine it to one as the database is very large
I might also create a unique key on the key but the value is so long (152 characters) that it will not help me.
Or it will??
You need to use the aggregation framework for this. There are multiple ways to do this, the solution below uses the $$ROOT variable to get the first document for each group:
db.data.aggregate([{
"$sort": {
"_id": 1
}
}, {
"$group": {
"_id": "$key",
"first": {
"$first": "$$ROOT"
}
}
}, {
"$project": {
"_id": 0,
"id":"$first.id",
"key":"$first.key",
"name":"$first.name",
"country":"$first.country"
}
}])

Categories