I am trying to use range() to fill in values in a list of dictionaries from a custom range.
I have this code:
import requests
import json
import time
test = []
for x in range(5000,5020):
page_url = f'https://api.jikan.moe/v4/anime/{x}/full'
response = requests.get(page_url)
json_data = json.loads(response.text)
test.append(json_data)
time.sleep(1)
anime_data = []
for dic in test:
anime = {
'call_id': range(5000,5020),
'title': dic.get('data',{}).get('title','title not found'),
'mal_id': dic.get('data',{}).get('mal_id', 'id not found'),
'url': dic.get('data',{}).get('url', 'url not found')
}
anime_data.append(anime)
The goal is to use numbers from 5000 to 5020 in sequence for the 'call_id' key of each dict, so that the output would look like:
[{'call_id': 5000,
'title': 'title not found',
'mal_id': 'id not found',
'url': 'url not found'},
{'call_id': 5001,
'title': 'title not found',
'mal_id': 'id not found',
'url': 'url not found'},
{'call_id': 5002,
'title': 'Bari Bari Densetsu',
'mal_id': 5002,
'url': 'https://myanimelist.net/anime/5002/Bari_Bari_Densetsu'}]
The code did not work as intended. How can I get the desired result?
Another approach to the problem: fundamentally, we would like to iterate over two lists in parallel - the raw API responses, and the numbers (from the range) that we want to use in the anime entries. So, the naive response is to use zip, thus:
for call_id, dic in zip(range(5000, 5020), test):
anime = {
'call_id': call_id,
'title': dic.get('data',{}).get('title','title not found'),
'mal_id': dic.get('data',{}).get('mal_id', 'id not found'),
'url': dic.get('data',{}).get('url', 'url not found')
}
anime_data.append(anime)
However, this overlooks a more specific, built-in tool: the built-in enumerate function. We just have to set the start point appropriately; we don't need to worry about how many elements there are - it will just keep incrementing until we run out.
That looks like:
for call_id, dic in enumerate(test, 5000):
anime = {
'call_id': call_id,
'title': dic.get('data',{}).get('title','title not found'),
'mal_id': dic.get('data',{}).get('mal_id', 'id not found'),
'url': dic.get('data',{}).get('url', 'url not found')
}
anime_data.append(anime)
Since there is already a loop to produce all the same 'call_id' values that are in range(5000,5020) - in order to make the API calls in the first place - a simple approach is to just create the final data directly in the first loop, instead of storing json_data results and trying to process them in a later loop. That looks like:
anime_data = []
for x in range(5000,5020):
page_url = f'https://api.jikan.moe/v4/anime/{x}/full'
response = requests.get(page_url)
json_data = json.loads(response.text)
anime = {
'call_id': x,
'title': json_data.get('data',{}).get('title','title not found'),
'mal_id': json_data.get('data',{}).get('mal_id', 'id not found'),
'url': json_data.get('data',{}).get('url', 'url not found')
}
anime_data.append(anime)
time.sleep(1)
We can organize the logic better by using functions to split up the tasks performed each time through the loop, and by pre-computing the .get('data',{}) result:
def query_api(anime_id):
page_url = f'https://api.jikan.moe/v4/anime/{anime_id}/full'
response = requests.get(page_url)
return json.loads(response.text).get('data',{})
def make_anime_data(anime_id, raw_data):
return {
'call_id': anime_id,
'title': raw_data.get('title','title not found'),
'mal_id': raw_data.get('mal_id', 'id not found'),
'url': raw_data.get('url', 'url not found')
}
anime_data = []
for x in range(5000,5020):
raw_data = query_api(x)
anime_data.append(make_anime_data(x, raw_data))
time.sleep(1)
Related
So, I had this code and it worked perfectly fine:
def upload_to_imgur(url):
j1 = requests.post(
"https://api.imgur.com/3/upload.json",
headers=headers,
data={
'key': api_key,
'image': b64encode(requests.get(url).content),
'type': 'base64',
'name': '1.jpg',
'title': 'Picture no. 1'
}
)
data = json.loads(j1.text)['data']
return data['link']
Now, I wanted to make all this async, but that doesn't really seem to work. Here's my current code:
async def async_upload_to_imgur(url):
image = await get_as_base64(url)
j1 = await aiohttp.ClientSession().post(
"https://api.imgur.com/3/upload.json",
headers=headers,
data={
'key': api_key,
'image': image,
'type': 'base64',
'name': '1.jpg',
'title': 'Picture no. 1'
}
)
data = await j1.json()
return data['link']
The get_as_base64-function is not the issue.
Header and Api Key are the same
This is the output when I print "data" in the second example.
{'data': {'error': {'code': 1001, 'message': 'File was not uploaded correctly', 'type': 'Exception_Logged', 'exception': {}}, 'request': '/3/upload.json', 'method': 'POST'}, 'success': False, 'status': 500}
It has to be the get_as_base64, for 2 reasons -:
1.) It says the file was not uploaded correctly meaning it doesn't support the URL or something like that
2.) I think you need to link a file not a link containing the file in it.
I am trying to use the PokeAPI to extract all pokemon names for a personal project to help build API comfort. I have been having issues with the Params specifically. Can someone please provide support or resources to simplify data grabbing with JSON. Here is the code I have written so far, which returns the entire data set.
import json
from unicodedata import name
import requests
from pprint import PrettyPrinter
pp = PrettyPrinter()
url = ("https://pokeapi.co/api/v2/ability/1/")
params = {
name : "garbodor"
}
def main():
r= requests.get(url)
status = r.status_code
if status != 200:
quit()
else:
get_pokedex(status)
def get_pokedex(x):
print("status code: ", + x) # redundant check for status code before the program begins.
response = requests.get(url, params = params).json()
pp.pprint(response)
main()
Website link: https://pokeapi.co/docs/v2#pokemon-section working specifically with the pokemon group.
I have no idea what values you want but response is a dictionary with lists and you can use keys and indexes (with for-loops) to select elements from response - ie. response["names"][0]["name"]
Minimal working example
Name or ID has to be added at the end of URL.
import requests
import pprint as pp
name_or_id = "stench" # name
#name_or_id = 1 # id
url = "https://pokeapi.co/api/v2/ability/{}/".format(name_or_id)
response = requests.get(url)
if response.status_code != 200:
print(response.text)
else:
data = response.json()
#pp.pprint(data)
print('\n--- data.keys() ---\n')
print(data.keys())
print('\n--- data["name"] ---\n')
print(data['name'])
print('\n--- data["names"] ---\n')
pp.pprint(data["names"])
print('\n--- data["names"][0]["name"] ---\n')
print(data['names'][0]['name'])
print('\n--- language : name ---\n')
names = []
for item in data["names"]:
print(item['language']['name'],":", item["name"])
names.append( item["name"] )
print('\n--- after for-loop ---\n')
print(names)
Result:
--- data.keys() ---
dict_keys(['effect_changes', 'effect_entries', 'flavor_text_entries', 'generation', 'id', 'is_main_series', 'name', 'names', 'pokemon'])
--- data["name"] ---
stench
--- data["names"] ---
[{'language': {'name': 'ja-Hrkt',
'url': 'https://pokeapi.co/api/v2/language/1/'},
'name': 'あくしゅう'},
{'language': {'name': 'ko', 'url': 'https://pokeapi.co/api/v2/language/3/'},
'name': '악취'},
{'language': {'name': 'zh-Hant',
'url': 'https://pokeapi.co/api/v2/language/4/'},
'name': '惡臭'},
{'language': {'name': 'fr', 'url': 'https://pokeapi.co/api/v2/language/5/'},
'name': 'Puanteur'},
{'language': {'name': 'de', 'url': 'https://pokeapi.co/api/v2/language/6/'},
'name': 'Duftnote'},
{'language': {'name': 'es', 'url': 'https://pokeapi.co/api/v2/language/7/'},
'name': 'Hedor'},
{'language': {'name': 'it', 'url': 'https://pokeapi.co/api/v2/language/8/'},
'name': 'Tanfo'},
{'language': {'name': 'en', 'url': 'https://pokeapi.co/api/v2/language/9/'},
'name': 'Stench'},
{'language': {'name': 'ja', 'url': 'https://pokeapi.co/api/v2/language/11/'},
'name': 'あくしゅう'},
{'language': {'name': 'zh-Hans',
'url': 'https://pokeapi.co/api/v2/language/12/'},
'name': '恶臭'}]
--- data["names"][0]["name"] ---
あくしゅう
--- language : name ---
ja-Hrkt : あくしゅう
ko : 악취
zh-Hant : 惡臭
fr : Puanteur
de : Duftnote
es : Hedor
it : Tanfo
en : Stench
ja : あくしゅう
zh-Hans : 恶臭
--- after for-loop ---
['あくしゅう', '악취', '惡臭', 'Puanteur', 'Duftnote', 'Hedor', 'Tanfo', 'Stench', 'あくしゅう', '恶臭']
EDIT:
Another example with other URL and with parameters limit and offset.
I use for-loop to run with different offset (0, 100, 200, etc.)
import requests
import pprint as pp
url = "https://pokeapi.co/api/v2/pokemon/"
params = {'limit': 100}
for offset in range(0, 1000, 100):
params['offset'] = offset # add new value to dict with `limit`
response = requests.get(url, params=params)
if response.status_code != 200:
print(response.text)
else:
data = response.json()
#pp.pprint(data)
for item in data['results']:
print(item['name'])
Result (first 100 items):
bulbasaur
ivysaur
venusaur
charmander
charmeleon
charizard
squirtle
wartortle
blastoise
caterpie
metapod
butterfree
weedle
kakuna
beedrill
pidgey
pidgeotto
pidgeot
rattata
raticate
spearow
fearow
ekans
arbok
pikachu
raichu
sandshrew
sandslash
nidoran-f
nidorina
nidoqueen
nidoran-m
nidorino
nidoking
clefairy
clefable
vulpix
ninetales
jigglypuff
wigglytuff
zubat
golbat
oddish
gloom
vileplume
paras
parasect
venonat
venomoth
diglett
dugtrio
meowth
persian
psyduck
golduck
mankey
primeape
growlithe
arcanine
poliwag
poliwhirl
poliwrath
abra
kadabra
alakazam
machop
machoke
machamp
bellsprout
weepinbell
victreebel
tentacool
tentacruel
geodude
graveler
golem
ponyta
rapidash
slowpoke
slowbro
magnemite
magneton
farfetchd
doduo
dodrio
seel
dewgong
grimer
muk
shellder
cloyster
gastly
haunter
gengar
onix
drowzee
hypno
krabby
kingler
voltorb
In python3 I need to get a JSON response from an API call,
and parse it so I will get a dictionary That only contains the data I need.
The final dictionary I ecxpt to get is as follows:
{'Severity Rules': ('cc55c459-eb1a-11e8-9db4-0669bdfa776e', ['cc637182-eb1a-11e8-9db4-0669bdfa776e']), 'auto_collector': ('57e9a4ec-21f7-4e0e-88da-f0f1fda4c9d1', ['0ab2470a-451e-11eb-8856-06364196e782'])}
the JSON response returns the following output:
{
'RuleGroups': [{
'Id': 'cc55c459-eb1a-11e8-9db4-0669bdfa776e',
'Name': 'Severity Rules',
'Order': 1,
'Enabled': True,
'Rules': [{
'Id': 'cc637182-eb1a-11e8-9db4-0669bdfa776e',
'Name': 'Severity Rule',
'Description': 'Look for default severity text',
'Enabled': False,
'RuleMatchers': None,
'Rule': '\\b(?P<severity>DEBUG|TRACE|INFO|WARN|ERROR|FATAL|EXCEPTION|[I|i]nfo|[W|w]arn|[E|e]rror|[E|e]xception)\\b',
'SourceField': 'text',
'DestinationField': 'text',
'ReplaceNewVal': '',
'Type': 'extract',
'Order': 21520,
'KeepBlockedLogs': False
}],
'Type': 'user'
}, {
'Id': '4f6fa7c6-d60f-49cd-8c3d-02dcdff6e54c',
'Name': 'auto_collector',
'Order': 4,
'Enabled': True,
'Rules': [{
'Id': '2d6bdc1d-4064-11eb-8856-06364196e782',
'Name': 'auto_collector',
'Description': 'DO NOT CHANGE!! Created via API coralogix-blocker tool',
'Enabled': False,
'RuleMatchers': None,
'Rule': 'AUTODISABLED',
'SourceField': 'subsystemName',
'DestinationField': 'subsystemName',
'ReplaceNewVal': '',
'Type': 'block',
'Order': 1,
'KeepBlockedLogs': False
}],
'Type': 'user'
}]
}
I was able to create a dictionary that contains the name and the RuleGroupsID, like that:
response = requests.get(url,headers=headers)
output = response.json()
outputlist=(output["RuleGroups"])
groupRuleName = [li['Name'] for li in outputlist]
groupRuleID = [li['Id'] for li in outputlist]
# Create a dictionary of NAME + ID
ruleDic = {}
for key in groupRuleName:
for value in groupRuleID:
ruleDic[key] = value
groupRuleID.remove(value)
break
Which gave me a simple dictionary:
{'Severity Rules': 'cc55c459-eb1a-11e8-9db4-0669bdfa776e', 'Rewrites': 'ddbaa27e-1747-11e9-9db4-0669bdfa776e', 'Extract': '0cb937b6-2354-d23a-5806-4559b1f1e540', 'auto_collector': '4f6fa7c6-d60f-49cd-8c3d-02dcdff6e54c'}
but when I tried to parse it as nested JSON things just didn't work.
In the end, I managed to create a function that returns this dictionary,
I'm doing it by breaking the JSON into 3 lists by the needed elements (which are Name, Id, and Rules from the first nest), and then create another list from the nested JSON ( which listed everything under Rule) which only create a list from the keyword "Id".
Finally creating a dictionary using a zip command on the lists and dictionaries created earlier.
def get_filtered_rules() -> List[dict]:
groupRuleName = [li['Name'] for li in outputlist]
groupRuleID = [li['Id'] for li in outputlist]
ruleIDList = [li['Rules'] for li in outputlist]
ruleIDListClean = []
ruleClean = []
for sublist in ruleIDList:
try:
lstRule = [item['Rule'] for item in sublist]
ruleClean.append(lstRule)
ruleContent=list(zip(groupRuleName, ruleClean))
ruleContentDictionary = dict(ruleContent)
lstID = [item['Id'] for item in sublist]
ruleIDListClean.append(lstID)
# Create a dictionary of NAME + ID + RuleID
ruleDic = dict(zip(groupRuleName, zip(groupRuleID, ruleIDListClean)))
except Exception as e: print(e)
return ruleDic
The data I am using is Twitter API's twitter trending topics.
url_0 = 'https://api.twitter.com/1.1/trends/place.json?id=2459115'
res = requests.get(url_0, auth=auth)
print(res, res.status_code, res.headers['content-type'])
print(res.url)
top_trends_twitter = res.json()
data= top_trends_twitter[0]
This is how data looks like:
[{'as_of': '2017-02-13T21:59:32Z',
'created_at': '2017-02-13T21:53:22Z',
'locations': [{'name': 'New York', 'woeid': 2459115}],
'trends': [{'name': 'Victor Cruz',
'promoted_content': None,
'query': '%22Victor+Cruz%22',
'tweet_volume': 45690,
'url': 'http://twitter.com/search?q=%22Victor+Cruz%22'},
{'name': '#percussion',
'promoted_content': None,
'query': '%23percussion',
'tweet_volume': None,
'url': 'http://twitter.com/search?q=%23percussion'}, .....etc
Now, after I connect the server with SQL, and create database and table, an error appears. This is the part that is causing me trouble:
for entry in data:
trendname = entry['trends']['name']
url = entry['trends']['url']
num_tweets = entry['trends']['trend_volume']
date= entry['as_of']
print("Inserting trend", trendname, "at", url)
query_parameters = (trendname, url, num_tweets, date)
cursor.execute(query_template, query_parameters)
con.commit()
cursor.close()
Then, I get this error:
TypeError Traceback (most recent call last)
<ipython-input-112-da3e17aadce0> in <module>()
29
30 for entry in data:
---> 31 trendname = entry['trends']['name']
32 url = entry['trends']['url']
33 num_tweets = entry['trends']['trend_volume']
TypeError: string indices must be integers
How can I get the set of strings into dictionary, so that I can use that for entry data code?
You Need entry['trends'][0]['name']. entry['trends'] is a list and you need integer index to access items of list.
Try like so:
data=[{'as_of': '2017-02-13T21:59:32Z',
'created_at': '2017-02-13T21:53:22Z',
'locations': [{'name': 'New York', 'woeid': 2459115}],
'trends': [{'name': 'Victor Cruz',
'promoted_content': None,
'query': '%22Victor+Cruz%22',
'tweet_volume': 45690,
'url': 'http://twitter.com/search?q=%22Victor+Cruz%22'},
{'name': '#percussion',
'promoted_content': None,
'query': '%23percussion',
'tweet_volume': None,
'url': 'http://twitter.com/search?q=%23percussion'}]}]
for entry in data:
date= entry['as_of']
for trend in entry['trends']:
trendname = trend['name']
url = trend['url']
num_tweets = trend['tweet_volume']
print trendname, url, num_tweets, date
Output:
Victor Cruz http://twitter.com/search?q=%22Victor+Cruz%22 45690 2017-02-13T21:59:32Z
#percussion http://twitter.com/search?q=%23percussion None 2017-02-13T21:59:32Z
I have a list of dicts that stores URLs. It has simply two fields, title and url. Example:
[
{'title': 'Index Page', 'url': 'http://www.example.com/something/index.htm'},
{'title': 'Other Page', 'url': 'http://www.example.com/something/other.htm'},
{'title': 'About Page', 'url': 'http://www.example.com/thatthing/about.htm'},
{'title': 'Detail Page', 'url': 'http://www.example.com/something/thisthing/detail.htm'},
]
However, I'd to get a tree structure from this list of dicts. I'm looking for something like this:
{ 'www.example.com':
[
{ 'something':
[
{ 'thisthing':
[
{ 'title': 'Detail Page', 'url': 'detail.htm'}
]
},
[
{ 'title': 'Index Page', 'url': 'index.htm'},
{ 'title': 'Other Page', 'url': 'other.htm'}
]
]
},
{ 'thatthing':
[
{ 'title': 'About Page', 'url': 'about.htm'}
]
}
]
}
My first attempt at this would be a urlparse soup in a bunch of for loops and I'm confident that there's a better and faster way to do this.
I've seen people on SO work magic with list comprehensions, lambda functions, etc. I'm still in the process of figuring it out.
(For Django developers: I'll be using this my Django application. I'm storing the URLs in a model called Page which has two fields name and title)
Third time is the charm... that is some nice structure you have there :). In your comment you mention that you "haven't been able to think of a better tree format to represent data like this"... this made me again take the liberty to (just slightly) alter the formatting of the output. In order to dynamically add subelements, a dictionary has to be created to house them. But for "leaf nodes", this dictionary is never filled in. If desired these can of course be removed by another loop, but it cannot happen during the iteration because the empty dict should be present for possible new nodes. The some goes for nodes that do not have a file in the: these will contain an empty list.
ll = [
{'title': 'Index Page', 'url': 'http://www.example.com/something/index.htm'},
{'title': 'Other Page', 'url': 'http://www.example.com/something/other.htm'},
{'title': 'About Page', 'url': 'http://www.example.com/thatthing/about.htm'},
{'title': 'Detail Page', 'url': 'http://www.example.com/something/thisthing/detail.htm'},
]
# First build a list of all url segments: final item is the title/url dict
paths = []
for item in ll:
split = item['url'].split('/')
paths.append(split[2:-1])
paths[-1].append({'title': item['title'], 'url': split[-1]})
# Loop over these paths, building the format as we go along
root = {}
for path in paths:
branch = root.setdefault(path[0], [{}, []])
for step in path[1:-1]:
branch = branch[0].setdefault(step, [{}, []])
branch[1].append(path[-1])
# As for the cleanup: because of the alternating lists and
# dicts it is a bit more complex, but the following works:
def walker(coll):
if isinstance(coll, list):
for item in coll:
yield item
if isinstance(coll, dict):
for item in coll.itervalues():
yield item
def deleter(coll):
for data in walker(coll):
if data == [] or data == {}:
coll.remove(data)
deleter(data)
deleter(root)
import pprint
pprint.pprint(root)
Output:
{'www.example.com':
[
{'something':
[
{'thisthing':
[
[
{'title': 'Detail Page', 'url': 'detail.htm'}
]
]
},
[
{'title': 'Index Page', 'url': 'index.htm'},
{'title': 'Other Page', 'url': 'other.htm'}
]
],
'thatthing':
[
[
{'title': 'About Page', 'url': 'about.htm'}
]
]
},
]
}
Here's my solution. It seems to work. A very different approach from Jro's:
import itertools
import pprint
pages = [
{'title': 'Index Page', 'url': 'http://www.example.com/something/index.htm'},
{'title': 'Other Page', 'url': 'http://www.example.com/something/other.htm'},
{'title': 'About Page', 'url': 'http://www.example.com/thatthing/about.htm'},
{'title': 'dtgtet Page', 'url': 'http://www.example.com/thatthing/'},
{'title': 'Detail Page', 'url': 'http://www.example.com/something/thisthing/detail.htm'},
{'title': 'Detail Page', 'url': 'http://www.example.com/something/thisthing/thisthing/detail.htm'},
]
def group_urls(url_set, depth=0):
"""
Fetches the actions for a particular domain
"""
url_set = sorted(url_set, key=lambda x: x['url'][depth])
tree = []
leaves = filter(lambda x: len(x['url']) - 1 == depth, url_set)
for cluster, group in itertools.groupby(leaves, lambda x: x['url'][depth]):
branch = list(group)
tree.append({cluster: branch})
twigs = filter(lambda x: len(x['url']) - 1 > depth, url_set)
for cluster, group in itertools.groupby(twigs, lambda x: x['url'][depth]):
branch = group_urls(list(group), depth+1)
tree.append({cluster: branch})
return tree
if __name__ == '__main__':
for page in pages:
page['url'] = page['url'].strip('http://').split('/')
pprint.pprint(group_urls(pages))
I can't seem to figure out why I need to sort at the beginning of every recursion. I bet if i worked that around, I could shave a off another couple of a seconds.