How to speed up dictionary search in python? - python

Basically I have a function that returns an API response with a huge amount of dictionaries, simplified to their keys, I then have another function, called getPlayerData which sends an api call to the same api to get information about the specific player, instead of all of them, the problem is that alone, getPlayerData is fast, but in this scenario, it is way more than unusable.
Is there a way i can speed up this? getPlayerData is not required, I can just make a request too.
The dictionary search
residents = []
for resident in getListData("resident"):
if getPlayerData(resident)["town"] == town:
residents.append(resident)
print(residents)
getPlayerData()
def getPlayerData(player):
r = requests.get("http://srv.earthpol.com/api/json/residents.php?name=" + player)
j = r.json()
player = player.lower()
global emptyresult
emptyresult = False
if str(j) == "{}":
emptyresult = True
else:
result = {"town": j[player]["town"],
"town-rank": j[player]["townRank"],
"nation-ranks": j[player]["nationRanks"],
"lastOnline:": j[player]["lastOnline"],
"registered": j[player]["registered"],
"town-title": j[player]["title"],
"nation-title": j[player]["surname"],
"friends": j[player]["friends"],
"uuid": j[player]["uuid"],
"avatar": "https://crafatar.com/avatars/"+ j[player]["uuid"]}
return result

Related

Access one item from a dict and store it into a variable

I am trying to get all the "uuid"'s from an API, and the issue is that it is stored into a dict (I think). Her is how it looks on the API:
{"guild": {
"_id": "5eba1c5f8ea8c960a61f38ed",
"name": "Creators Club",
"name_lower": "creators club",
"coins": 0,
"coinsEver": 0,
"created": 1589255263630,
"members":
[{ "uuid": "db03ceff87ad4909bababc0e2622aaf8",
"rank": "Guild Master",
"joined": 1589255263630,
"expHistory": {
"2020-06-01": 280,
"2020-05-31": 4701,
"2020-05-30": 0,
"2020-05-29": 518,
"2020-05-28": 1055,
"2020-05-27": 136665,
"2020-05-26": 34806}}]
}
}
Now I am interested in the "uuid" part there, and take note: There is multiple players, it can be 1 to 100 players, and I am going to need every UUID.
Now I have done this in my python to get the UUID's displayed on the website:
try:
f = requests.get(
"https://api.hypixel.net/guild?key=[secret]&id=" + guild).json()
guildName = f["guild"]["name"]
guildMembers = f["guild"]["members"]
members = client.getPlayer(uuid=guildMembers) #this converts UUID to player names
#I need to store all uuid's in variables and put them at "guildMembers"
And that gives me all the "UUID codes", and I will be using client.getPlayer(uuid=---) to convert the UUID into the Player Names. I have to loop through each "UUID" into that code client.getPlayer(uuid=---) . But first of I need to save the UUID'S in variables, I have been doing members.uuid to access the UUID on my HTML file, but I don't know how you do the .uuid part in python
If you need anything else, just comment :)
List comprehension is a powerful concept:
members = [client.getPlayer(member['uuid']) for member in guildMembers]
Edit:
If you want to insert the names back into your data (in guildMembers),
use a dictionary comprehension with {uuid: member_name,} format:
members = {member['uuid']: client.getPlayer(uuid=member['uuid']) for member in guildMembers}
Than you can update guildMembers with your results:
for member in guildMembers:
guildMembers[member]['name'] = members[member['uuid']]
Assuming that guild is the main dictionary in which a key called members exists with a list of "sub dictionaries", you can try
uuid = list()
for x in guild['members']:
uuid.append(x['uuid'])
uuid now has all the uuids
If i understood situation right, You just need to loop through all received uuids and get players' data. Something like this:
f = requests.get("https://api.hypixel.net/guild?key=[secret]&id=" + guild).json()
guildName = f["guild"]["name"]
guildMembers = f["guild"]["members"]
guildMembersData = dict() # Here we will save member's data from getPlayer method
for guildMember in guildMembers:
uuid = guildMember["uuid"]
memberData = client.getPlayer(uuid=uuid)
guildMembersData[uuid] = client.getPlayer(uuid=guildMember["uuid"])
print(guildMembersData) # Here will be players' Data.

While loop to make API calls until a condition is met

I want to make API calls until a condition is met. I figured I might use a while loop.
I have a JSON response from the server that is paginated.
{
"services": [
{
"id": "ABC12",
"name": "Networks",
"description": null,
"status": "active",
"teams": [
{
"id": "XYZ12",
"type": "team_reference",
"summary": "Network Systems ",
}
],
"acknowledgement_timeout": null,
"auto_resolve_timeout": null,
"alert_grouping": "intelligent",
"alert_grouping_timeout": null,
"integrations": [],
"response_play": null,
"type": "service",
"summary": "All Events",
}
],
"limit": 25,
"offset": 0,
"total": null,
"more": true
}
limit - max I can set is 100.
offset - If specified, shows results from that point.
more - If TRUE, there are more results. If FALSE, that is the end.
for more info on this pagination - https://v2.developer.pagerduty.com/docs/pagination
I need to match the name "Networks" and get its corresponding id "ABC12". The problem is, I have to paginate make multiple calls to the API.
I have written this so far.
import requests
import json
import urllib3
# Supress SSL warnings
urllib3.disable_warnings()
# API key
API_KEY = '12345asdfg'
def list_services():
x = 25
y = 0
results = []
url = f'https://api.pagerduty.com/services/?limit={x}&offset={y}'
headers = {
'Accept': 'application/vnd.pagerduty+json;version=2',
'Authorization': 'Token token={token}'.format(token=API_KEY)
}
current_page = json.loads(requests.get(url, verify=False, headers=headers).content.decode('UTF-8'))
results.append(current_page)
while current_page['more'] == 'True':
y = y + 1
current_page = json.loads(requests.get(url, verify=False, headers=headers).content.decode('UTF-8'))
results.append(current_page)
print(results) # Does not print anything
print(results) # Prints only the first call results, the while loop
# doesn't seem to work.
if __name__ == '__main__':
list_services()
the print(results) outside the while loop prints only the first API call results. The while loop doesn't seem to work. But the code compiles without any errors.
how do I set the value of x to 25 and make API calls and append the results to results until more is false?
OR
how do I make multiple API calls until I find the match. If I found a match, then stop making the call.
Or is there a better cleaner way to do this?
This does not work because you never actually reassign the url variable once y is changed. Also you are checking against 'True' which is a string, not a boolean value. In addition I believe the offset should increase by the amount of results everytime; not just one. For example if on your first call you get results 1-25. Then if you increase y by one, the second call will yield 2-26. Instead you should increase it by the limit. This way on the second call you get results 25-50. Here is how I would do this:
def list_services():
x = 25
y = 0
results = []
serv_id = None
flag = False
url = f'https://api.pagerduty.com/services/?limit={x}&offset={y}'
headers = {
'Accept': 'application/vnd.pagerduty+json;version=2',
'Authorization': 'Token token={token}'.format(token=API_KEY)
}
current_page = json.loads(requests.get(url, verify=False, headers=headers).content.decode('UTF-8'))
results.append(current_page)
for serv_set in current_page['services']:
if serv_set['name'] == 'Networks':
serv_id = serv_set['id']
flag = True
while current_page['more'] == True and not flag:
for serv_set in current_page['services']:
if serv_set['name'] == 'Networks':
serv_id = serv_set['id']
break
y += x
url = f'https://api.pagerduty.com/services/?limit={x}&offset={y}'
current_page = json.loads(requests.get(url, verify=False, headers=headers).content.decode('UTF-8'))
results.append(current_page)
print(results)
print(results, serv_id)
You could further clean this up to avoid some redundancy but this should work. You should also check the status of the API call to ensure that you have a valid response.
Edit:
I edited in the issue dealing with obtaining the id attribute when the name == 'Networks'. Once again you could reduce the redundancy in this a lot but this will get you on the right track. Now serv_id = the id of the service with the name of Networks. If no match is found at the end of the iterations then serv_id will be None.
while current_page['more'] == 'True':
You are checking for a string called 'True' instead of a boolean of True, as is defined in your json file. This could be why your while loop is never executing, and you are not receiving your print statement.
Also, generally for API calls that have more than 1 page of data, you need to specify which page you are getting. Which means you need to reinitialize your payload in your while loop.
For example, if an API has a parameter called "page" that you can pass in, in your while loop you would have to pass in page = 1, page = 2, etc. as a payload.

Elasticsearch for python - Calls not blocking correctly

I'm trying to write unittests for my own Elasticsearch client. It uses the client from elasticsearch-py.
Most of my tests are fine, but when running a test on my own search() function (which uses the search() function from Elasticsearch client) I get very random behaviour. This is the way my test is implemented:
def setUp(self) -> None:
self.es = ESClient(host="localhost")
self.es_acc = ESClient()
self.connection_res = (False, {})
self.t = self.es_acc.get_connection_status(self._callback)
self.t.join()
# Create test index and index some documents
self.es.create_index(self.TEST_INDEX)
names = ["Gregor", "Alice", "Per Svensson", "Mats Hermelin", "Mamma Mia"
, "Eva Dahlgren", "Per Morberg", "Maja Larsson", "Ola Salo", "Magrecievic Holagrostokovic"]
self.num_docs = len(names)
self.payload = []
random.seed(123)
for i, name in enumerate(names):
n = name.split(" ")
fname = n[0]
lname = n[1] if len(n) > 1 else n[0]
self.payload.append({"name": {"first": fname, "last": lname}, "age": random.randint(-100, 100),
"timestamp": datetime.utcnow() - timedelta(days=1 * i)})
self.es.upload(self.TEST_INDEX, self.payload, ids=list(range(len(names))))
def test_search(self):
# Test getting docs based on ids
ids = ["1", "4", "9"]
status, hits = self.es.search(self.TEST_INDEX, ids=ids) # Breakpoint
docs = hits["hits"]["hits"]
self.assertTrue(status, "Status not correct for search!")
returned_ids = [d["_id"] for d in docs]
names = [d["_source"]["name"] for d in docs]
self.assertListEqual(sorted(returned_ids), ids, "Returned ids from search not correct!")
self.assertListEqual(names, [self.payload[i]["name"] for i in [1, 4, 9]], "Returned source from search not correct!")
In setUp() I'm just uploading a few documents to test on, so there should always be 10 documents to test on. Below is an excerpt from my search() function.
if ids:
try:
q = Query().ids(ids).compile_and_get()
res = self.es.search(index=index, body=q)
print(res)
return True, res
except exceptions.ElasticsearchException as e:
self._handle_elastic_exceptions("search", e, index=index)
return False, {}
I've implemented Query. Anyway, when I just run the test, I ALMOST always get 0 hits. But if I debug the application, with a breakpoint in test_search() on the row where I make the call to search() and step, everything works fine. If I put it just one line below, I get 0 hits again. What is going on? Why is it not blocking correctly?
It seems like I found my solution!
I did not understand that setUp was called on every test method. This was actually not the problem however.
The problem is that for some tests, uploading documents simply took to much time (which was done in setUp) and so when the test started, the documents did not exist yet! Solution: add sleep(1) to the end of setUp.

Not able to get Country of a Tweet - Twython API

I am using the following code to collect Tweets pertaining to a certain topic but in all the tweets that I have extracted the 'places' attribute is None. Am I doing something wrong? Also, the code is meant to extract existing tweets and I do not need streaming api solution and not looking for this solution of streaming API : https://www.quora.com/How-can-I-get-a-stream-of-tweets-from-a-particular-country-using-Twitter-API
api = Twython(consumer_key, consumer_secret, access_key, access_secret)
tweets = []
MAX_ATTEMPTS = 200
COUNT_OF_TWEETS_TO_BE_FETCHED = 10000
in_max_id = sys.argv[1]
next_max_id = ''
for i in range(0,MAX_ATTEMPTS):
if(COUNT_OF_TWEETS_TO_BE_FETCHED < len(tweets)):
break # we got 500 tweets... !!
#----------------------------------------------------------------#
# STEP 1: Query Twitter
# STEP 2: Save the returned tweets
# STEP 3: Get the next max_id
#----------------------------------------------------------------#
# STEP 1: Query Twitter
if(0 == i):
# Query twitter for data.
results = api.search(q="#something",count='100',lang='en',max_id=in_max_id,include_entities='true',geo= True)
else:
# After the first call we should have max_id from result of previous call. Pass it in query.
results = api.search(q="#something",include_entities='true',max_id=next_max_id,lang='en',geo= True)
# STEP 2: Save the returned tweets
for result in results['statuses']:
temp = ""
tweet_text = result['text']
temp += tweet_text.encode('utf-8') + " "
hashtags = result['entities']['hashtags']
for i in hashtags:
temp += i['text'].encode('utf-8') + " "
print result
#temp += i["place"]["country"] + "\n"
#output_file.write(temp)
# STEP 3: Get the next max_id
try:
# Parse the data returned to get max_id to be passed in consequent call.
next_results_url_params = results['search_metadata']['next_results']
next_max_id = next_results_url_params.split('max_id=')[1].split('&')[0]
except:
# No more next pages
break
The short answer is, No, you are doing nothing wrong. The reason why all place tags are empty is because statistically they are very unlikely to contain data. Only about 1% of all tweets have data in their place tag. This is because users rarely tweet their location. Location is off by default.
Download 100 or more tweets and you probably will find place tag data.
If place field is a MUST for all the tweet that you app will process, then you can limit your search over a place to make sure all the result will definitely have it.
You can doing so by setting geocode (latitude,longitude,radius[km/mi]) parameter, to limit your search within an area.
An example such request via Twython is:
geocode = '25.032341,55.385557,100mi'
api.search(q="#something",count='100',lang='en',include_entities='true',geocode=geocode)
Not all tweets have all fields like tweet_text, place, country, language etc.,
So, to avoid KeyError use the following approach. Modify your code so that when the key that you're looking for is not found, a default value is returned.
result.get('place', {}).get('country', {}) if result.get('place') != None else None
Here, the above line means "search for the key country after fetching the key place if it exists, otherwise return None "
kmario is right. Most tweets don't have this information, but a small percent do. Doing a location search will increase this chance e.g. https://api.twitter.com/1.1/search/tweets.json?q=place%3Acba60fe77bc80469&count=1
"place": {
"id": "cba60fe77bc80469",
"url": "https://api.twitter.com/1.1/geo/id/cba60fe77bc80469.json",
"place_type": "city",
"name": "Tallinn",
"full_name": "Tallinn, Harjumaa",
"country_code": "EE",
"country": "Eesti",
"contained_within": [],
"bounding_box": {
"type": "Polygon",
"coordinates": [
[
[
24.5501404,
59.3518286
],
[
24.9262886,
59.3518286
],
[
24.9262886,
59.4981855
],
[
24.5501404,
59.4981855
]
]
]
},
"attributes": {}
},

Python for looping to while looping

I need some help with converting the below code into something a bit more manageable.
I'm pretty sure I need modify it to include some while statements. But have been hitting my head against a wall for the last day or so. I think I'm close....
for LevelItemList[1] in LevelUrlList[1]:
if LevelItemList[1][1] == "Folder":
printFolderHeader(1,LevelItemList[1][0])
LevelUrlList[2] = parseHTML (LevelItemList[1][2])
for LevelItemList[2] in LevelUrlList[2]:
if LevelItemList[2][1] == "Folder":
printFolderHeader(2,LevelItemList[2][0])
LevelUrlList[3] = parseHTML (LevelItemList[2][2])
for LevelItemList[3] in LevelUrlList[3]:
if LevelItemList[3][1] == "Folder":
printFolderHeader(3,LevelItemList[3][0])
LevelUrlList[4] = parseHTML (LevelItemList[3][2])
for LevelItemList[4] in LevelUrlList[4]:
if LevelItemList[4][1] == "Folder":
printFolderHeader(4,LevelItemList[4][0])
LevelUrlList[5] = parseHTML (LevelItemList[4][2])
for LevelItemList[5] in LevelUrlList[5]:
if LevelItemList[5][1] == "Folder":
printFolderHeader(5,LevelItemList[5][0])
LevelUrlList[6] = parseHTML (LevelItemList[5][2])
for LevelItemList[6] in LevelUrlList[6]:
printPage(6,LevelItemList[6][0])
printFolderFooter(5,LevelItemList[5][0])
else:
printPage(5,LevelItemList[5][0])
printFolderFooter(4,LevelItemList[4][0])
else:
printPage(4,LevelItemList[4][0])
printFolderFooter(3,LevelItemList[3][0])
else:
printPage(3,LevelItemList[3][0])
printFolderFooter(2,LevelItemList[2][0])
else:
printPage(2,LevelItemList[2][0])
printFolderFooter(1,LevelItemList[1][0])
else:
printPage(1,LevelItemList[1][0])
I don't have the full context of the code, but I think you can reduce it down to something like this:
def printTheList(LevelItemList, index):
for item in LevelItemList:
if item[1] == "Folder":
printFolderHeader(index,item[0])
printTheList(parseHTML (item[2]), index + 1) # note the + 1
printFolderFooter(index,item[0])
else:
printPage(index,item[0])
# and the initial call looks like this.
printTheList(LevelUrlList[1], 1)
This code makes the assumption that you don't actually need to assign the values into LevelUrlList and LevelItemList the way you are doing in your code. If you do need that data later, I suggest passing in a different data structure to hold the resulting values.

Categories