While loop to make API calls until a condition is met - python

I want to make API calls until a condition is met. I figured I might use a while loop.
I have a JSON response from the server that is paginated.
{
"services": [
{
"id": "ABC12",
"name": "Networks",
"description": null,
"status": "active",
"teams": [
{
"id": "XYZ12",
"type": "team_reference",
"summary": "Network Systems ",
}
],
"acknowledgement_timeout": null,
"auto_resolve_timeout": null,
"alert_grouping": "intelligent",
"alert_grouping_timeout": null,
"integrations": [],
"response_play": null,
"type": "service",
"summary": "All Events",
}
],
"limit": 25,
"offset": 0,
"total": null,
"more": true
}
limit - max I can set is 100.
offset - If specified, shows results from that point.
more - If TRUE, there are more results. If FALSE, that is the end.
for more info on this pagination - https://v2.developer.pagerduty.com/docs/pagination
I need to match the name "Networks" and get its corresponding id "ABC12". The problem is, I have to paginate make multiple calls to the API.
I have written this so far.
import requests
import json
import urllib3
# Supress SSL warnings
urllib3.disable_warnings()
# API key
API_KEY = '12345asdfg'
def list_services():
x = 25
y = 0
results = []
url = f'https://api.pagerduty.com/services/?limit={x}&offset={y}'
headers = {
'Accept': 'application/vnd.pagerduty+json;version=2',
'Authorization': 'Token token={token}'.format(token=API_KEY)
}
current_page = json.loads(requests.get(url, verify=False, headers=headers).content.decode('UTF-8'))
results.append(current_page)
while current_page['more'] == 'True':
y = y + 1
current_page = json.loads(requests.get(url, verify=False, headers=headers).content.decode('UTF-8'))
results.append(current_page)
print(results) # Does not print anything
print(results) # Prints only the first call results, the while loop
# doesn't seem to work.
if __name__ == '__main__':
list_services()
the print(results) outside the while loop prints only the first API call results. The while loop doesn't seem to work. But the code compiles without any errors.
how do I set the value of x to 25 and make API calls and append the results to results until more is false?
OR
how do I make multiple API calls until I find the match. If I found a match, then stop making the call.
Or is there a better cleaner way to do this?

This does not work because you never actually reassign the url variable once y is changed. Also you are checking against 'True' which is a string, not a boolean value. In addition I believe the offset should increase by the amount of results everytime; not just one. For example if on your first call you get results 1-25. Then if you increase y by one, the second call will yield 2-26. Instead you should increase it by the limit. This way on the second call you get results 25-50. Here is how I would do this:
def list_services():
x = 25
y = 0
results = []
serv_id = None
flag = False
url = f'https://api.pagerduty.com/services/?limit={x}&offset={y}'
headers = {
'Accept': 'application/vnd.pagerduty+json;version=2',
'Authorization': 'Token token={token}'.format(token=API_KEY)
}
current_page = json.loads(requests.get(url, verify=False, headers=headers).content.decode('UTF-8'))
results.append(current_page)
for serv_set in current_page['services']:
if serv_set['name'] == 'Networks':
serv_id = serv_set['id']
flag = True
while current_page['more'] == True and not flag:
for serv_set in current_page['services']:
if serv_set['name'] == 'Networks':
serv_id = serv_set['id']
break
y += x
url = f'https://api.pagerduty.com/services/?limit={x}&offset={y}'
current_page = json.loads(requests.get(url, verify=False, headers=headers).content.decode('UTF-8'))
results.append(current_page)
print(results)
print(results, serv_id)
You could further clean this up to avoid some redundancy but this should work. You should also check the status of the API call to ensure that you have a valid response.
Edit:
I edited in the issue dealing with obtaining the id attribute when the name == 'Networks'. Once again you could reduce the redundancy in this a lot but this will get you on the right track. Now serv_id = the id of the service with the name of Networks. If no match is found at the end of the iterations then serv_id will be None.

while current_page['more'] == 'True':
You are checking for a string called 'True' instead of a boolean of True, as is defined in your json file. This could be why your while loop is never executing, and you are not receiving your print statement.
Also, generally for API calls that have more than 1 page of data, you need to specify which page you are getting. Which means you need to reinitialize your payload in your while loop.
For example, if an API has a parameter called "page" that you can pass in, in your while loop you would have to pass in page = 1, page = 2, etc. as a payload.

Related

Reading JSON data in Python using Pagination, max records 100

I am trying to extract data from a REST API using python and put it into one neat JSON file, and having difficulty. The date is rather lengthy, with a total of nearly 4,000 records, but the max record allowed by the API is 100.
I've tried using some other examples to get through the code, and so far this is what I'm using (censoring the API URL and auth key, for the sake of confidentiality):
import requests
import json
from requests.structures import CaseInsensitiveDict
url = "https://api.airtable.com/v0/CENSORED/Vendors?maxRecords=100"
headers = CaseInsensitiveDict()
headers["Authorization"] = "Bearer CENSORED"
resp = requests.get(url, headers=headers)
resp.content.decode("utf-8")
vendors = []
new_results = True
page = 1
while new_results:
centiblock = requests.get(url + f"&page={page}", headers=headers).json()
new_results = centiblock.get("results", [])
vendors.extend(centiblock)
page += 1
full_directory = json.dumps(vendors, indent=4)
print(full_directory)
For the life of me, I cannot figure out why it isn't working. The output keeps coming out as just:
[
"records"
]
If I play around with the print statement at the end, I can get it to print centiblock (so named for being a block of 100 records at a time) just fine - it gives me 100 records in un-formated text. However, if I try printing vendors at the end, the output is:
['records']
...which leads me to guess that somehow, the vendors array is not getting filled with the data. I suspect that I need to modify the get request where I define new_results, but I'm not sure how.
For reference, this is a censored look at how the json data begins, when I format and print out one centiblock:
{
"records": [
{
"id": "XXX",
"createdTime": "2018-10-15T19:23:59.000Z",
"fields": {
"Vendor Name": "XXX",
"Main Phone": "XXX",
"Street": "XXX",
Can anyone see where I'm going wrong?
Thanks in advance!
When you are extending vendors with centiblock, your are giving a dict to the extend function. extend is expecting an Iterable, so that works, but when you iterate over a python dict, you only iterate over the keys of the dict. In this case, ['records'].
Note as well, that your loop condition becomes False after the first iteration, because centiblock.get("results", []) returns [], since "results" is not a key of the output of the API. and [] has a truthiness value of False.
Hence to correct those errors you need to get the correct field from the API into new_results, and extend vendors with new_results, which is itself an array. Note that on the last iteration, new_results will be the empty list, which means vendors won't be extended with any null value, and will contain exactly what you need:
This should look like:
import requests
import json
from requests.structures import CaseInsensitiveDict
url = "https://api.airtable.com/v0/CENSORED/Vendors?maxRecords=100"
headers = CaseInsensitiveDict()
headers["Authorization"] = "Bearer CENSORED"
resp = requests.get(url, headers=headers)
resp.content.decode("utf-8")
vendors = []
new_results = True
page = 1
while len(new_results) > 0:
centiblock = requests.get(url + f"&page={page}", headers=headers).json()
new_results = centiblock.get("records", [])
vendors.extend(new_results)
page += 1
full_directory = json.dumps(vendors, indent=4)
print(full_directory)
Note that I replaced the while new_results with a while len(new_results)>0 which is equivalent in this case, but more readable, and better practice in general.

How to speed up dictionary search in python?

Basically I have a function that returns an API response with a huge amount of dictionaries, simplified to their keys, I then have another function, called getPlayerData which sends an api call to the same api to get information about the specific player, instead of all of them, the problem is that alone, getPlayerData is fast, but in this scenario, it is way more than unusable.
Is there a way i can speed up this? getPlayerData is not required, I can just make a request too.
The dictionary search
residents = []
for resident in getListData("resident"):
if getPlayerData(resident)["town"] == town:
residents.append(resident)
print(residents)
getPlayerData()
def getPlayerData(player):
r = requests.get("http://srv.earthpol.com/api/json/residents.php?name=" + player)
j = r.json()
player = player.lower()
global emptyresult
emptyresult = False
if str(j) == "{}":
emptyresult = True
else:
result = {"town": j[player]["town"],
"town-rank": j[player]["townRank"],
"nation-ranks": j[player]["nationRanks"],
"lastOnline:": j[player]["lastOnline"],
"registered": j[player]["registered"],
"town-title": j[player]["title"],
"nation-title": j[player]["surname"],
"friends": j[player]["friends"],
"uuid": j[player]["uuid"],
"avatar": "https://crafatar.com/avatars/"+ j[player]["uuid"]}
return result

Clockify API, unexpected data returned?

I'm requesting some time entries for users with the Clockify API(). For some reason, I am receiving some responses which include entries without an end-time. I noticed that, the unexpectedly returned entries belong to currently running time entires... However, I did not specify/use the 'in-progress' parameter... What is happening here?
Here is my code:
def fetch_users_time_entries(users):
API_URL = "https://api.clockify.me/api/v1"
for user in users:
url = "{}/workspaces/{}/user/{}/time-entries?hydrated=true&page-size=1000&start=2019-08-05T00:00:01Z".format(API_URL, WORKSPACE_ID, user['clockify_id'])
time_entries = requests.get(url, headers=HEADER)
for time_entry in time_entries.json():
Here is a sample of an unexpected "end" value:
{
'id':'SECRET',
'description':'',
'tags':[
{
'id':'SECRET',
'name':'CERTI',
'workspaceId':'SECRET'
}
],
'user':None,
'billable':True,
'task':{
'id':'SECRET',
'name':'Etapa: Execução e Controle',
'projectId':'SECRET',
'assigneeId':'',
'estimate':'PT0S',
'status':'ACTIVE'
},
'project':{
'id':'SECRET',
'name':'C105',
'hourlyRate':{
'amount':0,
'currency':'USD'
},
'clientId':'SECRET',
'workspaceId':'SECRET',
'billable':True,
'memberships':[
{
'userId':'SECRET',
'hourlyRate':None,
'targetId':'SECRET',
'membershipType':'PROJECT',
'membershipStatus':'ACTIVE'
}
],
'color':'#8bc34a',
'estimate':{
'estimate':'PT0S',
'type':'AUTO'
},
'archived':False,
'duration':'PT25H20M12S',
'clientName':'NEO',
'public':True
},
'timeInterval':{
'start':'2019-08-22T18:55:55Z',
'end':None,
'duration':None
},
'workspaceId':'SECRET',
'totalBillable':None,
'hourlyRate':None,
'isLocked':False,
'userId':'SECRET',
'projectId':'SECRET'
}
I was only expecting time entries that were completed. Any suggestions?
UPDATE (10/16/19):
Another follow-up. They just send me an e-mail saying they fixed the problem. Putting the parameter "in-progress" to false will return only completed time entries. #matthew-e-miller it would be nice to add this to the answer. – Lukas Belck 5 hours ago
Okay, so I finally had a chance to reproduce the problem and it seems... There is not an end-time filter. They have misleadingly provided a start and end parameter, but these both filter on start-time.
The start and end parameters work like this:
The in-progress works as described in the doc, but it doesn't work for your application.
Answer:
I think your best bet is to request all the time entries, place them into a dict/list, and then use your python script to remove elements with "'end: 'None'".
import requests
import json
headers = {"content-type": "application/json", "X-Api-Key": "your api key""}
workspaceId = "your workspace id"
userId = "your user id"
params = {'start': '2019-08-28T11:10:32.998Z', 'end': '2019-08-29T02:05:02Z', 'in-progress': 'true'}
API_URL = "https://api.clockify.me/api/v1/workspaces/{workspaceId}/user/{userId}/time-entries"
print(API_URL)
result_one = requests.get(API_URL, headers=headers, params=params)
print(result_one)
List = json.loads(result_one.text)
for entry in List:
if entry.get("timeInterval")['end'] == None:
List.remove(entry)
print(List)
Output:
List containing only entries which do not have timeInterval.end == 'None'.
Here is time spent on this answer-edit:

Printing dictionary from inside a list puts one character on each line

Yes, yet another. I can't figure out what the issue is. I'm trying to iterate over a list that is a subsection of JSON output from an API call.
This is the section of JSON that I'm working with:
[
{
"created_at": "2017-02-22 17:20:29 UTC",
"description": "",
"id": 1,
"label": "FOO",
"name": "FOO",
"title": "FOO",
"updated_at": "2018-12-04 16:37:09 UTC"
}
]
The code that I'm running that retrieves this and displays it:
#!/usr/bin/python
import json
import sys
try:
import requests
except ImportError:
print "Please install the python-requests module."
sys.exit(-1)
SAT_API = 'https://satellite6.example.com/api/v2/'
USERNAME = "admin"
PASSWORD = "password"
SSL_VERIFY = False # Ignore SSL for now
def get_json(url):
# Performs a GET using the passed URL location
r = requests.get(url, auth=(USERNAME, PASSWORD), verify=SSL_VERIFY)
return r.json()
def get_results(url):
jsn = get_json(url)
if jsn.get('error'):
print "Error: " + jsn['error']['message']
else:
if jsn.get('results'):
return jsn['results']
elif 'results' not in jsn:
return jsn
else:
print "No results found"
return None
def display_all_results(url):
results = get_results(url)
if results:
return json.dumps(results, indent=4, sort_keys=True)
def main():
orgs = display_all_results(KATELLO_API + "organizations/")
for org in orgs:
print org
if __name__ == "__main__":
main()
I appear to be missing a concept because when I print org I get each character per line such as
[
{
"
c
r
e
a
t
e
d
_
a
t
"
It does this through to the final ]
I've also tried to print org['name'] which throws the TypeError: list indices must be integers, not str Python error. This makes me think that org is being seen as a list rather than a dictionary which I thought it would be due to the [{...}] format.
What concept am I missing?
EDIT: An explanation for why I'm not getting this: I'm working with a script in the Red Hat Satellite API Guide which I'm using to base another script on. I'm basically learning as I go.
display_all_results is returning a string since you are doing json.dumps in json.dumps(results, indent=4, sort_keys=True), which converts the dictionary to a string (you are getting that dictionary from r.json() in get_json function)
You then end up iterating over the characters of that string in main, and you see one character per line
Instead just return results from display_all_results and the code will work as intended
def display_all_results(url):
#results is already a dictionary, just return it
results = get_results(url)
if results:
return results
Orgs is a result of json.dump which produces a string. So instead of this code:
for org in orgs:
print(org)
replace it with simply:
#for org in orgs:
print(orgs)

Not able to get Country of a Tweet - Twython API

I am using the following code to collect Tweets pertaining to a certain topic but in all the tweets that I have extracted the 'places' attribute is None. Am I doing something wrong? Also, the code is meant to extract existing tweets and I do not need streaming api solution and not looking for this solution of streaming API : https://www.quora.com/How-can-I-get-a-stream-of-tweets-from-a-particular-country-using-Twitter-API
api = Twython(consumer_key, consumer_secret, access_key, access_secret)
tweets = []
MAX_ATTEMPTS = 200
COUNT_OF_TWEETS_TO_BE_FETCHED = 10000
in_max_id = sys.argv[1]
next_max_id = ''
for i in range(0,MAX_ATTEMPTS):
if(COUNT_OF_TWEETS_TO_BE_FETCHED < len(tweets)):
break # we got 500 tweets... !!
#----------------------------------------------------------------#
# STEP 1: Query Twitter
# STEP 2: Save the returned tweets
# STEP 3: Get the next max_id
#----------------------------------------------------------------#
# STEP 1: Query Twitter
if(0 == i):
# Query twitter for data.
results = api.search(q="#something",count='100',lang='en',max_id=in_max_id,include_entities='true',geo= True)
else:
# After the first call we should have max_id from result of previous call. Pass it in query.
results = api.search(q="#something",include_entities='true',max_id=next_max_id,lang='en',geo= True)
# STEP 2: Save the returned tweets
for result in results['statuses']:
temp = ""
tweet_text = result['text']
temp += tweet_text.encode('utf-8') + " "
hashtags = result['entities']['hashtags']
for i in hashtags:
temp += i['text'].encode('utf-8') + " "
print result
#temp += i["place"]["country"] + "\n"
#output_file.write(temp)
# STEP 3: Get the next max_id
try:
# Parse the data returned to get max_id to be passed in consequent call.
next_results_url_params = results['search_metadata']['next_results']
next_max_id = next_results_url_params.split('max_id=')[1].split('&')[0]
except:
# No more next pages
break
The short answer is, No, you are doing nothing wrong. The reason why all place tags are empty is because statistically they are very unlikely to contain data. Only about 1% of all tweets have data in their place tag. This is because users rarely tweet their location. Location is off by default.
Download 100 or more tweets and you probably will find place tag data.
If place field is a MUST for all the tweet that you app will process, then you can limit your search over a place to make sure all the result will definitely have it.
You can doing so by setting geocode (latitude,longitude,radius[km/mi]) parameter, to limit your search within an area.
An example such request via Twython is:
geocode = '25.032341,55.385557,100mi'
api.search(q="#something",count='100',lang='en',include_entities='true',geocode=geocode)
Not all tweets have all fields like tweet_text, place, country, language etc.,
So, to avoid KeyError use the following approach. Modify your code so that when the key that you're looking for is not found, a default value is returned.
result.get('place', {}).get('country', {}) if result.get('place') != None else None
Here, the above line means "search for the key country after fetching the key place if it exists, otherwise return None "
kmario is right. Most tweets don't have this information, but a small percent do. Doing a location search will increase this chance e.g. https://api.twitter.com/1.1/search/tweets.json?q=place%3Acba60fe77bc80469&count=1
"place": {
"id": "cba60fe77bc80469",
"url": "https://api.twitter.com/1.1/geo/id/cba60fe77bc80469.json",
"place_type": "city",
"name": "Tallinn",
"full_name": "Tallinn, Harjumaa",
"country_code": "EE",
"country": "Eesti",
"contained_within": [],
"bounding_box": {
"type": "Polygon",
"coordinates": [
[
[
24.5501404,
59.3518286
],
[
24.9262886,
59.3518286
],
[
24.9262886,
59.4981855
],
[
24.5501404,
59.4981855
]
]
]
},
"attributes": {}
},

Categories