python 3.6.x - function vs iterator - issue

python 3.6.x - function vs iterator - issue - python

I have inherited this chunk of code:
def _api_call(self, url, query=None, raw_query='', top=None, select=None, order_by=None, fetch_all=True,
full_response=False, timeout=10, retries=3, debugging=False):
if fetch_all and full_response:
# these parameters are not compatible
raise APIQueryException(message='fetch_all and full_response cannot be used together')
query_string = self._encode_query(query, top, select, order_by)
if query_string or raw_query:
query_string = '?' + query_string + raw_query
# get API signature headers for the new request from the api_auth singleton
headers = api_auth.generate_headers(path=url, method='GET')
service_end_point = 'https://%(portal)s.company.com%(url)s%(query_string)s' % {
'portal': self._portal,
'url': url,
'query_string': query_string,
}
retries_left = retries + 1 or 1
stop = False
kwargs = {'headers': headers, 'timeout': timeout, 'fetch_all': fetch_all}
accumulated_results = []
while not stop:
service_end_point, _tmp_res, stop = self._single_api_call(service_end_point, retries_left, stop, **kwargs)
accumulated_results.extend(_tmp_res)
return accumulated_results
def _single_api_call(self, service_end_point, retries_left, stop, debugging=True,**kwargs):
_res = []
headers = kwargs.pop('headers')
timeout = kwargs.pop('timeout')
fetch_all = kwargs.pop('fetch_all')
try:
while True:
if debugging:
print('Company Service API:', service_end_point)
result = requests.get(url=service_end_point, headers=headers, timeout=timeout)
break
except RequestException as e:
if retries_left > 0:
if debugging:
print('Company Service API EXCEPTION, retrying:', str(e))
retries_left -= 1
else:
raise
except requests.Timeout as e:
raise APITimeoutException(e, message='API request timeout')
except requests.ConnectionError as e:
raise APIRequestException(e, message='API request DNS error or connection refused')
except requests.TooManyRedirects as e:
raise APIRequestException(e, message='API request had too many redirects')
except requests.HTTPError as e:
raise APIRequestException(e, message='API request returned HTTP error status code')
if result.status_code == 400:
# Company often reports "Bad Request" if the query
# parameters are not acceptable
raise APIQueryException(message='API request failed, Company rejected query terms')
try:
parsed_result = json.loads(result.content)
except ValueError as e:
# an unknown failure mode
raise APIRequestException(
message='API request failed; no JSON returned; server said {}'.format(result.content))
if 'value' in parsed_result:
_res.extend(parsed_result['value'])
else:
pass
if '#odata.nextLink' in parsed_result and fetch_all:
service_end_point = parsed_result['#odata.nextLink']
else:
# no more pages
stop = True
return service_end_point, _res, stop
this works fine
call_1 = api_obj._api_call(url, *args, **kwargs)
len(call_1)
3492
however I'm trying to refactor it in order to use a generator, but I'm messing something up.
I made the following changes to the while not stop section of the _api_call method:
while not stop:
try:
service_end_point, _tmp_res, stop = self._single_api_call(service_end_point, retries_left, stop, **kwargs)
accumulated_results.extend(_tmp_res)
if stop:
raise StopIteration
else:
yield _tmp_res
except StopIteration:
return accumulated_results
I see that each single call is computed, but the result is:
call_2 = api_obj._api_call(url, *args, **kwargs)
len(call_2)
3
each of the three item is a list with 1000 items, so I have a total of 3000 items in separate list, and not 3492 as in the original implementation.
How can I change/rewrite this in order to achieve that?

You're trying to do it both ways at once, both yielding and returning. That is legal, but it doesn't mean what you probably want it to mean.
Also, you don't need to raise a StopIteration just to handle it and turn it into a return which the generator protocol is just going to turn back into a StopIteration. Just return and knock out two of those steps (and two extra chances to get something wrong). Or, in this case, we can just fall off the end of the while not stop: loop, just like the original code, and leave off the return because we then fall off the end of the function.
Meanwhile, your old code was adding each _tmp_res onto the list with extend, not append, which has the effect of "flattening" the list—if _tmp_res is a list of 1000 items, extend adds 1000 items onto the end of the list. But yield _tmp_res will just yield that 1000-item sub-list. You probably want yield from here:
while not stop:
service_end_point, _tmp_res, stop = self._single_api_call(service_end_point, retries_left, stop, **kwargs)
yield from _tmp_res
If you don't understand what yield from means, it's roughly equivalent (in this case) to:
for element in _tmp_res:
yield element
In general, yield from is much more powerful, but we don't need any of that power here. It will still be more efficient (although probably not enough to make a difference), and of course it's shorter and simpler, and it makes more sense once you wrap your head around the idea. But if your code needs to work in Python 2.7, you don't have yield from, so you'll have to use the loop over yield instead.

Related

Python\Flask\SQLAlchemy\Marshmallow - How to process a request with duplicate values without failing the request?

This is only my second task (bug I need to fix) in a Python\Flask\SQLAlchemy\Marshmallow system I need to work on. So please try to be easy with me :)
In short: I'd like to approve an apparently invalid request.
In details:
I need to handle a case in which a user might send a request with some json in which he included by mistake a duplicate value in a list.
For example:
{
"ciphers": [
"TLS_AES_256_GCM_SHA384",
"AES256-SHA256"
],
"is_default": true,
"tls_versions": [
"tls10",
"tls10",
"tls11",
]
}
What I need to do is to eliminate one of the duplicated tls1.0 values, but consider the request as valid, update the db with the correct and distinct tls versions, and in the response return the non duplicated json in body.
Current code segments are as follows:
tls Controller:
...
#client_side_tls_bp.route('/<string:tls_profile_id>', methods=['PUT'])
def update_tls_profile_by_id(tls_profile_id):
return update_entity_by_id(TlsProfileOperator, entity_name, tls_profile_id)
...
general entity controller:
...
def update_entity_by_id(operator, entity_name, entity_id):
"""flask route for updating a resource"""
try:
entity_body = request.get_json()
except Exception:
return make_custom_response("Bad Request", HTTPStatus.BAD_REQUEST)
entity_obj = operator.get(g.tenant, entity_id, g.correlation)
if not entity_obj:
response = make_custom_response(http_not_found_message(entity_name, entity_id), HTTPStatus.NOT_FOUND)
else:
updated = operator.update(g.tenant, entity_id, entity_body, g.correlation)
if updated == "accepted":
response = make_custom_response("Accepted", HTTPStatus.ACCEPTED)
else:
response = make_custom_response(updated, HTTPStatus.OK)
return response
...
tls operator:
...
#staticmethod
def get(tenant, name, correlation_id=None):
try:
tls_profile = TlsProfile.get_by_name(tenant, name)
return schema.dump(tls_profile)
except NoResultFound:
return None
except Exception:
apm_logger.error(f"Failed to get {name} TLS profile", tenant=tenant,
consumer=LogConsumer.customer, correlation=correlation_id)
raise
#staticmethod
def update(tenant, name, json_data, correlation_id=None):
schema.load(json_data)
try:
dependant_vs_names = VirtualServiceOperator.get_dependant_vs_names_locked_by_client_side_tls(tenant, name)
# locks virtual services and tls profile table simultaneously
to_update = TlsProfile.get_by_name(tenant, name)
to_update.update(json_data, commit=False)
db.session.flush() # TODO - need to change when 2 phase commit will be implemented
snapshots = VirtualServiceOperator.get_snapshots_dict(tenant, dependant_vs_names)
# update QWE
# TODO handle QWE update atomically!
for snapshot in snapshots:
QWEController.update_abc_services(tenant, correlation_id, snapshot)
db.session.commit()
apm_logger.info(f"Update successfully {len(dependant_vs_names)} virtual services", tenant=tenant,
correlation=correlation_id)
return schema.dump(to_update)
except Exception:
db.session.rollback()
apm_logger.error(f"Failed to update {name} TLS profile", tenant=tenant,
consumer=LogConsumer.customer, correlation=correlation_id)
raise
...
and in the api schema class:
...
#validates('_tls_versions')
def validate_client_side_tls_versions(self, value):
if len(noDuplicatatesList) < 1:
raise ValidationError("At least a single TLS version must be provided")
for tls_version in noDuplicatatesList:
if tls_version not in TlsProfile.allowed_tls_version_values:
raise ValidationError("Not a valid TLS version")
...
I would have prefer to solve the problem in the schema level, so it won't accept the duplication.
So, as easy as it is to remove the duplication from the "value" parameter value, how can I propagate the non duplicates list back in order to use it to update the db and the response?
Thanks.

I didn't test but I think mutating value in the validation function would work.
However, this is not really guaranteed by marshmallow's API.
The proper way to do it would be to add a post_load method to de-duplicate.
#post_load
def deduplicate_tls(self, data, **kwargs):
if "tls_versions" in data:
data["tls_version"] = list(set(data["tls_version"]))
return data
This won't maintain the order, so if the order matters, or for issues related to deduplication itself, see https://stackoverflow.com/a/7961390/4653485.

Twiter api.GetFriends sometimes doesn't work

I want to build a complex network of twitter followers.
I'm using the function api.GetFriends :
def get_friend_list_by_user (user, api) :
friends_lists = api.GetFriends(repr(user.id))
return friends_lists
The problem is that for the same twitter users, sometimes it works and sometimes doesn't.
When I'm debugging it,
the code is dead at that part on the api.py:
if enforce_auth:
if not self.__auth:
raise TwitterError("The twitter.Api instance must be authenticated.")
if url and self.sleep_on_rate_limit:
limit = self.CheckRateLimit(url)
if limit.remaining == 0:
try:
stime = max(int(limit.reset - time.time()) + 10, 0)
logger.debug('Rate limited requesting [%s], sleeping for [%s]', url, stime)
time.sleep(stime)
except ValueError:
pass
if not data:
data = {}
The stime value is 443.

YouTube Data Api: breaking a nextPageToken while loop if quota limit is reached?

I am using Python 3 and the YouTube Data API V3 to fetch comments from a YouTube video. This particular video has around 280,000 comments. I am trying to write a while loop that will get as many comments as possible before hitting the quota limit and then breaking if the quota limit is reached.
My loop appears to be successfully calling next page tokens and appending the requested metadata to my list, but when the quota is reached, it doesn't end the loop, instead registering an HttpError, and not saving any of the correctly fetched comment data.
Here is my current code:
# Get resources:
def get(resource, **kwargs):
print(f'Getting {resource} with params {kwargs}')
kwargs['key'] = API_KEY
response = requests.get(url=f'{YOUTUBE_BASE_URL}/{resource}',
params=remove_empty_kwargs(kwargs))
print(f'Response: {response.status_code}')
return response.json()
# Getting ALL comments for a video:
def getComments(video_id):
comments = []
res = get('commentThreads', part='id,snippet,replies', maxResults=100, videoId=video_id)
try:
nextPageToken = res['nextPageToken']
except TypeError:
nextPageToken = None
while (nextPageToken):
try:
res = get('commentThreads', part='id,snippet,replies', maxResults=100, videoId=video_id)
for i in res['items']:
comments.append(i)
nextPageToken = res['nextPageToken']
except HttpError as error:
print('An error occurred: %s' % error)
break
return comments
test = 'video-id-here'
testComments = getComments(test)
So, what happens is this correctly seems to be looping through all the comments. But after some time, i.e., after it has looped through several hundred times, I get the following error:
Getting commentThreads with params {'part': 'id,snippet,replies', 'maxResults': 100, 'videoId': 'real video ID shows here'}
Response: 403
KeyError Traceback (most recent call last)
<ipython-input-39-6582a0d8f122> in <module>
----> 1 testComments = getComments(test)
<ipython-input-29-68952caa30dd> in getComments(video_id)
12 res = get('commentThreads', part='id,snippet,replies', maxResults=100, videoId=video_id)
13
---> 14 for i in res['items']:
15 comments.append(i)
16
KeyError: 'items'
So, first I get the expected 403 respsonse from the API after some time, which indicates reaching the quota limit. Then it throws the error for 'items', but the reason this error is thrown is because it didn't catch anymore comment threads, so there are no more 'items' to append.
My expected result is that the loop will just break when the quota limit is reached and save the comment data it managed to fetch before reaching the quota.
I think this is probably related to my 'try' and 'except' handling, but I can't seem to figure out.
Thanks!

Ultimately fixed it with this code:
def getComments(video_id):
comments = []
res = get('commentThreads', part='id,snippet,replies', maxResults=100, videoId=video_id)
try:
nextPageToken = res['nextPageToken']
except KeyError:
nextPageToken = None
except TypeError:
nextPageToken = None
while (nextPageToken):
try:
res = get('commentThreads', part='id,snippet,replies', maxResults=100, videoId=video_id)
for i in res['items']:
comments.append(i)
nextPageToken = res['nextPageToken']
except KeyError:
break
return comments
Proper exception handling for the KeyError was the ultimate solution, since my get() function returns a JSON object.

You are catching an HttpError but it never happens, because when your limit runs out the API just returns 403.
There is no HttpError to catch and so you try to read a value which isn't there and get a KeyError.
The most robust way is probably to check the status code.
res = get('commentThreads', part='id,snippet,replies', maxResults=100, videoId=video_id)
if res.status_code != 200:
break
for i in res['items']:
comments.append(i)
nextPageToken = res['nextPageToken']
The res.status_code is assuming you're using requests.

Abandoning Futures in Tornado

I'm considering a fan-out proxy in tornado to query multiple backend servers and the possible use-case of having it not wait for all responses before returning.
Is there a problem with the remaining futures if you use a WaitIterator but not continuing to wait after receiving a useful response?
Perhaps the results of the other futures will not be cleaned up? Perhaps callbacks could be added to any remaining futures to discard their results?
#!./venv/bin/python
from tornado import gen
from tornado import httpclient
from tornado import ioloop
from tornado import web
import json
class MainHandler(web.RequestHandler):
#gen.coroutine
def get(self):
r1 = httpclient.HTTPRequest(
url="http://apihost1.localdomain/api/object/thing",
connect_timeout=4.0,
request_timeout=4.0,
)
r2 = httpclient.HTTPRequest(
url="http://apihost2.localdomain/api/object/thing",
connect_timeout=4.0,
request_timeout=4.0,
)
http = httpclient.AsyncHTTPClient()
wait = gen.WaitIterator(
r1=http.fetch(r1),
r2=http.fetch(r2)
)
while not wait.done():
try:
reply = yield wait.next()
except Exception as e:
print("Error {} from {}".format(e, wait.current_future))
else:
print("Result {} received from {} at {}".format(
reply, wait.current_future,
wait.current_index))
if reply.code == 200:
result = json.loads(reply.body)
self.write(json.dumps(dict(result, backend=wait.current_index)))
return
def make_app():
return web.Application([
(r'/', MainHandler)
])
if __name__ == '__main__':
app = make_app()
app.listen(8888)
ioloop.IOLoop.current().start()

So I've checked through the source for WaitIterator.
It tracks the futures adding a callback, when fired the iterator queues the result or (if you've called next()) fulfils a future it's given to you.
As the future you wait on only gets created by calling .next(), it appears you can exit out of the while not wait.done() and not leave any futures without observers.
Reference counting ought to allow the WaitIterator instance to remain until after all the futures have fired their callbacks and then be reclaimed.
Update 2017/08/02
Having tested further with subclassing WaitIterator with extra logging, yes the iterator will be cleaned up when all the futures return, but if any of those futures return an exception it will be logged that this exception hasn't been observed.
ERROR:tornado.application:Future exception was never retrieved: HTTPError: HTTP 599: Timeout while connecting
In summary and answering my question: completing the WaitIterator isn't necessary from a clean-up point of view, but it is probably desirable to do so from a logging point of view.
If you wanted to be sure, passing the the wait iterator to a new future that will finish consuming it and adding an observer may suffice. For example
#gen.coroutine
def complete_wait_iterator(wait):
rounds = 0
while not wait.done():
rounds += 1
try:
reply = yield wait.next()
except Exception as e:
print("Not needed Error {} from {}".format(e, wait.current_future))
else:
print("Not needed result {} received from {} at {}".format(
reply, wait.current_future,
wait.current_index))
log.info('completer finished after {n} rounds'.format(n=rounds))
class MainHandler(web.RequestHandler):
#gen.coroutine
def get(self):
r1 = httpclient.HTTPRequest(
url="http://apihost1.localdomain/api/object/thing",
connect_timeout=4.0,
request_timeout=4.0,
)
r2 = httpclient.HTTPRequest(
url="http://apihost2.localdomain/api/object/thing",
connect_timeout=4.0,
request_timeout=4.0,
)
http = httpclient.AsyncHTTPClient()
wait = gen.WaitIterator(
r1=http.fetch(r1),
r2=http.fetch(r2)
)
while not wait.done():
try:
reply = yield wait.next()
except Exception as e:
print("Error {} from {}".format(e, wait.current_future))
else:
print("Result {} received from {} at {}".format(
reply, wait.current_future,
wait.current_index))
if reply.code == 200:
result = json.loads(reply.body)
self.write(json.dumps(dict(result, backend=wait.current_index)))
consumer = complete_wait_iterator(wait)
consumer.add_done_callback(lambda f: f.exception())
return

raise in raise error

I have an error raise eb : list index out of range.
I don't understand why when i do a raise in an other try - catch
I'm doing a try - catch in a try - catch and both raises errors.
Here is my code and the error line is at raise eb :
try:
print("debut edit")
print(p)
modif_box = get_modif_box_profile(p)
post_box = get_Post_Box(p)
print("modi_box")
print(modif_box)
print("mbu id")
print(modif_box.id)
diff = {}
posts = {}
new_post = []
diff["posts"] = posts
posts["modified_post"] = new_post
for post in modif_box.edit_post_user.all():
# print(post.id_mod)
try:
messagenew = post_box.post.all().filter(id=post.id_mod)[0]
# print(post_new)
print("posts")
print(post)
# todo a factoriser
if messagenew.id > int(last_id) and messagenew.sender.id != p.id:
name = get_name_contact(p, messagenew)
return_post = {}
return_post["uid"] = messagenew.sender.id
return_post["pid"] = messagenew.id
return_post["author"] = name
return_post["title"] = messagenew.title
return_post["date"] = unix_time_millis(messagenew.date)
return_post["smile"] = count_smile(messagenew)
return_post["comment"] = count_comment(messagenew)
return_post["data"] = messagenew.data
return_post["type"] = messagenew.type_post.type_name
new_post.append(return_post)
else:
print("depop edit")
modif_box.edit_post_user.remove(post)
modif_box.save()
except Exception as eb:
PrintException()
# raise eb (if i decomment here i have an error in my program)
print(diff)
return diff
except Exception as e:
PrintException()
raise e
Regards and thanks

If you comment the raise statement there, it doesn't mean that you don't have an Error; it simply means that you handled the Exception -- which in your case is from what I can tell an IndexError -- by catching it with the except Exception and then calling PrintException().
When you raise an exception what you actually do is:
The raise statement allows the programmer to force a specified exception to occur.
So, by un-commenting, you allow the IndexError named eb to re-appear after catching it in the inner try-except block and get caught by the outer try - except clause in which you again re-raise it.
Generally, you don't want to catch exceptions in such a generic way because it might hide some unpredicted behaviour of the program that you would like to know about.
Limit the exceptions you catch in the except clause by simply specifying them, in your case, an except clause of the form:
except IndexError as eb:
PrintException()
would probably suffice.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python 3.6.x - function vs iterator - issue - python

Related

Python\Flask\SQLAlchemy\Marshmallow - How to process a request with duplicate values without failing the request?

Twiter api.GetFriends sometimes doesn't work

YouTube Data Api: breaking a nextPageToken while loop if quota limit is reached?

Abandoning Futures in Tornado

raise in raise error

Categories

Resources