i have a list of url_handler and i would want to make asyncronous httprequest using tornado. When all response structure is arrived i need to use it for other targets.
Here a simple example of my code:
(...)
self.number = 0
self.counter = 0
self.data = {}
(...)
#tornado.web.asynchronous
def post(self):
list_url = [url_service1, url_service2]
self.number = len(list_url)
http_client = AsyncHTTPClient()
for service in list_url:
request = tornado.httpclient.HTTPRequest(url=service, method='POST', headers={'content-type': 'application/json'}, body=json.dumps({..params..}))
http_client.fetch(request, callback=self.handle_response)
# Loop for is finished. Use self.data for example in other funcions...
# if i print(self.data) i have empty dict...
# do_something(self.data)
def handle_response(self,response):
if response.error:
print("Error")
else:
self.counter = self.counter + 1
print("Response {} / {} from {}".format(self.counter, self.number, response.effective_url))
self.data[response.effective_url] = json_decode(response.body)
# number is 2
if self.counter == self.number:
print("Finish response")
def do_something(data):
# code with data parameter
I hope my problem is well explained
Since you know AsyncHTTPClient is asynchronous, that means, the requests will run in background.
So, when the for loop is finished, that does not mean all the requests are also finished - they are running in the background even when the loop finishes.
That is why self.data is empty, because the requests aren't completed yet.
How to fix this
As you know the handle_response callback is called after every request is completed. You can call do_something function from this callback when all the requests are completed. Like this:
def handle_response(...):
...
if self.counter == self.number:
self.do_something(self.data)
print("Finish response")
Related
Heyo 👋
I have basic HTTPServer and pcsc_listener process that waits for user card scan. Both are run in using multiprocessing.
I have difficulty passing scanned card id from first pcsc_listener to HTTPServer as POST response - it just endlessly loops on first request, won't close and on 2nd request it just breaks.
My main function:
# Main routine
def main():
with Manager() as manager:
queue = manager.Queue()
p1 = Process(target=pcsc_listener, args=(queue,)).start()
p2 = Process(target=http_server, args=(queue,)).start()
while True:
msg = queue.get()
if msg:
if msg[0] == "active_card":
print("Card scanned " + msg[1])
Since card scanning pcsc_listener simply adds list to queue via queue.put([...]), it is safe to assume it works, so I won't add its code here.
My http server:
def http_server(queue):
def error_handler(request, client_address):
raise
class http_handler(BaseHTTPRequestHandler):
def do_GET(self):
'''
We are totally ignoring GET requests, but we going to leave it open for them.
Let's just return 501
'''
self.error_content_type('application/json')
self.send_error(501)
self.end_headers()
def do_POST(self):
content_length = int(self.headers['Content-Length'])
post_data = self.rfile.read(content_length).decode("utf-8") \
if content_length > 0 else ""
post_data = json.loads(post_data)
pass_data = {}
while True:
msg = queue.get()
if msg[0] == "active_card":
print(msg[1])
break
self.send_response(200)
self.send_header('Content-type', 'application/json')
self.end_headers()
self.wfile.write(json.dumps(post_data).encode("utf-8"))
raise(RuntimeWarning(post_data))
def log_message(self, format, *args):
return
Why is it not returning Queue and endlesly loops HTTP POST and doesn't return response? 🤔
What am I doing wrong. I basically need to share variable between these processes, so I'm okay with other suggestions.
How can I set maximum number of requests per second (limit them) in client side using aiohttp?
Although it's not exactly a limit on the number of requests per second, note that since v2.0, when using a ClientSession, aiohttp automatically limits the number of simultaneous connections to 100.
You can modify the limit by creating your own TCPConnector and passing it into the ClientSession. For instance, to create a client limited to 50 simultaneous requests:
import aiohttp
connector = aiohttp.TCPConnector(limit=50)
client = aiohttp.ClientSession(connector=connector)
In case it's better suited to your use case, there is also a limit_per_host parameter (which is off by default) that you can pass to limit the number of simultaneous connections to the same "endpoint". Per the docs:
limit_per_host (int) – limit for simultaneous connections to the same endpoint. Endpoints are the same if they are have equal (host, port, is_ssl) triple.
Example usage:
import aiohttp
connector = aiohttp.TCPConnector(limit_per_host=50)
client = aiohttp.ClientSession(connector=connector)
I found one possible solution here: http://compiletoi.net/fast-scraping-in-python-with-asyncio.html
Doing 3 requests at the same time is cool, doing 5000, however, is not so nice. If you try to do too many requests at the same time, connections might start to get closed, or you might even get banned from the website.
To avoid this, you can use a semaphore. It is a synchronization tool that can be used to limit the number of coroutines that do something at some point. We'll just create the semaphore before creating the loop, passing as an argument the number of simultaneous requests we want to allow:
sem = asyncio.Semaphore(5)
Then, we just replace:
page = yield from get(url, compress=True)
by the same thing, but protected by a semaphore:
with (yield from sem):
page = yield from get(url, compress=True)
This will ensure that at most 5 requests can be done at the same time.
This is an example without aiohttp, but you can wrap any async method or aiohttp.request using the Limit decorator
import asyncio
import time
class Limit(object):
def __init__(self, calls=5, period=1):
self.calls = calls
self.period = period
self.clock = time.monotonic
self.last_reset = 0
self.num_calls = 0
def __call__(self, func):
async def wrapper(*args, **kwargs):
if self.num_calls >= self.calls:
await asyncio.sleep(self.__period_remaining())
period_remaining = self.__period_remaining()
if period_remaining <= 0:
self.num_calls = 0
self.last_reset = self.clock()
self.num_calls += 1
return await func(*args, **kwargs)
return wrapper
def __period_remaining(self):
elapsed = self.clock() - self.last_reset
return self.period - elapsed
#Limit(calls=5, period=2)
async def test_call(x):
print(x)
async def worker():
for x in range(100):
await test_call(x + 1)
asyncio.run(worker())
Because none of the solution works from the other answers (I've already tried) if the API request limits the time since the end of the request. I'm posting a new one that should work:
class Limiter:
def __init__(self, calls_limit: int = 5, period: int = 1):
self.calls_limit = calls_limit
self.period = period
self.semaphore = asyncio.Semaphore(calls_limit)
self.requests_finish_time = []
async def sleep(self):
if len(self.requests_finish_time) >= self.calls_limit:
sleep_before = self.requests_finish_time.pop(0)
if sleep_before >= time.monotonic():
await asyncio.sleep(sleep_before - time.monotonic())
def __call__(self, func):
async def wrapper(*args, **kwargs):
async with self.semaphore:
await self.sleep()
res = await func(*args, **kwargs)
self.requests_finish_time.append(time.monotonic() + self.period)
return res
return wrapper
Usage:
#Limiter(calls_limit=5, period=1)
async def api_call():
...
async def main():
tasks = [asyncio.create_task(api_call(url)) for url in urls]
asyncio.gather(*tasks)
if __name__ == '__main__':
loop = asyncio.get_event_loop_policy().get_event_loop()
loop.run_until_complete(main())
I want to convert my current tornado app from using #web.asynchronous to #gen.coroutine. My asynchronous callback is called when a particular variable change happens on an IOLoop iteration. The current example in Tornado docs solves an I/O problem but in my case its the variable that I am interested in. I want the coroutine to wake up on the variable change. My app looks like the code shown below.
Note: I can only use Python2.
# A transaction is a DB change that can happen
# from another process
class Transaction:
def __init__(self):
self.status = 'INCOMPLETE'
self.callback = None
# In this, I am checking the status of the DB
# before responding to the GET request
class MainHandler(web.RequestHandler):
def initialize(self, app_reference):
self.app_reference = app_reference
#web.asynchronous
def get(self):
txn = Transaction()
callback = functools.partial(self.do_something)
txn.callback = callback
self.app_reference.monitor_transaction(txn)
def do_something(self):
self.write("Finished GET request")
self.finish()
# MyApp monitors a list of transactions and adds the callback
# 'transaction.callback' when transactions status changes to
# COMPLETE state.
class MyApp(Application):
def __init__(self, settings):
self.settings = settings
self._url_patterns = self._get_url_patterns()
self.txn_list = [] # list of all transactions being monitored
Application.__init__(self, self._url_patterns, **self.settings)
IOLoop.current().add_callback(self.check_status)
def monitor_transaction(self, txn):
self.txn_list.append(txn)
def check_status(self):
count = 0
for transaction in self.txn_list:
transaction.status = is_transaction_complete()
if transaction.status is 'COMPLETE':
IOLoop.current().add_callback(transaction.callback)
self.txn_list.pop(count)
count += 1
if len(self.txn_list):
IOloop.current().add_callback(self.check_status)
# adds 'self' to url_patterns
def _get_url_patterns(self):
from urls import url_patterns
modified_url_patterns = []
for url in url_patterns:
modified_url_patterns.append( url + ({ 'app_reference': self },))
return modified_url_patterns
If I understand right for it to write using gen.coroutine the get should be modified as
#gen.coroutine
def get(self):
txn = Transaction()
response = yield wake_up_when_transaction_completes()
# respond to GET here
My issue is I am not sure how to wake a routine only when the status changes and I cannot use a loop as it will block the tornado thread. Basically I want to notify from the IOLoop iteration.
def check_status():
for transaction in txn_list:
if transaction.status is 'COMPLETE':
NOTIFY_COROUTINE
Sounds like a job for the new tornado.locks! Released last week with Tornado 4.2:
http://tornado.readthedocs.org/en/latest/releases/v4.2.0.html#new-modules-tornado-locks-and-tornado-queues
Use an Event for this:
from tornado import locks, gen
event = locks.Event()
#gen.coroutine
def waiter():
print("Waiting for event")
yield event.wait()
print("Done")
#gen.coroutine
def setter():
print("About to set the event")
event.set()
More info on the Event interface:
http://tornado.readthedocs.org/en/latest/locks.html#tornado.locks.Event
From what I understand from tornado.gen module docs is that tornado.gen.Task comprises of tornado.gen.Callback and tornado.gen.Wait with each Callback/Wait pair associated with unique keys ...
#tornado.web.asynchronous
#tornado.gen.engine
def get(self):
http_client = AsyncHTTPClient()
http_client.fetch("http://google.com",
callback=(yield tornado.gen.Callback("google")))
http_client.fetch("http://python.org",
callback=(yield tornado.gen.Callback("python")))
http_client.fetch("http://tornadoweb.org",
callback=(yield tornado.gen.Callback("tornado")))
response = yield [tornado.gen.Wait("google"), tornado.gen.Wait("tornado"), tornado.gen.Wait("python")]
do_something_with_response(response)
self.render("template.html")
So the above code will get all responses from the different URLs.
Now what I actually need to accomplish is to return the response as soon as one http_client returns the data. So if 'tornadoweb.org' returns the data first, it should do a self.write(respose) and a loop in def get() should keep waiting for other http_clients to complete.
Any ideas on how to write this using tornado.gen interface.
Very vague implementation(and syntactically incorrect) of what I am trying to do would be like this
class GenAsyncHandler2(tornado.web.RequestHandler):
#tornado.web.asynchronous
#tornado.gen.engine
def get(self):
http_client = AsyncHTTPClient()
http_client.fetch("http://google.com",
callback=(yield tornado.gen.Callback("google")))
http_client.fetch("http://python.org",
callback=(yield tornado.gen.Callback("python")))
http_client.fetch("http://tornadoweb.org",
callback=(yield tornado.gen.Callback("tornado")))
while True:
response = self.get_response()
if response:
self.write(response)
self.flush()
else:
break
self.finish()
def get_response(self):
for key in tornado.gen.availableKeys():
if key.is_ready:
value = tornado.gen.pop(key)
return value
return None
It's case, when you shouldn't use inline callbacks, i.e gen.
Also self.render will be called after all callbacks finished. If you want to return response from server partially - render it partially.
Think this way(it's only idea with big room of improvement):
response = []
#tornado.web.asynchronous
def get(self):
self.render('head.html')
http_client = AsyncHTTPClient()
http_client.fetch("http://google.com",
callback=self.mywrite)
http_client.fetch("http://python.org",
callback=self.mywrite)
http_client.fetch("http://tornadoweb.org",
callback=self.mywrite)
self.render('footer.html')
self.finish()
def mywrite(self, result):
self.render('body_part.html')
self.response.add(result)
if len(self.response) == 3:
do_something_with_response(self.response)
In addition to this, actually there is a method WaitAll which waits for all results and returns when all HTTPCliens have completed giving responses.
I have submitted the diff in my tornado branch (https://github.com/pranjal5215/tornado). I have added a class WaitAny which is async WaitAll and returns result as soon as one HTTPClient has returned result.
Diff is at (https://github.com/pranjal5215/tornado/commit/dd6902147ab2c5cbf2b9c7ee9a35b4f89b40790e), (https://github.com/pranjal5215/tornado/wiki/Add-WaitAny-to-make-WaitAll-return-results-incrementally)
Sample usage:
class GenAsyncHandler2(tornado.web.RequestHandler):
#tornado.web.asynchronous
#tornado.gen.engine
def get(self):
http_client = AsyncHTTPClient()
http_client.fetch("http://google.com",
callback=(yield tornado.gen.Callback("google")))
http_client.fetch("http://python.org",
callback=(yield tornado.gen.Callback("python")))
http_client.fetch("http://tornadoweb.org",
callback=(yield tornado.gen.Callback("tornado")))
keys = set(["google", "tornado", "python"])
while keys:
key, response = yield tornado.gen.WaitAny(keys)
keys.remove(key)
# do something with response
self.write(str(key)+" ")
self.flush()
self.finish()
I have a generator that takes a long time for each iteration to run. Is there a standard way to have it yield a value, then generate the next value while waiting to be called again?
The generator would be called each time a button is pressed in a gui and the user would be expected to consider the result after each button press.
EDIT: a workaround might be:
def initialize():
res = next.gen()
def btn_callback()
display(res)
res = next.gen()
if not res:
return
If I wanted to do something like your workaround, I'd write a class like this:
class PrefetchedGenerator(object):
def __init__(self, generator):
self._data = generator.next()
self._generator = generator
self._ready = True
def next(self):
if not self._ready:
self.prefetch()
self._ready = False
return self._data
def prefetch(self):
if not self._ready:
self._data = self._generator.next()
self._ready = True
It is more complicated than your version, because I made it so that it handles not calling prefetch or calling prefetch too many times. The basic idea is that you call .next() when you want the next item. You call prefetch when you have "time" to kill.
Your other option is a thread..
class BackgroundGenerator(threading.Thread):
def __init__(self, generator):
threading.Thread.__init__(self)
self.queue = Queue.Queue(1)
self.generator = generator
self.daemon = True
self.start()
def run(self):
for item in self.generator:
self.queue.put(item)
self.queue.put(None)
def next(self):
next_item = self.queue.get()
if next_item is None:
raise StopIteration
return next_item
This will run separately from your main application. Your GUI should remain responsive no matter how long it takes to fetch each iteration.
No. A generator is not asynchronous. This isn't multiprocessing.
If you want to avoid waiting for the calculation, you should use the multiprocessing package so that an independent process can do your expensive calculation.
You want a separate process which is calculating and enqueueing results.
Your "generator" can then simply dequeue the available results.
You can definitely do this with generators, just create your generator so that each next call alternates between getting the next value and returning it by putting in multiple yield statements. Here is an example:
import itertools, time
def quick_gen():
counter = itertools.count().next
def long_running_func():
time.sleep(2)
return counter()
while True:
x = long_running_func()
yield
yield x
>>> itr = quick_gen()
>>> itr.next() # setup call, takes two seconds
>>> itr.next() # returns immediately
0
>>> itr.next() # setup call, takes two seconds
>>> itr.next() # returns immediately
1
Note that the generator does not automatically do the processing to get the next value, it is up to the caller to call next twice for each value. For your use case you would call next once as a setup up, and then each time the user clicks the button you would display the next value generated, then call next again for the pre-fetch.
I was after something similar. I wanted yield to quickly return a value (if it could) while a background thread processed the next, next.
import Queue
import time
import threading
class MyGen():
def __init__(self):
self.queue = Queue.Queue()
# Put a first element into the queue, and initialize our thread
self.i = 1
self.t = threading.Thread(target=self.worker, args=(self.queue, self.i))
self.t.start()
def __iter__(self):
return self
def worker(self, queue, i):
time.sleep(1) # Take a while to process
queue.put(i**2)
def __del__(self):
self.stop()
def stop(self):
while True: # Flush the queue
try:
self.queue.get(False)
except Queue.Empty:
break
self.t.join()
def next(self):
# Start a thread to compute the next next.
self.t.join()
self.i += 1
self.t = threading.Thread(target=self.worker, args=(self.queue, self.i))
self.t.start()
# Now deliver the already-queued element
while True:
try:
print "request at", time.time()
obj = self.queue.get(False)
self.queue.task_done()
return obj
except Queue.Empty:
pass
time.sleep(.001)
if __name__ == '__main__':
f = MyGen()
for i in range(5):
# time.sleep(2) # Comment out to get items as they are ready
print "*********"
print f.next()
print "returned at", time.time()
The code above gave the following results:
*********
request at 1342462505.96
1
returned at 1342462505.96
*********
request at 1342462506.96
4
returned at 1342462506.96
*********
request at 1342462507.96
9
returned at 1342462507.96
*********
request at 1342462508.96
16
returned at 1342462508.96
*********
request at 1342462509.96
25
returned at 1342462509.96