Python Tornado Async Fetching of URLs

Python Tornado Async Fetching of URLs - python

In the following code example I have a function do_async_thing which appears to return a Future, even though I'm not sure why?
import tornado.ioloop
import tornado.web
import tornado.httpclient
#tornado.gen.coroutine
def do_async_thing():
http = tornado.httpclient.AsyncHTTPClient()
response = yield http.fetch("http://www.google.com/")
return response.body
class MainHandler(tornado.web.RequestHandler):
def get(self):
x = do_async_thing()
print(x) # <tornado.concurrent.Future object at 0x10753a6a0>
self.set_header("Content-Type", "application/json")
self.write('{"foo":"bar"}')
self.finish()
if __name__ == "__main__":
app = tornado.web.Application([
(r"/foo/?", MainHandler),
])
app.listen(8888)
tornado.ioloop.IOLoop.current().start()
You'll see that I yield the call to fetch and in doing so I should have forced the value to be realised (and subsequently been able to access the body field of the response).
What's more interesting is how I can even access the body field on a Future and not have it error (as far as I know a Future has no such field/property/method)
So does anyone know how I can:
Resolve the Future so I get the actual value
Modify this example so the function do_async_thing makes multiple async url fetches
Now it's worth noting that because I was still getting a Future back I thought I would try adding a yield to prefix the call to do_async_thing() (e.g. x = yield do_async_thing()) but that gave me back the following error:
tornado.gen.BadYieldError: yielded unknown object <generator object get at 0x1023bc308>
I also looked at doing something like this for the second point:
def do_another_async_thing():
http = tornado.httpclient.AsyncHTTPClient()
a = http.fetch("http://www.google.com/")
b = http.fetch("http://www.github.com/")
return a, b
class MainHandler(tornado.web.RequestHandler):
def get(self):
y = do_another_async_thing()
print(y)
But again this returns:
<tornado.concurrent.Future object at 0x102b966d8>
Where as I would've expected a tuple of Futures at least? At this point I'm unable to resolve these Futures without getting an error such as:
tornado.gen.BadYieldError: yielded unknown object <generator object get at 0x1091ac360>
Update
Below is an example that works (as per answered by A. Jesse Jiryu Davis)
But I've also added another example where by I have a new function do_another_async_thing which makes two async HTTP requests (but evaluating their values are a little bit more involved as you'll see):
def do_another_async_thing():
http = tornado.httpclient.AsyncHTTPClient()
a = http.fetch("http://www.google.com/")
b = http.fetch("http://www.github.com/")
return a, b
#tornado.gen.coroutine
def do_async_thing():
http = tornado.httpclient.AsyncHTTPClient()
response = yield http.fetch("http://www.google.com/")
return response.body
class MainHandler(tornado.web.RequestHandler):
#tornado.gen.coroutine
def get(self):
x = yield do_async_thing()
print(x) # displays HTML response
fa, fb = do_another_async_thing()
fa = yield fa
fb = yield fb
print(fa.body, fb.body) # displays HTML response for each
It's worth clarifying: you might expect the two yield statements for do_another_async_thing to cause a blockage. But here is a breakdown of the steps that are happening:
do_another_async_thing returns immediately a tuple with two Futures
we yield the first tuple which causes the program to be blocked until the value is realised
the value is realised and so we move to the next line
we yield again, causing the program to block until the value is realised
but as both futures were created at the same time and run concurrently the second yield returns practically instantly

Coroutines return futures. To wait for the coroutine to complete, the caller must also be a coroutine, and must yield the future. So:
#gen.coroutine
def get(self):
x = yield do_async_thing()
For more info see Refactoring Tornado Coroutines.

Related

Persistent List Python Implementation

I have class that sends logs to an exposed API. Now what I want to happen is to save/persists failed logs in a list so that it can be resent to the server again.
This is what I have so far.
You may notice that I have declared a class variable, and I have read that this is not really advisable.
My concern is, is there a better way to persist lists or queues?
from collections import *
import time
import threading
import requests
import json
URL_POST = "http:/some/url/here"
class LogManager:
listQueue = deque()
def post(self, log):
headers = {"Content-type": "application/json",
"Accept": "text/plain", "User-Agent": "Test user agent"}
resp = requests.post(URL_POST, data=json.dumps(log), headers=headers)
return resp.status_code
def send_log(self, log):
try:
print ("Sending log to backend")
self.post(log)
except: # sending to server fails for some reason
print ("Sending logs to server fail, appending to Queue")
LogManager.listQueue.append(log)
def resend_log(self, log):
print ("checking if deque has values")
if LogManager.listQueue:
logToPost = LogManager.listQueue.popleft()
try:
self.post(logToPost)
print ("resending failed logs")
except: #for some reason it fails again
LogManager.listQueue.appendleft(logToPost)
print ("appending log back to deque")
def run(self,log):
t1 = threading.Thread(target=self.send_log, args=(log,))
t2 = threading.Thread(target=self.resend_log,args=(log,))
t1.start()
time.sleep(2)
t2.start()
t1.join()
t2.join()
if __name__ == "__main__":
while True:
logs = LogManager()
logs.run({"some log": "test logs"})

Since the only variable LogManager stores is a persistent one, there doesn't seem to be any need to re-instantiate the class each time. I would probably move the logs = LogManager() line outside of the while-loop and change list_queue to an instance variable, self.list_queue that is created in the class' __init__ method. This way you'll have one LogManager with one queue that just calls its run method each loop.
That said, if you're keeping it inside the loop because your class will have further functionality that requires a new instance each loop, then using a class variable to track a list between instances is exactly what class variables are for. They're not inadvisable at all; the example given in the documentation you link to is bad because they're using a class variable for data that should be instance-specific.

Get Deferred from dbpool.runQuery instead of data with Twisted and Oracle

I am trying to get some data from Oracle, using Twisted and runQuery and keep getting Deferred instead of actual data.
How can this be solved?
Some code (I excluded some unnecessary parts, but the idea should be clear):
from twisted.enterprise import adbapi
from twisted.internet import defer
import service_config
ORACLE_DSN = service_config.oracle_dsn
ORACLE_USER = service_config.oracle_user
ORACLE_PASSWORD = service_config.oracle_password
dbpool = adbapi.ConnectionPool('cx_Oracle',
user=ORACLE_USER,
password=ORACLE_PASSWORD,
dsn=ORACLE_DSN, port='49161')
#defer.inlineCallbacks
def ask_db():
data = yield dbpool.runQuery("SELECT * FROM customer")
a = ask_db()
print(a)
I got reactor running in other module, if that is important.
Thank you in advance.
UPDATE:
With help of #notorious.no got working code, returning data instead of Deferred with Python 3.5:
#defer.inlineCallbacks
def ask_db(request):
data = yield dbpool.runQuery(request)
return defer.returnValue(data)

You get a Deferred because you're calling an inlineCallback which always returns a Deferred. You're also misinterpreting what yield does. It doesn't actually return a value from an inlinceCallback it just wait's for a result. Use defer.returnValue() to return a value (you can use a simple return if you're using Python 3.4+). This is what your code should look like:
from __future__ import print_function
#...
#defer.inlineCallbacks
def ask_db():
data = yield dbpool.runQuery("SELECT * FROM customer")
defer.returnValue(data) # actually return a value
a = ask_db() # this returns a Deferred so add callbacks!
a.addCallback(print) # add a useful callback to processes query list
reactor.run()
The difference between what you had previously and this answer is that a callback is added so when the runQuery() returns with a value, the callback is executed and ask_db() actually returns a value you care about.
References
Database Usage with Klein/Twisted

How to do a HTTP Get request with a deferred inside another (cascade defer)?

I am want to do a GET request to check if the return code is what I expect. This request occurrs inside a function called by a addCallback of a general deferred chain, as is showed in the bellow code.
My specif question if: How to make the return of line -D- arrives at line -E-?
It seems that the callback function "cbResponse" (line -D-) never is called. My first attempt was do the request and return to the callback chain the result of the request (line -A-). It fails, because the deferr object hasn't the attribute result.
The second attemp (line -B-), was return the deferred object itself. It doesn't returns the result too.
The third attemp (line -C-), whas return anything, but it obviously hasn't the response code of the request.
Thanks a lot!
from twisted.web.client import Agent
from twisted.web.http_headers import Headers
from twisted.internet import reactor, defer
class Test (object):
#classmethod
def getRequest (self, result):
print "Function getRequest"
agent = Agent(reactor)
d2 = agent.request('GET',
'http://www.google.com',
Headers({'User-Agent': ['Twisted Web Client Example']}),
None)
d2.addCallback(Test.cbResponse)
# 1 st attempt: return the result of d2. Fail: exceptions.AttributeError: Deferred instance has no attribute 'result'
return d2.result # --> line A
# 2nd attempt: return only the deferr object d2. Don't fail, but I can't get the result of the above request
### return d2 # --> line B
# 3rd attemp: return None (without return).
# --> line C
#classmethod
def cbResponse(response):
print 'Function cbResponse %s', response.code
# This is the return value I want to pass back to deferredChain function (called at line E)
return response.code # line --> D
#classmethod
def deferredChain(self):
d = defer.Deferred()
d.addCallback(Test.getRequest) # line --> E
d.callback("success")
return d.result # line --> F
if __name__ == '__main__':
tst = Test()
rtn = tst.deferredChain()
print "RTN: %s " % rtn

You're using Twisted Agent which requires running reactor to work properly, see linked examples in docs. Your code sample will work just fine if you start Twisted reactor.
if __name__ == '__main__':
tst = Test()
rtn = tst.deferredChain()
reactor.run()
print "RTN: %s " % rtn
Twisted Treq is an interesting framework built on top of agents, it promises to give you python-requests like API for Twisted async HTTP client.

You are calling tst.deferredChain() synchronously and trying to read d.result within it and this is not correct. The correct solution is letting it return a deferred as well and attach to it a callback.

How to write functions to be called from within a #tornado.testing.gen_test?

So I've got some code I want to test, and I'm encountering what looks like a pretty horrible side effect of the yield generator-based nature of #tornado.testing.gen_test's expected input tests:
class GameTest(tornado.testing.AsyncHTTPTestCase):
def new_game(self):
ws = yield websocket_connect('address')
ws.write_message('new_game')
response = yield ws.read_message()
# I want to say:
# return response
#tornado.testing.gen_test
def test_new_game(self):
response = self.new_game()
# do some testing
The problem is that I can't return a value from a generator, so my natural instinct here is wrong. Furthermore, I can't do this:
class GameTest(tornado.testing.AsyncHTTPTestCase):
def new_game(self):
ws = yield websocket_connect('address')
ws.write_message('new_game')
response = yield ws.read_message()
yield response, True
#tornado.testing.gen_test
def test_new_game(self):
for i in self.new_game():
if isinstance(i, tuple):
response, success = i
break
# do some testing
Because then I encounter the error:
AttributeError: 'NoneType' object has no attribute 'write_message'
Obviously, I can include the entire test generation code in the test, but that's really ugly, hard to maintain, etc. Does this testing pattern really make indirection so difficult?

You should use #gen.coroutine on asynchronous functions to be called by #gen_test methods, just like in non-test code. #gen_test is an adapter for your top-level test function that makes it possible to use asynchronous code in the synchronous unittest interface.
#gen.coroutine
def new_game(self):
ws = yield websocket_connect('address')
ws.write_message('new_game')
response = yield ws.read_message()
raise gen.Return(response)
#tornado.testing.gen_test
def test_new_game(self):
response = yield self.new_game()
# do some testing
In Python 3.3+, you can use return response instead of raise gen.Return(response). You can even omit the #gen.coroutine if you use yield from at the call site.

How to make spydlay module to work like httplib/http.client?

I have to test server based on Jetty. This server can work with its own protocol, HTTP, HTTPS and lastly it started to support SPDY. I have some stress tests which are based on httplib /http.client -- each thread start with similar URL (some data in query string are variable), adds execution time to global variable and every few seconds shows some statistics. Code looks like:
t_start = time.time()
connection.request("GET", path)
resp = connection.getresponse()
t_stop = time.time()
check_response(resp)
QRY_TIMES.append(t_stop - t_start)
Client working with native protocol shares httplib API, so connection may be native, HTTPConnection or HTTPSConnection.
Now I want to add SPDY test using spdylay module. But its interface is opaque and I don't know how to change its opaqueness into something similar to httplib interface. I have made test client based on example but while 2nd argument to spdylay.urlfetch() is class name and not object I do not know how to use it with my tests. I have already add tests to on_close() method of my class which extends spdylay.BaseSPDYStreamHandler, but it is not compatibile with other tests. If it was instance I would use it outside of spdylay.urlfetch() call.
How can I use spydlay in a code that works based on httplib interfaces?

My only idea is to use global dictionary where url is a key and handler object is a value. It is not ideal because:
new queries with the same url will overwrite previous response
it is easy to forget to free handler from global dictionary
But it works!
import sys
import spdylay
CLIENT_RESULTS = {}
class MyStreamHandler(spdylay.BaseSPDYStreamHandler):
def __init__(self, url, fetcher):
super().__init__(url, fetcher)
self.headers = []
self.whole_data = []
def on_header(self, nv):
self.headers.append(nv)
def on_data(self, data):
self.whole_data.append(data)
def get_response(self, charset='UTF8'):
return (b''.join(self.whole_data)).decode(charset)
def on_close(self, status_code):
CLIENT_RESULTS[self.url] = self
def spdy_simply_get(url):
spdylay.urlfetch(url, MyStreamHandler)
data_handler = CLIENT_RESULTS[url]
result = data_handler.get_response()
del CLIENT_RESULTS[url]
return result
if __name__ == '__main__':
if '--test' in sys.argv:
spdy_response = spdy_simply_get('https://localhost:8443/test_spdy/get_ver_xml.hdb')
I hope somebody can do spdy_simply_get(url) better.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Tornado Async Fetching of URLs - python

Coroutines return futures. To wait for the coroutine to complete, the caller must also be a coroutine, and must yield the future. So: #gen.coroutine def get(self): x = yield do_async_thing() For more info see Refactoring Tornado Coroutines.

Related

Persistent List Python Implementation

Get Deferred from dbpool.runQuery instead of data with Twisted and Oracle

How to do a HTTP Get request with a deferred inside another (cascade defer)?

How to write functions to be called from within a #tornado.testing.gen_test?

How to make spydlay module to work like httplib/http.client?

Categories

Resources