Running twisted reactor in iPython - python

I'm aware this is normally done with twistd, but I'm wanting to use iPython to test out code 'live' on twisted code.
How to start twisted's reactor from ipython asked basically the same thing but the first solution no longer works with current ipython/twisted, while the second is also unusable (thread raises multiple errors).
https://gist.github.com/kived/8721434 has something called TPython which purports to do this, but running that seems to work except clients never connect to the server (while running the same clients works in the python shell).
Do I have to use Conch Manhole, or is there a way to get iPython to play nice (probably with _threadedselect).
For reference, I'm asking using ipython 5.0.0, python 2.7.12, twisted 16.4.1

Async code in general can be troublesome to run in a live interpreter. It's best just to run an async script in the background and do your iPython stuff in a separate interpreter. You can intercommunicate using files or TCP. If this went over your head, that's because it's not always simple and it might be best to avoid the hassle of possible.
However, you'll be happy to know there is an awesome project called crochet for using Twisted in non-async applications. It truly is one of my favorite modules and I'm shocked that it's not more widely used (you can change that ;D though). The crochet module has a run_in_reactor decorator that runs a Twisted reactor in a separate thread managed by crochet itself. Here is a quick class example that executes requests to a Star Wars RESTFul API, then stores the JSON response in a list.
from __future__ import print_function
import json
from twisted.internet import defer, task
from twisted.web.client import getPage
from crochet import run_in_reactor, setup as setup_crochet
setup_crochet()
class StarWarsPeople(object):
people_id = [_id for _id in range(1, 89)]
people = []
#run_in_reactor
def requestPeople(self):
"""
Request Star Wars JSON data from the SWAPI site.
This occurs in a Twisted reactor in a separate thread.
"""
for _id in self.people_id:
url = 'http://swapi.co/api/people/{0}'.format(_id).encode('utf-8')
d = getPage(url)
d.addCallback(self.appendJSON)
def appendJSON(self, response):
"""
A callback which will take the response from the getPage() request,
convert it to JSON, then append it to self.people, which can be
accessed outside of the crochet thread.
"""
response_json = json.loads(response.decode('utf-8'))
#print(response_json) # uncomment if you want to see output
self.people.append(response_json)
Save this in a file (example: swapi.py), open iPython, import the newly created module, then run a quick test like so:
from swapi import StarWarsPeople
testing = StarWarsPeople()
testing.requestPeople()
from time import sleep
for x in range(5):
print(len(testing.people))
sleep(2)
As you can see it runs in the background and stuff can still occur in the main thread. You can continue using the iPython interpreter as you usually do. You can even have a manhole running in the background for some cool hacking too!
References
https://crochet.readthedocs.io/en/1.5.0/introduction.html#crochet-use-twisted-anywhere

While this doesn't answer the question I thought I had, it does answer (sort of) the question I posted. Embedding ipython works in the sense that you get access to business objects with the reactor running.
from twisted.internet import reactor
from twisted.internet.endpoints import serverFromString
from myfactory import MyFactory
class MyClass(object):
def __init__(self, **kwargs):
super(MyClass, self).__init__(**kwargs)
server = serverFromString(reactor, 'tcp:12345')
server.list(MyFactory(self))
def interact():
import IPython
IPython.embed()
reactor.callInThread(interact)
if __name__ == "__main__":
myclass = MyClass()
reactor.run()
Call the above with python myclass.py or similar.

Related

How does gevent ensure that the same thread-local variables are not shared between multiple coroutines

I have a Python 2 django project, which was started with gunicorn, and write a lot of threading.currentThread().xxxxxx ='some value' in the code.
Because the coroutine reuses the same thread, I am curious how gevent guarantees that the currentThread variable created in coroutine A(Thread 1) will not affect coroutine B (same Thread 1).
After all, the writing on the code is:
import threading
threading.currentThread().xxxxx ='ABCD'
Instead of
import gevent
gevent.currentCoroutine().xxxxx ='ABCD' (simulate my guess)
thanks for your help
It doesn't as far as I'm aware. Normal Gevent coroutines run in the same thread - if you modify something on that thread in one coroutine, it will be modified in the other coroutine as well.
If this is a question about gunicorn, that's a different matter and the following answer has some great detail on that - https://stackoverflow.com/a/41696500/7970018.
You should create threading.localin mainThread.
After monkey patching, Gevent patched gevent.local = threading.local so you can save data in current via:
import threading
threadlocal = threading.local()
def func_in_thread():
# set data
setattr(threadlocal, "_key", "_value")
# do something
# do something
getattr(threadlocal, "_key", None)

How do you understand the ioloop in tornado?

I am looking for a way to understand ioloop in tornado, since I read the official doc several times, but can't understand it. Specifically, why it exists.
from tornado.concurrent import Future
from tornado.httpclient import AsyncHTTPClient
from tornado.ioloop import IOLoop
def async_fetch_future():
http_client = AsyncHTTPClient()
future = Future()
fetch_future = http_client.fetch(
"http://mock.kite.com/text")
fetch_future.add_done_callback(
lambda f: future.set_result(f.result()))
return future
response = IOLoop.current().run_sync(async_fetch_future)
# why get current IO of this thread? display IO, hard drive IO, or network IO?
print response.body
I know what is IO, input and output, e.g. read a hard drive, display graph on the screen, get keyboard input.
by definition, IOLoop.current() returns the current io loop of this thread.
There are many IO device on my laptop running this python code. Which IO does this IOLoop.current() return? I never heard of IO loop in javascript nodejs.
Furthermore, why do I care this low level thing if I just want to do a database query, read a file?
I never heard of IO loop in javascript nodejs.
In node.js, the equivalent concept is the event loop. The node event loop is mostly invisible because all programs use it - it's what's running in between your callbacks.
In Python, most programs don't use an event loop, so when you want one, you have to run it yourself. This can be a Tornado IOLoop, a Twisted Reactor, or an asyncio event loop (all of these are specific types of event loops).
Tornado's IOLoop is perhaps confusingly named - it doesn't do any IO directly. Instead, it coordinates all the different IO (mainly network IO) that may be happening in the program. It may help you to think of it as an "event loop" or "callback runner".
Rather to say it is IOLoop, maybe EventLoop is clearer for you to understand.
IOLoop.current() doesn't really return an IO device but just a pure python event loop which is basically the same as asyncio.get_event_loop() or the underlying event loop in nodejs.
The reason why you need event loop to just do a database query is that you are using event-driven structure to do databse query(In your example, you are doing http request).
Most of time you do not need to care about this low level structure. Instead you just need to use async&await keywords.
Let's say there is a lib which supports asynchronous database access:
async def get_user(user_id):
user = await async_cursor.execute("select * from user where user_id = %s" % user_id)
return user
Then you just need to use this function in your handler:
class YourHandler(tornado.web.RequestHandler):
async def get():
user = await get_user(self.get_cookie("user_id"))
if user is None:
return self.finish("No such user")
return self.finish("Your are %s" % user.user_name)

Why gevent can speed up requests to download?

I think requests.get should be block, so there should be no difference between run and run2.
import sys
import gevent
import requests
from gevent import monkey
monkey.patch_all()
def download():
requests.get('http://www.baidu.com').status_code
def run():
ls = [gevent.spawn(download) for i in range(100)]
gevent.joinall(ls)
def run2():
for i in range(100):
download()
if __name__ == '__main__':
from timeit import Timer
t = Timer(stmt="run();", setup="from __main__ import run")
print('good', t.timeit(3))
t = Timer(stmt="run2();", setup="from __main__ import run2")
print('bad', t.timeit(3))
sys.exit(0)
but result is:
good 5.006664161000117
bad 29.077525214999696
so are there all kind read, write could be speed up by gevent?
PS: I run it on mac/python3/requests 2.10.0/gevent 1.1.2
From the gevent website:
Fast event loop based on libev (epoll on Linux, kqueue on FreeBSD).
Lightweight execution units based on greenlet.
API that re-uses concepts from the Python standard library (for example there are gevent.event.Events and gevent.queue.Queues).
Cooperative sockets with SSL support
DNS queries performed through threadpool or c-ares.
Monkey patching utility to get 3rd party modules to become cooperative
Basically, just for looping a bunch of requests.get() calls is slow due to the fact that you're, well, for looping through a bunch of requests.get() calls. Gevent'ing a bunch of requests.get() calls isn't slow due to the fact that you're throwing those calls into a threaded Queue instead, which then uses gevent's powerful API to run through those calls incredibly efficiently.

Profiling an application that uses reactors/websockets and threads

Hi I wrote a Python program that should run unattended. What it basically does is fetching some data via http get requests in a couple of threads and fetching data via websockets and the autobahn framework. Running it for 2 days shows me that it has a growing memory demand and even stops without any notice.
The documentation says I have to run the reactor as last line of code in the app.
I read that yappi is capable of profiling threaded applications
Here is some pseudo code
from autobahn.twisted.websocket import WebSocketClientFactory,connectWS
if __name__ == "__main__":
#setting up a thread
#start the thread
Consumer.start()
xfactory = WebSocketClientFactory("wss://url")
cex_factory.protocol = socket
## SSL client context: default
##
if factory.isSecure:
contextFactory = ssl.ClientContextFactory()
else:
contextFactory = None
connectWS(xfactory, contextFactory)
reactor.run()
The example from the yappi project site is the following:
import yappi
def a():
for i in range(10000000): pass
yappi.start()
a()
yappi.get_func_stats().print_all()
yappi.get_thread_stats().print_all()
So I could put yappi.start() at the beginning and yappi.get_func_stats().print_all() plus yappi.get_thread_stats().print_all() after reactor.run() but since this code is never executed I will never get it executed.
So how do I profile a program like that ?
Regards
It's possible to use twistd profilers by the following way:
twistd -n --profile=profiling_results.txt --savestats --profiler=hotshot your_app
hotshot is a default profiler, you are also able to use cprofile.
Or you can run twistd from your python script by means of:
from twistd.scripts import run
run()
And add necessary parameters to script by sys.argv[1:1] = ["--profile=profiling_results.txt", ...]
After all you can convert hotshot format to calltree by means of:
hot2shot2calltree profiling_results.txt > calltree_profiling
And open generated calltree_profiling file:
kcachegrind calltree_profiling
There is a project for profiling of asynchronous execution time twisted-theseus
You can also try tool of pycharm: thread concurrency
There is a related question here sof
You can also run your function by:
reactor.callWhenRunning(your_function, *parameters_list)
Or by reactor.addSystemEventTrigger() with event description and your profiling function call.

Adapting celery.task.http.URL for tornado

Celery include a module that is able to make asynchronous HTTP requests using amqp or some other celery backend. I am using tornado-celery producer for asynchronous message publishing. As I understood tornado-celery uses pika for this. The question is how to adapt celery.task.http.URL for tornado (make it non-blocking). There are basically two places, which have to be refined:
HttpDispatch.make_request() have to be implemented using tornado async http client;
URL.get_async(**kw) or URL.post_async(**kw) must be reimplemented with corresponding non-blocking code using tornado API. For instance:
class NonBlockingURL(celery.task.http.URL):
#gen.coroutine
def post_async(self, **kwargs):
async_res = yield gen.Task(self.dispatcher.delay,
str(self), 'POST', **kwargs)
raise gen.Return(async_res)
But I could not understand how to do it in proper and concise way. How to make it fully as non-blocking as asynchronous ? By the way, I am using amqp backend.
Please, provide me nice guideline or even better, an example.
In fact, you have to decide if you use the async method of Tornado or if you use a queue like cellery. There is not point of using both, because the queue answers rapidly about the status of the queue, so there is no point of tornado doing something else while waiting for the queue to respond. To decide between the two solution, i would say:
Celery: more modulary, easy to distribute to different core or different machines, the task can be use by others than tornado, you have to install and keep running softare (amqp,cellery workers...)
Async in Tornado:more monolithic, one program do everything, shorter code, one program to run
To use the async method of Tornado, refer to the documentation.
Here is a short solution using celery and tornado together:
task.py
from celery import Celery,current_task
import time
celery=Celery('tasks',backend='amqp',result_backend='amqp')
#celery.task
def MyTask(url,resid):
for i in range(10):
time.sleep(1)
current_task.update_state(state='running',meta={'i': i})
return 'done'
server.py
import tasks
from tornado import RequestHandler,....
from tornado.web import Application
dictasks={}
class runtask(RequestHandler):
def post(self):
i=len(dictasks)
dictasks[i]=task.MyTask.delay()
self.write(i)
class chktask(RequestHandler):
def get(self,i):
i=int(i)
if dictasks[i].ready():
self.write(dictasks[i].result)
del dictasks[i]
else:
self.write(dictasks[i].state + ' i:' + dictasks[i].info.get('i',-1))
Application = Application([
(r"/runtask", runtask}),
(r"/chktask/([0-9]+)", chktask),
etc.

Categories