Could someone explain to me python-twisted like I'm five? - python

I've read the docs. I've played with examples. But I'm still unable to grasp what exactly asynchronous means when it is useful and where is the magic lots of people seem so crazy about?
If it is only for avoiding to wait for I/O when why to simple run in threads? Why does Deferred needed?
I think I'm missing some fundamental knowledge about computing so those questions. If so what is it?

like you're five... ok: threads bad, async good!
now, seriously: threads incur overhead - both in locking and switching of the interpreter, and in memory consumption and code complexity. when your program is IO bound, and does a lot of waiting for other services (APIs, databases) to return a response, you're basically waiting on idle, and wasting resources.
the point of async IO is to mitigate the overhead of threads while keeping the concurrency, and keeping your program simple, avoiding deadlocks and reducing complexity.
think for example about a chat server. you have thousands of connections on the server, and you want some people to receive some messages based on which room they are. doing this with threads will be much more complicated than doing this the async way.
re deferred - it's just a way of simplifying your code, instead of giving every function a callback to return to when the operation it's waiting for is ready.
another note - if you want a much simpler and elegant async IO framework, try tornado, which is basically an async web server, with async http client and a replacement for deferred. it's very nicely written and can be used as a general purpose async IO framework.
see http://tornadoweb.org

Related

What's the best way to use IB / TWS in algortihmic trading for Python?

I was wondering what is the most efficient way, from a performance perspective, to use the TWS/IB API in Python? I want to compute and update my strategies based on real-time data (Python has a lot of libraries that may be helpful in contrast to Java I think) and based on that send buy/sell orders. These strategies computations may involve quite some processing time, so in that sense, I was thinking about implementing some sort of threading/concurrency (for Java it uses 3 threads if I understand correctly, see *1).
I know there is IBpy (I think it is the same only wrapped up some things for convenience). I came accross IB-insync as an alternative to threading in Python due to Python's concurrency limitations, if I understand correctly:
https://ib-insync.readthedocs.io/api.html
which implements the IB API asynchronously and single-threaded.
Reading about concurrency in Python here:
https://realpython.com/python-concurrency/
async has some major advantages if I understand correctly since Python was designed using Global Interpreter Lock (GIL) (only one thread to hold the control of the Python interpreter). However, the IB-insync library may have some limitations too (but can be fixed by adapting code, as suggested below):
If, for example, the user code spends much time in a calculation, or
uses time.sleep() with a long delay, the framework will stop spinning,
messages accumulate and things may go awry
If a user operation takes a long time then it can be farmed out to a
different process. Alternatively the operation can be made such that
it periodically calls IB.sleep(0); This will let the framework handle
any pending work and return when finished. The operation should be
aware that the current state may have been updated during the sleep(0)
call.
For introducing a delay, never use time.sleep() but use sleep()
instead.
Would a multi-threading solution be better just like Java (I do not know if there is a Java Async equivalent which can be combined with a lot of easy tools/libs that manipulate data)? Or should I stick to Python Async? Other suggestions are welcome, too. With regard to multiple threads in Python (and Java), the following site:
https://interactivebrokers.github.io/tws-api/connection.html
mentions (*1):
API programs always have at least two threads of execution. One thread
is used for sending messages to TWS, and another thread is used for
reading returned messages. The second thread uses the API EReader
class to read from the socket and add messages to a queue. Everytime a
new message is added to the message queue, a notification flag is
triggered to let other threads now that there is a message waiting to
be processed. In the two-thread design of an API program, the message
queue is also processed by the first thread. In a three-thread design,
an additional thread is created to perform this task.
The phrase "The two-threaded design is used in the IB Python sample Program.py..." suggests that there are already two threads involved, which is a little bit confusion to me since the second reference mentions Python being single-threaded.
Python is not technically single-threaded, you can create multiple threads in Python, but there is GIL, which only allows one thread to run at a time, that is why it is sometimes said single-threaded ! But, GIL handles it so efficiently that it doesn't seem single threaded ! I have used multi-threading in Python and it is good . GIL handles all the orchestration of switching and swapping threads, but this proves to be a significance to single-threaded programs as a small speed boost, and a bit slow in multi-threaded programs .
I am also searching for a multi-threaded SDK for IB API ! I have not found one yet, except the Native one, which is a bit complicated for me .
And IB_Insync is not allowing for multi-threading :(
Btw, I am new to Stack Overflow, so don't mind me ...

how to speed up python backend making lots of http requests

I'm trying to learn some network/backend stuff.
I now want to build an API that makes an HTTP request, does some processing, sends back a response. Not very useful, but it's for learning.
I noticed that the get request is a huge bottleneck. It is a I/O problem i think because the respones are veery small.
Now I thought I could maybe do the downloading on multiple threads. If a fictional client of mine makes a request, an URL would need to be added to a pool, then fetched (by some worker thread) und returned to the worker thread, processed, and send back. Or something like that...
I'm really not an expert and maybe nothing what I just said made any sense... but I would really appreciate a little help:)
Multiple solutions exist.
You can use threading (thread pools) or multiprocessing (multiprocessing pools) to perform multiple requests in parallel.
Or you could use libraries like asyncio (or twisted) to perform multiple requests within one thread in a way, that waiting for IO is no more the blocking point.
I suggest you look at:
https://docs.python.org/3/library/threading.html for threading
or https://docs.python.org/3/library/multiprocessing.html?highlight=multiprocessing#module-multiprocessing for multiprocessing.
Asynchronous programming is in my opinion much more difficult, but if curious look at
https://docs.python.org/3/library/asyncio.html?highlight=asyncio#module-asyncio for asyncio basics and at https://docs.aiohttp.org/en/stable/ for performing multiple http requests in 'parallel' with asyncio
Afterwards after playing a little you will probably have much pore precise questions.
Just post your code then, explain issues and you will get more help

How can I implement async requests without 3rd party packages in Python 2.7

What I want is in title. The backgroud is I have thousands of requests to send to a very slow Restful interface in the program where all 3rd party packages are not allowed to imported into, except requests.
The speed of MULTITHREADING AND MULTIPROCESSING is limited to GIL and the 4 cores computer in which the program will be run.
I know you can implement an incomplete coroutine in Python 2.7 by generator and yield key word, but how can I make it possible to do thousands of requests with the incomplete coroutine ability?
Example
url_list = ["https://www.example.com/rest?id={}".format(num) for num in range(10000)]
results = request_all(url_list) # do asynchronously
First, you're starting from an incorrect premise.
The speed of multiprocessing is not limited by the GIL at all.
The speed of multiprocessing is only limited by the number of cores for CPU-bound work, which yours is not. And async doesn't work at all for CPU-bound work, so multiprocessing would be 4x better than async, not worse.
The speed of multithreading is only limited by the GIL for CPU-bound code, which, again, yours is not.
The speed of multithreading is barely affected by the number of cores. If your code is CPU-bound, the threads mostly end up serialized on a single core. But again, async is even worse here, not better.
The reason people use async is that not that it solves any of these problems; in fact, it only makes them worse. The main advantage is that if you have a ton of workers that are doing almost no work, you can schedule a ton of waiting-around coroutines more cheaply than a ton of waiting-around threads or processes. The secondary advantage is that you can tie the selector loop to the scheduler loop and eliminate a bit of overhead coordinating them.
Second, you can't use requests with asyncio in the first place. It expects to be able to block the whole thread on socket reads. There was a project to rewrite it around an asyncio-based transport adapter, but it was abandoned unfinished.
The usual way around that is to use it in threads, e.g., with run_in_executor. But if the only thing you're doing is requests, building an event loop just to dispatch things to a thread pool executor is silly; just use the executor directly.
Third, I doubt you actually need to have thousands of requests running in parallel. Although of course the details depend on your service or your network or whatever the bottleneck is, it's almost always more efficient to have a thread pool that can run, say, 12 or 64 requests running in parallel, with the other thousands queued up behind them.
Handling thousands of concurrent connections (and therefore workers) is usually something you only have to do on a server. Occasionally you have to do it on a client that's aggregating data from a huge number of different services. But if you're just hitting a single service, there's almost never any benefit to that much concurrency.
Fourth, if you really do want a coroutine-based event loop in Python 2, by far the easiest way is to use gevent or greenlets or another such library.
Yes, they give you an event loop hidden under the covers where you can't see it, and "magic" coroutines where the yielding happens inside methods like socket.send and Thread.join instead of being explicitly visible with await or yield from, but the plus side is that they already work—and, in fact, the magic means they work with requests, which anything you build will not.
Of course you don't want to use any third-party libraries. Building something just like greenlets yourself on top of Stackless or PyPy is pretty easy; building it for CPython is a lot more work. And then you still have to do all the monkeypatching that gevent does to make libraries like sockets work like magic, or rewrite requests around explicit greenlets.
Anyway, if you really want to build an event loop on top of just plain yield, you can.
In Greg Ewing's original papers on why Python needed to add yield from, he included examples of a coroutine event loop with just yield, and a better one that uses an explicit trampoline to yield to—with a simple networking-driven example. He even wrote an automatic translator from code for the (at the time not implemented) yield from to Python 3.1.
Notice that having to bounce every yield off a trampoline makes things a lot less efficient. There's really no way around that. That's a good part of the reason we have yield from in the language.
But that's just the scheduler part with a bit of toy networking. You still need to integrate a selectors loop and then write coroutines to replace all of the socket functions you need. Consider how long asyncio took Guido to build when he knew Python inside and out and had yield from to work with… but then you can steal most of his design, so it won't be quite that bad. Still, it's going to be a lot of work.
(Oh, and you don't have selectors in Python 2. If you don't care about Windows, it's pretty easy to build the part you need out of the select module, but if you do care about Windows, it's a lot more work.)
And remember, because requests won't work with your code, you're also going to need to reimplement most of it as well. Or, maybe better, port aiohttp from asyncio to your framework.
And, in the end, I'd be willing to give you odds that the result is not going to be anywhere near as efficient as aiohttp in Python 3, or requests on top of gevent in Python 2, or just requests on top of a thread pool in either.
And, of course, you'll be the only person in the world using it. asyncio had hundreds of bugs to fix between tulip and going into the stdlib, which were only detected because dozens of early adopters (including people who are serious experts on this kind of thing) were hammering on it. And requests, aiohttp, gevent, etc. are all used by thousands of servers handling zillions of dollars worth of business, so you benefit from all of those people finding bugs and needing fixes. Whatever you build almost certainly won't be nearly as reliable as any of those solutions.
All this for something you're probably going to need to port to Python 3 anyway, since Python 2 hits end-of-life in less than a year and a half, and distros and third-party libraries are already disengaging from it. For a relevant example, requests 3.0 is going to require at least Python 3.5; if you want to stick with Python 2.7, you'll be stuck with requests 2.1 forever.

Using defer module in python to execute a bunch of tasks async

I saw this page suggesting the usage of defer module to execute a series of tasks asynchronously.
I want to use it for my Project:
Calculate the median of each list of numbers I have (Got a list, containing lists of numbers)
Get the minimum and maximum medians, of all medians.
But for the matter of fact, I did not quite understand how to use it.
I would love some explanation about defer in python, and whether you think it is the appropriate way achieving my goal (considering the Global Interpreter Lock).
Thanks in advance!
No, using asynchronous programming (cooperative routines, aka coroutines), will not help your use case. Async is great for I/O intensive workloads, or anything else that has to wait for slower, external events to fire.
Coroutines work because they give up control (yield) to other coroutines whenever they have to wait for something (usually for some I/O to take place). If they do this frequently, the event loop can alternate between loads of coroutines, often far more than what threading could achieve, with a simpler programming model (no need to lock data structures all the time).
Your use-case is not waiting for I/O however; you have a computationally heavy workload. Such workloads do not have obvious places to yield, and because they don't need wait for external events, there is no reason to do so anyway. For such a workload, use a multiprocessing model to do work in parallel on different CPU cores.
Asynchronous programming does not defeat the GIL either, but does give the event loop the opportunity to move the waiting for I/O parts to C code that can unlock the GIL and handle all that I/O processing in parallel while other Python code (in a different coroutine) can execute.
See this talk by my colleague Łukasz Langa at PyCON 2016 for a good introduction to async programming.

Understanding Asynchronous IO: vs asynchronous programming

I'm having a difficult time understanding asynchronous IO so I hope to clear up some of my misunderstanding because the word "asynchronous" seems to be thrown in a lot. If it matters, my goal is to get into twisted python but I want a general understanding of the underlying concepts.
What exactly is asynchronous programming? Is it programming with a language and OS that support Asynchronous IO? Or is it something more general? In other words, is asynchronous IO a separate concept from asynchronous programming?
Asynchronous IO means the application isn't blocked when your computer is waiting for something. The definition of waiting here is not processing. Waiting for a webserver? Waiting for a network connection? Waiting for a hard drive to respond with data on a platter? All of this is IO.
Normally, you write this in a very simple fashion synchronously:
let file = fs.readFileSync('file');
console.log(`got file ${file}`);
This will block, and nothing will happen until readFileSync returns with what you asked for. Alternatively, you can do this asynchronously which won't block. This compiles totally differently. Under the hood it may be using interrupts. It may be polling handles with select statements. It typically uses a different binding to a low level library, such as libc. That's all you need to know. That'll get your feet wet. Here is what it looks like to us,
fs.readFile(
'file',
function (file) {console.log(`got file ${file}`)}
);
In this you're providing a "callback". That function will request the file immediately, and when it (the function you called, here fs.readFile) gets the file back it will call your callback (here that's a function that takes a single argument file.
There are difficulties writing things asynchronously:
Creates pyramid code if using callbacks.
Errors can be harder to pinpoint.
Garbage collection isn't always as clean.
Performance overhead, and memory overhead.
Can create hard to debug situations if mixed with synchronous code.
All of that is the art of asynchronous programming..

Categories