Understanding Asynchronous IO: vs asynchronous programming

Understanding Asynchronous IO: vs asynchronous programming - python

I'm having a difficult time understanding asynchronous IO so I hope to clear up some of my misunderstanding because the word "asynchronous" seems to be thrown in a lot. If it matters, my goal is to get into twisted python but I want a general understanding of the underlying concepts.
What exactly is asynchronous programming? Is it programming with a language and OS that support Asynchronous IO? Or is it something more general? In other words, is asynchronous IO a separate concept from asynchronous programming?

Asynchronous IO means the application isn't blocked when your computer is waiting for something. The definition of waiting here is not processing. Waiting for a webserver? Waiting for a network connection? Waiting for a hard drive to respond with data on a platter? All of this is IO.
Normally, you write this in a very simple fashion synchronously:
let file = fs.readFileSync('file');
console.log(`got file ${file}`);
This will block, and nothing will happen until readFileSync returns with what you asked for. Alternatively, you can do this asynchronously which won't block. This compiles totally differently. Under the hood it may be using interrupts. It may be polling handles with select statements. It typically uses a different binding to a low level library, such as libc. That's all you need to know. That'll get your feet wet. Here is what it looks like to us,
fs.readFile(
'file',
function (file) {console.log(`got file ${file}`)}
);
In this you're providing a "callback". That function will request the file immediately, and when it (the function you called, here fs.readFile) gets the file back it will call your callback (here that's a function that takes a single argument file.
There are difficulties writing things asynchronously:
Creates pyramid code if using callbacks.
Errors can be harder to pinpoint.
Garbage collection isn't always as clean.
Performance overhead, and memory overhead.
Can create hard to debug situations if mixed with synchronous code.
All of that is the art of asynchronous programming..

Related

What's the best way to use IB / TWS in algortihmic trading for Python?

I was wondering what is the most efficient way, from a performance perspective, to use the TWS/IB API in Python? I want to compute and update my strategies based on real-time data (Python has a lot of libraries that may be helpful in contrast to Java I think) and based on that send buy/sell orders. These strategies computations may involve quite some processing time, so in that sense, I was thinking about implementing some sort of threading/concurrency (for Java it uses 3 threads if I understand correctly, see *1).
I know there is IBpy (I think it is the same only wrapped up some things for convenience). I came accross IB-insync as an alternative to threading in Python due to Python's concurrency limitations, if I understand correctly:
https://ib-insync.readthedocs.io/api.html
which implements the IB API asynchronously and single-threaded.
Reading about concurrency in Python here:
https://realpython.com/python-concurrency/
async has some major advantages if I understand correctly since Python was designed using Global Interpreter Lock (GIL) (only one thread to hold the control of the Python interpreter). However, the IB-insync library may have some limitations too (but can be fixed by adapting code, as suggested below):
If, for example, the user code spends much time in a calculation, or
uses time.sleep() with a long delay, the framework will stop spinning,
messages accumulate and things may go awry
If a user operation takes a long time then it can be farmed out to a
different process. Alternatively the operation can be made such that
it periodically calls IB.sleep(0); This will let the framework handle
any pending work and return when finished. The operation should be
aware that the current state may have been updated during the sleep(0)
call.
For introducing a delay, never use time.sleep() but use sleep()
instead.
Would a multi-threading solution be better just like Java (I do not know if there is a Java Async equivalent which can be combined with a lot of easy tools/libs that manipulate data)? Or should I stick to Python Async? Other suggestions are welcome, too. With regard to multiple threads in Python (and Java), the following site:
https://interactivebrokers.github.io/tws-api/connection.html
mentions (*1):
API programs always have at least two threads of execution. One thread
is used for sending messages to TWS, and another thread is used for
reading returned messages. The second thread uses the API EReader
class to read from the socket and add messages to a queue. Everytime a
new message is added to the message queue, a notification flag is
triggered to let other threads now that there is a message waiting to
be processed. In the two-thread design of an API program, the message
queue is also processed by the first thread. In a three-thread design,
an additional thread is created to perform this task.
The phrase "The two-threaded design is used in the IB Python sample Program.py..." suggests that there are already two threads involved, which is a little bit confusion to me since the second reference mentions Python being single-threaded.

Python is not technically single-threaded, you can create multiple threads in Python, but there is GIL, which only allows one thread to run at a time, that is why it is sometimes said single-threaded ! But, GIL handles it so efficiently that it doesn't seem single threaded ! I have used multi-threading in Python and it is good . GIL handles all the orchestration of switching and swapping threads, but this proves to be a significance to single-threaded programs as a small speed boost, and a bit slow in multi-threaded programs .
I am also searching for a multi-threaded SDK for IB API ! I have not found one yet, except the Native one, which is a bit complicated for me .
And IB_Insync is not allowing for multi-threading :(
Btw, I am new to Stack Overflow, so don't mind me ...

How to read a file as it is being written in real time with Python 3 and asyncio, like "tail -f"

I want to write a Python program on Linux that reads a log file in real time as it is being written, for the purpose of sending an alarm if it detects certain things in the log. I want this to use asyncio for several reasons - I'm trying to build a framework that does many things at the same time based on asyncio, and I need the practice.
Since I'm using asyncio, I obviously don't want to use a blocking read to wait at the end of the input file for more lines to be written to it. I suspect I'll have to end up using select, but I'm not sure.
I suspect that this is pretty simple, but I have a hard time finding an example of how to do this, or coming up with one of my own even though I've dabbled a little bit in asyncio before. I can read and mostly understand other asyncio examples I find, but for some reason I find it difficult to write asyncio code of my own.
Therefore, I'd be very grateful if someone could point me to an example. Bonus points if the same technique also works for reading from stdin rather than a file.

I suspect I'll have to end up using select, but I'm not sure. I suspect that this is pretty simple, but I have a hard time finding an example of how to do this
With asyncio, the idea is that you don't need to select() yourself because asyncio selects for you - after all, a select() or equivalent is at the heart of every event loop. Asyncio provides abstractions like streams that implement a coroutine facade over the async programming model. There are also the lower-level methods that allow you to hook into select() yourself, but normally you should work with streams.
In case of tail -f, you can't use select() because regular files are always readable. When there is no data, you get an EOF and are expected to try again later. This is why tail -f historically used reads with pauses, with the option to deploy notification APIs like inotify where available.

How do I create an asynchronous socket in Python?

I've created a socket object for Telnet communication, and I'm using it to communicate with an API, sending and receiving data. I need to configure it in such a way that I can send and receive data at the same time. By that, I mean data should be sent as soon as the application tries to send it, and data should be processed immediately on receipt. Currently, I have a configuration which allows receipt to be instant, and sending to be second priority with a very short delay.
Currently the best way I have found to do this is by having an event queue, and pushing data to send into it, then having a response queue into which I put messages from the server. I have a thread which polls the buffer every .1 seconds to check for new data, if there isn't any, it then checks the request queue and processes anything there, and that's running in a continuous loop. I then have threads insert data into the request queue, and read data from the response queue. Everything is just about linear enough that this works fine.
This is not "asynchronous", in a sense that I've had to make it as asynchronous as possible without actually achieving it. Is there a proper way to do this? Or is anything under the hood going to be doing exactly the same as I am?
Other things I have investigated as a solution to this problem:
A callback system, where I might call socket.on_receipt(handle_message, args) to call the method handle_message with args as a parameter, passing the received data into the method. The only way I could find to achieve this is by implementing what I already have, then registering a callback for it (in fact, this is very close to what I do already have).
Please note: I am approaching this as a learning exercise to understand better how asynchronous systems work, not to understand how to use a particular library, so please do not suggest an existing library unless it contains very clear code which is simple to understand and answers the question fully and concisely.

This seems like a pretty straightforward use case for asyncio. I wouldn't consider using asyncio as "using a particular library" since socket programming paired with asyncio's event loop is pretty low-level and the concept is very transparent if you have experience with other languages and just want to see how async programming works in Python.
You can use this async chat as an example: https://gist.github.com/gregvish/7665915
Essentially, you create a non-blocking socket, see standard library reference on socket.setblocking(0):
https://docs.python.org/3/library/socket.html#socket.socket.setblocking
I'd also suggest this amazing session by David Beazley as a must-see for async Python programming. He explains the concurrency concepts in Python using sockets, exactly what you need: https://www.youtube.com/watch?v=MCs5OvhV9S4

How can I do asynchronous programming but hide it in Python?

Am just getting my head round Twisted, threading, stackless, etc. etc. and would appreciate some high level advice.
Suppose I have remote clients 1 and 2, connected via a websocket running in a page on their browsers. Here is the ideal goal:
for cl in (1,2):
guess[cl] = show(cl, choice("Pick a number:", range(1,11)))
checkpoint()
if guess[1] == guess[2]:
show((1,2), display("You picked the same number!"))
Ignoring the mechanics of show, choice and display, the point is that I want the show call to be asynchronous. Each client gets shown the choice. The code waits at checkpoint() for all the threads (or whatever) to rejoin.
I would be interested in hearing answers even if they involve hairy things like rewriting the source code. I'd also be interested in less hairy answers which involve compromising a bit on the syntax.

The most simple solution code wise is to use a framework like Autobahn which support remote procdure calls (RPC). That means you can call some JavaScript in the browser and wait for the result.
If you want to call two clients, you will have to use threads.
You can also do it manually. The approach works along these lines:
You need to pass a callback to show().
show() needs to register the callback with some kind of string ID in a global dict
show() must send this ID to the client
When the client sends the answer, it must include the ID.
The Python handler can then remove the callback from the global dict and invoke it with the answer
The callback needs to collect the results.
When it has enough results (two in your case), it must send status updates to the client.
You can simplify the code using yield but the theory behind is a bit complex to understand: What does the "yield" keyword do in Python? and coroutines

In Python, the most widely-used approach to async/event-based network programming that hides that model from the programmer is probably gevent.
Beware: this kind of trickery works by making tasks yield control implicitly, which encourages the same sorts of surprising bugs that tend to appear when OS threads are involved. Local reasoning about such problems is significantly harder than with explicit yielding, and the convenience of avoiding callbacks might not be worth the trouble introduced by the inherent pitfalls. Perhaps just as important to a library author like yourself: this approach is not pure Python, and would force dependencies and interpreter restrictions on the users of your library.
A lot of discussion about this topic sprouted up (especially between the gevent and twisted camps) while Guido was working on the asyncio library, which was called tulip at the time. He summarized the main issues here.

Threads vs Asynchronous Networking (Twisted) Python

I am writing an implementation of a NAT. My algorithm is as follows:
Packet comes in
Check against lookup table if external, add to lookup table if internal
Swap the source address and send the packet on its way
I have been reading about Twisted. I was curious if Twisted takes advantage of multicore CPUs? Assume the system has thousands of users and one packet comes right after the other. With twisted can the lookup table operations be taking place at the same time on each core. I hear with threads the GIL will not allow this anyway. Perhaps I could benifit from multiprocessing>
Nginix is asynchronous and happily serves thousands of users at the same time.

Using threads with twisted is discouraged. It has very good performance when used asynchronously, but the code you write for the request handlers must not block. So if your handler is a pretty big piece of code, break it up into smaller parts and utilize twisted's famous Deferreds to attach the other parts via callbacks. It certainly requires a somewhat different thinking than most programmers are used to, but it has benefits. If the code has blocking parts, like database operations, or accessing other resources via network to get some result, try finding asynchronous libraries for those tasks too, so you can use Deferreds in those cases also. If you can't use asynchronous libraries you may finally use the deferToThread function, which will run the function you want to call in a different thread and return a Deferred for it, and fire your callback when finished, but it's better to use that as a last resort, if nothing else can be done.
Here is the official tutorial for Deferreds:
http://twistedmatrix.com/documents/10.1.0/core/howto/deferredindepth.html
And another nice guide, which can help to get used to think in "async mode":
http://ezyang.com/twisted/defer2.html

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.