(preamble: I use python-telegram-bot for running a Telegram bot which registers users' messages in Google Sheets. None of this is relevant for this question but may provide some context to understand the source of the troubles. The thing is that Google Sheets API does not allow too frequent access to the google sheets, so if many users try to write there, I need to process their requests with some delay).
I know it is considered a very bad practice to use threading module to process tasks and avoid locking by GIL. But by the nature of my task, I receive a flow of requests from users, and I would like to process them with some delay (like from 1 to 10 seconds later than they were actually received). (right now I use celery+redis to process delayed tasks but it looks like an overkill for me for such trivial thing as delayed execution, but I may be wrong).
So I wonder if I can use concurrent.futures.ProcessPoolExecutor (as it is explained for example here: https://idolstarastronomer.com/two-futures.html) or it will result in some kind of disaster promised by the most of people who warn against using threading in Python?
Here is purely hypothetical code that runs something with delay with ProcessPoolExecutor. Will it end up with a disaster under some conditions (too many delayed requests for instance)?
import concurrent.futures
import time
import random
def register_with_delay():
time.sleep(random.randint(0, 10))
print('Im in the delayed registration')
def main():
with concurrent.futures.ProcessPoolExecutor() as executor:
futures = [executor.submit(register_with_delay) for _ in range(10)]
for i in range(10):
print('Im in the main loop')
time.sleep(random.randint(0, 1))
if __name__ == '__main__':
main()
Related
I am somewhat new to both threading and multiprocessing in Python, as well as dealing with the concept of the GIL. I have a situation where I have time consuming fire and forget tasks that I need the server to run, but the server should immediately reply to the client and basically be like "okay, your thing was submitted" so that the client does not hang waiting for the thing to complete. An example of what one of the "things" might do is pull down some data from a database or two, compare that data, and then write the result to another database. The databases are remote, not locally on the same host as the server itself. Another example, is crunching some data and then sending a text as a result of that. The client does not care about the data, but someone will receive a text later with some information that is the result of the data crunching from the various dictionaries and database entries. However, there could be many such requests pouring in from many clients. The goal here is to spawn a thread, or process that essentially kills itself because we don't care at all about returning any data from it.
At a glance, my understanding is that both multiprocessing and threading can achieve similar results for this use case. My main concerns are that I can immediately launch the function to go do its own thing and return to the client quickly so it does not hang. There are many, many requests coming in simultaneously from many, many clients in this scenario. As a result, my understanding is that multiprocessing may be better, so that these tasks would not need to be executed as sequential threads because of the GIL. However, I am unsure of how to make the processes end themselves when they are done with their task rather than needing to wait for them.
An example of the problem
#route('/api/example', methods=["POST"])
def example_request(self, request):
request_data = request.get_json()
crunch_data_and_send_text(request_data) # Takes maybe 5-10 seconds, doesn't return data
return # Return to client. Would like to return to client immediately rather than waiting
Would threading or multiprocessing be better here? And how can I make the process (or thread) .join() itself effectively when it is done rather than needing to join it before I can return to the client.
I have also considered asyncio which I think would allow something that would also improve this, but the existing codebase I have inherited is so large that it is infeasible to rewrite in async for the time being, and library replacements may need to be found in that case, so it is not an option.
#Threading
from threading import Thread
#route('/api/example', methods=["POST"])
def example_request(self, request):
request_data = request.get_json()
fire_and_forget = Thread(target = crunch_data_and_send_text, args=(request_data,))
fire_and_forget.start()
return # Return to client. Would like to return to client immediately rather than waiting
# Multiprocessing
from multiprocessing import Process
#route('/api/example', methods=["POST"])
def example_request(self, request):
request_data = request.get_json()
fire_and_forget = Process(target = crunch_data_and_send_text, args=(request_data,))
fire_and_forget.start()
return # Return to client. Would like to return to client immediately rather than waiting
Which of these is better for this use case? Is there a way I can have them .join() themselves automatically when they finish rather than needing to actually sit here in the function and wait for them to complete before returning to the client?
To be clear, asyncio is unfortunately NOT an option for me.
I suggest using Advance Python Scheduler.
Instead of running your function in a thread, schedule it to run and immediately return to client.
After setting up your flask app, setup Flask-APScheduler and then schedule your function to run in the background.
from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler({
--- setup the scheduler ---
})
#route('/api/example', methods=["POST"])
def example_request(self, request):
request_data = request.get_json()
job = scheduler.add_job(crunch_data_and_send_text, 'date', run_date=datetime.utcnow())
return "The request is being processed ..."
to pass arguments to crunch_data_and_send_text you can do:
lambda: crunch_data_and_send_text(request_data)
here is the User Guide
So I've read this nice article about asynch threads in python. Tough, the last one have some troubles with the GIL and threads are not as effective as it may seems.
Luckily python incorporates Multiprocessing which are designed to be not affected by this trouble.
I'd like to understand how to implement a multiprocessing queue (with Pipe open for each process) in an async manner so it wouldn't hang a running async webserver .
I've read this topic however I'm not looking for performance but rather boxing out a big calculation that hangs my webserver. Those calculations require pictures so they might have a significant i/o exchange but in my understanding this is something that is pretty well handled by async.
All the calcs are separate from each other so they are not meant to be mixed.
I'm trying to build this in front of a ws handler.
If you hint heresy in this please let me know as well :)
This is re-sourced from a article after someone nice on #python irc hinted me on async executors, and another answer on reddit :
(2) Using ProcessPoolExecutor
“The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only picklable objects can be executed and returned.”
import asyncio
from concurrent.futures import ProcessPoolExecutor
def cpu_heavy(num):
print('entering cpu_heavy', num)
import time
time.sleep(10)
print('leaving cpu_heavy', num)
return num
async def main(loop):
print('entering main')
executor = ProcessPoolExecutor(max_workers=3)
data = await asyncio.gather(*(loop.run_in_executor(executor, cpu_heavy, num)
for num in range(3)))
print('got result', data)
print('leaving main')
loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))
And this from another nice guy on reddit ;)
in the middle of a function I'd like to be able to fire off a call to the DB (burn results) but the function keeps on running so that I do not experience an I/O bottleneck. this is NOT a web application. all is offline.
snippet for explanatory purposes:
a = list(range(100))
for i in a:
my_output = very_long_function(i)
# I'd like to_sql to run in a "fire-and-forget fashion"
function_of_choice_to_sql(my_output)
I was wondering whether I was better off with the threading library, asyncio or other tools. I was unsuccessful in this particular endeavour with none of them. i'll take any working solution.
any help?
p.s.: there will like be no problems with concurrency/locking and the like since in my case the time my function takes to compute is far larger than the time for the database to be written.
You could use a ThreadPoolExecutor, it provides a simple interface to schedule callables to a pool of worker. In particular, you might be interested in the map method:
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=4) as executor:
# Lazy map (python3 only)
outputs = map(very_long_function, range(10))
# Asynchronous map
results = executor.map(function_of_choice_to_sql, outputs)
# Wait for the results
print(list(results))
I'm afraid I'm still a bit confused (despite checking other threads) whether:
all asynchronous code is multi-threaded
all multi-threaded functions are asynchronous
My initial guess is no to both and that proper asynchronous code should be able to run in one thread - however it can be improved by adding threads for example like so:
So I constructed this toy example:
from threading import *
from queue import Queue
import time
def do_something_with_io_lag(in_work):
out = in_work
# Imagine we do some work that involves sending
# something over the internet and processing the output
# once it arrives
time.sleep(0.5) # simulate IO lag
print("Hello, bee number: ",
str(current_thread().name).replace("Thread-",""))
class WorkerBee(Thread):
def __init__(self, q):
Thread.__init__(self)
self.q = q
def run(self):
while True:
# Get some work from the queue
work_todo = self.q.get()
# This function will simiulate I/O lag
do_something_with_io_lag(work_todo)
# Remove task from the queue
self.q.task_done()
if __name__ == '__main__':
def time_me(nmbr):
number_of_worker_bees = nmbr
worktodo = ['some input for work'] * 50
# Create a queue
q = Queue()
# Fill with work
[q.put(onework) for onework in worktodo]
# Launch processes
for _ in range(number_of_worker_bees):
t = WorkerBee(q)
t.start()
# Block until queue is empty
q.join()
# Run this code in serial mode (just one worker)
%time time_me(nmbr=1)
# Wall time: 25 s
# Basically 50 requests * 0.5 seconds IO lag
# For me everything gets processed by bee number: 59
# Run this code using multi-tasking (launch 50 workers)
%time time_me(nmbr=50)
# Wall time: 507 ms
# Basically the 0.5 second IO lag + 0.07 seconds it took to launch them
# Now everything gets processed by different bees
Is it asynchronous?
To me this code does not seem asynchronous because it is Figure 3 in my example diagram. The I/O call blocks the thread (although we don't feel it because they are blocked in parallel).
However, if this is the case I am confused why requests-futures is considered asynchronous since it is a wrapper around ThreadPoolExecutor:
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
future_to_url = {executor.submit(load_url, url, 10): url for url in get_urls()}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
Can this function on just one thread?
Especially when compared to asyncio, which means it can run single-threaded
There are only two ways to have a program on a single processor do
“more than one thing at a time.” Multi-threaded programming is the
simplest and most popular way to do it, but there is another very
different technique, that lets you have nearly all the advantages of
multi-threading, without actually using multiple threads. It’s really
only practical if your program is largely I/O bound. If your program
is processor bound, then pre-emptive scheduled threads are probably
what you really need. Network servers are rarely processor bound,
however.
First of all, one note: concurrent.futures.Future is not the same as asyncio.Future. Basically it's just an abstraction - an object, that allows you to refer to job result (or exception, which is also a result) in your program after you assigned a job, but before it is completed. It's similar to assigning common function's result to some variable.
Multithreading: Regarding your example, when using multiple threads you can say that your code is "asynchronous" as several operations are performed in different threads at the same time without waiting for each other to complete, and you can see it in the timing results. And you're right, your function due to sleep is blocking, it blocks the worker thread for the specified amount of time, but when you use several threads those threads are blocked in parallel. So if you would have one job with sleep and the other one without and run multiple threads, the one without sleep would perform calculations while the other would sleep. When you use single thread, the jobs are performed in in a serial manner one after the other, so when one job sleeps the other jobs wait for it, actually they just don't exist until it's their turn. All this is pretty much proven by your time tests. The thing happened with print has to do with "thread safety", i.e. print uses standard output, which is a single shared resource. So when your multiple threads tried to print at the same time the switching happened inside and you got your strange output. (This also show "asynchronicity" of your multithreaded example.) To prevent such errors there are locking mechanisms, e.g. locks, semaphores, etc.
Asyncio: To better understand the purpose note the "IO" part, it's not 'async computation', but 'async input/output'. When talking about asyncio you usually don't think about threads at first. Asyncio is about event loop and generators (coroutines). The event loop is the arbiter, that governs the execution of coroutines (and their callbacks), that were registered to the loop. Coroutines are implemented as generators, i.e. functions that allow to perform some actions iteratively, saving state at each iteration and 'returning', and on the next call continuing with the saved state. So basically the event loop is while True: loop, that calls all coroutines/generators, assigned to it, one after another, and they provide result or no-result on each such call - this provides possibility for "asynchronicity". (A simplification, as there's scheduling mechanisms, that optimize this behavior.) The event loop in this situation can run in single thread and if coroutines are non-blocking it will give you true "asynchronicity", but if they are blocking then it's basically a linear execution.
You can achieve the same thing with explicit multithreading, but threads are costly - they require memory to be assigned, switching them takes time, etc. On the other hand asyncio API allows you to abstract from actual implementation and just consider your jobs to be performed asynchronously. It's implementation may be different, it includes calling the OS API and the OS decides what to do, e.g. DMA, additional threads, some specific microcontroller use, etc. The thing is it works well for IO due to lower level mechanisms, hardware stuff. On the other hand, performing computation will require explicit breaking of computation algorithm into pieces to use as asyncio coroutine, so a separate thread might be a better decision, as you can launch the whole computation as one there. (I'm not talking about algorithms that are special to parallel computing). But asyncio event loop might be explicitly set to use separate threads for coroutines, so this will be asyncio with multithreading.
Regarding your example, if you'll implement your function with sleep as asyncio coroutine, shedule and run 50 of them single threaded, you'll get time similar to the first time test, i.e. around 25s, as it is blocking. If you will change it to something like yield from [asyncio.sleep][3](0.5) (which is a coroutine itself), shedule and run 50 of them single threaded, it will be called asynchronously. So while one coroutine will sleep the other will be started, and so on. The jobs will complete in time similar to your second multithreaded test, i.e. close to 0.5s. If you will add print here you'll get good output as it will be used by single thread in serial manner, but the output might be in different order then the order of coroutine assignment to the loop, as coroutines could be run in different order. If you will use multiple threads, then the result will obviously be close to the last one anyway.
Simplification: The difference in multythreading and asyncio is in blocking/non-blocking, so basicly blocking multithreading will somewhat come close to non-blocking asyncio, but there're a lot of differences.
Multithreading for computations (i.e. CPU bound code)
Asyncio for input/output (i.e. I/O bound code)
Regarding your original statement:
all asynchronous code is multi-threaded
all multi-threaded functions are asynchronous
I hope that I was able to show, that:
asynchronous code might be both single threaded and multi-threaded
all multi-threaded functions could be called "asynchronous"
I think the main confusion comes from the meaning of asynchronous. From the Free Online Dictionary of Computing, "A process [...] whose execution can proceed independently" is asynchronous. Now, apply that to what your bees do:
Retrieve an item from the queue. Only one at a time can do that, while the order in which they get an item is undefined. I wouldn't call that asynchronous.
Sleep. Each bee does so independently of all others, i.e. the sleep duration runs on all, otherwise the time wouldn't go down with multiple bees. I'd call that asynchronous.
Call print(). While the calls are independent, at some point the data is funneled into the same output target, and at that point a sequence is enforced. I wouldn't call that asynchronous. Note however that the two arguments to print() and also the trailing newline are handled independently, which is why they can be interleaved.
Lastly, the call to q.join(). Here of course the calling thread is blocked until the queue is empty, so some kind of synchronization is enforced and wanted. I don't see why this "seems to break" for you.
I'm trying to implement a Python app that uses async functions to receive and emit messages using NATS, using a client based on Tornado. Once a message is received, a blocking function must be called, that I'm trying to implement on a separate thread, to allow the reception and publication of messages to put messages in a Tornado queue for later processing of the blocking function.
I'm very new to Tornado (and to python multithreading), but after reading several times the Tornado documentation and other sources, I've been able to put up a working version of the code, that looks like this:
import tornado.gen
import tornado.ioloop
from tornado.queues import Queue
from concurrent.futures import ThreadPoolExecutor
from nats.io.client import Client as NATS
messageQueue = Queue()
nc = NATS()
#tornado.gen.coroutine
def consumer():
def processMessage(currentMessage):
# process the message ...
while True:
currentMessage = yield messageQueue.get()
try:
# execute the call in a separate thread to prevent blocking the queue
EXECUTOR.submit(processMessage, currentMessage)
finally:
messageQueue.task_done()
#tornado.gen.coroutine
def producer():
#tornado.gen.coroutine
def enqueueMessage(currentMessage):
yield messageQueue.put(currentMessage)
yield nc.subscribe("new_event", "", enqueueMessage)
#tornado.gen.coroutine
def main():
tornado.ioloop.IOLoop.current().spawn_callback(consumer)
yield producer()
if __name__ == '__main__':
main()
tornado.ioloop.IOLoop.current().start()
My questions are:
1) Is this the correct way of using Tornado to call a blocking function?
2) What's the best practice for implementing a consumer/producer scheme that is always listening? I'm afraid my while True: statement is actually blocking the processor...
3) How can I inspect the Queue to make sure a burst of calls is being enqueued? I've tried using Queue().qSize(), but it always returns zero, which makes me wonder if the enqueuing is done correctly or not.
General rule (credits to NYKevin) is:
multiprocessing for CPU- and GPU-bound computations.
Event-driven stuff for non-blocking I/O (which should be preferred over blocking I/O where possible, since it scales much more effectively).
Threads for blocking I/O (you can also use multiprocessing, but the per-process overhead probably isn't worth it).
ThreadPoolExecutor for IO, ProcessPoolExecutor for CPU. Both have internal queue, both scale to at most specified max_workers. More info about concurrent executors in docs.
So answer are:
Reimplementing pool is an overhead. Thread or Process depends on what you plan to do.
while True is not blocking if you have e.g. some yielded async calls (even yield gen.sleep(0.01)), it gives back control to ioloop
qsize() is the right to call, but since I have not run/debug this and I would take a different approach (existing pool), it is hard to find a problem here.