Alternative python library for managing threads

Alternative python library for managing threads - python

I had some annoyances with spawning subprocesses, like getting correct output and so on. A wrapper library, envoy, solved all of my problems with an easy-to-use interface that gets rid of most problems.
Using thread, I sometimes struggle with hanging processes that does not end, external programs launched within threads that I can't reach anymore and so on.
Is there any "threading for dummies" python library out there? Thanks

Is there any "threading for dummies" python library out there?
No, there is not. threading is pretty simple to use in simple cases. You want to use it to introduce concurrency in your program. This means you want to use it whenever you want two or more actions to happen simultaneously, i.e. at the same time.
This is how you can let Peter build a house and let Igor drive to Moskow at the same time:
from threading import Thread
import time
def drive_bus():
time.sleep(1)
print "Igor: I'm Igor and I'm driving to... Moskow!"
time.sleep(9)
print "Igor: Yei, Moskow!"
def build_house():
print "Peter: Let's start building a large house..."
time.sleep(10.1)
print "Peter: Urks, we have no tools :-("
threads = [Thread(target=drive_bus), Thread(target=build_house)]
for t in threads:
t.start()
for t in threads:
t.join()
Isn't that simple? Define your function to be run in another thread. Create a threading.Thread instance with that function as target. Nothing happend so far, until you invoke start. It fires off the thread and immediately returns.
Before letting your main thread exit, you should wait for all the threads you have spawned to finish. This is what t.join() does: it blocks and waits for the thread t to finish. Only then it returns.

I would recommend reading more about the actual Python library - it is simple enough. Your problem with hanging threads, provided it prevents your application from exiting, may be solved by using daemon threads.
What kind of task are you trying to achieve? If you are trying to run a task in parallel without actual use of the custom threading, you may find the package multiprocessing useful. Furthermore, there is an interesting piece of information on the python wiki about parallel processing.
Could you elaborate a bit more on the task please?

Related

Stop multithreaded Python script on Windows

I have troubles with a simple multithreaded Python looping program. It should loop infinitely and stop with Ctrl+C. Here is an implementation using threading:
from threading import Thread, Event
from time import sleep
stop = Event()
def loop():
while not stop.is_set():
print("looping")
sleep(2)
try:
thread = Thread(target=loop)
thread.start()
thread.join()
except KeyboardInterrupt:
print("stopping")
stop.set()
This MWE is extracted from a more complex code (obviously, I do not need multithreading to create an infinite loop).
It works as expected on Linux, but not on Windows: the Ctrl+C event is not intercepted and the loop continues infinitely. According to the Python Dev mailing list, the different behaviors are due to the way Ctrl+C is handled by the two OSs.
So, it appears that one cannot simply rely on Ctrl+C with threading on Windows. My question is: what are the other ways to stop a multithreaded Python script on this OS with Ctrl+C?

As explained by Nathaniel J. Smith in the link from your question, at least as of CPython 3.7, Ctrl-C cannot wake your main thread on Windows:
The end result is that on Windows, control-C almost never works to
wake up a blocked Python process, with a few special exceptions where
someone did the work to implement this. On Python 2 the only functions
that have this implemented are time.sleep() and
multiprocessing.Semaphore.acquire; on Python 3 there are a few more
(you can grep the source for _PyOS_SigintEvent to find them), but
Thread.join isn't one of them.
So, what can you do?
One option is to just not use Ctrl-C to kill your program, and instead use something that calls, e.g., TerminateProcess, such as the builtin taskkill tool, or a Python script using the os module. But you don't want that.
And obviously, waiting until they come up with a fix in Python 3.8 or 3.9 or never before you can Ctrl-C your program is not acceptable.
So, the only thing you can do is not block the main thread on Thread.join, or anything else non-interruptable.
The quick&dirty solution is to just poll join with a timeout:
while thread.is_alive():
thread.join(0.2)
Now, your program is briefly interruptable while it's doing the while loop and calling is_alive, before going back to an uninterruptable sleep for another 200ms. Any Ctrl-C that comes in during that 200ms will just wait for you to process it, so that isn't a problem.
Except that 200ms is already long enough to be noticeable and maybe annoying.
And it may be too short as well as too long. Sure, it's not wasting much CPU to wake up every 200ms and execute a handful of Python bytecodes, but it's not nothing, and it's still getting a timeslice in the scheduler, and that may be enough to, e.g., keep a laptop from going into one of its long-term low-power modes.
The clean solution is to find another function to block on. As Nathaniel J. Smith says:
you can grep the source for _PyOS_SigintEvent to find them
But there may not be anything that fits very well. It's hard to imagine how you'd design your program to block on multiprocessing.Semaphore.acquire in a way that wouldn't be horribly confusing to the reader…
In that case, you might want to drag in the Win32 API directly, whether via PyWin32 or ctypes. Look at how functions like time.sleep and multiprocessing.Semaphore.acquire manage to be interruptible, block on whatever they're using, and have your thread signal whatever it is you're blocking on at exit.
If you're willing to use undocumented internals of CPython, it looks like, at least in 3.7, the hidden _winapi module has a wrapper function around WaitForMultipleObjects that appends the magic _PyOSSigintEvent for you when you're doing a wait-first rather than wait-all.
One of the things you can pass to WaitForMultipleObjects is a Win32 thread handle, which has the same effect as a join, although I'm not sure if there's an easy way to get the thread handle out of a Python thread.
Alternatively, you can manually create some kind of kernel sync object (I don't know the _winapi module very well, and I don't have a Windows system, so you'll probably have to read the source yourself, or at least help it in the interactive interpreter, to see what wrappers it offers), WaitForMultipleObjects on that, and have the thread signal it.

"Correct" process for sampling from external device (multithreading/timing) in Python

I am developing a Python script in which I am sampling data from a BLE device at a rate of about 50-200Hz. As of right now, I am doing this synchronously:
while True:
if time_now > time_before + (1/sample_rate):
do_stuff()
This works great, except for the fact that it is blocking the thread and other applications completely (if I want to combine with a Qt GUI, e.g.). What is the correct way to go about this issue?
Can I easily implement a multithreading setup, where each "sampler" (while-loop) gets its own thread?
Should I implement timer operations, and how do I make sure the script is not killed while waiting for a new sample?
My problem is similar to this, which, however, is for C#.

If the sampling itself doesn't take much time, you could use a QTimer and do the sampling in a slot on timeout. If it takes a lot of time blocking on I/O and not executing python code, you should probably use a Thread for the polling and send the result to the main thread using a signal.
If the sampling uses a lot of time executing python code, you are out of luck with most python implementations because of the GIL (Global Interpreter Lock). In most python implementations only one thread can actively execute python code. So real parallelism in pyhton is often done by creating new processes instead of new threads.

I love the Javascript setInterval pattern, my guess is this is more like what you want.
import threading
def setInterval(func,time):
e = threading.Event()
while not e.wait(time):
func()
def foo():
print "do poll here"
# using
setInterval(foo,5)
https://stackoverflow.com/a/39842247/1598412

How do I handle multiple requests to python program using zeromq and threading or async?

I have a little program, which does some calculations in background, when I call it through zerorpc module in python 2.7.
Here is my code:
is_busy = False
class Server(object):
def calculateSomeStuff(self):
global is_busy
if (is_busy):
return 'I am busy!'
is_busy = True
# calculate some stuff
is_busy = False
print 'Done!'
return
def getIsBusy(self):
return is_busy
s = zerorpc.Server(Server())
s.bind("tcp://0.0.0.0:66666")
s.run()
What should I change to make this program returning is_busy when I call .getIsBusy() method, after .calculateSomeStuff() has started doing it's job?
As I know, there is no way to make it asynchronous in python 2.

You need multi-threading for real concurrency and exploit more than one CPU core if this is what you are after. See the Python threading module, GIL-lock details & possible workarounds and literature.
If you want a cooperative solution, read on.
zerorpc uses gevent for asynchronous input/output. With gevent you write coroutines (also called greenlet or userland threads) which are all running cooperatively on a single thread. The thread in which the gevent input output loop is running. The gevent ioloop takes care of resuming coroutines waiting for some I/O event.
The key here is the word cooperative. Compare that to threads running on a single CPU/core machine. Effectively there is nothing concurrent,but the operating system will preempt ( verb:
take action in order to prevent (an anticipated event) from happening ) a running thread to execute the next on and so on so that every threads gets a fair chance of moving forward.
This happens fast enough so that it feels like all threads are running at the same time.
If you write your code cooperatively with the gevent input/output loop, you can achieve the same effect by being careful of calling gevent.sleep(0) often enough to give a chance for the gevent ioloop to run other coroutines.
It's literally cooperative multithrading. I've heard it was like that in Windows 2 or something.
So, in your example, in the heavy computation part, you likely have some loop going on. Make sure to call gevent.sleep(0) a couple times per second and you will have the illusion of multi-threading.
I hope my answer wasn't too confusing.

time.sleep that allows parent application to still evaluate?

I've run into situations as of late when writing scripts for both Maya and Houdini where I need to wait for aspects of the GUI to update before I can call the rest of my Python code. I was thinking calling time.sleep in both situations would have fixed my problem, but it seems that time.sleep just holds up the parent application as well. This means my script evaluates the exact same regardless of whether or not the sleep is in there, it just pauses part way through.
I have a thought to run my script in a separate thread in Python to see if that will free up the application to still run during the sleep, but I haven't had time to test this yet.
Thought I would ask in the meantime if anybody knows of some other solution to this scenario.

Maya - or more precisely Maya Python - is not really multithreaded (Python itself has a dodgy kind of multithreading because all threads fight for the dread global interpreter lock, but that's not your problem here). You can run threaded code just fine in Maya using the threading module; try:
import time
import threading
def test():
for n in range (0, 10):
print "hello"
time.sleep(1)
t = threading.Thread(target = test)
t.start()
That will print 'hello' to your listener 10 times at one second intervals without shutting down interactivity.
Unfortunately, many parts of maya - including most notably ALL user created UI and most kinds of scene manipulation - can only be run from the "main" thread - the one that owns the maya UI. So, you could not do a script to change the contents of a text box in a window using the technique above (to make it worse, you'll get misleading error messages - code that works when you run it from the listener but errors when you call it from the thread and politely returns completely wrong error codes). You can do things like network communication, writing to a file, or long calculations in a separate thread no problem - but UI work and many common scene tasks will fail if you try to do them from a thread.
Maya has a partial workaround for this in the maya.utils module. You can use the functions executeDeferred and executeInMainThreadWithResult. These will wait for an idle time to run (which means, for example, that they won't run if you're playing back an animation) and then fire as if you'd done them in the main thread. The example from the maya docs give the idea:
import maya.utils import maya.cmds
def doSphere( radius ):
maya.cmds.sphere( radius=radius )
maya.utils.executeInMainThreadWithResult( doSphere, 5.0 )
This gets you most of what you want but you need to think carefully about how to break up your task into threading-friendly chunks. And, of course, running threaded programs is always harder than the single-threaded alternative, you need to design the code so that things wont break if another thread messes with a variable while you're working. Good parallel programming is a whole big kettle of fish, although boils down to a couple of basic ideas:
1) establish exclusive control over objects (for short operations) using RLocks when needed
2) put shared data into safe containers, like Queue in #dylan's example
3) be really clear about what objects are shareable (they should be few!) and which aren't
Here's decent (long) overview.
As for Houdini, i don't know for sure but this article makes it sound like similar issues arise there.

A better solution, rather than sleep, is a while loop. Set up a while loop to check a shared value (or even a thread-safe structure like a Queue). The parent processes that your waiting on can do their work (or children, it's not important who spawns what) and when they finish their work, they send a true/false/0/1/whatever to the Queue/variable letting the other processes know that they may continue.

Non-blocking, non-concurrent tasks in Python

I am working on an implementation of a very small library in Python that has to be non-blocking.
On some production code, at some point, a call to this library will be done and it needs to do its own work, in its most simple form it would be a callable that needs to pass some information to a service.
This "passing information to a service" is a non-intensive task, probably sending some data to an HTTP service or something similar. It also doesn't need to be concurrent or to share information, however it does need to terminate at some point, possibly with a timeout.
I have used the threading module before and it seems the most appropriate thing to use, but the application where this library will be used is so big that I am worried of hitting the threading limit.
On local testing I was able to hit that limit at around ~2500 threads spawned.
There is a good possibility (given the size of the application) that I can hit that limit easily. It also makes me weary of using a Queue given the memory implications of placing tasks at a high rate in it.
I have also looked at gevent but I couldn't see an example of being able to spawn something that would do some work and terminate without joining. The examples I went through where calling .join() on a spawned Greenlet or on an array of greenlets.
I don't need to know the result of the work being done! It just needs to fire off and try to talk to the HTTP service and die with a sensible timeout if it didn't.
Have I misinterpreted the guides/tutorials for gevent ? Is there any other possibility to spawn a callable in fully non-blocking fashion that can't hit a ~2500 limit?
This is a simple example in Threading that does work as I would expect:
from threading import Thread
class Synchronizer(Thread):
def __init__(self, number):
self.number = number
Thread.__init__(self)
def run(self):
# Simulating some work
import time
time.sleep(5)
print self.number
for i in range(4000): # totally doesn't get past 2,500
sync = Synchronizer(i)
sync.setDaemon(True)
sync.start()
print "spawned a thread, number %s" % i
And this is what I've tried with gevent, where it obviously blocks at the end to
see what the workers did:
def task(pid):
"""
Some non-deterministic task
"""
gevent.sleep(1)
print('Task', pid, 'done')
for i in range(100):
gevent.spawn(task, i)
EDIT:
My problem stemmed out from my lack of familiarity with gevent. While the Thread code was indeed spawning threads, it also prevented the script from terminating while it did some work.
gevent doesn't really do that in the code above, unless you add a .join(). All I had to do to see the gevent code do some work with the spawned greenlets was to make it a long running process. This definitely fixes my problem as the code that needs to spawn the greenlets is done within a framework that is a long running process in itself.

Nothing requires you to call join in gevent, if you're expecting your main thread to last longer than any of your workers.
The only reason for the join call is to make sure the main thread lasts at least as long as all of the workers (so that the program doesn't terminate early).

Why not spawn a subprocess with a connected pipe or similar and then, instead of a callable, just drop your data on the pipe and let the subprocess handle it completely out of band.

As explained in Understanding Asynchronous/Multiprocessing in Python, asyncoro framework supports asynchronous, concurrent processes. You can run tens or hundreds of thousands of concurrent processes; for reference, running 100,000 simple processes takes about 200MB. If you want to, you can mix threads in rest of the system and coroutines with asyncoro (provided threads and coroutines don't share variables, but use coroutine interface functions to send messages etc.).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.