Variable loops and synchronization

Variable loops and synchronization - python

def start(self):
self.running = True
while self.running:
pass
def shut_down(self):
self.running = False
Hi i want to knew a good way, to synchronise variable running. I want to have fast solution but i dont knew what will be better semaphores, mutexs or locks. I assume that shut_down is not often used.
This is my best solution but i think we can do this better.
def start(self):
self.__lock__.acquire()
self.running = True
while self.running:
self.__lock__.release()
self.__lock__.acquire()
def shut_down(self):
self.__lock__.acquire()
self.running = False
self.__lock__.release()

For your simple example with a primitive value like a Boolean flag, no synchronization is necessary in Python. At least, not in CPython (the standard interpreter you can download from python.org).
That's because the whole interpreter is covered by the "Global Interpreter Lock" so only one thread can be running at a time at the Python level (multiple threads could potentially be doing stuff at the same time in extension modules, if those modules are set up to release the GIL at appropriate times). So, when your worker thread looping in the start function checks the running attribute, it is guaranteed that that object is in a sane state. If no code other than shut_down modifies it, you can even be sure that it will be True or False.
So, if you are going to be using CPython and you're sticking to very simple logic (like a Boolean flag that is only ever written to by one thread), your first example code will work just fine.
If you need more complicated logic, like a counter that can be incremented by any of several threads, then you will need a bit of synchronization to avoid race conditions like TOCTTOU.
One of the easiest synchronization tools that Python offers is the queue module, which allows synchronized communication between threads in a FIFO manner. I can't speak to its performance compared to lower level stuff, but it's really easy to get code working correctly with queues (whereas its pretty easy to mess up manual locking, ending up with deadlocks or race conditions that are a nightmare to debug).

Related

Correctness of modified consumer/producer

I am creating a Sound class to play notes and would like feedback on the correctness and conciseness of my design. This class differs from the typical consumer/producer in two ways:
The consumer should respond to events, such as to shut down the thread, or otherwise continue forever. The typical consumer/producer exits when the queue is empty. For example, a thread waiting in queue.get cannot handle additional notifications.
Each set of notes submitted by the producer should overwrite any unprocessed notes remaining on the queue.
Originally I had the consumer process one note at a time using the queue module. I found continually acquiring and releasing the lock without any competition to be inefficient, and as previously noted, queue.get prevents waiting on additional events. So instead of building upon that, I rewrote it into:
import threading
queue = []
condition = threading.Condition()
interrupt = threading.Event()
stop = threading.Event()
def producer():
while some_condition:
ns = get_notes() # [(float,float)]
with condition:
queue.clear()
queue.append(ns)
interrupt.set()
condition.notify()
with condition:
stop.set()
condition.notify()
consumer.join()
def consumer():
while not stop.is_set():
with condition:
while not (queue or stop.is_set()):
condition.wait()
if stop.is_set():
break
interrupt.clear()
ns = queue.pop()
ss = gen_samples(ns) # iterator/fast
for b in grouper(ss, size/2):
if interrupt.is_set() or stop.is_set()
break
stream.write(b)
thread = threading.Thread(target=consumer)
thread.start()
producer()
My questions are as follows:
Is this thread-safe? I want to specifically point out my use of is_set without locks or synchronization (in the for-loop).
Can the events be replaced with boolean variables? I believe so as conflicting writes in both threads (data race) are guarded by the condition variable. There is a race condition between setting and checking events but I do not believe it affects program flow.
Is there a more efficient approach/algorithm utilizing different synchronization primitives from the threading module?
edit: Found and fixed a possible deadlock described in Why does Python threading.Condition() notify() require a lock?

Analyzing thread-safety in Python can take into account the Global Interpreter Lock (GIL): no two threads will execute Python code simultaneously. Assignments to variables or object fields are effectively atomic (there are no half-assigned variables) and changes propagate effectively immediately to other threads.
This means that your use of Event.is_set() is already equivalent to using plain booleans. An event is a bool guarded by a Condition. The is_set() method checks the boolean directly. The set() method acquires the Condition, sets the boolean, and notifies all waiting threads. The wait() methods waits until the set() method is invoked. The clear() method acquires the Condition and unsets the boolean. Since you never wait() for any Event, and setting the boolean is atomic, the Condition in the Event is effectively unused.
This might get rid of a couple of locks, but isn't really a huge efficiency win. A Condition is still an abstraction over a lock, but the built-in Queue type uses locks directly. Thus, I would assume that the built-in queue is no less performant than your solution, even for a single consumer.
Your main issue with the built-in queue is that “continually acquiring and releasing the lock without any competition [is] inefficient”. This is wrong on two counts:
Due to Python's GIL, there is little competition in either case.
Acquiring uncontested locks is very efficient.
So while your solution is probably sufficiently correct (I can see no opportunity for deadlock) it is unlikely to be particularly efficient. (There are just some small mistakes, like using stop instead of stop.is_set() and some syntax errors.)
If you are seeing poor performance with Python threads that's probably because of CPython, not because of the Queue type. I already mentioned that only one thread can run at a time due to the GIL. If multiple threads want to run, they must be scheduled by the operating system to do so and acquire the GIL. Each thread will wait for 5ms before asking the running thread to give up the GIL (in a manner quite similar to your interrupt flag). And then the thread can do useful work like acquiring a lock for a critical section that must not be interrupted by other threads.
Possibly, the solution could be to avoid CPython's threads.
If you have multiple CPU-bound tasks, you must use multiple processes. CPython's threads will not run in parallel. However, communication between processes is more expensive.
Consider whether you can combine the producer+consumer directly, possibly using features such as generators.
For an easier time with juggling multiple tasks in the same thread, consider using async/await. Event loops are provided by the asyncio module. This is just as fast as Python's threads, with the caveat that tasks don't pre-empt (interrupt) each other. But this can be advantage: since a task can only be suspended at an await, you don't need most locks and it is easier to reason about correctness of the code. The downside is that async/await might have even higher latency than using threads.
Python has a concept of “executors” that make it easy and efficient to run tasks in separate threads (for I/O-bound tasks) or separate processes (for CPU-bound tasks).
For communicating between multiple processes, use the types from the multiprocessing module (e.g. Queue, Connection, or Value).

threading.Lock() performance issues

I have multiple threads:
dispQ = Queue.Queue()
stop_thr_event = threading.Event()
def worker (stop_event):
while not stop_event.wait(0):
try:
job = dispQ.get(timeout=1)
job.waitcount -= 1
dispQ.task_done()
except Queue.Empty, msg:
continue
# create job objects and put into dispQ here
for j in range(NUM_OF_JOBS):
j = Job()
dispQ.put(j)
# NUM_OF_THREADS could be 10-20 ish
running_threads = []
for t in range(NUM_OF_THREADS):
t1 = threading.Thread( target=worker, args=(stop_thr_event,) )
t1.daemon = True
t1.start()
running_threads.append(t1)
stop_thr_event.set()
for t in running_threads:
t.join()
The code above was giving me some very strange behavior.
I've ended up finding out that it was due to decrementing waitcount with out a lock
I 've added an attribute to Job class self.thr_lock = threading.Lock()
Then I've changed it to
with job.thr_lock:
job.waitcount -= 1
This seems to fix the strange behavior but it looks like it has degraded in performance.
Is this expected? is there way to optimize locking?
Would it be better to have one global lock rather than one lock per job object?

About the only way to "optimize" threading would be to break the processing down in blocks or chunks of work that can be performed at the same time. This mostly means doing input or output (I/O) because that is the only time the interpreter will release the Global Interpreter Lock, aka the GIL.
In actuality there is often no gain or even a net slow-down when threading is added due to the overhead of using it unless the above condition is met.
It would probably be worse if you used a single global lock for all the shared resources because it would make parts of the program wait when they really didn't need to do so since it wouldn't distinguish what resource was needed so unnecessary waiting would occur.
You might find the PyCon 2015 talk David Beasley gave titled Python Concurrency From the Ground Up of interest. It covers threads, event loops, and coroutines.

It's hard to answer your question based on your code. Locks do have some inherent cost, nothing is free, but normally it is quite small. If your jobs are very small, you might want to consider "chunking" them, that way you have many fewer acquire/release calls relative to the amount of work being done by each thread.
A related but separate issue is one of threads blocking each other. You might notice large performance issues if many threads are waiting on the same lock(s). Here your threads are sitting idle waiting on each other. In some cases this cannot be avoided because there is a shared resource which is a performance bottlenecking. In other cases you can re-organize your code to avoid this performance penalty.
There are some things in your example code that make me thing that it might be very different from actual application. First, your example code doesn't share job objects between threads. If you're not sharing job objects you shouldn't need locks on them. Second, as written your example code might not empty the queue before finishing. It will exit as soon as you hit stop_thr_event.set() leaving any remaining jobs in queue, is this by design?

How Do I Queue My Python Locks?

Is there a way to make python locks queued? I have been assuming thus far in my code that threading.lock operates on a queue. It looks like it just gives the lock to a random locker. This is bad for me, because the program (game) I'm working is highly dependent on getting messages in the right order. Are there queued locks in python? If so, how much will I lose on processing time?

I wholly agree with the comments claiming that you're probably thinking about this in an unfruitful way. Locks provide serialization, and aren't at all intended to provide ordering. The bog-standard, easy, and reliable way to enforce an order is to use a Queue.Queue
CPython leaves it up to the operating system to decide in which order locks are acquired. On most systems, that will appear to be more-or-less "random". That cannot be changed.
That said, I'll show a way to implement a "FIFO lock". It's neither hard nor easy - somewhere in between - and you shouldn't use it ;-) I'm afraid only you can answer your "how much will I lose on processing time?" question - we have no idea how heavily you use locks, or how much lock contention your application provokes. You can get a rough feel by studying this code, though.
import threading, collections
class QLock:
def __init__(self):
self.lock = threading.Lock()
self.waiters = collections.deque()
self.count = 0
def acquire(self):
self.lock.acquire()
if self.count:
new_lock = threading.Lock()
new_lock.acquire()
self.waiters.append(new_lock)
self.lock.release()
new_lock.acquire()
self.lock.acquire()
self.count += 1
self.lock.release()
def release(self):
with self.lock:
if not self.count:
raise ValueError("lock not acquired")
self.count -= 1
if self.waiters:
self.waiters.popleft().release()
def locked(self):
return self.count > 0
Here's a little test driver, which can be changed in the obvious way to use either this QLock or a threading.Lock:
def work(name):
qlock.acquire()
acqorder.append(name)
from time import sleep
if 0:
qlock = threading.Lock()
else:
qlock = QLock()
qlock.acquire()
acqorder = []
ts = []
for name in "ABCDEFGHIJKLMNOPQRSTUVWXYZ":
t = threading.Thread(target=work, args=(name,))
t.start()
ts.append(t)
sleep(0.1) # probably enough time for .acquire() to run
for t in ts:
while not qlock.locked():
sleep(0) # yield time slice
qlock.release()
for t in ts:
t.join()
assert qlock.locked()
qlock.release()
assert not qlock.locked()
print "".join(acqorder)
On my box just now, 3 runs using threading.Lock produced this output:
BACDEFGHIJKLMNOPQRSTUVWXYZ
ABCDEFGHIJKLMNOPQRSUVWXYZT
ABCEDFGHIJKLMNOPQRSTUVWXYZ
So it's certainly not random, but neither is it wholly predictable. Running it with the QLock instead, the output should always be:
ABCDEFGHIJKLMNOPQRSTUVWXYZ

I stumbled upon this post because I had a similar requirement. Or at least I thought so.
My fear was that if the locks didn't release in FIFO order, thread starvation would be likely to happen, and that would be terrible for my software.
After reading a bit, I dismissed my fears and realized what everyone was saying: if you want this, you're doing it wrong. Also, I was convinced that you can rely on the OS to do its job and not let your thread starve.
To get to that point, I did a bit of digging to understand better how locks worked under Linux. I started by taking a look at the pthreads (Posix Threads) glibc source code and specifications, because I was working in C++ on Linux. I don't know if Python uses pthreads under the hood, but I'm gonna assume it probably is.
I didn't find any specification, on the multiple pthreads references around, relating to the order of unlocks.
What I found is: locks in pthreads on Linux are implemented using a kernel feature called futex.
http://man7.org/linux/man-pages/man2/futex.2.html
http://man7.org/linux/man-pages/man7/futex.7.html
A link on the references of the first of these pages leads you to this PDF:
https://www.kernel.org/doc/ols/2002/ols2002-pages-479-495.pdf
It explains a bit about unlocking strategies, and about how futexes work and are implemented in the Linux kernel, and a lot lot more.
And there I found what I wanted. It explains that futexes are implemented in the kernel in a way such that unlocks are mostly done in FIFO order (to increase fairness). However, that is not guaranteed, and it is possible that one thread might jump the line one in a while. They allow this to not complicate too much the code and allow for the good performance they achieved without losing it due to extreme measures to enforce the FIFO order.
So basically, what you have is:
The POSIX standard doesn't impose any requirement as to the order of locking and unlocking of mutexes. Any implementation is free to do as they want, so if you rely on this order, your code won't be portable (not even between different versions of the same platform).
The Linux implementation of the pthreads library rely on a feature/technique called futex to implement mutexes, and it mostly tries to do a FIFO style unlocking of mutexes, but it is not guaranteed that it will be done in that order.

Yes you could create a FIFO queue using a list of the thread IDs:
FIFO = [5,79,3,2,78,1,9...]
You would try to acquire the lock and if you can't, then push the attempting thread's ID (FIFO.insert(0,threadID)) onto the front of the queue and each time you release the lock, make sure that if a thread wants to acquire the lock it must have the thread ID at the end of the queue (threadID == FIFO[-1]). If the thread does have the thread ID at the end of the queue, then let it acquire the lock and then pop it off (FIFO.pop()). Repeat as necessary.

Python: time a method call and stop it if time is exceeded

I need to dynamically load code (comes as source), run it and get the results. The code that I load always includes a run method, which returns the needed results. Everything looks ridiculously easy, as usual in Python, since I can do
exec(source) #source includes run() definition
result = run(params)
#do stuff with result
The only problem is, the run() method in the dynamically generated code can potentially not terminate, so I need to only run it for up to x seconds. I could spawn a new thread for this, and specify a time for .join() method, but then I cannot easily get the result out of it (or can I). Performance is also an issue to consider, since all of this is happening in a long while loop
Any suggestions on how to proceed?
Edit: to clear things up per dcrosta's request: the loaded code is not untrusted, but generated automatically on the machine. The purpose for this is genetic programming.

The only "really good" solutions -- imposing essentially no overhead -- are going to be based on SIGALRM, either directly or through a nice abstraction layer; but as already remarked Windows does not support this. Threads are no use, not because it's hard to get results out (that would be trivial, with a Queue!), but because forcibly terminating a runaway thread in a nice cross-platform way is unfeasible.
This leaves high-overhead multiprocessing as the only viable cross-platform solution. You'll want a process pool to reduce process-spawning overhead (since presumably the need to kill a runaway function is only occasional, most of the time you'll be able to reuse an existing process by sending it new functions to execute). Again, Queue (the multiprocessing kind) makes getting results back easy (albeit with a modicum more caution than for the threading case, since in the multiprocessing case deadlocks are possible).
If you don't need to strictly serialize the executions of your functions, but rather can arrange your architecture to try two or more of them in parallel, AND are running on a multi-core machine (or multiple machines on a fast LAN), then suddenly multiprocessing becomes a high-performance solution, easily paying back for the spawning and IPC overhead and more, exactly because you can exploit as many processors (or nodes in a cluster) as you can use.

You could use the multiprocessing library to run the code in a separate process, and call .join() on the process to wait for it to finish, with the timeout parameter set to whatever you want. The library provides several ways of getting data back from another process - using a Value object (seen in the Shared Memory example on that page) is probably sufficient. You can use the terminate() call on the process if you really need to, though it's not recommended.

You could also use Stackless Python, as it allows for cooperative scheduling of microthreads. Here you can specify a maximum number of instructions to execute before returning. Setting up the routines and getting the return value out is a little more tricky though.

I could spawn a new thread for this, and specify a time for .join() method, but then I cannot easily get the result out of it
If the timeout expires, that means the method didn't finish, so there's no result to get. If you have incremental results, you can store them somewhere and read them out however you like (keeping threadsafety in mind).
Using SIGALRM-based systems is dicey, because it can deliver async signals at any time, even during an except or finally handler where you're not expecting one. (Other languages deal with this better, unfortunately.) For example:
try:
# code
finally:
cleanup1()
cleanup2()
cleanup3()
A signal passed up via SIGALRM might happen during cleanup2(), which would cause cleanup3() to never be executed. Python simply does not have a way to terminate a running thread in a way that's both uncooperative and safe.
You should just have the code check the timeout on its own.
import threading
from datetime import datetime, timedelta
local = threading.local()
class ExecutionTimeout(Exception): pass
def start(max_duration = timedelta(seconds=1)):
local.start_time = datetime.now()
local.max_duration = max_duration
def check():
if datetime.now() - local.start_time > local.max_duration:
raise ExecutionTimeout()
def do_work():
start()
while True:
check()
# do stuff here
return 10
try:
print do_work()
except ExecutionTimeout:
print "Timed out"
(Of course, this belongs in a module, so the code would actually look like "timeout.start()"; "timeout.check()".)
If you're generating code dynamically, then generate a timeout.check() call at the start of each loop.

Consider using the stopit package that could be useful in some cases you need timeout control. Its doc emphasizes the limitations.
https://pypi.python.org/pypi/stopit

a quick google for "python timeout" reveals a TimeoutFunction class

Executing untrusted code is dangerous, and should usually be avoided unless it's impossible to do so. I think you're right to be worried about the time of the run() method, but the run() method could do other things as well: delete all your files, open sockets and make network connections, begin cracking your password and email the result back to an attacker, etc.
Perhaps if you can give some more detail on what the dynamically loaded code does, the SO community can help suggest alternatives.

How do threads work in Python, and what are common Python-threading specific pitfalls?

I've been trying to wrap my head around how threads work in Python, and it's hard to find good information on how they operate. I may just be missing a link or something, but it seems like the official documentation isn't very thorough on the subject, and I haven't been able to find a good write-up.
From what I can tell, only one thread can be running at once, and the active thread switches every 10 instructions or so?
Where is there a good explanation, or can you provide one? It would also be very nice to be aware of common problems that you run into while using threads with Python.

Yes, because of the Global Interpreter Lock (GIL) there can only run one thread at a time. Here are some links with some insights about this:
http://www.artima.com/weblogs/viewpost.jsp?thread=214235
http://smoothspan.wordpress.com/2007/09/14/guido-is-right-to-leave-the-gil-in-python-not-for-multicore-but-for-utility-computing/
From the last link an interesting quote:
Let me explain what all that means.
Threads run inside the same virtual
machine, and hence run on the same
physical machine. Processes can run
on the same physical machine or in
another physical machine. If you
architect your application around
threads, you’ve done nothing to access
multiple machines. So, you can scale
to as many cores are on the single
machine (which will be quite a few
over time), but to really reach web
scales, you’ll need to solve the
multiple machine problem anyway.
If you want to use multi core, pyprocessing defines an process based API to do real parallelization. The PEP also includes some interesting benchmarks.

Python's a fairly easy language to thread in, but there are caveats. The biggest thing you need to know about is the Global Interpreter Lock. This allows only one thread to access the interpreter. This means two things: 1) you rarely ever find yourself using a lock statement in python and 2) if you want to take advantage of multi-processor systems, you have to use separate processes. EDIT: I should also point out that you can put some of the code in C/C++ if you want to get around the GIL as well.
Thus, you need to re-consider why you want to use threads. If you want to parallelize your app to take advantage of dual-core architecture, you need to consider breaking your app up into multiple processes.
If you want to improve responsiveness, you should CONSIDER using threads. There are other alternatives though, namely microthreading. There are also some frameworks that you should look into:
stackless python
greenlets
gevent
monocle

Below is a basic threading sample. It will spawn 20 threads; each thread will output its thread number. Run it and observe the order in which they print.
import threading
class Foo (threading.Thread):
def __init__(self,x):
self.__x = x
threading.Thread.__init__(self)
def run (self):
print str(self.__x)
for x in xrange(20):
Foo(x).start()
As you have hinted at Python threads are implemented through time-slicing. This is how they get the "parallel" effect.
In my example my Foo class extends thread, I then implement the run method, which is where the code that you would like to run in a thread goes. To start the thread you call start() on the thread object, which will automatically invoke the run method...
Of course, this is just the very basics. You will eventually want to learn about semaphores, mutexes, and locks for thread synchronization and message passing.

Note: wherever I mention thread i mean specifically threads in python until explicitly stated.
Threads work a little differently in python if you are coming from C/C++ background. In python, Only one thread can be in running state at a given time.This means Threads in python cannot truly leverage the power of multiple processing cores since by design it's not possible for threads to run parallelly on multiple cores.
As the memory management in python is not thread-safe each thread require an exclusive access to data structures in python interpreter.This exclusive access is acquired by a mechanism called GIL ( global interpretr lock ).
Why does python use GIL?
In order to prevent multiple threads from accessing interpreter state simultaneously and corrupting the interpreter state.
The idea is whenever a thread is being executed (even if it's the main thread), a GIL is acquired and after some predefined interval of time the
GIL is released by the current thread and reacquired by some other thread( if any).
Why not simply remove GIL?
It is not that its impossible to remove GIL, its just that in prcoess of doing so we end up putting mutiple locks inside interpreter in order to serialize access, which makes even a single threaded application less performant.
so the cost of removing GIL is paid off by reduced performance of a single threaded application, which is never desired.
So when does thread switching occurs in python?
Thread switch occurs when GIL is released.So when is GIL Released?
There are two scenarios to take into consideration.
If a Thread is doing CPU Bound operations(Ex image processing).
In Older versions of python , Thread switching used to occur after a fixed no of python instructions.It was by default set to 100.It turned out that its not a very good policy to decide when switching should occur since the time spent executing a single instruction can
very wildly from millisecond to even a second.Therefore releasing GIL after every 100 instructions regardless of the time they take to execute is a poor policy.
In new versions instead of using instruction count as a metric to switch thread , a configurable time interval is used.
The default switch interval is 5 milliseconds.you can get the current switch interval using sys.getswitchinterval().
This can be altered using sys.setswitchinterval()
If a Thread is doing some IO Bound Operations(Ex filesystem access or
network IO)
GIL is release whenever the thread is waiting for some for IO operation to get completed.
Which thread to switch to next?
The interpreter doesn’t have its own scheduler.which thread becomes scheduled at the end of the interval is the operating system’s decision. .

Use threads in python if the individual workers are doing I/O bound operations. If you are trying to scale across multiple cores on a machine either find a good IPC framework for python or pick a different language.

One easy solution to the GIL is the multiprocessing module. It can be used as a drop in replacement to the threading module but uses multiple Interpreter processes instead of threads. Because of this there is a little more overhead than plain threading for simple things but it gives you the advantage of real parallelization if you need it.
It also easily scales to multiple physical machines.
If you need truly large scale parallelization than I would look further but if you just want to scale to all the cores of one computer or a few different ones without all the work that would go into implementing a more comprehensive framework, than this is for you.

Try to remember that the GIL is set to poll around every so often in order to do show the appearance of multiple tasks. This setting can be fine tuned, but I offer the suggestion that there should be work that the threads are doing or lots of context switches are going to cause problems.
I would go so far as to suggest multiple parents on processors and try to keep like jobs on the same core(s).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.