What could possibly go wrong when not locking write operation here? - python

Assume a class WorkerThread that implements a field running which indicates whether the thread should continue to work after it was started or not.
class WorkerThread(threading.Thread):
running = False
def run(self):
self.running = True
while self.running:
# .. do some important stuff
pass
def main():
t = WorkerThread()
t.start()
# .. do other important stuff
t.running = False
t.join()
Is there something that could possibly go wrong when modifying t.running from the main thread, without locking the read and write operations to this field? What is it?

The main thread and the worker thread could run on cores that do not share cache. Because of the absence of synchronization, the write to t.running might never be shared from the main thread's cache to the worker thread's cache.
What synchronization means is not just "I want exclusive access". It also means, "I want to share my writes to other threads, and see the writes from other threads". No synchronization means that you do not need those things. Not synchronizing doesn't prevent them happening (and on some systems/architectures they will happen with more frequency than others), it just fails to guarantee they will happen.
In practice you might find that provided CPython is taking the GIL at regular intervals, these things sort themselves out even on architectures that, unlike Intel, do not have coherent caches.

for your requirements use a threading.Event() object instead a flag.
class WorkerThread(threading.Thread):
def __init__(self):
super(WorkerThread, self).__init__()
self.running = threading.Event()
def run(self):
self.running.set()
while self.running.is_set():
# .. do some important stuff
pass
def halt(self):
self.running.clear()
def main():
t = WorkerThread()
t.start()
# .. do other important stuff
t.halt()
t.join()
and for check if is running t.is_alive().

The field "running" is shared state, you need to guard it with some kind of monitor. In the absence of synchronizing access to this shared state, its visibility semantics are difficult to reason about and you'll get unexpected results.

Related

On Ctrl-d, call Close() like with file objects happen

I've wrote a class that inherits from object and has instances of sub-objects that uses some threads for tasks. There are two socket listeners that creates other threads for each accepted connection. They do what they have to do. To finish them, they are looking on a Threading.Event object to know that they have to finish.
I've noticed that, when exit the python console they are not notified (or don't catch the notification) and the exit don't return control to the bash console, unless a Close() is called before.
First idea to fix it has been to implement the '__del__' method to use the garbage collector to clean it when exit.
class ServiceProvider(object):
def __init__(self):
super(ServiceProvider,self).__init__()
#...
self.Open()
def Open(self):
#... Some threads are created.
def Close(self):
#.... Threading.Event to report the threads to finish
def __del__(self):
self.Close()
But the behaviour is the same. If I place a print in those methods, non in '__del__', neither in 'Close' they are written. Unless it is closed before, then the print in the del is wrote.
Then I've implemented the '__enter__' and '__exit__' methods to manage the with statement. And the exit behaves as expected and when the with ends, things are release. But what I really want is to have something like the file descriptors that event if file.close() is not called, it is executed when exits the program.
class ServiceProvider(object):
#...
def __enter__(self):
return self
def __exit__(self):
self.Close()
Searching for more solutions I've tried with atexit but not. I have similar results that doesn't fix the issue. Even I collect all the objects created of this class, the doOnExit only writes its print if the objects in the list are already Close.
import atexit
global objects2Close
objects2Close = []
#atexit.register
def doOnExit():
for obj in objects2Close:
obj.Close()
class ServiceProvider(object):
def __init__(self):
super(ServiceProvider,self).__init__()
objects2Close.append(self)
It's usually a good idea to use with when you have resources that you don't want to leak (files, connections, whatever else you care about).
Somewhere, just outside your main loop you should have something like:
with ServiceProvider(some_params) as service_provider:
rest_of_the_code()
What this does is that regardless of how you exit rest_of_the_code() (except for kill -9) it will call service_provider.Close() at the end. This works for exceptions and interrupts as well. Kill -9 doesn't work because the process is kill at os level and doesn't have a chance to attempt to recover.
I've got a solution for this issue. The posted information in this question was not related with the real issue.
This is as simple as daemon threading.
A the implementation uses some threads for listening remote connections they have to finish their execution when the program goes to exit. But the program ends when all the no daemon thread has finished.
Mistakenly those listeners and talkers where not set to be daemons and that's why the execution waits for them.

Reactive event loop in Python

I'm trying to build a system that collects data from some sources using I/O (HDD, network...)
For this, I have a class (controller) that launch the collectors.
Each collector is an infinite loop with a classic ETL process (extract, transform and load).
I want send some commands to the collectors (stop, reload settings...) from an interface (CLI, web...) and I'm not sure about how to do it.
For example, this is the skeleton for a collector:
class Collector(object):
def __init__(self):
self.reload_settings()
def reload_settings(self):
# Get the settings
# Set the settings as attributes
def process_data(self, data):
# Do something
def run(self):
while True:
data = retrieve_data()
self.process_data(data)
And this is the skeleton for the controller:
class Controller(object):
def __init__(self, collectors):
self.collectors = collectors
def run(self):
for collector in collectors:
collector.run()
def reload_settings(self):
??
def stop(self):
??
Is there a classic design pattern that solves this problem (Publish–subscribe, event loop, reactor...)? What is the best way to solve this problem?
PD: Obviously, this will be a multiprocess application and will run on a single machine.
There are multiple choices here, but they boil down to two major kinds: cooperative (event loop/reactor/coroutine/explicit greenlet), or preemptive (implicit greenlet/thread/multiprocess).
The first requires a lot more restructuring of your collectors. It can be a nice way to make the nondeterminism explicit, or to achieve massive concurrency, but neither of those seems relevant here. The second just requires sticking the collectors on threads, and using some synchronization mechanism for both communication and shared data. It seems like you have no shared data, and your communication is trivial and not highly time-sensitive. So, I'd go with threads.
Assuming you want to go with threads in the general sense, assuming your collectors are I/O-bound and you don't have dozens of them, I'd go with actual threads.
So, here's one way you can write it:
class Collector(threading.Thread):
def __init__(self):
self._reload_settings()
self._need_reload = threading.Event()
self._need_stop = threading.Event()
def _reload_settings(self):
# Get the settings
# Set the settings as attributes
self._need_reload.clear()
def reload_settings(self):
self._need_reload.set()
def stop(self):
self._need_stop.set()
def process_data(self, data):
# Do something
def run(self):
while not self._need_stop.is_set():
if self._need_reload.is_set():
self._reload_settings()
data = retrieve_data()
self.process_data(data)
class Controller(object):
def __init__(self, collectors):
self.collectors = collectors
def run(self):
for collector in self.collectors:
collector.start()
def reload_settings(self):
for collector in self.collectors:
collector.reload_settings()
def stop(self):
for collector in self.collectors:
collector.stop()
for collector in self.collectors:
collector.join()
(Although I'd call the Controller.run method stop, because it fits in better with the naming used not only by Thread, but also by the stdlib server classes and other similar things.)
I'd look at the possibility of adapting your case to socket-based client-server architecture where Controller would instantiate required number of Collectors each listening on its own port and handling received data in more elegant way through handle() method of the server. The fact that data comes from various I/O sources speaks even more for this solution - you could use Client part of this architecture to standarize the DataSource -> Collector protocol
https://docs.python.org/2/library/socketserver.html

correct way to update attributes on one thread from another

I can best explain this with example code first;
class reciever(threading.Thread,simple_server):
def __init__(self,callback):
threading.Thread.__init__(self)
self.callback=callback
def run(self):
self.serve_forever(self.callback)
class sender(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.parameter=50
def run(self):
while True:
#do some processing in general
#....
#send some udp messages derived from self.parameter
send_message(self.parameter)
if __name__=='__main__':
osc_send=sender()
osc_send.start()
def update_parameter(val):
osc_send.parameter=val
osc_recv=reciever(update_parameter)
osc_recv.start()
the pieces I have left out are hopefully self explanatory from the code thats there..
My question is, is this a safe way to use a server running in a thread to update the attributes on a separate thread that could be reading the value at any time?
The way you're updating that parameter is actually thread-safe already, because of the Global Interpreter Lock (GIL). The GIL means that Python only allows one thread to execute byte-code at a time, so it is impossible for one thread to be reading from parameter at the same time another thread is writing to it. Reading from and setting an attribute are both single, atomic byte-code operations; one will always start and complete before the other can happen. You would only need to introduce synchronization primitives if you needed to do operations that are more than one byte-code operation from more than one threads (e.g. incrementing parameter from multiple threads).

Why does the instance need to be recreated when restarting a thread?

Imagine the following classes:
Class Object(threading.Thread):
# some initialisation blabla
def run(self):
while True:
# do something
sleep(1)
class Checker():
def check_if_thread_is_alive(self):
o = Object()
o.start()
while True:
if not o.is_alive():
o.start()
I want to restart the thread in case it is dead. This doens't work. Because the threads can only be started once. First question. Why is this?
For as far as I know I have to recreate each instance of Object and call start() to start the thread again. In case of complex Objects this is not very practical. I've to read the current values of the old Object, create a new one and set the parameters in the new object with the old values. Second question: Can this be done in a smarter, easier way?
The reason why threading.Thread is implemented that way is to keep correspondence between a thread object and operating system's thread. In major OSs threads can not be restarted, but you may create another thread with another thread id.
If recreation is a problem, there is no need to inherit your class from threading.Thread, just pass a target parameter to Thread's constructor like this:
class MyObj(object):
def __init__(self):
self.thread = threading.Thread(target=self.run)
def run(self):
...
Then you may access thread member to control your thread execution, and recreate it as needed. No MyObj recreation is required.
See here:
http://docs.python.org/2/library/threading.html#threading.Thread.start
It must be called at most once per thread object. It arranges for the
object’s run() method to be invoked in a separate thread of control.
This method will raise a RuntimeError if called more than once on the
same thread object.
A thread isn't intended to run more than once. You might want to use a Thread Pool
I believe, that has to do with how Thread class is implemented. It wraps a real OS thread, so that restarting the thread would actually change its identity, which might be confusing.
A better way to deal with threads is actually through target functions/callables:
class Worker(object):
""" Implements the logic to be run in separate threads """
def __call__(self):
# do useful stuff and change the state
class Supervisor():
def run(self, worker):
thr = None
while True:
if not thr or not thr.is_alive():
thr = Thread(target=worker)
thr.daemon = True
thr.start()
thr.join(1) # give it some time

Python threading question

I have some python application with 2 threads. Each thread operates within a separate gui. The GUIs need to operate independently without blocking. I am trying to figure out how to make thread_1 trigger an event to happen in thread_2?
Below is some code I want function foo to trigger function bar in the simplest, most elegant way as quickly as possible, without consuming unnecessary resources. Below is what I've come up with.
bar_trigger=False #global trigger for function bar.
lock = threading.Lock()
class Thread_2(threading.Thread):
def run(self):
global lock, bar_trigger
while(True):
lock.acquire()
if bar_trigger==True:
Thread_2.bar() #function I want to happen
bar_trigger=False
lock.release()
time.sleep(100) #sleep to preserve resources
#would like to preserve as much resources as possible
# and sleep as little as possible.
def bar(self):
print "Bar!"
class Thread_1(threading.Thread):
def foo(self):
global lock, bar_trigger
lock.acquire()
bar_trigger=True #trigger for bar in thread2
lock.release()
Is there a better way to accomplish this? I'm not a threadding expert so any advice on how to best trigger a method in thread_2 from within thread_1 is appreciated.
Without knowing what you're doing and what GUI framework you're using, I can't get into much more detail, but from your problem's code snippet, it sounds like you're looking for something called conditional variables.
Python comes with them included by default in the threading module, under threading.Condition You might be interested in threading.Event as well.
How are these threads instantiated? There should really be a main thread that oversees the workers. For example,
import time
import threading
class Worker(threading.Thread):
def __init__(self, stopper):
threading.Thread.__init__(self)
self.stopper = stopper
def run(self):
while not self.stopper.is_set():
print 'Hello from Worker!'
time.sleep(1)
stop = threading.Event()
worker = Worker(stop)
worker.start()
# ...
stop.set()
Using a shared Event object is just one way of synchronizing and sending messages between threads. There are others, and their usages depend on the specifics.
One option would be to share a queue between the threads. Thread 1 would push an instruction into the queue and thread two would poll that queue. When Thread 2 sees the queue is non-empty, it reads off the first instruction in the queue and calls the appropriate function. This has the additional benefit of being fairly loosely couple which can make testing each thread in isolation easier.

Categories