python: writing to a global variable from separate thread creates a copy

python: writing to a global variable from separate thread creates a copy - python

I am trying to create a global state variable, which is written in a callback method (event handler).
However, the callback creates a copy (deep) on another memory location, which is not being seen (of course) by the other methods.
Here is the situation
class Server:
def __init__(self):
self.callbacks=[]
#create a web server instance to listen to requests
self.app=Flask("test")
def add_callback(self, func):
self.calbacks.append(func)
self.app.add_url_rule("/test", "test", self.handle_http_request)
def handle_http_request(self):
content = request.get_json(silent=True)
for ca in self.callbacks:
ca(content)
def start_server(self):
#some stuff starting flask here...
class SomeModule:
def __init__(self):
self.ws=Server()
self.ws.add_callback(self.callback)
self.callback_called=False
def callback(self, content):
print "callback executing---"
print "var addr before callback assign: "+str(hex(id(self.callback_called)))
self.callback_called=True
print "var addr after callback assign: "+str(hex(id(self.callback_called)))
def start(self):
self.ws.start()
#send a request to the server using the request library, which invokes all the trigger
#check the state variable:
print "var addr before check: "+str(hex(id(self.callback_called)))
if (not self.callback_called):
raise Exception("error...")
if __name__ == '__main__':
sm=SomeModule()
sm.start()
The output is then:
callback executing---
var addr before callback assign: 0x927910
var addr before callback assign: 0x927930
var addr before check: 0x927910
Can anyone suggest me a way how to avoid this?
In c++ its clear how to access pointer and mutex. Here however, I did not manage to find any ways to do a secure write on the variable...
Thanks a lot in advance!

Since you're using multithreading, and not multiprocessing in your tags, I'll still go ahead and post this answer.. Might be helpful for some.
Some objects are immutable, copied as you said. Other variables, such as dictionaries tend not to be and can be manipulated from functions or threats (not sure if threads only apply to certain cases).
If you pass a dict as a parameter to a thread for instance, that variable can be manipulated and that affects the original version of your variable.
However, doing this is risky. There might be update collisions, access violations and in general just hard to keep track of where things happen.
But here's an example of how to pass a dict into a thread and present the change. It's crude, but gives you a working example.
from threading import *
class server(Thread):
def __init__(self, o):
self.o = o
Thread.__init__(self)
self.start()
def run(self):
for i in range(3):
self.o['test'] = i
test_var = {'test' : 0}
server(test_var)
while len(enumerate()) > 1: # Stupid and oversimplified wait for threads to end.
pass
print(test_var)

The problem was actually somewhere else. :( The reason was, that Flask (the Webserver) was started in a separate thread and there some instances are copied for some reason. I did not dig any further, Just avoided to use Flask and switched to cherrypy and started it in a non-blocking mode.
Thinkgs started to work there as expected.
#torxed, thank you very much for your help. It indeed pointed me into the right direction!

Related

Python - track (catch) variable changes in imported (instantiated) Class from another file (sensors readings)

I have a script, let's say "sensors.py" in which I have a Class "Meas", that reads measurements from several sensors. These comes from serial ports, program makes some calculations on them and changes the class "self.variable_a value" and another self variables also. The readings are in continuous mode, i.e. the program automatically waits for a message to come from the sensor to the serial port and read whole line (it's done via pyserial library). Some transmit readings at frequency of 10Hz, others 20Hz or 100Hz. This is a really big and messy class, therefore I put it in a separate file.
In my "main.py" script I import this "sensors" file and instantiate the "Meas" Class. Now I have a problem. How can I run immediately some "on_changed_var_a" function in "main" script, only when the "variable_a" in "Meas" object has changed - without consuming CPU power with while loop (constatly checking whether by any chance the variable has not changed) or waiting with time.sleep()? I need to get the sensors readings changes and then run another functions in "main" script in the most efficient way, as fast as possible. Thanks in advance!
EDIT: added example files
"sensors.py" file:
import random
import time
import threading
running = True
class Meas1:
def __init__(self, xyz):
self.xyz = xyz
self.var_a = None
thr1 = threading.Thread(target=self.readings, daemon=True)
thr1.start()
def readings(self):
while running:
# simulating 5Hz sensor readings:
self.var_a = self.xyz * random.randint(1, 1000)
print(self.var_a)
time.sleep(0.2)
"main.py" file:
import time
import sensors
import threading
class MainClass:
def __init__(self):
print("started")
self.sensor1 = sensors.Meas1(xyz=7)
thr_ksr = threading.Thread(target=self.thr_keep_script_running, daemon=True)
thr_ksr.start()
# in this part I would like to run the on_changed_var_a function, immediately when var_a changes
thr_ksr.join()
def on_changed_var_a(self):
print("var_a changed: ", self.sensor1.var_a)
def thr_keep_script_running(self, t=10):
time.sleep(t)
sensors.running = False
print("stopped, sleeping 1 sec")
time.sleep(1)
mc = MainClass()

Not sure why this is tagged mutithreading. You need this function to be run on different thread?
To the problem. The easiest way would be to make Meas call function you will pass to it.
You could make variable_a a property and then in it's setter call the function you want. Function could be passed and assigned to self.call_on_a_change attr for example.
Edit:
I don't think there is a way to make function execute on different thread (well, you could start a new one for that purpose, which sounds like a great solution to me).
Another problem with threads is that you give control to the system. It decides when and for how long which thread runs. So "as fast as possible" is constrained by that.
Nonetheless, you could create a threading.Lock and try to acquire it from main thread. Then in the reading thread upon change you could release the Lock and allow main thread to execute all call_on_a_change. Something like this:
import time
import threading
lock = threading.Lock()
# change to locked
lock.acquire()
a_change_callbacks = []
def on_changed_var_a(new_a):
print(new_a)
def readings():
a_change_callbacks.append(lambda: on_changed_var_a('first `a` change'))
lock.release()
time.sleep(5)
a_change_callbacks.append(lambda: on_changed_var_a('second `a` change'))
lock.release()
time.sleep(5)
a_change_callbacks.append(lambda: on_changed_var_a('third `a` change'))
lock.release()
thr = threading.Thread(target=readings, daemon=True)
thr.start()
while True:
lock.acquire()
for callback in list(a_change_callbacks):
callback()
a_change_callbacks.remove(callback)
if not thr.is_alive():
break
It's not your class model, but I hope it's enough to show the idea :D

Access thread local object in different module - Python

I'm new to Python so please bear with my question.
Let's say my application has a module named message_printer which simply defines a print_message function to print the message. Now in my main file, I create two threads which calls print function in message_printer.
My question is: How can I set a different message per thread and access it in message_printer?
message_printer:
import threading
threadLocal = threading.local()
def print_message():
name = getattr(threadLocal, 'name', None);
print name
return
main:
import threading
import message_printer
threadLocal = threading.local()
class Executor (threading.Thread):
def __init__(self, name):
threading.Thread.__init__(self)
threadLocal.name = name
def run(self):
message_printer.print_message();
A = Executor("A");
A.start();
B = Executor("B");
B.start();
This just outputs None and None while I expect A and B. I also tried accessing threadLocal object inside the print_message function directly but doesn't work.
Note that this is just an example. In my application, the exact use case is for logging. Main launches a bunch of thread which call other modules. I want to have a different logger per thread (each thread should log to its own file) and each logger needs to be configured in Main. So I'm trying to instantiate logger per thread and set in thread local storage which can then be accessed in other modules.
What am I doing wrong? I'm following this question as an example Thread local storage in Python

The problem with your code, is that you are not assigning your name to the correct local() context. Your __init__() method is run in the main thread, before you start your A and B threads by calling .start().
Your first thread creation A = Executor("A"); will create a new thread A but update the local context of the main thread. Then, when you start A by calling A.start(); you will enter A:s context, with a separate local context. Here name is not defined and you end up with None as output. The same then happens for B.
In other words, to access the thread local variables you should be running the current thread, which you are when running .start() (which will call your .run() method), but not when creating the objects (running __init__()).
To get your current code working, you could store the data in each object (using self references) and then, when each thread is running, copy the content to the thread local context:
import threading
threadLocal = threading.local()
def print_message():
name = getattr(threadLocal, 'name', None);
print name
return
class Executor (threading.Thread):
def __init__(self, name):
threading.Thread.__init__(self)
# Store name in object using self reference
self.name = name
def run(self):
# Here we copy from object to local context,
# since the thread is running
threadLocal.name = self.name
print_message();
A = Executor("A")
A.start()
B = Executor("B")
B.start()
Note, though, in this situation, it is somewhat of an overkill to use the thread local context, since we already store the separate data values in the different objects. To use it directly from the objects, would require a small rewrite of print_message() though.

I think this may be helpful for your use case. Another way on how thread storage can been done across files/modules.

Reactive event loop in Python

I'm trying to build a system that collects data from some sources using I/O (HDD, network...)
For this, I have a class (controller) that launch the collectors.
Each collector is an infinite loop with a classic ETL process (extract, transform and load).
I want send some commands to the collectors (stop, reload settings...) from an interface (CLI, web...) and I'm not sure about how to do it.
For example, this is the skeleton for a collector:
class Collector(object):
def __init__(self):
self.reload_settings()
def reload_settings(self):
# Get the settings
# Set the settings as attributes
def process_data(self, data):
# Do something
def run(self):
while True:
data = retrieve_data()
self.process_data(data)
And this is the skeleton for the controller:
class Controller(object):
def __init__(self, collectors):
self.collectors = collectors
def run(self):
for collector in collectors:
collector.run()
def reload_settings(self):
??
def stop(self):
??
Is there a classic design pattern that solves this problem (Publish–subscribe, event loop, reactor...)? What is the best way to solve this problem?
PD: Obviously, this will be a multiprocess application and will run on a single machine.

There are multiple choices here, but they boil down to two major kinds: cooperative (event loop/reactor/coroutine/explicit greenlet), or preemptive (implicit greenlet/thread/multiprocess).
The first requires a lot more restructuring of your collectors. It can be a nice way to make the nondeterminism explicit, or to achieve massive concurrency, but neither of those seems relevant here. The second just requires sticking the collectors on threads, and using some synchronization mechanism for both communication and shared data. It seems like you have no shared data, and your communication is trivial and not highly time-sensitive. So, I'd go with threads.
Assuming you want to go with threads in the general sense, assuming your collectors are I/O-bound and you don't have dozens of them, I'd go with actual threads.
So, here's one way you can write it:
class Collector(threading.Thread):
def __init__(self):
self._reload_settings()
self._need_reload = threading.Event()
self._need_stop = threading.Event()
def _reload_settings(self):
# Get the settings
# Set the settings as attributes
self._need_reload.clear()
def reload_settings(self):
self._need_reload.set()
def stop(self):
self._need_stop.set()
def process_data(self, data):
# Do something
def run(self):
while not self._need_stop.is_set():
if self._need_reload.is_set():
self._reload_settings()
data = retrieve_data()
self.process_data(data)
class Controller(object):
def __init__(self, collectors):
self.collectors = collectors
def run(self):
for collector in self.collectors:
collector.start()
def reload_settings(self):
for collector in self.collectors:
collector.reload_settings()
def stop(self):
for collector in self.collectors:
collector.stop()
for collector in self.collectors:
collector.join()
(Although I'd call the Controller.run method stop, because it fits in better with the naming used not only by Thread, but also by the stdlib server classes and other similar things.)

I'd look at the possibility of adapting your case to socket-based client-server architecture where Controller would instantiate required number of Collectors each listening on its own port and handling received data in more elegant way through handle() method of the server. The fact that data comes from various I/O sources speaks even more for this solution - you could use Client part of this architecture to standarize the DataSource -> Collector protocol
https://docs.python.org/2/library/socketserver.html

How to use multiprocessing with class instances in Python?

I am trying to create a class than can run a separate process to go do some work that takes a long time, launch a bunch of these from a main module and then wait for them all to finish. I want to launch the processes once and then keep feeding them things to do rather than creating and destroying processes. For example, maybe I have 10 servers running the dd command, then I want them all to scp a file, etc.
My ultimate goal is to create a class for each system that keeps track of the information for the system in which it is tied to like IP address, logs, runtime, etc. But that class must be able to launch a system command and then return execution back to the caller while that system command runs, to followup with the result of the system command later.
My attempt is failing because I cannot send an instance method of a class over the pipe to the subprocess via pickle. Those are not pickleable. I therefore tried to fix it various ways but I can't figure it out. How can my code be patched to do this? What good is multiprocessing if you can't send over anything useful?
Is there any good documentation of multiprocessing being used with class instances? The only way I can get the multiprocessing module to work is on simple functions. Every attempt to use it within a class instance has failed. Maybe I should pass events instead? I don't understand how to do that yet.
import multiprocessing
import sys
import re
class ProcessWorker(multiprocessing.Process):
"""
This class runs as a separate process to execute worker's commands in parallel
Once launched, it remains running, monitoring the task queue, until "None" is sent
"""
def __init__(self, task_q, result_q):
multiprocessing.Process.__init__(self)
self.task_q = task_q
self.result_q = result_q
return
def run(self):
"""
Overloaded function provided by multiprocessing.Process. Called upon start() signal
"""
proc_name = self.name
print '%s: Launched' % (proc_name)
while True:
next_task_list = self.task_q.get()
if next_task is None:
# Poison pill means shutdown
print '%s: Exiting' % (proc_name)
self.task_q.task_done()
break
next_task = next_task_list[0]
print '%s: %s' % (proc_name, next_task)
args = next_task_list[1]
kwargs = next_task_list[2]
answer = next_task(*args, **kwargs)
self.task_q.task_done()
self.result_q.put(answer)
return
# End of ProcessWorker class
class Worker(object):
"""
Launches a child process to run commands from derived classes in separate processes,
which sit and listen for something to do
This base class is called by each derived worker
"""
def __init__(self, config, index=None):
self.config = config
self.index = index
# Launce the ProcessWorker for anything that has an index value
if self.index is not None:
self.task_q = multiprocessing.JoinableQueue()
self.result_q = multiprocessing.Queue()
self.process_worker = ProcessWorker(self.task_q, self.result_q)
self.process_worker.start()
print "Got here"
# Process should be running and listening for functions to execute
return
def enqueue_process(target): # No self, since it is a decorator
"""
Used to place an command target from this class object into the task_q
NOTE: Any function decorated with this must use fetch_results() to get the
target task's result value
"""
def wrapper(self, *args, **kwargs):
self.task_q.put([target, args, kwargs]) # FAIL: target is a class instance method and can't be pickled!
return wrapper
def fetch_results(self):
"""
After all processes have been spawned by multiple modules, this command
is called on each one to retreive the results of the call.
This blocks until the execution of the item in the queue is complete
"""
self.task_q.join() # Wait for it to to finish
return self.result_q.get() # Return the result
#enqueue_process
def run_long_command(self, command):
print "I am running number % as process "%number, self.name
# In here, I will launch a subprocess to run a long-running system command
# p = Popen(command), etc
# p.wait(), etc
return
def close(self):
self.task_q.put(None)
self.task_q.join()
if __name__ == '__main__':
config = ["some value", "something else"]
index = 7
workers = []
for i in range(5):
worker = Worker(config, index)
worker.run_long_command("ls /")
workers.append(worker)
for worker in workers:
worker.fetch_results()
# Do more work... (this would actually be done in a distributor in another class)
for worker in workers:
worker.close()
Edit: I tried to move the ProcessWorker class and the creation of the multiprocessing queues outside of the Worker class and then tried to manually pickle the worker instance. Even that doesn't work and I get an error
RuntimeError: Queue objects should only be shared between processes
through inheritance
. But I am only passing references of those queues into the worker instance?? I am missing something fundamental. Here is the modified code from the main section:
if __name__ == '__main__':
config = ["some value", "something else"]
index = 7
workers = []
for i in range(1):
task_q = multiprocessing.JoinableQueue()
result_q = multiprocessing.Queue()
process_worker = ProcessWorker(task_q, result_q)
worker = Worker(config, index, process_worker, task_q, result_q)
something_to_look_at = pickle.dumps(worker) # FAIL: Doesn't like queues??
process_worker.start()
worker.run_long_command("ls /")

So, the problem was that I was assuming that Python was doing some sort of magic that is somehow different from the way that C++/fork() works. I somehow thought that Python only copied the class, not the whole program into a separate process. I seriously wasted days trying to get this to work because all of the talk about pickle serialization made me think that it actually sent everything over the pipe. I knew that certain things could not be sent over the pipe, but I thought my problem was that I was not packaging things up properly.
This all could have been avoided if the Python docs gave me a 10,000 ft view of what happens when this module is used. Sure, it tells me what the methods of multiprocess module does and gives me some basic examples, but what I want to know is what is the "Theory of Operation" behind the scenes! Here is the kind of information I could have used. Please chime in if my answer is off. It will help me learn.
When you run start a process using this module, the whole program is copied into another process. But since it is not the "__main__" process and my code was checking for that, it doesn't fire off yet another process infinitely. It just stops and sits out there waiting for something to do, like a zombie. Everything that was initialized in the parent at the time of calling multiprocess.Process() is all set up and ready to go. Once you put something in the multiprocess.Queue or shared memory, or pipe, etc. (however you are communicating), then the separate process receives it and gets to work. It can draw upon all imported modules and setup just as if it was the parent. However, once some internal state variables change in the parent or separate process, those changes are isolated. Once the process is spawned, it now becomes your job to keep them in sync if necessary, either through a queue, pipe, shared memory, etc.
I threw out the code and started over, but now I am only putting one extra function out in the ProcessWorker, an "execute" method that runs a command line. Pretty simple. I don't have to worry about launching and then closing a bunch of processes this way, which has caused me all kinds of instability and performance issues in the past in C++. When I switched to launching processes at the beginning and then passing messages to those waiting processes, my performance improved and it was very stable.
BTW, I looked at this link to get help, which threw me off because the example made me think that methods were being transported across the queues: http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html
The second example of the first section used "next_task()" that appeared (to me) to be executing a task received via the queue.

Instead of attempting to send a method itself (which is impractical), try sending a name of a method to execute.
Provided that each worker runs the same code, it's a matter of a simple getattr(self, task_name).
I'd pass tuples (task_name, task_args), where task_args were a dict to be directly fed to the task method:
next_task_name, next_task_args = self.task_q.get()
if next_task_name:
task = getattr(self, next_task_name)
answer = task(**next_task_args)
...
else:
# poison pill, shut down
break

REF: https://stackoverflow.com/a/14179779
Answer on Jan 6 at 6:03 by David Lynch is not factually correct when he says that he was misled by
http://www.doughellmann.com/PyMOTW/multiprocessing/communication.html.
The code and examples provided are correct and work as advertised. next_task() is executing a task received via the queue -- try and understand what the Task.__call__() method is doing.
In my case what, tripped me up was syntax errors in my implementation of run(). It seems that the sub-process will not report this and just fails silently -- leaving things stuck in weird loops! Make sure you have some kind of syntax checker running e.g. Flymake/Pyflakes in Emacs.
Debugging via multiprocessing.log_to_stderr()F helped me narrow down the problem.

QObject::connect: Cannot queue arguments of type 'QTextCursor'

Im trying to send a signal from a non-main thread in PyQt but i dont know what am doing wrong! And when i execute the program it fails with this error:
QObject::connect: Cannot queue arguments of type 'QTextCursor'
(Make sure 'QTextCursor' is registered using qRegisterMetaType().)
here is my code:
class Sender(QtCore.QThread):
def __init__(self,q):
super(Sender,self).__init__()
self.q=q
def run(self):
while True:
pass
try: line = q.get_nowait()
# or q.get(timeout=.1)
except Empty:
pass
else:
self.emit(QtCore.SIGNAL('tri()'))
class Workspace(QMainWindow, Ui_MainWindow):
""" This class is for managing the whole GUI `Workspace'.
Currently a Workspace is similar to a MainWindow
"""
def __init__(self):
try:
from Queue import Queue, Empty
except ImportError:
while True:
#from queue import Queue, Empty # python 3.x
print "error"
ON_POSIX = 'posix' in sys.builtin_module_names
def enqueue_output(out, queue):
for line in iter(out.readline, b''):
queue.put(line)
out.close()
p= Popen(["java -Xmx256m -jar bin/HelloWorld.jar"],cwd=r'/home/karen/sphinx4-1.0beta5-src/sphinx4-1.0beta5/',stdout=PIPE, shell=True, bufsize= 4024)
q = Queue()
t = threading.Thread(target=enqueue_output, args=(p.stdout, q))
t.daemon = True # thread dies with the program
t.start()
self.sender= Sender(q)
self.connect(self.sender, QtCore.SIGNAL('tri()'), self.__action_About)
self.sender.start()
I think that my way of send parameter to the thread is wrong...
I need to know how to send parameters to a thread, in my case i need to send q to the worker thread.

Quite new to PyQt5, but this appears to happen when you try to do a GUI operation from a thread which is not the "application thread". I put this in quotes because it appears to be a mistake to think that, even in a fairly simple PyQt5 app, QApplication.instance().thread() will always return the same object.
The thing to do is to use the signal/slot mechanism to send any kind of data from a worker thread (a thread created in my case by extending QtCore.QRunnable, one other pattern apparently being QtCore.QThread and QtCore.QObject.moveToThread, see here).
Then also include a check in all your slot methods which are likely to receive data from a non-"application thread". Example which logs messages visually during execution:
def append_message(self, message):
# this "instance" method is very useful!
app_thread = QtWidgets.QApplication.instance().thread()
curr_thread = QtCore.QThread.currentThread()
if app_thread != curr_thread:
raise Exception('attempt to call MainWindow.append_message from non-app thread')
ms_now = datetime.datetime.now().isoformat(sep=' ', timespec='milliseconds')
self.messages_text_box.insertPlainText(f'{ms_now}: {message}\n')
# scroll to bottom
self.messages_text_box.moveCursor(QtGui.QTextCursor.End)
It's all too easy to just call this inadvertently and directly from a non-"application thread".
Making such a mistake then raise an exception is good, because it gives you a stack trace showing the culprit call. Then change the call so that it instead sends a signal to the GUI class, the slot for which could be the method in the GUI class (here append_message), or alternatively one which then in turn calls append_message.
In my example I've included the "scroll to bottom" line above because it was only when I added that line that these "cannot queue" errors started happening. In other words, it is perfectly possible to get away with a certain amount of non-compliant handling (in this case adding some more text with each call) without any error being raised... and only later do you then run into difficulties. To prevent this, I suggest that EVERY method in a GUI class with GUI functionality should include such a check!

Make sure 'QTextCursor' is registered using qRegisterMetaType().
Did you try to use qRegisterMetaType function?
The official manual says:
The class is used as a helper to marshall types in QVariant and in
queued signals and slots connections. It associates a type name to a
type so that it can be created and destructed dynamically at run-time.
Declare new types with Q_DECLARE_METATYPE() to make them available to
QVariant and other template-based functions. Call qRegisterMetaType()
to make type available to non-template based functions, such as the
queued signal and slot connections.

I would like to add the following notes to the #mike rodent's post which solved my problem (I'm using PyQt5):
Custom signals and slots can be used to avoid directly modifying GUI from thread other than "application thread" (I'm using Python threading module and the equivalent there to that is probably "main thread"). I find this website very useful for basic custom signal and slot setup. Pay attention to using a class (and not an instance) attribute.
To avoid the QObject::connect: Cannot queue arguments of type 'QTextCursor' message I needed to find the following locations and add some code:
Before the function __init__ of the class MainWindow: definition of class attribute; I needed to use something like class_attribute = pyqtSignal(str).
In the function __init__: self.class_attribute.connect(self.slot_name)
Inside of a thread (I mean the thread which is not the main thread): self.class_attribute.emit(str)
In the slot inside the main thread: "safety mechanism" proposed by #mike rodent.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.