I'm currently working on a project that involves three components,
an observer that check for changes in a directory, a worker and an command line interface.
What I want to achieve is:
The observer, when a change happens send a string to the worker (add a job to the worker's queue).
The worker has a queue of jobs and forever works on his queue.
Now I want the possibility to run a python script to check the status of the worker (number of active jobs, errors and so on)
I don't know how to achieve this with python in terms of which component to use and how to link the three components.
I though as a singleton worker where the observer add a job to a queue but 1) I was not able to write a working code and 2) How can I fit the checker in?
Another solution that I thought of may be multiple child processes from a father that has the queue but I'm a bit lost...
Thanks for any advices
I'd use some kind of observer pattern or publish-subscribe pattern. For the former you can use for example the Python version of ReactiveX. But for a more basic example let's stay with the Python core. Parts of your program can subscribe to the worker and receive updates from the process via queues for example.
import itertools as it
from queue import Queue
from threading import Thread
import time
class Observable(Thread):
def __init__(self):
super().__init__()
self._observers = []
def notify(self, msg):
for obs in self._observers:
obs.put(msg)
def subscribe(self, obs):
self._observers.append(obs)
class Observer(Thread):
def __init__(self):
super().__init__()
self.updates = Queue()
class Watcher(Observable):
def run(self):
for i in it.count():
self.notify(i)
time.sleep(1)
class Worker(Observable, Observer):
def run(self):
while True:
task = self.updates.get()
self.notify((str(task), 'start'))
time.sleep(1)
self.notify((str(task), 'stop'))
class Supervisor(Observer):
def __init__(self):
super().__init__()
self._statuses = {}
def run(self):
while True:
status = self.updates.get()
print(status)
self._statuses[status[0]] = status[1]
# Do something based on status updates.
if status[1] == 'stop':
del self._statuses[status[0]]
watcher = Watcher()
worker = Worker()
supervisor = Supervisor()
watcher.subscribe(worker.updates)
worker.subscribe(supervisor.updates)
supervisor.start()
worker.start()
watcher.start()
However many variations are possible and you can check the various patterns which suits you most.
Related
I have class Sender that has a function add_task(task) to add the task to the thread pool. The task needs to be handled by one of the 10 worker threads. Each thread has to have a unique information, say their own address (needs to be loaded form the config file) to do the job.
Currently I was thinking of using ThreadPoolExecutor & threading.Local to store address for each worker thread. Something like this:
class Sender:
# addresses of len 10
def __init__(self, addresses):
self.executor = ThreadPoolExecutor(max_workers=10, initializer=initialize, initargs=(addresses))
def add_task(task):
fut = self.executor.submit(task)
fut.add_done_callback(dummy_callback)
def initialize(addresses):|
# ... thread name is <pool prefix>_i
retrieve that i
data = threading.Local()
data.address = addresses[i]
def task():
# do something with data.address
print(data.address)
def dummy_callback():
pass
I am not sure if I am going in the right direction. Could you give any hints on that?
I created a small flask app to download images and text from pages, this can take verly long time, so
I would like to execute my requests in parell. I create threaded tasks. I would like this tasks to be able to download text or images from sites. I keep my tasks in a list of workers.
However I would like to select a method which thread will execute and then start whole thread.
How can I pass my method to thread run method()? Will this be a sub daemon thread?
import threading
import time
workers = []
class SavePage:
def get_text(self):
print("Getting text")
def get_images(self):
print("Getting images")
class Task(threading.Thread):
def __init__(self):
super().__init__()
self.save_page = SavePage()
def get_text_from_page(self):
self.save_page.get_text()
def get_images_from_page(self):
self.save_page.get_images()
if __name__ == '__main__':
task = Task()
task.get_images_from_page() # Why this executes, when I didn't put task.start() ?
# Moreover, is this really threaded? or just uses a method from class Task?
workers.append(task) # I want this list to be empty, after job is finished
print("".join(str(worker.is_alive()) for worker in workers)) #
print(workers)
task.get_images_from_page() # Why this executes, when I didn't put task.start() ?
# Moreover, is this really threaded? or just uses a method from class Task?
It's not threaded. It's just a normal method call in the main thread.
Thread.start is the method that will start Thread.run function inside another thread.
You could set some state in __init__ to choose which function to execute:
class Task(threading.Thread):
def __init__(self, action):
super().__init__()
self.save_page = SavePage()
self.action = action
def get_text_from_page(self):
self.save_page.get_text()
def get_images_from_page(self):
self.save_page.get_images()
def run(self):
if self.action == "text":
self.get_text_from_page()
elif self.action == "images":
self.get_images_from_page()
Keep in mind that threads can be run in simpler way by passing target function:
def target_func():
save_page = SavePage()
save_page.get_images()
t = threading.Thread(target=target_func)
t.start()
# or in this simple case:
save_page = SavePage()
t = threading.Thread(target=save_page.get_images)
t.start()
I've two classes - MessageProducer and MessageConsumer.
MessageConsumer does the following:
receives messages and puts them in its message list "_unprocessed_msgs"
on a separate worker thread, moves the messages to internal list "_in_process_msgs"
on the worker thread, processes messages from "_in_process_msgs"
On my development environment, I'm facing issue with #2 above - after adding a message by performing step#1, when worker thread checks length of "_unprocessed_msgs", it gets it as zero.
When step #1 is repeated, the list properly shows 2 items on the thread on which the item was added. But in step #2, on worker thread, again the len(_unprocessed_msgs) returns zero.
Not sure why this is happening. Would really appreciate help any help on this.
I'm using Ubuntu 16.04 having Python 2.7.12.
Below is the sample source code. Please let me know if more information is required.
import threading
import time
class MessageConsumerThread(threading.Thread):
def __init__(self):
super(MessageConsumerThread, self).__init__()
self._unprocessed_msg_q = []
self._in_process_msg_q = []
self._lock = threading.Lock()
self._stop_processing = False
def start_msg_processing_thread(self):
self._stop_processing = False
self.start()
def stop_msg_processing_thread(self):
self._stop_processing = True
def receive_msg(self, msg):
with self._lock:
LOG.info("Before: MessageConsumerThread::receive_msg: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
self._unprocessed_msg_q.append(msg)
LOG.info("After: MessageConsumerThread::receive_msg: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
def _queue_unprocessed_msgs(self):
with self._lock:
LOG.info("MessageConsumerThread::_queue_unprocessed_msgs: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
if self._unprocessed_msg_q:
LOG.info("Moving messages from unprocessed to in_process queue")
self._in_process_msg_q += self._unprocessed_msg_q
self._unprocessed_msg_q = []
LOG.info("Moved messages from unprocessed to in_process queue")
def run(self):
while not self._stop_processing:
# Allow other threads to add messages to message queue
time.sleep(1)
# Move unprocessed listeners to in-process listener queue
self._queue_unprocessed_msgs()
# If nothing to process continue the loop
if not self._in_process_msg_q:
continue
for msg in self._in_process_msg_q:
self.consume_message(msg)
# Clean up processed messages
del self._in_process_msg_q[:]
def consume_message(self, msg):
print(msg)
class MessageProducerThread(threading.Thread):
def __init__(self, producer_id, msg_receiver):
super(MessageProducerThread, self).__init__()
self._producer_id = producer_id
self._msg_receiver = msg_receiver
def start_producing_msgs(self):
self.start()
def run(self):
for i in range(1,10):
msg = "From: %s; Message:%s" %(self._producer_id, i)
self._msg_receiver.receive_msg(msg)
def main():
msg_receiver_thread = MessageConsumerThread()
msg_receiver_thread.start_msg_processing_thread()
msg_producer_thread = MessageProducerThread(producer_id='Producer-01',
msg_receiver=msg_receiver_thread)
msg_producer_thread.start_producing_msgs()
msg_producer_thread.join()
msg_receiver_thread.stop_msg_processing_thread()
msg_receiver_thread.join()
if __name__ == '__main__':
main()
Following is the log the I get:
INFO: MessageConsumerThread::_queue_unprocessed_msgs: len(self._unprocessed_msg_q)=0
INFO: Before: MessageConsumerThread::receive_msg: len(self._unprocessed_msg_q)=0
INFO: After: MessageConsumerThread::receive_msg: **len(self._unprocessed_msg_q)=1**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: **len(self._unprocessed_msg_q)=0**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: len(self._unprocessed_msg_q)=0
INFO: Before: MessageConsumerThread::receive_msg: len(self._unprocessed_msg_q)=1
INFO: After: MessageConsumerThread::receive_msg: **len(self._unprocessed_msg_q)=2**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: **len(self._unprocessed_msg_q)=0**
This is not a good desing for you application.
I spent some time trying to debug this - but threading code is naturally complicated, so we should try to descomplicate it, instead of getting it even more confure.
When I see threading code in Python, I usually see it written a in a procedural form: a normal function that is passed to threading.Thread as the target argument that drives each thread. That way, you don't need to write code for a new class that will have a single instance.
Another thing is that, although Python's global interpreter lock itself guarantees lists won't get corrupted if modified in two separate threads, lists are not a recomended "thread data passing" data structure. You probably should look at threading.Queue to do that
The thing is wrong in this code at first sight is probably not the cause of your problem due to your use of locks, but it might be. Instead of
self._unprocessed_msg_q = []
which will create a new list object, the other thread have momentarily no reference too (so it might write data to the old list), you should do:
self._unprocessed_msg_q[:] = []
Or just the del slice thing you do on the other method.
But to be on the safer side, and having mode maintanable and less surprising code, you really should change to a procedural approach there, assuming Python threading. Assume "Thread" is the "final" object that can do its thing, and then use Queues around:
# coding: utf-8
from __future__ import print_function
from __future__ import unicode_literals
from threading import Thread
try:
from queue import Queue, Empty
except ImportError:
from Queue import Queue, Empty
import time
import random
TERMINATE_SENTINEL = object()
NO_DATA_SENTINEL = object()
class Receiver(object):
def __init__(self, queue):
self.queue = queue
self.in_process = []
def receive_data(self, data):
self.in_process.append(data)
def consume_data(self):
print("received data:", self.in_process)
del self.in_process[:]
def receiver_loop(self):
queue = self.queue
while True:
try:
data = queue.get(block=False)
except Empty:
print("got no data from queue")
data = NO_DATA_SENTINEL
if data is TERMINATE_SENTINEL:
print("Got sentinel: exiting receiver loop")
break
self.receive_data(data)
time.sleep(random.uniform(0, 0.3))
if queue.empty():
# Only process data if we have nothing to receive right now:
self.consume_data()
print("sleeping receiver")
time.sleep(1)
if self.in_process:
self.consume_data()
def producer_loop(queue):
for i in range(10):
time.sleep(random.uniform(0.05, 0.4))
print("putting {0} in queue".format(i))
queue.put(i)
def main():
msg_queue = Queue()
msg_receiver_thread = Thread(target=Receiver(msg_queue).receiver_loop)
time.sleep(0.1)
msg_producer_thread = Thread(target=producer_loop, args=(msg_queue,))
msg_receiver_thread.start()
msg_producer_thread.start()
msg_producer_thread.join()
msg_queue.put(TERMINATE_SENTINEL)
msg_receiver_thread.join()
if __name__ == '__main__':
main()
note that since you want multiple methods in the recever thread to do things with data, I used a class - but it does not inherit from Thread, and does not have to worry about its workings. All its methods are called within the same thread: no need of locks, no worries about race conditions within the receiver class itself. For communicating outside the class, the Queue class is structured to handle any race conditions for us.
The producer loop, as it is just a dummy producer, has no need at all to be written in class form. But it would look just the same, if it had more methods.
(The random sleeps help visualize what would happen in "real world" message receiving)
Also, you might want to take a look at something like:
https://www.thoughtworks.com/insights/blog/composition-vs-inheritance-how-choose
Finally I was able to solve the issue. In the actual code, I've a Manager class that is responsible for instantiating MessageConsumerThread as its last thing in the initializer:
class Manager(object):
def __init__(self):
...
...
self._consumer = MessageConsumerThread(self)
self._consumer.start_msg_processing_thread()
The problem seems to be with passing 'self' in MessageConsumerThread initializer when Manager is still executing its initializer (eventhough those are last two steps). The moment I moved the creation of consumer out of initializer, consumer thread was able to see the elements in "_unprocessed_msg_q".
Please note that the issue is still not reproducible with the above sample code. It is manifesting itself in the production environment only. Without the above fix, I tried queue and dictionary as well but observed the same issue. After the fix, tried with queue and list and was able to successfully execute the code.
I really appreciate and thank #jsbueno and #ivan_pozdeev for their time and help! Community #stackoverflow is very helpful!
I was looking for some good example of managing worker process from Qt GUI created in Python. I need this to be as complete as possible, including reporting progress from the process, including aborting the process, including handling of possible errors coming from the process.
I only found some semi-finished examples which only did part of work but when I tried to make them complete I failed. My current design comes in three layers:
1) there is the main thread in which resides the GUI and ProcessScheduler which controls that only one instance of worker process is running and can abort it
2) there is another thread in which I have ProcessObserver which actually runs the process and understands the stuff coming from queue (which is used for inter-process communication), this must be in non-GUI thread to keep GUI responsive
3) there is the actual worker process which executes a given piece of code (my future intention is to replace multiprocessing with multiprocess or pathos or something else what can pickle function objects, but this is not my current issue) and report progress or result to the queue
Currently I have this snippet (the print functions in the code are just for debugging and will be deleted eventually):
import multiprocessing
from PySide import QtCore, QtGui
QtWidgets = QtGui
N = 10000000
# I would like this to be a function object
# but multiprocessing cannot pickle it :(
# so I will use multiprocess in the future
CODE = """
# calculates sum of numbers from 0 to n-1
# reports percent progress of finished work
sum = 0
progress = -1
for i in range(n):
sum += i
p = i * 100 // n
if p > progress:
queue.put(["progress", p])
progress = p
queue.put(["result", sum])
"""
class EvalProcess(multiprocessing.Process):
def __init__(self, code, symbols):
super(EvalProcess, self).__init__()
self.code= code
self.symbols = symbols # symbols must contain 'queue'
def run(self):
print("EvalProcess started")
exec(self.code, self.symbols)
print("EvalProcess finished")
class ProcessObserver(QtCore.QObject):
"""Resides in worker thread. Its role is to understand
to what is received from the process via the queue."""
progressChanged = QtCore.Signal(float)
finished = QtCore.Signal(object)
def __init__(self, process, queue):
super(ProcessObserver, self).__init__()
self.process = process
self.queue = queue
def run(self):
print("ProcessObserver started")
self.process.start()
try:
while True:
# this loop keeps running and listening to the queue
# even if the process is aborted
result = self.queue.get()
print("received from queue:", result)
if result[0] == "progress":
self.progressChanged.emit(result[1])
elif result[0] == "result":
self.finished.emit(result[1])
break
except Exception as e:
print(e) # QUESTION: WHAT HAPPENS WHEN THE PROCESS FAILS?
self.process.join() # QUESTION: DO I NEED THIS LINE?
print("ProcessObserver finished")
class ProcessScheduler(QtCore.QObject):
"""Resides in the main thread."""
sendText = QtCore.Signal(str)
def __init__(self):
super(ProcessScheduler, self).__init__()
self.observer = None
self.thread = None
self.process = None
self.queue = None
def start(self):
if self.process: # Q: IS THIS OK?
# should kill current process and start a new one
self.abort()
self.queue = multiprocessing.Queue()
self.process = EvalProcess(CODE, {"n": N, "queue": self.queue})
self.thread = QtCore.QThread()
self.observer = ProcessObserver(self.process, self.queue)
self.observer.moveToThread(self.thread)
self.observer.progressChanged.connect(self.onProgressChanged)
self.observer.finished.connect(self.onResultReceived)
self.thread.started.connect(self.observer.run)
self.thread.finished.connect(self.onThreadFinished)
self.thread.start()
self.sendText.emit("Calculation started")
def abort(self):
self.process.terminate()
self.sendText.emit("Aborted.")
self.onThreadFinished()
def onProgressChanged(self, percent):
self.sendText.emit("Progress={}%".format(percent))
def onResultReceived(self, result):
print("onResultReceived called")
self.sendText.emit("Result={}".format(result))
self.thread.quit()
def onThreadFinished(self):
print("onThreadFinished called")
self.thread.deleteLater() # QUESTION: DO I NEED THIS LINE?
self.thread = None
self.observer = None
self.process = None
self.queue = None
if __name__ == '__main__':
app = QtWidgets.QApplication([])
scheduler = ProcessScheduler()
window = QtWidgets.QWidget()
layout = QtWidgets.QVBoxLayout(window)
startButton = QtWidgets.QPushButton("sum(range({}))".format(N))
startButton.pressed.connect(scheduler.start)
layout.addWidget(startButton)
abortButton = QtWidgets.QPushButton("Abort")
abortButton.pressed.connect(scheduler.abort)
layout.addWidget(abortButton)
console = QtWidgets.QPlainTextEdit()
scheduler.sendText.connect(console.appendPlainText)
layout.addWidget(console)
window.show()
app.exec_()
It works kind of OK but it still lacks proper error handling and aborting of process. Especially I am now struggling with the aborting. The main problem is that the worker thread keeps running (in the loop listening to the queue) even if the process has been aborted/terminated in the middle of calculation (or at least it prints this error in the console QThread: Destroyed while thread is still running). Is there a way to solve this? Or any alternative approach? Or, if possible, any real-life and compete example of such task fulfilling all the requirements mentioned above? Any comment would be much appreciated.
I am working on a xmlrpc server which has to perform certain tasks cyclically. I am using twisted as the core of the xmlrpc service but I am running into a little problem:
class cemeteryRPC(xmlrpc.XMLRPC):
def __init__(self, dic):
xmlrpc.XMLRPC.__init__(self)
def xmlrpc_foo(self):
return 1
def cycle(self):
print "Hello"
time.sleep(3)
class cemeteryM( base ):
def __init__(self, dic): # dic is for cemetery
multiprocessing.Process.__init__(self)
self.cemRPC = cemeteryRPC()
def run(self):
# Start reactor on a second process
reactor.listenTCP( c.PORT_XMLRPC, server.Site( self.cemRPC ) )
p = multiprocessing.Process( target=reactor.run )
p.start()
while not self.exit.is_set():
self.cemRPC.cycle()
#p.join()
if __name__ == "__main__":
import errno
test = cemeteryM()
test.start()
# trying new method
notintr = False
while not notintr:
try:
test.join()
notintr = True
except OSError, ose:
if ose.errno != errno.EINTR:
raise ose
except KeyboardInterrupt:
notintr = True
How should i go about joining these two process so that their respective joins doesn't block?
(I am pretty confused by "join". Why would it block and I have googled but can't find much helpful explanation to the usage of join. Can someone explain this to me?)
Regards
Do you really need to run Twisted in a separate process? That looks pretty unusual to me.
Try to think of Twisted's Reactor as your main loop - and hang everything you need off that - rather than trying to run Twisted as a background task.
The more normal way of performing this sort of operation would be to use Twisted's .callLater or to add a LoopingCall object to the Reactor.
e.g.
from twisted.web import xmlrpc, server
from twisted.internet import task
from twisted.internet import reactor
class Example(xmlrpc.XMLRPC):
def xmlrpc_add(self, a, b):
return a + b
def timer_event(self):
print "one second"
r = Example()
m = task.LoopingCall(r.timer_event)
m.start(1.0)
reactor.listenTCP(7080, server.Site(r))
reactor.run()
Hey asdvawev - .join() in multiprocessing works just like .join() in threading - it's a blocking call the main thread runs to wait for the worker to shut down. If the worker never shuts down, then .join() will never return. For example:
class myproc(Process):
def run(self):
while True:
time.sleep(1)
Calling run on this means that join() will never, ever return. Typically to prevent this I'll use an Event() object passed into the child process to allow me to signal the child when to exit:
class myproc(Process):
def __init__(self, event):
self.event = event
Process.__init__(self)
def run(self):
while not self.event.is_set():
time.sleep(1)
Alternatively, if your work is encapsulated in a queue - you can simply have the child process work off of the queue until it encounters a sentinel (typically a None entry in the queue) and then shut down.
Both of these suggestions means that prior to calling .join() you can send set the event, or insert the sentinel and when join() is called, the process will finish it's current task and then exit properly.