Python - Non-empty shared list on separate thread appears empty

Python - Non-empty shared list on separate thread appears empty - python

I've two classes - MessageProducer and MessageConsumer.
MessageConsumer does the following:
receives messages and puts them in its message list "_unprocessed_msgs"
on a separate worker thread, moves the messages to internal list "_in_process_msgs"
on the worker thread, processes messages from "_in_process_msgs"
On my development environment, I'm facing issue with #2 above - after adding a message by performing step#1, when worker thread checks length of "_unprocessed_msgs", it gets it as zero.
When step #1 is repeated, the list properly shows 2 items on the thread on which the item was added. But in step #2, on worker thread, again the len(_unprocessed_msgs) returns zero.
Not sure why this is happening. Would really appreciate help any help on this.
I'm using Ubuntu 16.04 having Python 2.7.12.
Below is the sample source code. Please let me know if more information is required.
import threading
import time
class MessageConsumerThread(threading.Thread):
def __init__(self):
super(MessageConsumerThread, self).__init__()
self._unprocessed_msg_q = []
self._in_process_msg_q = []
self._lock = threading.Lock()
self._stop_processing = False
def start_msg_processing_thread(self):
self._stop_processing = False
self.start()
def stop_msg_processing_thread(self):
self._stop_processing = True
def receive_msg(self, msg):
with self._lock:
LOG.info("Before: MessageConsumerThread::receive_msg: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
self._unprocessed_msg_q.append(msg)
LOG.info("After: MessageConsumerThread::receive_msg: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
def _queue_unprocessed_msgs(self):
with self._lock:
LOG.info("MessageConsumerThread::_queue_unprocessed_msgs: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
if self._unprocessed_msg_q:
LOG.info("Moving messages from unprocessed to in_process queue")
self._in_process_msg_q += self._unprocessed_msg_q
self._unprocessed_msg_q = []
LOG.info("Moved messages from unprocessed to in_process queue")
def run(self):
while not self._stop_processing:
# Allow other threads to add messages to message queue
time.sleep(1)
# Move unprocessed listeners to in-process listener queue
self._queue_unprocessed_msgs()
# If nothing to process continue the loop
if not self._in_process_msg_q:
continue
for msg in self._in_process_msg_q:
self.consume_message(msg)
# Clean up processed messages
del self._in_process_msg_q[:]
def consume_message(self, msg):
print(msg)
class MessageProducerThread(threading.Thread):
def __init__(self, producer_id, msg_receiver):
super(MessageProducerThread, self).__init__()
self._producer_id = producer_id
self._msg_receiver = msg_receiver
def start_producing_msgs(self):
self.start()
def run(self):
for i in range(1,10):
msg = "From: %s; Message:%s" %(self._producer_id, i)
self._msg_receiver.receive_msg(msg)
def main():
msg_receiver_thread = MessageConsumerThread()
msg_receiver_thread.start_msg_processing_thread()
msg_producer_thread = MessageProducerThread(producer_id='Producer-01',
msg_receiver=msg_receiver_thread)
msg_producer_thread.start_producing_msgs()
msg_producer_thread.join()
msg_receiver_thread.stop_msg_processing_thread()
msg_receiver_thread.join()
if __name__ == '__main__':
main()
Following is the log the I get:
INFO: MessageConsumerThread::_queue_unprocessed_msgs: len(self._unprocessed_msg_q)=0
INFO: Before: MessageConsumerThread::receive_msg: len(self._unprocessed_msg_q)=0
INFO: After: MessageConsumerThread::receive_msg: **len(self._unprocessed_msg_q)=1**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: **len(self._unprocessed_msg_q)=0**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: len(self._unprocessed_msg_q)=0
INFO: Before: MessageConsumerThread::receive_msg: len(self._unprocessed_msg_q)=1
INFO: After: MessageConsumerThread::receive_msg: **len(self._unprocessed_msg_q)=2**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: **len(self._unprocessed_msg_q)=0**

This is not a good desing for you application.
I spent some time trying to debug this - but threading code is naturally complicated, so we should try to descomplicate it, instead of getting it even more confure.
When I see threading code in Python, I usually see it written a in a procedural form: a normal function that is passed to threading.Thread as the target argument that drives each thread. That way, you don't need to write code for a new class that will have a single instance.
Another thing is that, although Python's global interpreter lock itself guarantees lists won't get corrupted if modified in two separate threads, lists are not a recomended "thread data passing" data structure. You probably should look at threading.Queue to do that
The thing is wrong in this code at first sight is probably not the cause of your problem due to your use of locks, but it might be. Instead of
self._unprocessed_msg_q = []
which will create a new list object, the other thread have momentarily no reference too (so it might write data to the old list), you should do:
self._unprocessed_msg_q[:] = []
Or just the del slice thing you do on the other method.
But to be on the safer side, and having mode maintanable and less surprising code, you really should change to a procedural approach there, assuming Python threading. Assume "Thread" is the "final" object that can do its thing, and then use Queues around:
# coding: utf-8
from __future__ import print_function
from __future__ import unicode_literals
from threading import Thread
try:
from queue import Queue, Empty
except ImportError:
from Queue import Queue, Empty
import time
import random
TERMINATE_SENTINEL = object()
NO_DATA_SENTINEL = object()
class Receiver(object):
def __init__(self, queue):
self.queue = queue
self.in_process = []
def receive_data(self, data):
self.in_process.append(data)
def consume_data(self):
print("received data:", self.in_process)
del self.in_process[:]
def receiver_loop(self):
queue = self.queue
while True:
try:
data = queue.get(block=False)
except Empty:
print("got no data from queue")
data = NO_DATA_SENTINEL
if data is TERMINATE_SENTINEL:
print("Got sentinel: exiting receiver loop")
break
self.receive_data(data)
time.sleep(random.uniform(0, 0.3))
if queue.empty():
# Only process data if we have nothing to receive right now:
self.consume_data()
print("sleeping receiver")
time.sleep(1)
if self.in_process:
self.consume_data()
def producer_loop(queue):
for i in range(10):
time.sleep(random.uniform(0.05, 0.4))
print("putting {0} in queue".format(i))
queue.put(i)
def main():
msg_queue = Queue()
msg_receiver_thread = Thread(target=Receiver(msg_queue).receiver_loop)
time.sleep(0.1)
msg_producer_thread = Thread(target=producer_loop, args=(msg_queue,))
msg_receiver_thread.start()
msg_producer_thread.start()
msg_producer_thread.join()
msg_queue.put(TERMINATE_SENTINEL)
msg_receiver_thread.join()
if __name__ == '__main__':
main()
note that since you want multiple methods in the recever thread to do things with data, I used a class - but it does not inherit from Thread, and does not have to worry about its workings. All its methods are called within the same thread: no need of locks, no worries about race conditions within the receiver class itself. For communicating outside the class, the Queue class is structured to handle any race conditions for us.
The producer loop, as it is just a dummy producer, has no need at all to be written in class form. But it would look just the same, if it had more methods.
(The random sleeps help visualize what would happen in "real world" message receiving)
Also, you might want to take a look at something like:
https://www.thoughtworks.com/insights/blog/composition-vs-inheritance-how-choose

Finally I was able to solve the issue. In the actual code, I've a Manager class that is responsible for instantiating MessageConsumerThread as its last thing in the initializer:
class Manager(object):
def __init__(self):
...
...
self._consumer = MessageConsumerThread(self)
self._consumer.start_msg_processing_thread()
The problem seems to be with passing 'self' in MessageConsumerThread initializer when Manager is still executing its initializer (eventhough those are last two steps). The moment I moved the creation of consumer out of initializer, consumer thread was able to see the elements in "_unprocessed_msg_q".
Please note that the issue is still not reproducible with the above sample code. It is manifesting itself in the production environment only. Without the above fix, I tried queue and dictionary as well but observed the same issue. After the fix, tried with queue and list and was able to successfully execute the code.
I really appreciate and thank #jsbueno and #ivan_pozdeev for their time and help! Community #stackoverflow is very helpful!

Related

How can I send a value from a python script to another script?

For example the first script:
from secondScript import Second
---
""
""
""
while True:
lastResult = <a list> --> I need to send this result to other script
---
My other script
class Second:
def __init__(self):
""
""
""
self.dum = Thread(target=self.func1)
self.dum.deamon = True
self.dum.start()
self.tis = Thread(target=self.func2, args= <a list>)
self.tis.deamon = True
self.tis.start()
def func1(self):
while True:
""
""
""
def func2(self, lastResult):
while True:
print(lastResult)
As a result, I want to send the value which I found in the first script to a infinity thread function in script 2. I can't import first script to second because I am also getting another values from script 2 to script 1.
Edit:
We can think of it like: There is a part of my program that is already running. We can say that I am getting real time images from the camera. While the whole code is running, it also generates a number value continuously and uninterruptedly. All of these operations are done in the 1st file. While the 1st file continues to work, it needs to continuously send this number to the 2nd file. In the second code, 2 different infinite loop functions are running at the same time. In the 1st function, the data from the arduino is constantly being read continuously and uninterruptedly. The 2nd function should print the number which coming from the 1st code. So actually there is nothing I can change in code 1. I am generating the number value. I need to send it to code 2 somehow. I'm not sure how to edit the code you wrote. Any sleep etc. I can't use any interrupt method because in code 1 the camera should work without interruption.

Ok this answers your question, but I changed a bit the structure. I would do it as follows:
EDIT: If you have streams of continuous data like the CameraFeed you mentioned, you can use a Queue with a pattern like this (You don't really need the Second class in this case, you can implement the CameraFeed and CameraDataConsumer in different classes).
If the dum thread does not send data to the tis thread, you can use the send_data method of tis to send data to it through main function and remove the queue from CameraFeed.
from threading import Event, Thread
from queue import Queue, Full
import time
class CameraFeed(Thread):
def __init__(self, queue):
super().__init__()
# This event is used to stop the thread
# Initially it is unset (False)
self._stopped_event = Event()
self._queue = queue
self._data = []
def run(self):
# sample_data is for demo purposes remove it and
# fetch data however you do it
sample_data = iter(range(10**10))
# Loop as long as the stopped event is not set
while not self._stopped_event.is_set():
# Get data from camera this is mock data
data = next(sample_data)
# Put data in the list for data to be sent
self._data.append(data)
# Get the next available item from the list
data = self._data.pop(0)
try:
# This tries to puts in the queue
# if queue is at max capacity raises a Full Exception
self._queue.put_nowait(data)
print('CameraFeed, sent data:', data)
except Full:
# If exception occures, put the data back to list
self._data.insert(0, data)
def stop(self):
# Sets the stopped event, so the thread exits the run loop
self._stopped_event.set()
class CameraDataConsumer(Thread):
def __init__(self, queue):
super().__init__()
self._stopped_event = Event()
self._queue = queue
def run(self):
while not self._stopped_event.is_set():
# Waits for data from queue
data = self._queue.get(block=True)
# If data is None then do nothing
if data is None:
continue
print('CameraConsumer, got data:', data)
def send_data(self, data):
"""Method to send data to this thread from main probably"""
self._queue.put(data, block=True)
def stop(self):
# Set the stopped event flag
self._stopped_event.set()
# Try to put data to queue, to wake up the thread
try:
self._queue.put_nowait(Data(None, EventType.OPERATION))
except Full:
# If queue is full, don't do anything it is probably
# safe to assume that setting the stop flag is sufficient
print('Queue is full')
# Create a Queue with capacity 1000
queue = Queue(maxsize=1000)
dum = CameraFeed(queue)
dum.start()
tis = CameraDataConsumer(queue)
tis.start()
# time.sleep is for demo purposes
time.sleep(1)
tis.stop()
dum.stop()
tis.join()
dum.join()

In first script:
print(lastResult, end='\n', file=sys.stdout, flush=True)
In other script and other thread:
second = Second()
...
for lastResult in sys.stdin:
lastResult = lastResult[:-1]
second.func2(lastResult)
...

Python socketio with multiprocessing

So I have been struggling with this one error of pickle which is driving me crazy. I have the following master Engine class with the following code :
import eventlet
import socketio
import multiprocessing
from multiprocessing import Queue
from multi import SIOSerever
class masterEngine:
if __name__ == '__main__':
serverObj = SIOSerever()
try:
receiveData = multiprocessing.Process(target=serverObj.run)
receiveData.start()
receiveProcess = multiprocessing.Process(target=serverObj.fetchFromQueue)
receiveProcess.start()
receiveData.join()
receiveProcess.join()
except Exception as error:
print(error)
and I have another file called multi which runs like the following :
import multiprocessing
from multiprocessing import Queue
import eventlet
import socketio
class SIOSerever:
def __init__(self):
self.cycletimeQueue = Queue()
self.sio = socketio.Server(cors_allowed_origins='*',logger=False)
self.app = socketio.WSGIApp(self.sio, static_files={'/': 'index.html',})
self.ws_server = eventlet.listen(('0.0.0.0', 5000))
#self.sio.on('production')
def p_message(sid, message):
self.cycletimeQueue.put(message)
print("I logged : "+str(message))
def run(self):
eventlet.wsgi.server(self.ws_server, self.app)
def fetchFromQueue(self):
while True:
cycle = self.cycletimeQueue.get()
print(cycle)
As you can see I can trying to create two processes of def run and fetchFromQueue which i want to run independently.
My run function starts the python-socket server to which im sending some data from a html web page ( This runs perfectly without multiprocessing). I am then trying to push the data received to a Queue so that my other function can retrieve it and play with the data received.
I have a set of time taking operations that I need to carry out with the data received from the socket which is why im pushing it all into a Queue.
On running the master Engine class I receive the following :
Can't pickle <class 'threading.Thread'>: it's not the same object as threading.Thread
I ended!
[Finished in 0.5s]
Can you please help with what I am doing wrong?

From multiprocessing programming guidelines:
Explicitly pass resources to child processes
On Unix using the fork start method, a child process can make use of a shared resource created in a parent process using a global resource. However, it is better to pass the object as an argument to the constructor for the child process.
Apart from making the code (potentially) compatible with Windows and the other start methods this also ensures that as long as the child process is still alive the object will not be garbage collected in the parent process. This might be important if some resource is freed when the object is garbage collected in the parent process.
Therefore, I slightly modified your example by removing everything unnecessary, but showing an approach where the shared queue is explicitly passed to all processes that use it:
import multiprocessing
MAX = 5
class SIOSerever:
def __init__(self, queue):
self.cycletimeQueue = queue
def run(self):
for i in range(MAX):
self.cycletimeQueue.put(i)
#staticmethod
def fetchFromQueue(cycletimeQueue):
while True:
cycle = cycletimeQueue.get()
print(cycle)
if cycle >= MAX - 1:
break
def start_server(queue):
server = SIOSerever(queue)
server.run()
if __name__ == '__main__':
try:
queue = multiprocessing.Queue()
receiveData = multiprocessing.Process(target=start_server, args=(queue,))
receiveData.start()
receiveProcess = multiprocessing.Process(target=SIOSerever.fetchFromQueue, args=(queue,))
receiveProcess.start()
receiveData.join()
receiveProcess.join()
except Exception as error:
print(error)
0
1
...

Python Multi Threading not working properly

I can't seem to get this Multi Threading code to work with my already structured Python script of a simple IP Pining script with a few other features.
After testing the Multi Threading code i though i was ready to implement onto my code, however i can't seem to be able to call a new thread correctly. I know this because if Multi Threading was working properly my GUI interface would not stop responding when the scanall() function gets executed upon pressing the Scan all IPs button on the GUI interface.
I'm also not getting anymore errors after finishing the implementation, so it's hard to know now what to proceed with. This extremely frustrating thank you for the help guys, i would love to tackle this one down!
This is the Multi Threading code:
class ThreadManager:
"""Multi Threading manager"""
def __init__(self):
pass
def start(self, threads):
thread_refs = []
for i in range(threads):
t = MyThread(i) # Thread(args=(1,)) # target=test(),
t.daemon = True
print('starting thread %i' % i)
t.start()
for t in thread_refs:
t.join()
class MyThread(Thread):
"""Multi Threading"""
def __init__(self, i):
Thread.__init__(self)
self.i = i
def run(self):
while True:
print('thread # {}'.format(self.i))
time.sleep(.25)
break
And This is the code that executes the multi threading:
print("[Debug] Main Thread has been started")
self.manager = ThreadManager()
self.manager.start(1)
This is the Github for the entire script code and the Multi Threading implementation.
https://github.com/Hontiris1/IPPing

As you are not adding the value of t to thread_refs array. Its empty and is not waiting for the threads to join.
Change you start function like this:
def start(self, threads):
thread_refs = []
for i in range(threads):
t = MyThread(i) # Thread(args=(1,)) # target=test(),
t.daemon = True
print('starting thread %i' % i)
t.start()
thread_refs.append(t)
for t in thread_refs:
t.join()
secondly you might want to remove the break statement from your while loop in the run function. Otherwise it will exit after printing thread 0 once.

Is there a python library for notification and waiting?

I'm using python-zookeeper for locking, and I'm trying to figure out a way of getting the execution to wait for notification when it's watching a file, because zookeeper.exists() returns immediately, rather than blocking.
Basically, I have the code listed below, but I'm unsure of the best way to implement the notify() and wait_for_notification() functions. It could be done with os.kill() and signal.pause(), but I'm sure that's likely to cause problems if I later have multiple locks in one program - is there a specific Python library that is good for this sort of thing?
def get_lock(zh):
lockfile = zookeeper.create(zh,lockdir + '/guid-lock-','lock', [ZOO_OPEN_ACL_UNSAFE], zookeeper.EPHEMERAL | zookeeper.SEQUENCE)
while(True):
# this won't work for more than one waiting process, fix later
children = zookeeper.get_children(zh, lockdir)
if len(children) == 1 and children[0] == basename(lockfile):
return lockfile
# yeah, there's a problem here, I'll fix it later
for child in children:
if child < basename(lockfile):
break
# exists will call notify when the watched file changes
if zookeeper.exists(zh, lockdir + '/' + child, notify):
# Process should wait here until notify() wakes it
wait_for_notification()
def drop_lock(zh,lockfile):
zookeeper.delete(zh,lockfile)
def notify(zh, unknown1, unknown2, lockfile):
pass
def wait_for_notification():
pass

The Condition variables from Python's threading module are probably a very good fit for what you're trying to do:
http://docs.python.org/library/threading.html#condition-objects
I've extended to the example to make it a little more obvious how you would adapt it for your purposes:
#!/usr/bin/env python
from collections import deque
from threading import Thread,Condition
QUEUE = deque()
def an_item_is_available():
return bool(QUEUE)
def get_an_available_item():
return QUEUE.popleft()
def make_an_item_available(item):
QUEUE.append(item)
def consume(cv):
cv.acquire()
while not an_item_is_available():
cv.wait()
print 'We got an available item', get_an_available_item()
cv.release()
def produce(cv):
cv.acquire()
make_an_item_available('an item to be processed')
cv.notify()
cv.release()
def main():
cv = Condition()
Thread(target=consume, args=(cv,)).start()
Thread(target=produce, args=(cv,)).start()
if __name__ == '__main__':
main()

My answer may not be relevant to your question, but it is relevant to the question title.
from threading import Thread,Event
locker = Event()
def MyJob(locker):
while True:
#
# do some logic here
#
locker.clear() # Set event state to 'False'
locker.wait() # suspend the thread until event state is 'True'
worker_thread = Thread(target=MyJob, args=(locker,))
worker_thread.start()
#
# some main thread logic here
#
locker.set() # This sets the event state to 'True' and thus it resumes the worker_thread
More information here: https://docs.python.org/3/library/threading.html#event-objects

python multithreading synchronization

I am having a synchronization problem while threading with cPython. I have two files, I parse them and return the desired result. However, the code below acts strangely and returns three times instead of two plus doesn't return in the order I put them into queue. Here's the code:
import Queue
import threading
from HtmlDoc import Document
OUT_LIST = []
class Threader(threading.Thread):
"""
Start threading
"""
def __init__(self, queue, out_queue):
threading.Thread.__init__(self)
self.queue = queue
self.out_queue = out_queue
def run(self):
while True:
if self.queue.qsize() == 0: break
path, host = self.queue.get()
f = open(path, "r")
source = f.read()
f.close()
self.out_queue.put((source, host))
self.queue.task_done()
class Processor(threading.Thread):
"""
Process threading
"""
def __init__(self, out_queue):
self.out_queue = out_queue
self.l_first = []
self.f_append = self.l_first.append
self.l_second = []
self.s_append = self.l_second.append
threading.Thread.__init__(self)
def first(self, doc):
# some code to to retrieve the text desired, this works 100% I tested it manually
def second(self, doc):
# some code to to retrieve the text desired, this works 100% I tested it manually
def run(self):
while True:
if self.out_queue.qsize() == 0: break
doc, host = self.out_queue.get()
if host == "first":
self.first(doc)
elif host == "second":
self.second(doc)
OUT_LIST.extend(self.l_first + self.l_second)
self.out_queue.task_done()
def main():
queue = Queue.Queue()
out_queue = Queue.Queue()
queue.put(("...first.html", "first"))
queue.put(("...second.html", "second"))
qsize = queue.qsize()
for i in range(qsize):
t = Threader(queue, out_queue)
t.setDaemon(True)
t.start()
for i in range(qsize):
dt = Processor(out_queue)
dt.setDaemon(True)
dt.start()
queue.join()
out_queue.join()
print '<br />'.join(OUT_LIST)
main()
Now, when I print, I'd like to print the content of the "first" first of all and then the content of the "second". Can anyone help me?
NOTE: I am threading because actually I will have to connect more than 10 places at a time and retrieve its results. I believe that threading is the most appropriate way to accomplish such a task

I am threading because actually I will have to connect more than 10 places at a time and retrieve its results. I believe that threading is the most appropriate way to accomplish such a task
Threading is actually one of the most error-prone ways to manage multiple concurrent connections. A more powerful, more debuggable approach is to use event-driven asynchronous networking, such as implemented by Twisted. If you're interested in using this model, you might want to check out this introduction.

I dont share the same opinion that threading is the best way to do this (IMO some events/select mechanism would be better) but problem with your code could be in variables t and dt. You have the assignements in the cycle and object instances are to stored anywhere - so it may be possible that your new instance of Thread/Processor get deleted at the end of the each cycle.
It would be more clarified if you show us precise output of this code.

1) You cannot control order of job completion. It depends on execution time, so to return results as you want you can create global dictionary with job objects, like job_results : {'first' : None, 'second' : None} and store results here, than you can fetch data on desired order
2) self.first and self.second should be cleared after each processed doc, else you will have duplicates in OUT_LIST
3) You may use multi-processing with subprocess module and put all result data to CSV files for example and them sort them as you wish.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Non-empty shared list on separate thread appears empty - python

Related

How can I send a value from a python script to another script?

Python socketio with multiprocessing

Python Multi Threading not working properly

Is there a python library for notification and waiting?

python multithreading synchronization

Categories

Resources