Handling requests with multiple threads in python - python

I have class Sender that has a function add_task(task) to add the task to the thread pool. The task needs to be handled by one of the 10 worker threads. Each thread has to have a unique information, say their own address (needs to be loaded form the config file) to do the job.
Currently I was thinking of using ThreadPoolExecutor & threading.Local to store address for each worker thread. Something like this:
class Sender:
# addresses of len 10
def __init__(self, addresses):
self.executor = ThreadPoolExecutor(max_workers=10, initializer=initialize, initargs=(addresses))
def add_task(task):
fut = self.executor.submit(task)
fut.add_done_callback(dummy_callback)
def initialize(addresses):|
# ... thread name is <pool prefix>_i
retrieve that i
data = threading.Local()
data.address = addresses[i]
def task():
# do something with data.address
print(data.address)
def dummy_callback():
pass
I am not sure if I am going in the right direction. Could you give any hints on that?

Related

Is it right to init multiprocess in class __init__?

from multiprocessing.dummy import Pool as ThreadPool
class TSNew:
def __init__(self):
self.redis_client = redis.StrictRedis(host="172.17.31.147", port=4401, db=0)
self.global_switch = 0
self.pool = ThreadPool(40) # init pool
self.dnn_model = None
self.nnf = None
self.md5sum_nnf = "initialize"
self.thread = threading.Thread(target=self.load_model_item)
self.ts_picked_ids = None
self.thread.start()
self.memory = deque(maxlen=3000)
self.process = threading.Thread(target=self.process_user_dict)
self.process.start()
def load_model_item(self):
'''
code
'''
def predict_memcache(self,user_dict):
'''
code
'''
def process_user_dict(self):
while True:
'''
code to generate user_dicts which is a list
'''
results = self.pool.map(self.predict_memcache, user_dicts)
'''
code
'''
TSNew_ = TSNew()
def get_user_result():
logging.info("----------------come in ------------------")
if request.method == 'POST':
user_dict_json = request.get_data()# userid
if user_dict_json == '' or user_dict_json is None:
logging.info("----------------user_dict_json is ''------------------")
return ''
try:
user_dict = json.loads(user_dict_json)
except:
logging.info("json load error, pass")
return ''
TSNew_.memory.append(user_dict)
logging.info('add to deque TSNew_.memory size: %d PID: %d', len(TSNew_.memory), os.getpid())
logging.info("add to deque userid: %s, nation: %s \n",user_dict['user_id'], user_dict['user_country'])
return 'SUCCESS\n'
#app.route('/', methods=['POST'])
def get_ts_gbdt_id():
return get_user_result()
from werkzeug.contrib.fixers import ProxyFix
app.wsgi_app = ProxyFix(app.wsgi_app)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=4444)
I create a multi thread pool in class __init__ and I use the self.pool
to map the function of predict_memcache.
I have two doubts:
(a) Should I initialize the pool in __init__ or just init it right before
results = self.pool.map(self.predict_memcache, user_dicts)
(b) Since the pool is a multi thread operation and it is executed in the thread of process_user_dict, so is there any hidden error ?
Thanks.
Question (a):
It depends. If you need to run process_user_dict more than once, then it makes sense to start the pool in the constructor and keep it running. Creating a thread pool always comes with some overhead and by keeping the pool alive between calls to process_user_dict you would avoid that additional overhead.
If you just want to process one set of input, you can as well create your pool right inside process_user_dict. But probably not right before results = self.pool.map(self.predict_memcache, user_dicts) because that would create a pool for every iteration of your surrounding while loop.
In your specific case, it does not make any difference. You create your TSNew_ object on module-level, so that it remains alive (and with it the thread pool) while your app is running; the same thread pool from the same TSNew instance is used to process all the requests during the lifetime of app.run().
Since you seem to be using that construct with self.process = threading.Thread(target=self.process_user_dict) as some sort of listener on self.memory, creating the pool in the constructor is functionally equivalent to creating the pool inside of process_user_dict (but outside the loop).
Question (b):
Technically, there is no hidden error by default when creating a thread inside a thread. In the end, any additional thread's ultimate parent is always the MainThread, that is implicitly created for every instance of a Python interpreter. Basically, every time you create a thread inside a Python program, you create a thread in a thread.
Actually, your code does not even create a thread inside a thread. Your self.pool is created inside the MainThread. When the pool is instantiated via self.pool = ThreadPool(40) it creates the desired number (40) of worker threads, plus one worker handler thread, one task handler thread and one result handler thread. All of these are child threads of the MainThread. All you do with regards to your pool inside your thread under self.process is calling its map method to assign tasks to it.
However, I do not really see the point of what you are doing with that self.process here.
Making a guess, I would say that you want to start the loop in process_user_dict to act as kind of a listener on self.memory, so that the pool starts processing user_dict as soon as they start showing up in the deque in self.memory. From what I see you doing in get_user_result, you seem to get one user_dict per request. I understand that you might have concurrent user sessions passing in these dicts, but do you really see benfit from process_user_dict running in an infinite loop over simply calling TSNew_.process_user_dict() after TSNew_.memory.append(user_dict)? You could even omit self.memory completely and pass the dict directly to process_user_dict, unless I am missing something you did not show us.

python design pattern queue with workers

I'm currently working on a project that involves three components,
an observer that check for changes in a directory, a worker and an command line interface.
What I want to achieve is:
The observer, when a change happens send a string to the worker (add a job to the worker's queue).
The worker has a queue of jobs and forever works on his queue.
Now I want the possibility to run a python script to check the status of the worker (number of active jobs, errors and so on)
I don't know how to achieve this with python in terms of which component to use and how to link the three components.
I though as a singleton worker where the observer add a job to a queue but 1) I was not able to write a working code and 2) How can I fit the checker in?
Another solution that I thought of may be multiple child processes from a father that has the queue but I'm a bit lost...
Thanks for any advices
I'd use some kind of observer pattern or publish-subscribe pattern. For the former you can use for example the Python version of ReactiveX. But for a more basic example let's stay with the Python core. Parts of your program can subscribe to the worker and receive updates from the process via queues for example.
import itertools as it
from queue import Queue
from threading import Thread
import time
class Observable(Thread):
def __init__(self):
super().__init__()
self._observers = []
def notify(self, msg):
for obs in self._observers:
obs.put(msg)
def subscribe(self, obs):
self._observers.append(obs)
class Observer(Thread):
def __init__(self):
super().__init__()
self.updates = Queue()
class Watcher(Observable):
def run(self):
for i in it.count():
self.notify(i)
time.sleep(1)
class Worker(Observable, Observer):
def run(self):
while True:
task = self.updates.get()
self.notify((str(task), 'start'))
time.sleep(1)
self.notify((str(task), 'stop'))
class Supervisor(Observer):
def __init__(self):
super().__init__()
self._statuses = {}
def run(self):
while True:
status = self.updates.get()
print(status)
self._statuses[status[0]] = status[1]
# Do something based on status updates.
if status[1] == 'stop':
del self._statuses[status[0]]
watcher = Watcher()
worker = Worker()
supervisor = Supervisor()
watcher.subscribe(worker.updates)
worker.subscribe(supervisor.updates)
supervisor.start()
worker.start()
watcher.start()
However many variations are possible and you can check the various patterns which suits you most.

Python - Non-empty shared list on separate thread appears empty

I've two classes - MessageProducer and MessageConsumer.
MessageConsumer does the following:
receives messages and puts them in its message list "_unprocessed_msgs"
on a separate worker thread, moves the messages to internal list "_in_process_msgs"
on the worker thread, processes messages from "_in_process_msgs"
On my development environment, I'm facing issue with #2 above - after adding a message by performing step#1, when worker thread checks length of "_unprocessed_msgs", it gets it as zero.
When step #1 is repeated, the list properly shows 2 items on the thread on which the item was added. But in step #2, on worker thread, again the len(_unprocessed_msgs) returns zero.
Not sure why this is happening. Would really appreciate help any help on this.
I'm using Ubuntu 16.04 having Python 2.7.12.
Below is the sample source code. Please let me know if more information is required.
import threading
import time
class MessageConsumerThread(threading.Thread):
def __init__(self):
super(MessageConsumerThread, self).__init__()
self._unprocessed_msg_q = []
self._in_process_msg_q = []
self._lock = threading.Lock()
self._stop_processing = False
def start_msg_processing_thread(self):
self._stop_processing = False
self.start()
def stop_msg_processing_thread(self):
self._stop_processing = True
def receive_msg(self, msg):
with self._lock:
LOG.info("Before: MessageConsumerThread::receive_msg: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
self._unprocessed_msg_q.append(msg)
LOG.info("After: MessageConsumerThread::receive_msg: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
def _queue_unprocessed_msgs(self):
with self._lock:
LOG.info("MessageConsumerThread::_queue_unprocessed_msgs: "
"len(self._unprocessed_msg_q)=%s" %
len(self._unprocessed_msg_q))
if self._unprocessed_msg_q:
LOG.info("Moving messages from unprocessed to in_process queue")
self._in_process_msg_q += self._unprocessed_msg_q
self._unprocessed_msg_q = []
LOG.info("Moved messages from unprocessed to in_process queue")
def run(self):
while not self._stop_processing:
# Allow other threads to add messages to message queue
time.sleep(1)
# Move unprocessed listeners to in-process listener queue
self._queue_unprocessed_msgs()
# If nothing to process continue the loop
if not self._in_process_msg_q:
continue
for msg in self._in_process_msg_q:
self.consume_message(msg)
# Clean up processed messages
del self._in_process_msg_q[:]
def consume_message(self, msg):
print(msg)
class MessageProducerThread(threading.Thread):
def __init__(self, producer_id, msg_receiver):
super(MessageProducerThread, self).__init__()
self._producer_id = producer_id
self._msg_receiver = msg_receiver
def start_producing_msgs(self):
self.start()
def run(self):
for i in range(1,10):
msg = "From: %s; Message:%s" %(self._producer_id, i)
self._msg_receiver.receive_msg(msg)
def main():
msg_receiver_thread = MessageConsumerThread()
msg_receiver_thread.start_msg_processing_thread()
msg_producer_thread = MessageProducerThread(producer_id='Producer-01',
msg_receiver=msg_receiver_thread)
msg_producer_thread.start_producing_msgs()
msg_producer_thread.join()
msg_receiver_thread.stop_msg_processing_thread()
msg_receiver_thread.join()
if __name__ == '__main__':
main()
Following is the log the I get:
INFO: MessageConsumerThread::_queue_unprocessed_msgs: len(self._unprocessed_msg_q)=0
INFO: Before: MessageConsumerThread::receive_msg: len(self._unprocessed_msg_q)=0
INFO: After: MessageConsumerThread::receive_msg: **len(self._unprocessed_msg_q)=1**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: **len(self._unprocessed_msg_q)=0**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: len(self._unprocessed_msg_q)=0
INFO: Before: MessageConsumerThread::receive_msg: len(self._unprocessed_msg_q)=1
INFO: After: MessageConsumerThread::receive_msg: **len(self._unprocessed_msg_q)=2**
INFO: MessageConsumerThread::_queue_unprocessed_msgs: **len(self._unprocessed_msg_q)=0**
This is not a good desing for you application.
I spent some time trying to debug this - but threading code is naturally complicated, so we should try to descomplicate it, instead of getting it even more confure.
When I see threading code in Python, I usually see it written a in a procedural form: a normal function that is passed to threading.Thread as the target argument that drives each thread. That way, you don't need to write code for a new class that will have a single instance.
Another thing is that, although Python's global interpreter lock itself guarantees lists won't get corrupted if modified in two separate threads, lists are not a recomended "thread data passing" data structure. You probably should look at threading.Queue to do that
The thing is wrong in this code at first sight is probably not the cause of your problem due to your use of locks, but it might be. Instead of
self._unprocessed_msg_q = []
which will create a new list object, the other thread have momentarily no reference too (so it might write data to the old list), you should do:
self._unprocessed_msg_q[:] = []
Or just the del slice thing you do on the other method.
But to be on the safer side, and having mode maintanable and less surprising code, you really should change to a procedural approach there, assuming Python threading. Assume "Thread" is the "final" object that can do its thing, and then use Queues around:
# coding: utf-8
from __future__ import print_function
from __future__ import unicode_literals
from threading import Thread
try:
from queue import Queue, Empty
except ImportError:
from Queue import Queue, Empty
import time
import random
TERMINATE_SENTINEL = object()
NO_DATA_SENTINEL = object()
class Receiver(object):
def __init__(self, queue):
self.queue = queue
self.in_process = []
def receive_data(self, data):
self.in_process.append(data)
def consume_data(self):
print("received data:", self.in_process)
del self.in_process[:]
def receiver_loop(self):
queue = self.queue
while True:
try:
data = queue.get(block=False)
except Empty:
print("got no data from queue")
data = NO_DATA_SENTINEL
if data is TERMINATE_SENTINEL:
print("Got sentinel: exiting receiver loop")
break
self.receive_data(data)
time.sleep(random.uniform(0, 0.3))
if queue.empty():
# Only process data if we have nothing to receive right now:
self.consume_data()
print("sleeping receiver")
time.sleep(1)
if self.in_process:
self.consume_data()
def producer_loop(queue):
for i in range(10):
time.sleep(random.uniform(0.05, 0.4))
print("putting {0} in queue".format(i))
queue.put(i)
def main():
msg_queue = Queue()
msg_receiver_thread = Thread(target=Receiver(msg_queue).receiver_loop)
time.sleep(0.1)
msg_producer_thread = Thread(target=producer_loop, args=(msg_queue,))
msg_receiver_thread.start()
msg_producer_thread.start()
msg_producer_thread.join()
msg_queue.put(TERMINATE_SENTINEL)
msg_receiver_thread.join()
if __name__ == '__main__':
main()
note that since you want multiple methods in the recever thread to do things with data, I used a class - but it does not inherit from Thread, and does not have to worry about its workings. All its methods are called within the same thread: no need of locks, no worries about race conditions within the receiver class itself. For communicating outside the class, the Queue class is structured to handle any race conditions for us.
The producer loop, as it is just a dummy producer, has no need at all to be written in class form. But it would look just the same, if it had more methods.
(The random sleeps help visualize what would happen in "real world" message receiving)
Also, you might want to take a look at something like:
https://www.thoughtworks.com/insights/blog/composition-vs-inheritance-how-choose
Finally I was able to solve the issue. In the actual code, I've a Manager class that is responsible for instantiating MessageConsumerThread as its last thing in the initializer:
class Manager(object):
def __init__(self):
...
...
self._consumer = MessageConsumerThread(self)
self._consumer.start_msg_processing_thread()
The problem seems to be with passing 'self' in MessageConsumerThread initializer when Manager is still executing its initializer (eventhough those are last two steps). The moment I moved the creation of consumer out of initializer, consumer thread was able to see the elements in "_unprocessed_msg_q".
Please note that the issue is still not reproducible with the above sample code. It is manifesting itself in the production environment only. Without the above fix, I tried queue and dictionary as well but observed the same issue. After the fix, tried with queue and list and was able to successfully execute the code.
I really appreciate and thank #jsbueno and #ivan_pozdeev for their time and help! Community #stackoverflow is very helpful!

Django, Queue and Thread - don't free memory

User visit http://example.com/url/ and invoke page_parser from views.py. page_parser create instance of class Foo from script.py.
Each time http://example.com/url/ is visited I see that memory usage goes up and up. I guess Garbage Collector don't collect instantiated class Foo. Any ideas why is it so?
Here is the code:
views.py:
from django.http import HttpResponse
from script import Foo
from script import urls
# When user visits http://example.com/url/ I run `page_parser`
def page_parser(request):
Foo(urls)
return HttpResponse("alldone")
script.py:
import requests
from queue import Queue
from threading import Thread
class Newthread(Thread):
def __init__(self, queue, result):
Thread.__init__(self)
self.queue = queue
self.result = result
def run(self):
while True:
url = self.queue.get()
data = requests.get(url) # Download image at url
self.result.append(data)
self.queue.task_done()
class Foo:
def __init__(self, urls):
self.result = list()
self.queue = Queue()
self.startthreads()
for url in urls:
self.queue.put(url)
self.queue.join()
def startthreads(self):
for x in range(3):
worker = Newthread(queue=self.queue, result=self.result)
worker.daemon = True
worker.start()
urls = [
"https://static.pexels.com/photos/106399/pexels-photo-106399.jpeg",
"https://static.pexels.com/photos/164516/pexels-photo-164516.jpeg",
"https://static.pexels.com/photos/206172/pexels-photo-206172.jpeg",
"https://static.pexels.com/photos/32870/pexels-photo.jpg",
"https://static.pexels.com/photos/106399/pexels-photo-106399.jpeg",
"https://static.pexels.com/photos/164516/pexels-photo-164516.jpeg",
"https://static.pexels.com/photos/206172/pexels-photo-206172.jpeg",
"https://static.pexels.com/photos/32870/pexels-photo.jpg",
"https://static.pexels.com/photos/32870/pexels-photo.jpg",
"https://static.pexels.com/photos/106399/pexels-photo-106399.jpeg",
"https://static.pexels.com/photos/164516/pexels-photo-164516.jpeg",
"https://static.pexels.com/photos/206172/pexels-photo-206172.jpeg",
"https://static.pexels.com/photos/32870/pexels-photo.jpg"]
There's several moving parts involved, but what I think happens is the following:
WSGI processes are not killed after each request, so things may persist.
You create 3 new threads, but don't let them join the main thread again, for example when the queue is empty.
Since the reference count to Foo.queue never reaches zero (as the threads are still alive waiting for new queue items), it cannot be garbage collected
So you keep creating new threads, new Foo classes and none of them can be freed.
I'm not an expert on queue.Queue, but my theory can be verified if you can watch the number of threads in the WSGI process go up with 3 each request (for example using top(1)).
As a side note, this is a side-effect of your class design. You do everything in __init__, which should really only be assigning class attributes.

Runing class method multiple times parallel in Python

I have implemented a Python socket server. It sends image data from multiple cameras to a client. My request handler class looks like:
class RequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
while True:
data = self.request.recv(1024)
if data.endswith('0000000050'): # client requests data
for camera_id, camera_path in _video_devices.iteritems():
message = self.create_image_transfer_message(camera_id, camera_path)
self.request.sendto(message, self.client_address)
def create_image_transfer_message(self, camera_id, camera_path):
# somecode ...
I am forced to stick to the socket server because of the client. It works however the problem is that it works sequentially, so there are large delays between the camera images being uploaded. I would like to create the transfer messages in parallel with a small delay between the calls.
I tried to use the pool class from multiprocessing:
import multiprocessing
class RequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
...
pool = multiprocessing.Pool(processes=4)
messages = [pool.apply(self.create_image_transfer_message, args=(camera_id, camera_path)) for camId, camPath in _video_devices.iteritems()]
But this throws:
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
I want to know if there is an another way to create those transfer messages in parallel with a defined delay between the calls?
EDIT:
I create the response messages using data from multiple cameras. The problem is, that if I run the image grabbing routines too close to each other I get image artifacts, because the USB bus is overloaded. I figured out, that calling the image grabbing sequentially with 0.2 sec delay will solve the problem. The cameras are not sending data the whole time the image grabbing function is running, so the delayed parallel cal result in good images with only a small delay between them.
I think you're on the right path already, no need to throw away your work.
Here's an answer to how to use a class method with multiprocessing I found via Google after searching for "multiprocessing class method"
from multiprocessing import Pool
import time
pool = Pool(processes=2)
def unwrap_self_f(arg, **kwarg):
return C.create_image_transfer_message(*arg, **kwarg)
class RequestHandler(SocketServer.BaseRequestHandler):
#classmethod
def create_image_transfer_message(cls, camera_id, camera_path):
# your logic goes here
def handle(self):
while True:
data = self.request.recv(1024)
if not data.endswith('0000000050'): # client requests data
continue
pool.map(unwrap_self_f,
(
(camera_id, camera_path)
for camera_id, camera_path in _video_devices.iteritems()
)
)
Note, if you want to return values from the workers then you'll need to explore using a shared resource see this answer here - How can I recover the return value of a function passed to multiprocessing.Process?
This code did the trick for me:
class RequestHandler(SocketServer.BaseRequestHandler):
def handle(self):
while True:
data = self.request.recv(1024)
if data.endswith('0000000050'): # client requests data
process_manager = multiprocessing.Manager()
messaging_queue = process_manager.Queue()
jobs = []
for camId, camPath in _video_devices.iteritems():
p = multiprocessing.Process(target=self.create_image_transfer_message,
args=(camera_id, camera_path, messaging_queue))
jobs.append(p)
p.start()
time.sleep(0.3)
# wait for all processes to finish
for p in jobs:
p.join()
while not messaging_queue.empty():
self.request.sendto(messaging_queue.get(), self.client_address)

Categories