I include an example usage of multiprocessing below. This is a process pool model. It is not as simple as it might be, but is relatively close in structure to the code I'm actually using. It also uses sqlalchemy, sorry.
My question is - I currently have a situation where I have a relatively long running Python script which is executing a number of functions which each look like the code below, so the parent process is the same in all cases. In other words, multiple pools are created by one python script. (I don't have to do it this way, I suppose, but the alternative is to use something like os.system and subprocess.) The problem is that these processes hang around and hold on to memory. The docs say these daemon processes are supposed to stick around till the parent process exits, but what about if the parent process then goes on to generate another pool or processes and doesn't exit immediately.
Calling terminate() works, but this doesn't seem terribly polite. Is there a good way to ask the processes to terminate nicely? I.e. clean up after yourself and go away now, I need to start up the next pool?
I also tried calling join() on the processes. According to the documentation this means wait for the processes to terminate. What if they don't plan to terminate? What actually happens is that the process hangs.
Thanks in advance.
Regards, Faheem.
import multiprocessing, time
class Worker(multiprocessing.Process):
"""Process executing tasks from a given tasks queue"""
def __init__(self, queue, num):
multiprocessing.Process.__init__(self)
self.num = num
self.queue = queue
self.daemon = True
def run(self):
import traceback
while True:
func, args, kargs = self.queue.get()
try:
print "trying %s with args %s"%(func.__name__, args)
func(*args, **kargs)
except:
traceback.print_exc()
self.queue.task_done()
class ProcessPool:
"""Pool of threads consuming tasks from a queue"""
def __init__(self, num_threads):
self.queue = multiprocessing.JoinableQueue()
self.workerlist = []
self.num = num_threads
for i in range(num_threads):
self.workerlist.append(Worker(self.queue, i))
def add_task(self, func, *args, **kargs):
"""Add a task to the queue"""
self.queue.put((func, args, kargs))
def start(self):
for w in self.workerlist:
w.start()
def wait_completion(self):
"""Wait for completion of all the tasks in the queue"""
self.queue.join()
for worker in self.workerlist:
print worker.__dict__
#worker.terminate() <--- terminate used here
worker.join() <--- join used here
start = time.time()
from sqlalchemy import *
from sqlalchemy.orm import *
dbuser = ''
password = ''
dbname = ''
dbstring = "postgres://%s:%s#localhost:5432/%s"%(dbuser, password, dbname)
db = create_engine(dbstring, echo=True)
m = MetaData(db)
def make_foo(i):
t1 = Table('foo%s'%i, m, Column('a', Integer, primary_key=True))
conn = db.connect()
for i in range(10):
conn.execute("DROP TABLE IF EXISTS foo%s"%i)
conn.close()
for i in range(10):
make_foo(i)
m.create_all()
def do(i, dbstring):
dbstring = "postgres://%s:%s#localhost:5432/%s"%(dbuser, password, dbname)
db = create_engine(dbstring, echo=True)
Session = scoped_session(sessionmaker())
Session.configure(bind=db)
Session.execute("ALTER TABLE foo%s SET ( autovacuum_enabled = false );"%i)
Session.execute("ALTER TABLE foo%s SET ( autovacuum_enabled = true );"%i)
Session.commit()
pool = ProcessPool(5)
for i in range(10):
pool.add_task(do, i, dbstring)
pool.start()
pool.wait_completion()
My way of dealing with this was:
import multiprocessing
for prc in multiprocessing.active_children():
prc.terminate()
I like this more so I don't have to pollute the worker function with some if clause.
You know multiprocessing already has classes for worker pools, right?
The standard way is to send your threads a quit signal:
queue.put(("QUIT", None, None))
Then check for it:
if func == "QUIT":
return
Related
What's the proper way to tell a looping thread to stop looping?
I have a fairly simple program that pings a specified host in a separate threading.Thread class. In this class it sleeps 60 seconds, the runs again until the application quits.
I'd like to implement a 'Stop' button in my wx.Frame to ask the looping thread to stop. It doesn't need to end the thread right away, it can just stop looping once it wakes up.
Here is my threading class (note: I haven't implemented looping yet, but it would likely fall under the run method in PingAssets)
class PingAssets(threading.Thread):
def __init__(self, threadNum, asset, window):
threading.Thread.__init__(self)
self.threadNum = threadNum
self.window = window
self.asset = asset
def run(self):
config = controller.getConfig()
fmt = config['timefmt']
start_time = datetime.now().strftime(fmt)
try:
if onlinecheck.check_status(self.asset):
status = "online"
else:
status = "offline"
except socket.gaierror:
status = "an invalid asset tag."
msg =("{}: {} is {}. \n".format(start_time, self.asset, status))
wx.CallAfter(self.window.Logger, msg)
And in my wxPyhton Frame I have this function called from a Start button:
def CheckAsset(self, asset):
self.count += 1
thread = PingAssets(self.count, asset, self)
self.threads.append(thread)
thread.start()
Threaded stoppable function
Instead of subclassing threading.Thread, one can modify the function to allow
stopping by a flag.
We need an object, accessible to running function, to which we set the flag to stop running.
We can use threading.currentThread() object.
import threading
import time
def doit(arg):
t = threading.currentThread()
while getattr(t, "do_run", True):
print ("working on %s" % arg)
time.sleep(1)
print("Stopping as you wish.")
def main():
t = threading.Thread(target=doit, args=("task",))
t.start()
time.sleep(5)
t.do_run = False
if __name__ == "__main__":
main()
The trick is, that the running thread can have attached additional properties. The solution builds
on assumptions:
the thread has a property "do_run" with default value True
driving parent process can assign to started thread the property "do_run" to False.
Running the code, we get following output:
$ python stopthread.py
working on task
working on task
working on task
working on task
working on task
Stopping as you wish.
Pill to kill - using Event
Other alternative is to use threading.Event as function argument. It is by
default False, but external process can "set it" (to True) and function can
learn about it using wait(timeout) function.
We can wait with zero timeout, but we can also use it as the sleeping timer (used below).
def doit(stop_event, arg):
while not stop_event.wait(1):
print ("working on %s" % arg)
print("Stopping as you wish.")
def main():
pill2kill = threading.Event()
t = threading.Thread(target=doit, args=(pill2kill, "task"))
t.start()
time.sleep(5)
pill2kill.set()
t.join()
Edit: I tried this in Python 3.6. stop_event.wait() blocks the event (and so the while loop) until release. It does not return a boolean value. Using stop_event.is_set() works instead.
Stopping multiple threads with one pill
Advantage of pill to kill is better seen, if we have to stop multiple threads
at once, as one pill will work for all.
The doit will not change at all, only the main handles the threads a bit differently.
def main():
pill2kill = threading.Event()
tasks = ["task ONE", "task TWO", "task THREE"]
def thread_gen(pill2kill, tasks):
for task in tasks:
t = threading.Thread(target=doit, args=(pill2kill, task))
yield t
threads = list(thread_gen(pill2kill, tasks))
for thread in threads:
thread.start()
time.sleep(5)
pill2kill.set()
for thread in threads:
thread.join()
This has been asked before on Stack. See the following links:
Is there any way to kill a Thread in Python?
Stopping a thread after a certain amount of time
Basically you just need to set up the thread with a stop function that sets a sentinel value that the thread will check. In your case, you'll have the something in your loop check the sentinel value to see if it's changed and if it has, the loop can break and the thread can die.
I read the other questions on Stack but I was still a little confused on communicating across classes. Here is how I approached it:
I use a list to hold all my threads in the __init__ method of my wxFrame class: self.threads = []
As recommended in How to stop a looping thread in Python? I use a signal in my thread class which is set to True when initializing the threading class.
class PingAssets(threading.Thread):
def __init__(self, threadNum, asset, window):
threading.Thread.__init__(self)
self.threadNum = threadNum
self.window = window
self.asset = asset
self.signal = True
def run(self):
while self.signal:
do_stuff()
sleep()
and I can stop these threads by iterating over my threads:
def OnStop(self, e):
for t in self.threads:
t.signal = False
I had a different approach. I've sub-classed a Thread class and in the constructor I've created an Event object. Then I've written custom join() method, which first sets this event and then calls a parent's version of itself.
Here is my class, I'm using for serial port communication in wxPython app:
import wx, threading, serial, Events, Queue
class PumpThread(threading.Thread):
def __init__ (self, port, queue, parent):
super(PumpThread, self).__init__()
self.port = port
self.queue = queue
self.parent = parent
self.serial = serial.Serial()
self.serial.port = self.port
self.serial.timeout = 0.5
self.serial.baudrate = 9600
self.serial.parity = 'N'
self.stopRequest = threading.Event()
def run (self):
try:
self.serial.open()
except Exception, ex:
print ("[ERROR]\tUnable to open port {}".format(self.port))
print ("[ERROR]\t{}\n\n{}".format(ex.message, ex.traceback))
self.stopRequest.set()
else:
print ("[INFO]\tListening port {}".format(self.port))
self.serial.write("FLOW?\r")
while not self.stopRequest.isSet():
msg = ''
if not self.queue.empty():
try:
command = self.queue.get()
self.serial.write(command)
except Queue.Empty:
continue
while self.serial.inWaiting():
char = self.serial.read(1)
if '\r' in char and len(msg) > 1:
char = ''
#~ print('[DATA]\t{}'.format(msg))
event = Events.PumpDataEvent(Events.SERIALRX, wx.ID_ANY, msg)
wx.PostEvent(self.parent, event)
msg = ''
break
msg += char
self.serial.close()
def join (self, timeout=None):
self.stopRequest.set()
super(PumpThread, self).join(timeout)
def SetPort (self, serial):
self.serial = serial
def Write (self, msg):
if self.serial.is_open:
self.queue.put(msg)
else:
print("[ERROR]\tPort {} is not open!".format(self.port))
def Stop(self):
if self.isAlive():
self.join()
The Queue is used for sending messages to the port and main loop takes responses back. I've used no serial.readline() method, because of different end-line char, and I have found the usage of io classes to be too much fuss.
Depends on what you run in that thread.
If that's your code, then you can implement a stop condition (see other answers).
However, if what you want is to run someone else's code, then you should fork and start a process. Like this:
import multiprocessing
proc = multiprocessing.Process(target=your_proc_function, args=())
proc.start()
now, whenever you want to stop that process, send it a SIGTERM like this:
proc.terminate()
proc.join()
And it's not slow: fractions of a second.
Enjoy :)
My solution is:
import threading, time
def a():
t = threading.currentThread()
while getattr(t, "do_run", True):
print('Do something')
time.sleep(1)
def getThreadByName(name):
threads = threading.enumerate() #Threads list
for thread in threads:
if thread.name == name:
return thread
threading.Thread(target=a, name='228').start() #Init thread
t = getThreadByName('228') #Get thread by name
time.sleep(5)
t.do_run = False #Signal to stop thread
t.join()
I find it useful to have a class, derived from threading.Thread, to encapsulate my thread functionality. You simply provide your own main loop in an overridden version of run() in this class. Calling start() arranges for the object’s run() method to be invoked in a separate thread.
Inside the main loop, periodically check whether a threading.Event has been set. Such an event is thread-safe.
Inside this class, you have your own join() method that sets the stop event object before calling the join() method of the base class. It can optionally take a time value to pass to the base class's join() method to ensure your thread is terminated in a short amount of time.
import threading
import time
class MyThread(threading.Thread):
def __init__(self, sleep_time=0.1):
self._stop_event = threading.Event()
self._sleep_time = sleep_time
"""call base class constructor"""
super().__init__()
def run(self):
"""main control loop"""
while not self._stop_event.isSet():
#do work
print("hi")
self._stop_event.wait(self._sleep_time)
def join(self, timeout=None):
"""set stop event and join within a given time period"""
self._stop_event.set()
super().join(timeout)
if __name__ == "__main__":
t = MyThread()
t.start()
time.sleep(5)
t.join(1) #wait 1s max
Having a small sleep inside the main loop before checking the threading.Event is less CPU intensive than looping continuously. You can have a default sleep time (e.g. 0.1s), but you can also pass the value in the constructor.
Sometimes you don't have control over the running target. In those cases you can use signal.pthread_kill to send a stop signal.
from signal import pthread_kill, SIGTSTP
from threading import Thread
from itertools import count
from time import sleep
def target():
for num in count():
print(num)
sleep(1)
thread = Thread(target=target)
thread.start()
sleep(5)
pthread_kill(thread.ident, SIGTSTP)
result
0
1
2
3
4
[14]+ Stopped
I'm using Python Python Multiprocessing for a RabbitMQ Consumers.
On Application Start I create 4 WorkerProcesses.
def start_workers(num=4):
for i in xrange(num):
process = WorkerProcess()
process.start()
Below you find my WorkerClass.
The Logic works so far, I create 4 parallel Consumer Processes.
But the Problem is after a Process got killed. I want to create a new Process. The Problem in the Logic below is that the new Process is created as child process from the old one and after a while the memory runs out of space.
Is there any possibility with Python Multiprocessing to start a new process and kill the old one correctly?
class WorkerProcess(multiprocessing.Process):
def ___init__(self):
app.logger.info('%s: Starting new Thread!', self.name)
super(multiprocessing.Process, self).__init__()
def shutdown(self):
process = WorkerProcess()
process.start()
return True
def kill(self):
start_workers(1)
self.terminate()
def run(self):
try:
# Connect to RabbitMQ
credentials = pika.PlainCredentials(app.config.get('RABBIT_USER'), app.config.get('RABBIT_PASS'))
connection = pika.BlockingConnection(
pika.ConnectionParameters(host=app.config.get('RABBITMQ_SERVER'), port=5672, credentials=credentials))
channel = connection.channel()
# Declare the Queue
channel.queue_declare(queue='screenshotlayer',
auto_delete=False,
durable=True)
app.logger.info('%s: Start to consume from RabbitMQ.', self.name)
channel.basic_qos(prefetch_count=1)
channel.basic_consume(callback, queue='screenshotlayer')
channel.start_consuming()
app.logger.info('%s: Thread is going to sleep!', self.name)
# do what channel.start_consuming() does but with stoppping signal
#while self.stop_working.is_set():
# channel.transport.connection.process_data_events()
channel.stop_consuming()
connection.close()
except Exception as e:
self.shutdown()
return 0
Thank You
In the main process, keep track of your subprocesses (in a list) and loop over them with .join(timeout=50) (https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process.join).
Then check is he is alive (https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process.is_alive).
If he is not, replace him with a fresh one.
def start_workers(n):
wks = []
for _ in range(n):
wks.append(WorkerProcess())
wks[-1].start()
while True:
#Remove all terminated process
wks = [p for p in wks if p.is_alive()]
#Start new process
for i in range(n-len(wks)):
wks.append(WorkerProcess())
wks[-1].start()
I would not handle the process pool management myself. Instead, I would use the ProcessPoolExecutor from the concurrent.future module.
No need to inherit the WorkerProcess to inherit the Process class. Just write your actual code in the class and then submit it to a process pool executor. The executor would have a pool of processes always ready to execute your tasks.
This way you can keep things simple and less headache for you.
You can read more about in my blog post here: http://masnun.com/2016/03/29/python-a-quick-introduction-to-the-concurrent-futures-module.html
Example Code:
from concurrent.futures import ProcessPoolExecutor
from time import sleep
def return_after_5_secs(message):
sleep(5)
return message
pool = ProcessPoolExecutor(3)
future = pool.submit(return_after_5_secs, ("hello"))
print(future.done())
sleep(5)
print(future.done())
print("Result: " + future.result())
I have a python script which has a line that makes a post request as shown below:
rsp = requests.post(img_url, data=img_json_data, headers=img_headers)
print rsp # just for debugging
But suppose I don't want my script to keep waiting for the response, but instead run the above lines asynchronously in parallel to the rest of the code. What would be the easiest way to do so?
This is a class that allow easy parallel execution on multiple workers.
Basically it creates worker threads, that wait for job in a Queue.
Once you put a task they execute it and put the results in another Queue.
join() will wait until everything is done, then we empty the results queue and return as an array.
from Queue import Queue
import logging
from threading import Thread
logger = logging.getLogger(__name__)
class Parallel(object):
def __init__(self, thread_num=10):
# create queues
self.tasks_queue = Queue()
self.results_queue = Queue()
# create a threading pool
self.pool = []
for i in range(thread_num):
worker = Worker(i, self.tasks_queue, self.results_queue)
self.pool.append(worker)
worker.start()
logger.debug('Created %s workers',thread_num)
def add_task(self, task_id, func, *args, **kwargs):
"""
Add task to queue, they will be started as soon as added
:param func: function to execute
:param args: args to transmit
:param kwargs: kwargs to transmit
"""
logger.debug('Adding one task to queue (%s)', func.__name__)
# add task to queue
self.tasks_queue.put_nowait((task_id, func, args, kwargs))
pass
def get_results(self):
logger.debug('Waiting for processes to ends')
self.tasks_queue.join()
logger.debug('Processes terminated, fetching results')
results = []
while not self.results_queue.empty():
results.append(self.results_queue.get())
logger.debug('Results fetched, returning data')
return dict(results)
class Worker(Thread):
def __init__(self, thread_id, tasks, results):
super(Worker, self).__init__()
self.id = thread_id
self.tasks = tasks
self.results = results
self.daemon = True
def run(self):
logger.debug('Worker %s launched', self.id)
while True:
task_id, func, args, kwargs = self.tasks.get()
logger.debug('Worker %s start to work on %s', self.id, func.__name__)
try:
self.results.put_nowait((task_id, func(*args, **kwargs)))
except Exception as err:
logger.debug('Thread(%s): error with task %s\n%s', self.id, repr(func.__name__), err)
finally:
logger.debug('Worker %s finished work on %s', self.id, func.__name__)
self.tasks.task_done()
import requests
# create parallel instance with 4 workers
parallel = Parallel(4)
# launch jobs
for i in range(20):
parallel.add_task(i, requests.post, img_url, data=img_json_data, headers=img_headers)
# wait for all jobs to return data
print parrallel.get_results()
You can use celery for the same. With celery the processing will be async and you can check for status as well as result. For further info click here
You need to queue this task for asynchronous processing.
There are multiple options here :
celery which has larger learning curve for a newbie. check here
python-rq which is relatively very light weight and a goto library. check here
You can use any of the message queues among redis,rabbitmq etc
Since I use a similar pattern in my work a lot, I decided to write a class that abstracts very simple worker concurrency via job queue / threading. I know there are already things out there that solve this, but I also wanted to use this as an opportunity to hone my multithreading skills.
The main challenge I've given myself is that I want this to be able to let processes finish, even if they are not explicitly blocked by Queue.join(). "A process finishing" is defined by the input function returning a value (or None). The way I have attempted to accomplish this is by having each job create it's own results queue rq, which is then checked by _wait_for_results in a non-daemon thread, which blocks the automatic exit of all other daemonized threads until rq is filled by the worker in add_to_queue.
Here is the full class:
class EasyPool(object):
def __init__(self, concurrency, always_finish=True):
def add_to_queue(q):
while True:
func_data, rq = q.get()
func, args, kwargs = func_data
if not args:
args = []
if not kwargs:
kwargs = {}
result = func(*args, **kwargs)
rq.put(result)
q.task_done()
self.rqs = []
self.always_finish = always_finish
self.q = Queue(maxsize=0)
self.workers = []
for i in range(concurrency):
worker = Thread(target=add_to_queue, args=(self.q,))
self.workers.append(worker)
worker.setDaemon(True)
worker.start()
def _wait_for_results(self, rq):
rq.not_empty.acquire()
rq.not_empty.wait()
rq.not_empty.notify()
rq.not_empty.release()
def add_job(self, func, *args, **kwargs):
rq = Queue()
if self.always_finish:
blocker = Thread(target=self._wait_for_results, args=(rq,))
blocker.setDaemon(False)
blocker.start()
to_add = []
[ to_add.append(i) if i else to_add.append(None) for i in [func, args, kwargs] ]
self.q.put((to_add, rq))
return rq.get
When a job is created via the .add_job instance method, it immediately returns a promise-like object, which is a reference to the .get method of the results queue. The problem I'm facing is that there seems to be a race condition between this .get and the _wait_for_results method. I think the answer probably involves a Lock or a Condition, but I'm not really sure. Any help is much appreciated :)
I have an idle background process to process data in a queue, which I've implemented in the following way. The data passed in this example is just an integer, but I will be passing lists with up to 1000 integers and putting up to 100 lists on the queue per sec. Is this the correct approach, or should I be looking at more elaborate RPC and server methods?
import multiprocessing
import Queue
import time
class MyProcess(multiprocessing.Process):
def __init__(self, queue, cmds):
multiprocessing.Process.__init__(self)
self.q = queue
self.cmds = cmds
def run(self):
exit_flag = False
while True:
try:
obj = self.q.get(False)
print obj
except Queue.Empty:
if exit_flag:
break
else:
pass
if not exit_flag and self.cmds.poll():
cmd = self.cmds.recv()
if cmd == -1:
exit_flag = True
time.sleep(.01)
if __name__ == '__main__':
queue = multiprocessing.Queue()
proc2main, main2proc = multiprocessing.Pipe(duplex=False)
p = MyProcess(queue, proc2main)
p.start()
for i in range(5):
queue.put(i)
main2proc.send(-1)
proc2main.close()
main2proc.close()
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
It depends on how long it will take to process the data. I can't tell because I don't have a sample of the data, but in general it is better to move to more elaborate RPC and server methods when you need things like load balancing, guaranteed uptime, or scalability. Just remember that these things will add complexity, which may make your application harder to deploy, debug, and maintain. It will also increase the latency that it takes to process a task (which might or might not be a concern to you).
I would test it with some sample data, and determine if you need the scalability that multiple servers provide.