I need to run multiple background asynchronous functions, using multiprocessing. I have working Popen solution, but it looks a bit unnatural. Example:
from time import sleep
from multiprocessing import Process, Value
import subprocess
def worker_email(keyword):
subprocess.Popen(["python", "mongoworker.py", str(keyword)])
return True
keywords_list = ['apple', 'banana', 'orange', 'strawberry']
if __name__ == '__main__':
for keyword in keywords_list:
# Do work
p = Process(target=worker_email, args=(keyword,))
p.start()
p.join()
If I try not to use Popen, like:
def worker_email(keyword):
print('Before:' + keyword)
sleep(10)
print('After:' + keyword)
return True
Functions run one-by-one, no async. So, how to run all functions at the same time without using Popen?
UPD: I'm using multiprocessing.Value to return results from Process, like:
def worker_email(keyword, func_result):
sleep(10)
print('Yo:' + keyword)
func_result.value = 1
return True
func_result = Value('i', 0)
p = Process(target=worker_email, args=(doc['check_id'],func_result))
p.start()
# Change status
if func_result.value == 1:
stream.update_one({'_id': doc['_id']}, {"$set": {"status": True}}, upsert=False)
But it doesn't work without .join(). Any ideas how to make it work or similar way? :)
If you just remove the line p.join() it should work.
You only need p.join if you want to wait for the process to finish before executing further. At the end of the Program python waits for all Process to finished before closing, so you don't need to worry about that.
Solved problem with getting Process result by transferring result check and status update into worker function. Something like:
# Update task status if work is done
def update_status(task_id, func_result):
# Connect to DB
client = MongoClient('mongodb://localhost:27017/')
db = client.admetric
stream = db.stream
# Update task status if OK
if func_result:
stream.update_one({'_id': task_id}, {"$set": {"status": True}}, upsert=False)
# Close DB connection
client.close()
# Do work
def yo_func(keyword):
sleep(10)
print('Yo:' + keyword)
return True
# Worker function
def worker_email(keyword, task_id):
update_status(task_id, yo_func(keyword))
Related
I was thinking to use multiprocess package to run a function in parallel but I need to pass a different value of parameter every run (every 1 sec).
e.g.)
def foo(list):
while True:
<do something with list>
sleep(1000)
def main():
process = multiprocess.Process(target=foo, args=(lst))
process.start()
<keep updating lst>
This will cause a foo() function running with the same parameter value over and over. How can I work around in this scenario?
Armed with the knowledge of what you're actually trying to do, i.e.
The foo function does an http post call to save some logs (batch) to the storage. The main function is getting text logs (save log to the batch) while running a given shell script. Basically, I'm trying to do batching for logging.
the answer is to use a thread and a queue for message passing (multiprocessing.Process and multiprocessing.Queue would also work, but aren't really necessary):
import threading
import time
from queue import Queue
def send_batch(batch):
print("Sending", batch)
def save_worker(queue: Queue):
while True:
batch = queue.get()
if batch is None: # stop signal
break
send_batch(batch)
queue.task_done()
def main():
batch_queue = Queue()
save_thread = threading.Thread(target=save_worker, args=(batch_queue,))
save_thread.start()
log_batch = []
for x in range(42): # pretend this is the shell script outputting things
message = f"Message {x}"
print(message)
log_batch.append(message)
time.sleep(0.1)
if len(log_batch) >= 7: # could also look at wallclock
batch_queue.put(log_batch)
log_batch = []
if log_batch:
batch_queue.put(log_batch) # send the last batch
print("Script stopped, waiting for worker to finish")
batch_queue.put(None) # stop signal
save_thread.join()
if __name__ == "__main__":
main()
import threading
import time
def run_every_second(param):
# your function code here
print(param)
# create a list of parameters to pass to the function
params = [1, 2, 3, 4]
# create and start a thread for each parameter
for param in params:
t = threading.Thread(target=run_every_second, args=(param,))
t.start()
time.sleep(1)
# wait for all threads to complete
for t in threads:
t.join()
This will create a new thread for each parameter, and each thread will run the run_every_second function with the corresponding parameter. The threads will run concurrently, so the functions will be executed in parallel. There will be a 1-second time lapse between the start of each thread.
I am preparing a Python multiprocessing tool where I use Process and Queue commands. The queue is putting another script in a process to run in parallel. As a sanity check, in the queue, I want to check if there is any error happing in my other script and return a flag/message if there was an error (status = os.system() will run the process and status is a flag for error). But I can't output errors from the queue/child in the consumer process to the parent process. Following are the main parts of my code (shortened):
import os
import time
from multiprocessing import Process, Queue, Lock
command_queue = Queue()
lock = Lock()
p = Process(target=producer, args=(command_queue, lock, test_config_list_path))
for i in range(consumer_num):
c = Process(target=consumer, args=(command_queue, lock))
consumers.append(c)
p.daemon = True
p.start()
for c in consumers:
c.daemon = True
c.start()
p.join()
for c in consumers:
c.join()
if error_flag:
Stop_this_process_and_send_a_message!
def producer(queue, lock, ...):
for config_path in test_config_list_path:
queue.put((config_path, process_to_be_queued))
def consumer(queue, lock):
while True:
elem = queue.get()
if elem is None:
return
status = os.system(elem[1])
if status:
error_flag = 1
time.sleep(3)
Now I want to get that error_flag and use it in the main code to handle things. But seems I can't output error_flag from the consumer (child) part to the main part of the code. I'd appreciate it if someone can help with this.
Given your update, I also pass an multiprocessing.Event instance to your to_do process. This allows you to simply issue a call to wait on the event in the main process, which will block until a call to set is called on it. Naturally, when to_do or one of its threads detects a script error, it would call set on the event after setting error_flag.value to True. This will wake up the main process who can then call method terminate on the process, which will do what you want. On a normal completion of to_do, it still is necessary to call set on the event since the main process is blocking until the event has been set. But in this case the main process will just call join on the process.
Using a multiprocessing.Value instance alone would have required periodically checking its value in a loop, so I think waiting on a multiprocessing.Event is better. I have also made a couple of other updates to your code with comments, so please review them:
import multiprocessing
from ctypes import c_bool
...
def to_do(event, error_flag):
# Run the tests
wrapper_threads.main(event, error_flag)
# on error or normal process completion:
event.set()
def git_pull_change(path_to_repo):
repo = Repo(path)
current = repo.head.commit
repo.remotes.origin.pull()
if current == repo.head.commit:
print("Repo not changed. Sleep mode activated.")
# Call to time.sleep(some_number_of_seconds) should go here, right?
return False
else:
print("Repo changed. Start running the tests!")
return True
def main():
while True:
status = git_pull_change(git_path)
if status:
# The repo was just pulled, so no point in doing it again:
#repo = Repo(git_path)
#repo.remotes.origin.pull()
event = multiprocessing.Event()
error_flag = multiprocessing.Value(c_bool, False, lock=False)
process = multiprocessing.Process(target=to_do, args=(event, error_flag))
process.start()
# wait for an error or normal process completion:
event.wait()
if error_flag.value:
print('Error! breaking the process!!!!!!!!!!!!!!!!!!!!!!!')
process.terminate() # Kill the process
else:
process.join()
break
You should always tag multiprocessing questions with the platform you are running on. Since I do not see your process-creating code within a if __name__ == '__main__': block, I have to assume you are running on a platform that uses OS fork calls to create new processes, such as Linux.
That means your newly created processes inherit the value of error_flag when they are created but for all intents and purposes, if a process modifies this variable, it is modifying a local copy of this variable that exists in an address space that is unique to that process.
You need to create error_flag in shared memory and pass it as an argument to your process:
from multiprocessing import Value
from ctypes import c_bool
...
error_flag = Value(c_bool, False, lock=False)
for i in range(consumer_num):
c = Process(target=consumer, args=(command_queue, lock, error_flag))
consumers.append(c)
...
if error_flag.value:
...
#Stop_this_process_and_send_a_message!
def consumer(queue, lock, error_flag):
while True:
elem = queue.get()
if elem is None:
return
status = os.system(elem[1])
if status:
error_flag.value = True
time.sleep(3)
But I have a questions/comments for you. You have in your original code the following statement:
if error_flag:
Stop_this_process_and_send_a_message!
But this statement is located after you have already joined all the started processes. So what processes are there to stop and where are you sending a message to (you have potentially multiple consumers any of which might be setting the error_flag -- by the way, no need to have this done under a lock since setting the value True is an atomic action). And since you are joining all your processes, i.e. waiting for them to complete, I am not sure why you are making them daemon processes. You are also passing a Lock instance to your producer and consumers, but it is not being used at all.
Your consumers return when they get a None record from the queue. So if you have N consumers, the last N elements of test_config_path need to be None.
I also see no need for having the producer process. The main process could just as well write all the records to the queue either before or even after it starts the consumer processes.
The call to time.sleep(3) you have at the end of function consumer is unreachable.
So the above code summary is the inner process to run some tests in parallel. I removed the def function part from it, but just assume that is the wrapper_threads in the following code summary. Here I'll add the parent process which is checking a variable (let's assume a commit in my git repo). The following process is meant to run indefinitely and when there is a change it will trigger the multiprocess in the main question:
def to_do():
# Run the tests
wrapper_threads.main()
def git_pull_change(path_to_repo):
repo = Repo(path)
current = repo.head.commit
repo.remotes.origin.pull()
if current == repo.head.commit:
print("Repo not changed. Sleep mode activated.")
return False
else:
print("Repo changed. Start running the tests!")
return True
def main():
process = None
while True:
status = git_pull_change(git_path)
if status:
repo = Repo(git_path)
repo.remotes.origin.pull()
process = multiprocessing.Process(target=to_do)
process.start()
if error_flag.value:
print('Error! breaking the process!!!!!!!!!!!!!!!!!!!!!!!')
os.system('pkill -U user XXX')
break
Now I want to propagate that error_flag from the child process to this process and stop process XXX. The problem is that I don't know how to bring that error_flag to this (grand)parent process.
I have some process running in threads in my python script
but id like to know if this process stops for any reason and try to execute it again.
How to do that ?
my code:
main.py
from functions import Functions
func = Functions()
func.checkRoiProcess(roi_list) # this function call a thread
while True:
# this is where i must to check if this tread still running
#if func.checkRoiProcess still running:
# do some thing
#else:
# execute thread again
in my functions.py
def checkRoiProcess(self,ROI):
Thread(target = self.checkRoiChanges, args = (ROI,), daemon=True).start()
any one can help me?
Return the Thread object from the checkRoiProcess method.
def checkRoiProcess(self,ROI):
roi_thread = Thread(target = self.checkRoiChanges, args = (ROI,), daemon=True)
roi_thread.start()
return roi_thread
And use the is_alive method to check if the thread is still running:
roi_thread = func.checkRoiProcess(roi_list)
while True:
if roi_thread.is_alive():
# do some thing
else:
# execute thread again
I am using multiprocessing in python, try to kill the running after a timeout. But it doesn't work, and I don't know the reason.
I followed an example, it seems easy. Just need to start the process, after 2 seconds, terminate the running. But it doesnt work for me.
Could you please help me figure it out? Thanks for your help!
from amazonproduct import API
import multiprocessing
import time
AWS_KEY = '...'
SECRET_KEY = '...'
ASSOC_TAG = '...'
def crawl():
api = API(AWS_KEY, SECRET_KEY, 'us', ASSOC_TAG)
for root in api.item_search('Beauty', Keywords='maybelline',
ResponseGroup='Large'):
# extract paging information
nspace = root.nsmap.get(None, '')
products = root.xpath('//aws:Item',
namespaces={'aws' : nspace})
for product in products:
print product.ASIN,
if __name__ == '__main__':
p = multiprocessing.Process(target = crawl())
p.start()
if time.sleep(2.0):
p.terminate()
Well, this won't work:
if time.sleep(2.0):
p.terminate()
time.sleep does not return anything, so the above statement is always equivalent to if None:. None is False in a boolean context, so there you go.
If you want it to always terminate, take out that if statement. Just do a bare time.sleep.
Also, bug:
p = multiprocessing.Process(target = crawl())
This isn't doing what you think it's doing. You need to specify target=crawl, NOT target=crawl(). The latter calls the function in your main thread, the former passes the function as an argument to Process which will then execute it in parallel.
I need to do a blocking xmlrpc call from my python script to several physical server simultaneously and perform actions based on response from each server independently.
To explain in detail let us assume following pseudo code
while True:
response=call_to_server1() #blocking and takes very long time
if response==this:
do that
I want to do this for all the servers simultaneously and independently but from same script
Use the threading module.
Boilerplate threading code (I can tailor this if you give me a little more detail on what you are trying to accomplish)
def run_me(func):
while not stop_event.isSet():
response= func() #blocking and takes very long time
if response==this:
do that
def call_to_server1():
#code to call server 1...
return magic_server1_call()
def call_to_server2():
#code to call server 2...
return magic_server2_call()
#used to stop your loop.
stop_event = threading.Event()
t = threading.Thread(target=run_me, args=(call_to_server1))
t.start()
t2 = threading.Thread(target=run_me, args=(call_to_server2))
t2.start()
#wait for threads to return.
t.join()
t2.join()
#we are done....
You can use multiprocessing module
import multiprocessing
def call_to_server(ip,port):
....
....
for i in xrange(server_count):
process.append( multiprocessing.Process(target=call_to_server,args=(ip,port)))
process[i].start()
#waiting process to stop
for p in process:
p.join()
You can use multiprocessing plus queues. With one single sub-process this is the example:
import multiprocessing
import time
def processWorker(input, result):
def remoteRequest( params ):
## this is my remote request
return True
while True:
work = input.get()
if 'STOP' in work:
break
result.put( remoteRequest(work) )
input = multiprocessing.Queue()
result = multiprocessing.Queue()
p = multiprocessing.Process(target = processWorker, args = (input, result))
p.start()
requestlist = ['1', '2']
for req in requestlist:
input.put(req)
for i in xrange(len(requestlist)):
res = result.get(block = True)
print 'retrieved ', res
input.put('STOP')
time.sleep(1)
print 'done'
To have more the one sub-process simply use a list object to store all the sub-processes you start.
The multiprocessing queue is a safe object.
Then you may keep track of which request is being executed by each sub-process simply storing the request associated to a workid (the workid can be a counter incremented when the queue get filled with new work). Usage of multiprocessing.Queue is robust since you do not need to rely on stdout/err parsing and you also avoid related limitation.
Then, you can also set a timeout on how long you want a get call to wait at max, eg:
import Queue
try:
res = result.get(block = True, timeout = 10)
except Queue.Empty:
print error
Use twisted.
It has a lot of useful stuff for work with network. It is also very good at working asynchronously.