Running multiprocessing on two different functions in Python 2.7 - python

I have 2 different functions that I want to use multiprocessing for: makeFakeTransactions and elasticIndexing. The function makeFakeTransactions returns a list of dictionaries, which is then added to the async_results list. So essentially, async_results is a list of lists. I want to use this list of lists as input for the elasticIndexing function, but I must wait for the first p.apply_async to finish first before I use the list of lists. How do I ensure that the first batch of multiprocessing is finished before I initiate the next one?
Also, when I run the program as is, it skips the second p.apply_async and just terminates. Do I have to declare a separate multiprocessing.Pool variable to do another multiprocessing operation?
store_num = 1
process_number = 6
num_transactions = 10
p = multiprocessing.Pool(process_number)
async_results = [p.apply_async(makeFakeTransactions, args = (store_num, num_transactions,)) for store_num in xrange(1, 10, 5)]
results = [ar.get() for ar in async_results]
async_results = [p.apply_async(elasticIndexing, args = (result_list,)) for result_list in results]
EDIT:
I tried using p.join() after async_results, but it gives this error:
Traceback (most recent call last):
File "C:\Users\workspace\Proj\test.py", line 210, in <module>
p.join()
File "C:\Python27\lib\multiprocessing\pool.py", line 460, in join
assert self._state in (CLOSE, TERMINATE)
AssertionError

Related

Python multiprocessing is not giving expected results

I am new to multiprocessing with python, I was following a course and i find thatthe code is not working as they say in the tutorials. for example:
this code :
import multiprocessing
# empty list with global scope
result = []
def square_list(mylist):
"""
function to square a given list
"""
global result
# append squares of mylist to global list result
for num in mylist:
result.append(num * num)
# print global list result
print("Result(in process p1): {}".format(result))
if __name__ == "__main__":
# input list
mylist = [1,2,3,4]
# creating new process
p1 = multiprocessing.Process(target=square_list, args=(mylist,))
# starting process
p1.start()
# wait until process is finished
p1.join()
# print global result list
print("Result(in main program): {}".format(result))
should print this result as they say in the tutorial:
Result(in process p1): [1, 4, 9, 16]
Result(in main program): []
but when I run it it prints
Result(in main program): []
I think the prosses did not even start.
I am using python 3.7.9 from anaconda.
how to fix this ?
Do not use global Variables which you access at the same time. Global Variables are most of the time a very bad idea and should be used very carefully.
The easiest way is to use p.map. (you don't have to start/join the processes)
with Pool(5) as p:
result=p.map(square_list,mylist)
If you do not want to use p.map you can use also q.put() to return the value and q.get() to get the value from the function
You can find also examples for getting the result in multiprocessed function here:
https://docs.python.org/3/library/multiprocessing.html

How to pass argument in multiprocessing function and how to use multiprocessing list?

I am trying to use the multiprocessing in python. I've created a function that appends the value to the list passed to that function (check_m_process). I am trying to pass a list (m) which is defined outside. Since the normal list variable won't update itself outside the multiprocessing function, I've used a multiprocessing list to see changes made my function to the list.
While executing the function it shows argument error as shown in the below output, instead of passing the argument.
import multiprocessing
# common list
m = multiprocessing.Manager().list()
def check_m_process(m):
print('m before - ',list(m))
for i in range(5):
m = m + [i]
print('m in function - ',list(m))
p1 = multiprocessing.Process(target = check_m_process, args=(m))
p1.start()
p1.join()
OUTPUT ERROR:
Process Process-37:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
TypeError: check_m_process() takes exactly 1 argument (0 given)
However, the above function does execute when executed without multiprocessing as check_m_process([]). But when I add some extra parameter, the function does execute as shown in the below output. I am confused about how an argument in multiprocessing function works or how it should actually pass like how to pass just a single argument with multiprocessing function.
def check_m_process(tmp_str,m):
print('m before - ',list(m))
for i in range(5):
m = m + [i]
print('m in function - ',list(m))
p1 = multiprocessing.Process(target = check_m_process, args=('',m))
p1.start()
p1.join()
OUTPUT:
('m before - ', [])
('m in function - ', [0, 1, 2, 3, 4])
So after the execution of the function, I was hoping the list defined (m) must have updated now after function execution as per the shown above output.
print('m outside function - ',list(m))
OUTPUT:
('m outside function - ', [])
But after printing the value of list m, it shows empty instead of defining the variable as a multiprocessing list in the beginning.
Can someone help me with how to pass a single parameter in the multiprocessing function and how to use the common list throughout the multiprocessing function? Or is there any other way to deal with it?
For your first problem, you need to pass (m,) at the argument (note the trailing comma). That is the syntax required to create a single-element tuple in Python. When you just surround a single item with parenthesis, no tuple is created.
For your second problem, you need to just append items to the multiprocessing.Manager().list(), instead of re-assigning the variable repeatedly:
for i in range(5):
m.append(i)
The way you're currently doing it is actually creating a regular Python list, and assigning m to that, not updating your multiprocessing list
>>> i = multiprocessing.Manager().list()
>>> i
<ListProxy object, typeid 'list' at 0x7fa18483b590>
>>> i = i + ['a']
>>> i
['a']
Notice how i is no longer a ListProxy after I concatenate it with ['a'], it's just a regular list. Using append avoids this:
>>> i = multiprocessing.Manager().list()
>>> i
<ListProxy object, typeid 'list' at 0x7fa18483b6d0>
>>> i.append('a')
>>> i
<ListProxy object, typeid 'list' at 0x7fa18483b6d0>
>>> str(i)
"['a']"

Python - apply_async doesn't execute function

Hi I'm trying to use multiprocessing to speed up my code. However, the apply_async doesn't work for me. I tried to do a simple example like:
from multiprocessing.pool import Pool
t = [0, 1, 2, 3, 4, 5]
def cube(x):
t[x] = x**3
pool = Pool(processes=4)
for i in range(6):
pool.apply_async(cube, args=(i, ))
for x in t:
print(x)
It does not really change t as I would expect.
My real code is like:
from multiprocessing.pool import Pool
def func(a, b, c, d):
#some calculations
#save result to files
#no return value
lt = #list of possible value of a
#set values to b, c, d
p = Pool()
for i in lt:
p.apply_async(func, args=(i, b, c, d, ))
Where are the problems here?
Thank you!
Update: Thanks to the comments and answers, now I understand why my simple example won't work. However, I'm still in trouble with my real code. I have checked that my func does not rely on any global variable, so it seems not to be the same problem as my example code.
As suggested, I added a return value to my func, now my code is:
f = Flux("reactor")
d = Detector("Ge")
mv = arange(-6, 1.5, 0.5)
p = Pool()
lt = ["uee", "dee"]
for i in lt:
re = p.apply_async(res, args=(i, d, f, mv, ))
print(re.get())
p.close()
p.join()
Now I get the following error:
Traceback (most recent call last):
File "/Users/Shu/Documents/Programming/Python/Research/debug.py", line 35, in <module>
print(re.get())
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 608, in get
raise self._value
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/pool.py", line 385, in _handle_tasks
put(task)
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'Flux.__init__.<locals>.<lambda>'
EDIT: the first example you provided will not work for a simple reason: processes do not share memory. Therefore, the change t[x] = x**3 will not be applied to the parent process leaving the values of the list t unchanged.
You need to actually return the value from the computation and build a new list from that.
def cube(x):
return x**3
t = [0, 1, 2, 3, 4, 5]
p = Pool()
t = p.map(cube, t)
print(t)
If, as you claim in the second example, the results are supposed not to be returned but to be independently stored within files and this does not happen, I'd recommend to check the return value of your function to see whether the function itself is raising an exception or not.
I'd recommend you to get the actual results and see what happens:
p = Pool()
for i in lt:
res = p.apply_async(func, args=(i, b, c, d, ))
print(res.get()) # this will raise an exception if it happens within func
p.close() # do not accept any more tasks
p.join() # wait for the completion of all scheduled jobs
Function quits too soon, try add at the end of your script this code:
import time
time.sleep(3)

Execute a list of process without multiprocessing pool map

import multiprocessing as mp
if __name__ == '__main__':
#pool = mp.Pool(M)
p1 = mp.Process(target= target1, args= (arg1,))
p2 = mp.Process(target= target2, args= (arg1,))
...
p9 = mp.Process(target= target9, args= (arg9,))
p10 = mp.Process(target= target10, args= (arg10,))
...
pN = mp.Process(target= targetN, args= (argN,))
processList = [p1, p2, .... , p9, p10, ... ,pN]
I have N different target functions which consume unequal non-trivial amount of time to execute.
I am looking for a way to execute them in parallel such that M (1 < M < N) processes are running simultaneously. And as soon as a process is finished next process should start from the list, until all the processes in processList are completed.
As I am not calling the same target function, I could not use Pool.
I considered doing something like this:
for i in range(0, N, M):
limit = i + M
if(limit > N):
limit = N
for p in processList[i:limit]:
p.join()
Since my target functions consume unequal time to execute, this method is not really efficient.
Any suggestions? Thanks in advance.
EDIT:
Question title has been changed to 'Execute a list of process without multiprocessing pool map' from 'Execute a list of process without multiprocessing pool'.
You can use proccess Pool:
#!/usr/bin/env python
# coding=utf-8
from multiprocessing import Pool
import random
import time
def target_1():
time.sleep(random.uniform(0.5, 2))
print('done target 1')
def target_2():
time.sleep(random.uniform(0.5, 2))
print('done target 1')
def target_3():
time.sleep(random.uniform(0.5, 2))
print('done target 1')
def target_4():
time.sleep(random.uniform(0.5, 2))
print('done target 1')
pool = Pool(2) # maximum two processes at time.
pool.apply_async(target_1)
pool.apply_async(target_2)
pool.apply_async(target_3)
pool.apply_async(target_4)
pool.close()
pool.join()
Pool is created specifically for what you need to do - execute many tasks in limited number of processes.
I also suggest you take a look at concurrent.futures library and it's backport to Python 2.7. It has a ProcessPoolExecutor, which has roughly same capabilities, but it's methods returns Future objects, and they has a nicer API.
Here is a way to do it in Python 3.4, which could be adapted for Python 2.7 :
targets_with_args = [
(target1, arg1),
(target2, arg2),
(target3, arg3),
...
]
with concurrent.futures.ProcessPoolExecutor(max_workers=20) as executor:
futures = [executor.submit(target, arg) for target, arg in targets_with_args]
results = [future.result() for future in concurrent.futures.as_completed(futures)]
I would use a Queue. adding processes to it from processList, and as soon as a process is finished i would remove it from the queue and add another one.
a pseudo code will look like:
from Queue import Queue
q = Queue(m)
# add first process to queue
i = 0
q.put(processList[i])
processList[i].start()
i+=1
while not q.empty():
p=q.get()
# check if process is finish. if not return it to the queue for later checking
if p.is_alive():
p.put(t)
# add another process if there is space and there are more processes to add
if not q.full() and i < len(processList):
q.put(processList[i])
processList[i].start()
i+=1
A simple solution would be to wrap the functions target{1,2,...N} into a single function forward_to_target that forwards to the appropriate target{1,2,...N} function according to the argument that is passed in. If you cannot infer the appropriate target function from the arguments you currently use, replace each argument with a tuple (argX, X), then in the forward_to_target function unpack the tuple and forward to the appropriate function indicated by the X.
You could have two lists of targets and arguments, zip the two together - and send them to a runner function (here it's run_target_on_args):
#!/usr/bin/env python
import multiprocessing as mp
# target functions
targets = [len, str, len, zip]
# arguments for each function
args = [["arg1"], ["arg2"], ["arg3"], [["arg5"], ["arg6"]]]
# applies target function on it's arguments
def run_target_on_args(target_args):
return target_args[0](*target_args[1])
pool = mp.Pool()
print pool.map(run_target_on_args, zip(targets, args))

Python Multiprocessing with a single function

I have a simulation that is currently running, but the ETA is about 40 hours -- I'm trying to speed it up with multi-processing.
It essentially iterates over 3 values of one variable (L), and over 99 values of of a second variable (a). Using these values, it essentially runs a complex simulation and returns 9 different standard deviations. Thus (even though I haven't coded it that way yet) it is essentially a function that takes two values as inputs (L,a) and returns 9 values.
Here is the essence of the code I have:
STD_1 = []
STD_2 = []
# etc.
for L in range(0,6,2):
for a in range(1,100):
### simulation code ###
STD_1.append(value_1)
STD_2.append(value_2)
# etc.
Here is what I can modify it to:
master_list = []
def simulate(a,L):
### simulation code ###
return (a,L,STD_1, STD_2 etc.)
for L in range(0,6,2):
for a in range(1,100):
master_list.append(simulate(a,L))
Since each of the simulations are independent, it seems like an ideal place to implement some sort of multi-threading/processing.
How exactly would I go about coding this?
EDIT: Also, will everything be returned to the master list in order, or could it possibly be out of order if multiple processes are working?
EDIT 2: This is my code -- but it doesn't run correctly. It asks if I want to kill the program right after I run it.
import multiprocessing
data = []
for L in range(0,6,2):
for a in range(1,100):
data.append((L,a))
print (data)
def simulation(arg):
# unpack the tuple
a = arg[1]
L = arg[0]
STD_1 = a**2
STD_2 = a**3
STD_3 = a**4
# simulation code #
return((STD_1,STD_2,STD_3))
print("1")
p = multiprocessing.Pool()
print ("2")
results = p.map(simulation, data)
EDIT 3: Also what are the limitations of multiprocessing. I've heard that it doesn't work on OS X. Is this correct?
Wrap the data for each iteration up into a tuple.
Make a list data of those tuples
Write a function f to process one tuple and return one result
Create p = multiprocessing.Pool() object.
Call results = p.map(f, data)
This will run as many instances of f as your machine has cores in separate processes.
Edit1: Example:
from multiprocessing import Pool
data = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)]
def f(t):
name, a, b, c = t
return (name, a + b + c)
p = Pool()
results = p.map(f, data)
print results
Edit2:
Multiprocessing should work fine on UNIX-like platforms such as OSX. Only platforms that lack os.fork (mainly MS Windows) need special attention. But even there it still works. See the multiprocessing documentation.
Here is one way to run it in parallel threads:
import threading
L_a = []
for L in range(0,6,2):
for a in range(1,100):
L_a.append((L,a))
# Add the rest of your objects here
def RunParallelThreads():
# Create an index list
indexes = range(0,len(L_a))
# Create the output list
output = [None for i in indexes]
# Create all the parallel threads
threads = [threading.Thread(target=simulate,args=(output,i)) for i in indexes]
# Start all the parallel threads
for thread in threads: thread.start()
# Wait for all the parallel threads to complete
for thread in threads: thread.join()
# Return the output list
return output
def simulate(list,index):
(L,a) = L_a[index]
list[index] = (a,L) # Add the rest of your objects here
master_list = RunParallelThreads()
Use Pool().imap_unordered if ordering is not important. It will return results in a non-blocking fashion.

Categories