I am new to multiprocessing with python, I was following a course and i find thatthe code is not working as they say in the tutorials. for example:
this code :
import multiprocessing
# empty list with global scope
result = []
def square_list(mylist):
"""
function to square a given list
"""
global result
# append squares of mylist to global list result
for num in mylist:
result.append(num * num)
# print global list result
print("Result(in process p1): {}".format(result))
if __name__ == "__main__":
# input list
mylist = [1,2,3,4]
# creating new process
p1 = multiprocessing.Process(target=square_list, args=(mylist,))
# starting process
p1.start()
# wait until process is finished
p1.join()
# print global result list
print("Result(in main program): {}".format(result))
should print this result as they say in the tutorial:
Result(in process p1): [1, 4, 9, 16]
Result(in main program): []
but when I run it it prints
Result(in main program): []
I think the prosses did not even start.
I am using python 3.7.9 from anaconda.
how to fix this ?
Do not use global Variables which you access at the same time. Global Variables are most of the time a very bad idea and should be used very carefully.
The easiest way is to use p.map. (you don't have to start/join the processes)
with Pool(5) as p:
result=p.map(square_list,mylist)
If you do not want to use p.map you can use also q.put() to return the value and q.get() to get the value from the function
You can find also examples for getting the result in multiprocessed function here:
https://docs.python.org/3/library/multiprocessing.html
Related
I have created a shared class with a shared variable. My function is supposed to run two parallel processes and find the total number of perfectly square integers. I'm able to get the total number of perfectly square numbers in each array, but when the process is done, I'm not able to get the sum of both of these numbers. Could you check where I went wrong? Creating Shared class was unnecessary but I just did it to check if it would work.
Here is my execution:
from multiprocessing import *
import multiprocessing
import math
class Shared:
def __init__(self) -> None:
self.total = multiprocessing.Value('f', 0)
def setMP(self, value):
self.total.value = value
def getMP(self):
return self.total
# global total
# total = multiprocessing.Value('f', 0) # using a synchronized value for all processes
shared = Shared()
shared.setMP(0)
# function to determine if the number is perfect square
def is_perfect(number):
if float(math.sqrt(number)) *2 == int(math.sqrt(number))*2:
return True # the number is perfect square
return False
# function to find the total number of perfectly square numbers
def find_perfect(array):
# loop through each element in the array
for element in array:
if is_perfect(element):
# get value
shared.getMP().acquire()
i = shared.getMP().value + 1
shared.setMP(i)
shared.getMP().release()
print(shared.getMP())
def perfectSquares(listA, listB):
# multiprocess
p1 = Process(target=find_perfect, args=(listA,))
p2 = Process(target=find_perfect, args=(listB,))
p1.start()
p2.start()
p1.join()
p2.join()
return shared.getMP()
if __name__ == '__main__':
list1 = [7, 8, 23, 64, 2, 3]
list2 = [64, 54, 32, 35, 36]
total = perfectSquares(list1, list2)
print (total)
You are running under Windows, a platform that uses spawn rather than fork to create new processes. What this means is that when a new process is created, execution starts at the very top of the program. This is the reason why the code that creates the new process must be within a if __name__ == '__main__': block (if it weren't, you would get into a recursive loop creating new processes). But this means that each new process you are creating is re-executing any code that is at global scope and is therefore creating its own shared variable instance.
The easiest fix is to move the creation of shared to function perfectSquared and to then pass shared as an argument to findPerfect. Be aware that you have two processes running in parallel but that one must finish before the other. The first process to finish will most likly print a count of 1.0 or 2.0 depending upon which process completes first (although it is possible that it could even be 3.0 when the two processes finish very close together) and the second process to finish must print a count of 3.0.
from multiprocessing import *
import multiprocessing
import math
class Shared:
def __init__(self) -> None:
self.total = multiprocessing.Value('f', 0)
def setMP(self, value):
self.total.value = value
def getMP(self):
return self.total
# function to determine if the number is perfect square
def is_perfect(number):
if float(math.sqrt(number)) *2 == int(math.sqrt(number))*2:
return True # the number is perfect square
return False
# function to find the total number of perfectly square numbers
def find_perfect(array, shared):
# loop through each element in the array
for element in array:
if is_perfect(element):
# get value
shared.getMP().acquire()
i = shared.getMP().value + 1
shared.setMP(i)
shared.getMP().release()
print(shared.getMP())
def perfectSquares(listA, listB):
# global total
# total = multiprocessing.Value('f', 0) # using a synchronized value for all processes
shared = Shared()
shared.setMP(0)
# multiprocess
p1 = Process(target=find_perfect, args=(listA, shared))
p2 = Process(target=find_perfect, args=(listB, shared))
p1.start()
p2.start()
p1.join()
p2.join()
return shared.getMP()
if __name__ == '__main__':
list1 = [7, 8, 23, 64, 2, 3]
list2 = [64, 54, 32, 35, 36]
total = perfectSquares(list1, list2)
print (total)
Prints:
<Synchronized wrapper for c_float(1.0)>
<Synchronized wrapper for c_float(3.0)>
<Synchronized wrapper for c_float(3.0)>
I have two loops in python. Here is some pseudo code. I would like to run both those functions and each iteration of each of those functions at the same time. So in this example, there would be 8 processes going on at once. I know you can use "Process", but I just don't know how to incorporate an iterable. Please let me know, thanks!
import...
def example1(iteration):
print('stuff')
def example2(iteration):
print('stuff')
if __name__ == '__main__':
freeze_support()
pool = multiprocessing.Pool(4)
iteration = [1,2,3,4]
pool.map(example1,iteration)
Assuming they don't need to be kicked off at exactly the same time I think map_async is what you want.
In the example bellow we can print the result from example2 before example1 has finished even though example1 was kicked off first.
import multiprocessing
import time
def example1(iteration):
time.sleep(1)
return 1
def example2(iteration):
return 2
if __name__ == '__main__':
pool = multiprocessing.Pool(4)
iteration = [1,2,3,4]
result1 = pool.map_async(example1, iteration)
result2 = pool.map_async(example2, iteration)
print(result2.get())
print(result1.get())
I have a following simple code
from multiprocessing import Pool
x = []
def func(a):
print(x,a)
def main():
a = [1,2,3,4,5]
pool = Pool(1)
global x
x = [1,2,3,4]
ans = pool.map(func,a)
print(x)
It gives me the result
[] 1
[] 2
[] 3
[] 4
[] 5
[1, 2, 3, 4]
I expected the result to reflects the change in global variable x.
Which seems that the changed in global variable x is not updated before the pool call. I would like to ask what is the cause of this?
So I have done what GuangshengZuo suggested, and sadly the result was not desirable. After looking deeper into it, I realized the problem was not because of script, but rather the OS.
In windows, there is no os.fork(), hence the change in global variable is not copied. But, on Unix machine, the script works fine.
I think it is because this is multiprocess, not multithread. the main process and the new process does not share a same global variable. So the new process has the copy of the main process when x is [], and after created, main process change x's value, but it does not change to new process's x.
if change the code to this :
from multiprocessing import Pool
x = []
def func(a):
print(x,a)
def main():
a = [1,2,3,4,5]
global x
x = [1,2,3,4]
pool = Pool(1)
ans = pool.map(func,a)
print(x)
and the ouput will be what you want.
Notice the pool = Pool(1) 's position
Two seperate processes will not share the same global variables. A multiprocessing pool abstracts away the fact that you are using two seperate processes which makes this tough to recognise.
I have a simulation that is currently running, but the ETA is about 40 hours -- I'm trying to speed it up with multi-processing.
It essentially iterates over 3 values of one variable (L), and over 99 values of of a second variable (a). Using these values, it essentially runs a complex simulation and returns 9 different standard deviations. Thus (even though I haven't coded it that way yet) it is essentially a function that takes two values as inputs (L,a) and returns 9 values.
Here is the essence of the code I have:
STD_1 = []
STD_2 = []
# etc.
for L in range(0,6,2):
for a in range(1,100):
### simulation code ###
STD_1.append(value_1)
STD_2.append(value_2)
# etc.
Here is what I can modify it to:
master_list = []
def simulate(a,L):
### simulation code ###
return (a,L,STD_1, STD_2 etc.)
for L in range(0,6,2):
for a in range(1,100):
master_list.append(simulate(a,L))
Since each of the simulations are independent, it seems like an ideal place to implement some sort of multi-threading/processing.
How exactly would I go about coding this?
EDIT: Also, will everything be returned to the master list in order, or could it possibly be out of order if multiple processes are working?
EDIT 2: This is my code -- but it doesn't run correctly. It asks if I want to kill the program right after I run it.
import multiprocessing
data = []
for L in range(0,6,2):
for a in range(1,100):
data.append((L,a))
print (data)
def simulation(arg):
# unpack the tuple
a = arg[1]
L = arg[0]
STD_1 = a**2
STD_2 = a**3
STD_3 = a**4
# simulation code #
return((STD_1,STD_2,STD_3))
print("1")
p = multiprocessing.Pool()
print ("2")
results = p.map(simulation, data)
EDIT 3: Also what are the limitations of multiprocessing. I've heard that it doesn't work on OS X. Is this correct?
Wrap the data for each iteration up into a tuple.
Make a list data of those tuples
Write a function f to process one tuple and return one result
Create p = multiprocessing.Pool() object.
Call results = p.map(f, data)
This will run as many instances of f as your machine has cores in separate processes.
Edit1: Example:
from multiprocessing import Pool
data = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)]
def f(t):
name, a, b, c = t
return (name, a + b + c)
p = Pool()
results = p.map(f, data)
print results
Edit2:
Multiprocessing should work fine on UNIX-like platforms such as OSX. Only platforms that lack os.fork (mainly MS Windows) need special attention. But even there it still works. See the multiprocessing documentation.
Here is one way to run it in parallel threads:
import threading
L_a = []
for L in range(0,6,2):
for a in range(1,100):
L_a.append((L,a))
# Add the rest of your objects here
def RunParallelThreads():
# Create an index list
indexes = range(0,len(L_a))
# Create the output list
output = [None for i in indexes]
# Create all the parallel threads
threads = [threading.Thread(target=simulate,args=(output,i)) for i in indexes]
# Start all the parallel threads
for thread in threads: thread.start()
# Wait for all the parallel threads to complete
for thread in threads: thread.join()
# Return the output list
return output
def simulate(list,index):
(L,a) = L_a[index]
list[index] = (a,L) # Add the rest of your objects here
master_list = RunParallelThreads()
Use Pool().imap_unordered if ordering is not important. It will return results in a non-blocking fashion.
I have three functions, each returning a list. The problem is that running each function takes around 20-30 seconds. So running the entire script ends up taking about 2 min.
I want to use multiprocessing or multithreading (whichever is easier to implement) to have all three functions running at the same time.
The other hurdle I ran into was I that I'm not sure how to return the list from each of the functions.
def main():
masterlist = get_crs_in_snow()
noop_crs = get_noops_in_snow()
made_crs = get_crs_in_git()
# take the prod master list in SNOW, subtract what's been made or is in the noop list
create_me = [obj for obj in masterlist if obj not in made_crs and obj not in noop_crs]
print "There are {0} crs in Service Now not in Ansible".format(len(create_me))
for cr in create_me:
print str(cr[0]),
if __name__ == '__main__':
main()
I figure I can get some significant improvements in run time just by multithreading or multiprocessing the following line:
masterlist = get_crs_in_snow()
noop_crs = get_noops_in_snow()
made_crs = get_crs_in_git()
How do I have these three functions run at the same time?
This is completely untested since I don't have the rest of your code, but it may give you an idea of what can be done. I have adapted your code into the multiprocessing pattern:
from multiprocessing import Pool
def dispatcher(n):
if n == 0:
return get_crs_in_snow()
if n == 1:
return get_noops_in_snow()
if n == 2:
return get_crs_in_git()
def main():
pool = Pool(processes=3)
v = pool.map(dispatcher, range(3))
masterlist = v[0]
noop_crs = v[1]
made_crs = v[2]
# take the prod master list in SNOW, subtract what's been made or is in the noop list
create_me = [obj for obj in masterlist if obj not in made_crs and obj not in noop_crs]
print "There are {0} crs in Service Now not in Ansible".format(len(create_me))
for cr in create_me:
print str(cr[0]),
if __name__ == '__main__':
main()
Try the threading library.
import threading
threading.Thread(target=get_crs_in_snow).start()
threading.Thread(target=get_noops_in_snow).start()
threading.Thread(target=get_crs_in_git).start()
As far as getting their return values, you could wrap the calls to recommon in some class functions and have them save the result to a member variable. Or, you could wrap the calls to recommon in some local functions and simply pass in a mutable object (list or dictionary) to the function, and have the function modify that mutable object.
Or, as others have stated, multiprocessing may be a good way to do what you want.