I want to run a "main"-function for n times. This function starts other functions when it is running.
The "main"-function is called "repeat" and when it is running it first starts the function "copula_sim" and from there I get an output which is called "total_summe_liste". This list will be added to "mega_summe_list" which safes all outputs from the n runs. The sorted "total_summe_liste" will be safed as " RM_list" which is the input for the functions "VaR_func", "CVaR_func" and "power_func" which all generate an output which is sorted in the specific list "RM_VaR_list", "RM_CVaR_list" or "RM_PSRM_list". After that "RM_list" and "total_summe_liste" will be cleared before the next run begins.
In the end I got "mega_summe_list", "RM_VaR_list", "RM_CVaR_list" and "RM_PSRM_list" which will be used to generate an plot and a dataframe.
Now I want to run the "repeat"-function parallel. For example when I want to run this function n=10 times I want to run it on 10 cpu cores at the same time. The reason is that "copula_sim" is a monte-carlo-simulation which take a while when I make a big simulation.
What I have is this:
total_summe_liste = []
RM_VaR_list = []
RM_CVaR_list = []
RM_PSRM_list = []
mega_summe_list = []
def repeat():
global RM_list
global total_summe_liste
global RM_VaR_list
global RM_CVaR_list
global RM_PSRM_list
global mega_summe_list
copula_sim(runs_sim, rand_x, rand_y, mu, full_log=False)
mega_summe_list += total_summe_liste
RM_list = sorted(total_summe_liste)
VaR_func(alpha)
RM_VaR_list.append(VaR)
CVaR_func(alpha)
RM_CVaR_list.append(CVaR)
power_func(gamma)
RM_PSRM_list.append(risk)
RM_list = []
total_summe_liste = []
n = 10
for i in range(0,n):
repeat()
which is working so far.
I tryed:
if __name__ == '__main__':
jobs = []
for i in range(0,10):
p = mp.Process(target=repeat)
jobs.append(p)
p.start()
But when I run this the "mega_summe_list" is empty.. When I add "print(VaR) to repeat then it shows me all 10 VaR when its done. So the parallel task is working so far.
What is the problem?
The reason for this issue is because, the list mega_summe_list is not shared between the processes.
When you invoke parallel processing in python all the functions and variables are imported and run independently in different processes.
So, for instance when you start 5 processes, 5 different copies of these variables are imported and run independently. So, when you access mega_summe_list in main it is still empty, because it is empty in this process.
To enable synchronization between processes, you can use a list proxy from the multiprocessing package.
A Multiprocessing manager maintains an independent server process where in these python objects are held.
Below is the code used to create a multiprocessing Manager List,
from multiprocessing import Manager
mega_summe_list = Manager().List()
Above code can be used instead of mega_summe_list = [] while using multiprocessing.
Below is an example,
from multiprocessing.pool import Pool
from multiprocessing import Manager
def repeat_test(_):
global b, mp_list
a = [1,2,3]
b += a
mp_list += a # Multiprocessing Manager List
a = []
if __name__ == "__main__":
b = []
mp_list = Manager().list()
p = Pool(5)
p.map(repeat_test, range(5))
print("a: {0}, \n mp_list: {1}".format(b, mp_list))
Output:
b: [],
mp_list: [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
Hope this solves your problem.
You should use the Multiprocessing Pool, then you can do something like:
p = Pool(10)
p.map(repeat, range(10))
I solved the problem this way:
This function is the function I want to repeat n times in parallel way:
from multiprocessing import Process
from multiprocessing import Manager
from multiprocessing.pool import Pool
def repeat(shared_list, VaR_list, CVaR_list, PSRM_list, i):
global RM_list
global total_summe_liste
copula_sim(runs_sim, rand_x, rand_y, mu, full_log=False)
shared_list += total_summe_liste
RM_list = sorted(total_summe_liste)
VaR_func(alpha)
VaR_list.append(VaR)
CVaR_func(alpha)
CVaR_list.append(CVaR)
power_func(gamma)
PSRM_list.append(risk)
RM_list = []
total_summe_liste = []
This part manages the shared lists and do the paralleling stuff. Thanks #noufel13!
RM_VaR_list = []
RM_CVaR_list = []
RM_PSRM_list = []
mega_summe_list = []
if __name__ == "__main__":
with Manager() as manager:
shared_list = manager.list()
VaR_list = manager.list()
CVaR_list = manager.list()
PSRM_list = manager.list()
processes = []
for i in range(12):
p = Process(target=repeat, args=(shared_list, VaR_list, CVaR_list, PSRM_list, i)) # Passing the list
p.start()
processes.append(p)
for p in processes:
p.join()
RM_VaR_list += VaR_list
RM_CVaR_list += CVaR_list
RM_PSRM_list += PSRM_list
mega_summe_list += shared_list
RM_frame_func()
plotty_func()
Thank you!
The only question left is how I handle big arrays? Is there a way to do this morr efficiently? One of the 12 shared lists can have more than 100.000.000 items so in total the mega_summe_list has got about 1.200.000.000 items...
Related
I have a certain class that has an attribute list. There are some functions in the class that write to, but never read from this list. I initialize the class with a list and call the functions from multiple threads, however after waiting for all threads to finish the list remains empty.
The value order in the list does not matter.
from multiprocessing import Process
class TestClass():
def __init__(self, vals):
self.vals = vals
def linearrun(self):
Is = range(2000)
Js = range(2000)
for i in Is:
for j in Js:
self.vals.append(i+j)
if __name__ == "__main__":
vals = []
instantiated_class = TestClass(vals)
processes = []
for _ in range(10):
new_process=Process(target=instantiated_class.linearrun)
processes.append(new_process)
new_process.start()
for p in processes:
p.join()
print(vals)
In below code I expect print('q.count' , q.count) to be 2 as count is a varialble initialised once using q = QueueFun() and then incremented in the read_queue method, instead print('q.count' , q.count) prints 0. What is the correct method of sharing a counter between multiprocessesing Processes ?
Complete code:
from multiprocessing import Process, Queue, Pool, Lock
class QueueFun():
def __init__(self):
self.count = 0
self.lock = Lock()
def write_queue(self, work_tasks, max_size):
for i in range(0, max_size):
print("Writing to queue")
work_tasks.put(1)
def read_queue(self, work_tasks, max_size):
while self.count != max_size:
self.lock.acquire()
self.count += 1
self.lock.release()
print('self.count' , self.count)
print('')
print('Reading from queue')
work_tasks.get()
if __name__ == '__main__':
q = QueueFun()
max_size = 1
work_tasks = Queue()
write_processes = []
for i in range(0,2):
write_processes.append(Process(target=q.write_queue,
args=(work_tasks,max_size)))
for p in write_processes:
p.start()
read_processes = []
for i in range(0, 2):
read_processes.append(Process(target=q.read_queue,
args=(work_tasks,max_size)))
for p in read_processes:
p.start()
for p in read_processes:
p.join()
for p in write_processes:
p.join()
print('q.count' , q.count)
Unlike threads, different processes have different address
spaces: they do not share memory with each other. Writing
to a variable in one process will not change an (unshared)
variable in another process.
In the original example, the count was 0 at the end, because
the main process never changed it (no matter what the other
spawned processes did).
Probably better to communicate between processes with Queue.
If it's really necessary, Value or Array could be used:
17.2.1.5. Sharing state between processes
As mentioned above, when doing concurrent programming it is usually
best to avoid using shared state as far as possible. This is
particularly true when using multiple processes.
However, if you really do need to use some shared data then
multiprocessing provides a couple of ways of doing so.
Shared memory Data can be stored in a shared memory map using Value or
Array.
...
These shared objects will be process and thread-safe.
multiprocessing.Value
Operations like += which involve a read and write are not atomic.
A slightly modified version of the question's code:
from multiprocessing import Process, Queue, Value
class QueueFun():
def __init__(self):
self.readCount = Value('i', 0)
self.writeCount = Value('i', 0)
def write_queue(self, work_tasks, MAX_SIZE):
with self.writeCount.get_lock():
if self.writeCount != MAX_SIZE:
self.writeCount.value += 1
work_tasks.put(1)
def read_queue(self, work_tasks, MAX_SIZE):
with self.readCount.get_lock():
if self.readCount.value != MAX_SIZE:
self.readCount.value += 1
work_tasks.get()
if __name__ == '__main__':
q = QueueFun()
MAX_SIZE = 2
work_tasks = Queue()
write_processes = []
for i in range(MAX_SIZE):
write_processes.append(Process(target=q.write_queue,
args=(work_tasks,MAX_SIZE)))
for p in write_processes: p.start()
read_processes = []
for i in range(MAX_SIZE):
read_processes.append(Process(target=q.read_queue,
args=(work_tasks,MAX_SIZE)))
for p in read_processes: p.start()
for p in read_processes: p.join()
for p in write_processes: p.join()
print('q.writeCount.value' , q.writeCount.value)
print('q.readCount.value' , q.readCount.value)
Note: printing to standard output from multiple processes,
can result in output getting mixed up (not synchronized).
I have this program where everything is built in a class object. There is a function that does 50 computations of a another function, each with a different input, so I decided to use multiprocessing to speed it up. However, the list that needs to be returned in the end always returns empty. any ideas? Here is a simplified version of my problem. The output of main_function() should be a list containing the numbers 0-9, however the list returns empty.
class MyClass(object):
def __init__(self):
self.arr = list()
def helper_function(self, n):
self.arr.append(n)
def main_function(self):
jobs = []
for i in range(0,10):
p = multiprocessing.Process(target=self.helper_function, args=(i,))
jobs.append(p)
p.start()
for job in jobs:
jobs.join()
print(self.arr)
arr is a list that's not going to be shared across subprocess instances.
For that you have to use a Manager object to create a managed list that is aware of the fact that it's shared between processes.
The key is:
self.arr = multiprocessing.Manager().list()
full working example:
import multiprocessing
class MyClass(object):
def __init__(self):
self.arr = multiprocessing.Manager().list()
def helper_function(self, n):
self.arr.append(n)
def main_function(self):
jobs = []
for i in range(0,10):
p = multiprocessing.Process(target=self.helper_function, args=(i,))
jobs.append(p)
p.start()
for job in jobs:
job.join()
print(self.arr)
if __name__ == "__main__":
a = MyClass()
a.main_function()
this code now prints: [7, 9, 2, 8, 6, 0, 4, 3, 1, 5]
(well of course the order cannot be relied on between several executions, but all numbers are here which means that all processes contributed to the result)
multiprocessing is touchy.
For simple multiprocessing tasks, I would recomend:
from multiprocessing.dummy import Pool as ThreadPool
class MyClass(object):
def __init__(self):
self.arr = list()
def helper_function(self, n):
self.arr.append(n)
def main_function(self):
pool = ThreadPool(4)
pool.map(self.helper_function, range(10))
print(self.arr)
if __name__ == '__main__':
c = MyClass()
c.main_function()
The idea of using map instead of complicated multithreading calls is from one of my favorite blog posts: https://chriskiehl.com/article/parallelism-in-one-line
I'm beginner with the multiprocessing module in python and I want to use concurrent execution ONLY for my def func. Moreover I'm using some constants in my code and I have problem with them.
The code is (python 3.6.8):
from multiprocessing import Pool
FIRST_COUNT=10
print("Enter your path")
PATH=input()
some_list=[]
for i in range(10000):
some_list.append(i)
def func(some_list):
.....
if __name__ == "__main__":
chunks = [some_list[i::4] for i in range(4)]
pool = Pool(processes=4)
pool.map(func,chunks)
When I try to start this programm, I see the message Enter your path 5 times and 5 times I need to input my path. i.e this code execute 1 + 4(for each processes) times.
I want to use FIRST_COUNT, PATH and some_list like a constants, and use multiprocesseing only for func. How can i do this. Please, help me.
You should put code inside if __name__ == "__main__": to execute it only once
if __name__ == "__main__":
FIRST_COUNT = 10
PATH = input("Enter your path: ")
some_list = list(range(10000))
#some_list = []
#for i in range(10000):
# some_list.append(i)
chunks = [some_list[i::4] for i in range(4)]
pool = Pool(processes=4)
results = pool.map(func, chunks)
print(results)
If you want to use FIRST_COUNT, PATH then better send it to func as arguments.
You will have to create tuples with FIRST_COUNT, PATH in chunks
chunks = [(FIRST_COUNT, PATH, some_list[i::4]) for i in range(4)]
and function will have to get it as tuple and unpack it
def func(args):
first_count, paht, some_list = args
Working example
from multiprocessing import Pool
def func(args):
first_count, path, some_list = args
result = sum(some_list)
print(first_count, path, result)
return result
if __name__ == "__main__":
FIRST_COUNT = 10
PATH = input("Enter your path: ")
some_list = list(range(10000))
#some_list = []
#for i in range(10000):
# some_list.append(i)
chunks = [(FIRST_COUNT, PATH, some_list[i::4]) for i in range(4)]
pool = Pool(processes=4)
all_results = pool.map(func, chunks)
print('all results:', all_results)
EDIT: You can also use starmap() instead of map()
all_results = pool.starmap(func, chunks)
and then you can use (without unpacking arguments)
def func(first_count, path, some_list):
I would like to create and run at most N processes at once.
As soon as a process is finished, a new one should take its place.
The following code works(assuming Dostuff is the function to execute).
The problem is that I am using a loop and need time.sleep to allow
the processes to do their work. This is rather ineficient.
What's the best method for this task?
import time,multiprocessing
if __name__ == "__main__":
Jobs = []
for i in range(10):
while len(Jobs) >= 4:
NotDead = []
for Job in Jobs:
if Job.is_alive():
NotDead.append(Job)
Jobs = NotDead
time.sleep(0.05)
NewJob = multiprocessing.Process(target=Dostuff)
Jobs.append(NewJob)
NewJob.start()
After a bit of tinkering, I thought about creating new threads and then
launching my processes from these threads like so:
import threading,multiprocessing,time
def processf(num):
print("in process:",num)
now=time.clock()
while time.clock()-now < 2:
pass ##..Intensive processing..
def main():
z = [0]
lock = threading.Lock()
def threadf():
while z[0] < 20:
lock.acquire()
work = multiprocessing.Process(target=processf,args=(z[0],))
z[0] = z[0] +1
lock.release()
work.start()
work.join()
activet =[]
for i in range(2):
newt = threading.Thread(target=threadf)
activet.append(newt)
newt.start()
for i in activet:
i.join()
if __name__ == "__main__":
main()
This solution is better(doesn't slow down the launched processes), however,
I wouldn't really trust code that I wrote in a field I don't know..
I've had to use a list(z = [0]) since an integer was immutable.
Is there a way to embed processf into main()? I'd prefer not needing an additional
global variable. If I try to simply copy/paste the function inside, I get a nasty error(
Attribute error can't pickle local object 'main.(locals).processf')
Why not using concurrent.futures.ThreadPoolExecutor?
executor = ThreadPoolExecutor(max_workers=20)
res = execuror.submit(any_def)