python generators for concurrency - python

I am following slides from Python's guru David Beazley. It states "Generators are also used for concurrency. Here is an example:
from collections import deque
def countdown(n):
while n > 0:
print("T-minus", n)
yield
n -=1
def countup(n):
x = 0
while x > n:
print("Up we go", x)
yield
x +=1
# instantiate some tasks in a queue
tasks = deque([countdown(10),
countdown(5),
countup(20)
])
# run a little scheduler
while tasks:
t = tasks.pop() # get a task
try:
next(t) # run it until it yields
tasks.appendleft(t) # reschedule
except StopIteration:
pass
Here is the output:
T-minus 5
T-minus 10
T-minus 4
T-minus 9
T-minus 3
T-minus 8
T-minus 2
T-minus 7
T-minus 1
T-minus 6
T-minus 5
T-minus 4
T-minus 3
T-minus 2
T-minus 1
The question is how is concurrency introduced by generators and how is it manifesting?

This bit of code implements the concept of "green threads", cooperative, userland (as opposed to Preemptive, kernel) threading.
The "threads" are the generators, each function with yeild or yield from in it. The scheduler lives, obviously, inside the if __name__ == '__main__': bit.
So, lets imagine we have not generators but regular lists, and each list has in it, a sequence of functions.
def doThis(): pass
def sayThis(): pass
def doThat(): pass
...
myThread = [doThis, doThat, doAnother]
yourThread = [sayThis, sayThat, sayAnother]
We could run all of the functions in order:
for thread in [myThread, yourThread]:
for stmt in thread:
stmt()
Or we could do them in some other order:
for myStmt, yourStmt in zip(myThread, yourThread):
myStmt()
yourStmt()
In the first "scheduler", we exhaust the first thread, and then proceed to the second thread. In the second scheduler, we interleave statements out of both threads, first mine, then yours, then back to mine.
It's because we are interleaving "statements" across multiple "threads" before exausting those threads that we can say that the second scheduler gives concurrency.
Note that concurrency doesn't neccesarily mean parallelism. It's not simultaneous execution, just overlapping.

Here is an example to clarify:
from collections import deque
def coro1():
for i in range(1, 10):
yield i
def coro2():
for i in range(1, 10):
yield i*10
print('Async behaviour'.center(60, '#'))
tasks = deque()
tasks.extend([coro1(), coro2()])
while tasks:
task = tasks.popleft() # select and remove a task (coro1/coro2).
try:
print(next(task))
tasks.append(task) # add the removed task (coro1/coro2) for permutation.
except StopIteration:
pass
Out:
######################Async behaviour#######################
1
10
2
20
3
30
4
40
5
50
6
60
7
70
8
80
9
90

Related

Non-monotonic evolution of runtime with increasing parallelization

I'm running some runtime tests to understand what I can gain from parallelization and how it affects runtime (linearly?).
For a given integer n I successively compute the n-th Fibonacci number and vary the degree of parallelization by allowing to compute each Fibonacci number i in {0,1,...,n} by using up to 16 parallel processes.
import pandas as pd
import time
import multiprocessing as mp
# n-te Fibonacci Zahl
def f(n: int):
if n in {0, 1}:
return n
return f(n - 1) + f(n - 2)
if __name__ == "__main__":
K = range(1, 16 + 1)
n = 100
N = range(n)
df_dauern = pd.DataFrame(index=K, columns=N)
for _n in N:
_N = range(_n)
print(f'\nn = {_n}')
for k in K:
start = time.time()
pool = mp.Pool(k)
pool.map(f, _N)
pool.close()
pool.join()
ende = time.time()
dauer = ende - start
m, s = divmod(dauer, 60)
h, m = divmod(m, 60)
h, m, s = round(h), round(m), round(s)
df_dauern.loc[k, _n] = f'{h}:{m}:{s}'
print(f'... k = {k:02d}, Dauer: {h}:{m}:{s}')
df_dauern.to_excel('Dauern.xlsx')
In the following DataFrame I display the duration (h:m:s) for n in {45, 46, 47}.
45 46 47
1 0:9:40 0:15:24 0:24:54
2 0:7:24 0:13:23 0:22:59
3 0:5:3 0:9:37 0:19:7
4 0:7:18 0:7:19 0:15:29
5 0:7:21 0:7:17 0:15:35
6 0:3:41 0:9:34 0:9:36
7 0:3:40 0:9:46 0:9:34
8 0:3:41 0:9:33 0:9:33
9 0:3:39 0:9:33 0:9:33
10 0:3:39 0:9:32 0:9:32
11 0:3:39 0:9:34 0:9:45
12 0:3:40 0:6:4 0:9:37
13 0:3:39 0:5:54 0:9:32
14 0:3:39 0:5:55 0:9:32
15 0:3:40 0:5:53 0:9:33
16 0:3:39 0:5:55 0:9:33
In my opinion the results are odd in two dimensions. First, the duration is not monotonically decreasing for increasing parallelization and second, runtime is not linearly decreasing (that is, double processes, half runtime).
Is this behavior to be expected?
Is this behavior due to the chosen example of computing Fibonacci numbers?
How is it even possible that runtime increases with increasing parallelization (e.g. always when moving from 2 to 3 parallel processes)?
How come it does not make a difference whether I use 6 or 16 parallel processes?
it's because of multiprocessing scheduling algorithm and the fact that the task has factorial complexity, by default the pool will choose a chunksize that is relative to the number of workers
basically multiprocessing splits work into equal chunks to reduce serialization overhead, the chunksize is given by.
chunksize, extra = divmod(len(iterable), len(self._pool) * 4)
chunksize += bool(extra)
for 4 and 5 workers, the chunk size is the same (3), and 99.9% of the time is taken by the last 3 tasks, which are scheduled in the same core (because they are in 1 chunk), so 1 core ends up doing 99.9% of the work regardless of the core count, the extra 3 seconds are most likely scheduling overhead (more workers = more scheduling), you'll get a speedup if you set the chunksize=1 in pool.map parameters manually, as each of these 3 tasks will be scheduled to a different core.
for worker number higher than 6, the chunksize is calculated to be 2, but you have an odd number of tasks, which means you will always wait for the last task that is scheduled, which is the longest one, the entire 3:40 minutes are in a single function, it cannot be broken down further, so it doesn't matter if you launch 6 workers or a 100, you are still limited by the slowest task (or actually the slowest chunk).

Increase number of CPUs (ncores) has negative impact on multiprocessing pool

I have the following code and I want to spread the task into multi-process. After experiments, I realized that increase the number of CPU cores negatively impacts the execution time.
I have 8 cores on my machine
Case 1: without using multiprocessing
Execution time: 106 minutes
Case 2: with multiprocessing using ncores = 4
Execution time: 37 minutes
Case 3: with multiprocessing using ncores = 7
Execution time: 40 minutes
the following code:
import time
import multiprocessing as mp
def _fun(i, args1=10):
#Sort matrix W
#For loop 1 on matrix M
#For loop 2 on matrix Y
return value
def run1(ncores=mp.cpu_count()):
ncores = ncores - 4 # use 4 and 1 to have ncores = 4 and 7
_f = functools.partial(_fun,args1=x)
with mp.Pool(ncores) as pool:
result = pool.map(_f, range(n))
return [t for t in result]
start = time.time()
list1= run1()
end = time.time()
print( 'time {0} minutes '.format((end - start)/60))
My question, what is the best practice to use multiprocessing? As I understand that as much we use cpu cores as much it will be faster.

Python multiprocessing Process ID

I'm using multiprocessing.Pool too run different processes (e.g. 4 processes) and I need to ID each process so I can do different things in each process.
As I have the pool running inside a while loop, for the first iteration I can know the ID of each process, however for the second and more iterations this ID changes or at least I can't find one property that seems to be the same for each process in all iterations.
The relevant part of the code is as follows:
while i <= maxiter:
print('\n' + 'Iteration: %r'%i + '\n')
pool = mp.Pool(processes = numprocs)
swarm = pool.map_async(partial(proxy, costf = costFunc, i=i),Swarm)
pool.close()
pool.join()
Swarm = swarm.get()
I've tried with the following properties to properly ID the processes but it's not working for me:
print(mp.Process().name)
print(mp.current_process().name)
With this the output is:
Iteration: 1
Process-2:1
Process-1:1
ForkPoolWorker-1
ForkPoolWorker-2
Process-3:1
ForkPoolWorker-3
Process-2:2
ForkPoolWorker-2
Process-3:2
Process-2:3
ForkPoolWorker-3
ForkPoolWorker-2
Process-1:2
ForkPoolWorker-1
Process-4:1
Process-3:3
ForkPoolWorker-4
ForkPoolWorker-3
Process-2:4
ForkPoolWorker-2
Iteration: 2
Process-5:1
ForkPoolWorker-5
Process-5:2
Process-7:1
ForkPoolWorker-7
Process-6:1
ForkPoolWorker-5
ForkPoolWorker-6
Process-5:3
ForkPoolWorker-5
Process-7:2
ForkPoolWorker-7
Process-5:4
ForkPoolWorker-5
Process-6:2
ForkPoolWorker-6
Process-7:3
ForkPoolWorker-7
Process-8:1
ForkPoolWorker-8
Any ideas how can I ID each process the same way every time?
EDIT 1:
I've simplified the program to this but the idea is the same:
import random, numpy as np,time
import multiprocessing as mp
def costFunc(i):
print(mp.current_process().name,mp.Process().name)
return i*1
class PSO():
def __init__(self,maxiter,numprocs):
# Begin optimization Loop
i = 1
self.Evol = []
while i <= maxiter:
print('\n' + 'Iteration: %r'%i + '\n')
pool = mp.Pool(processes = numprocs)
swarm = pool.map_async(costFunc,(i,))
pool.close()
pool.join()
Swarm = swarm.get()
i += 1
if __name__ == "__main__":
#mp.set_start_method('spawn')
PSO(10,1)
OUTPUT:
Iteration: 1
ForkPoolWorker-1 Process-1:1
Iteration: 2
ForkPoolWorker-2 Process-2:1
Iteration: 3
ForkPoolWorker-3 Process-3:1
Iteration: 4
ForkPoolWorker-4 Process-4:1
Iteration: 5
ForkPoolWorker-5 Process-5:1
Iteration: 6
ForkPoolWorker-6 Process-6:1
Iteration: 7
ForkPoolWorker-7 Process-7:1
Iteration: 8
ForkPoolWorker-8 Process-8:1
Iteration: 9
ForkPoolWorker-9 Process-9:1
Iteration: 10
ForkPoolWorker-10 Process-10:1
You are creating a new pool in each iteration of the loop, so processes in the pool are never re-used.
Move pool = mp.Pool(processes = numprocs) (and pool.close() and pool.join()) out of the while loop to re-use processes in the pool.

Python Multiprocessing reading input iterator all at once

Using python 3.4.3, I have a generator function foo that yields data to be processed in parallel. Passing this function to multiprocessing.Pool.map of n processes, I expected it to be called n times at a time:
from multiprocessing import Pool
import time
now = time.time
def foo(n):
for i in range(n):
print("%f get %d" % (now(), i))
yield i
def bar(i):
print("%f start %d" % (now(), i))
time.sleep(1)
print("%f end %d" % (now(), i))
pool = Pool(2)
pool.map(bar, foo(6))
pool.close()
pool.join()
Unfortunately, the generator function is called 6 times immediately. The output is this:
1440713274.290760 get 0
1440713274.290827 get 1
1440713274.290839 get 2
1440713274.290849 get 3
1440713274.290858 get 4
1440713274.290867 get 5
1440713274.291526 start 0
1440713274.291654 start 1
1440713275.292680 end 0
1440713275.292803 end 1
1440713275.293056 start 2
1440713275.293129 start 3
1440713276.294106 end 2
1440713276.294182 end 3
1440713276.294344 start 4
1440713276.294390 start 5
1440713277.294803 end 4
1440713277.294859 end 5
But I had hoped to get something more like:
1440714272.612041 get 0
1440714272.612078 get 1
1440714272.612090 start 0
1440714272.612100 start 1
1440714273.613174 end 0
1440714273.613247 end 1
1440714273.613264 get 2
1440714273.613276 get 3
1440714273.613287 start 2
1440714273.613298 start 3
1440714274.614357 end 2
1440714274.614423 end 3
1440714274.614432 get 4
1440714274.614437 get 5
1440714274.614443 start 4
1440714274.614448 start 5
1440714275.615475 end 4
1440714275.615549 end 5
(Reason is that foo is going to read a large amount of data into memory.)
I got the same results with pool.imap(bar, foo(6), 2) and
for i in foo(6):
pool.apply_async(bar, args=(i,))
What is the easiest way to make this work?
I had faced a similar problem, where I needed to read a large amount of data and process parts of it in parallel. I solved it by sub-classing the multiprocessing.Process and using queues. I think you will benefit from reading about embarrassingly parallel problems. I have given sample code below:
import multiprocessing
import time
import logging
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)-8s %(message)s',
datefmt='%m-%d %H:%M:%S')
#Producer class
class foo(multiprocessing.Process):
def __init__(self, n, queues):
super(foo, self).__init__()
self.n=n
self.queues = queues
def run(self):
logging.info('Starting foo producer')
for i in range(self.n):
logging.info('foo: Sending "%d" to a consumer' % (i))
self.queues[i%len(self.queues)].put(i)
time.sleep(1)#Unnecessary sleep to demonstrate order of events
for q in self.queues:
q.put('end')
logging.info('Ending foo producer')
return
#Consumer class
class bar(multiprocessing.Process):
def __init__(self, idx, queue):
super(bar, self).__init__()
self.idx = idx
self.queue = queue
def run(self):
logging.info("Starting bar %d consumer" % (self.idx ))
while True:
fooput = self.queue.get()
if type(fooput)==str and fooput=='end':
break
logging.info('bar %d: Got "%d" from foo' % (self.idx, fooput))
time.sleep(2)#Unnecessary sleep to demonstrate order of events
logging.info("Ending bar %d consumer" % (self.idx ))
return
if __name__=='__main__':
#make queues to put data read by foo
count_queues = 2
queues =[]
for i in range(count_queues):
q = multiprocessing.Queue(2)
# Give queue size according to your buffer requirements
queues.append(q)
#make reader for reading data. lets call this object Producer
foo_object = foo(6, queues)
#make receivers for the data. Lets call these Consumers
#Each consumer is assigned a queue
bar_objects = []
for idx, q in enumerate(queues):
bar_object = bar(idx, q)
bar_objects.append(bar_object)
# start the consumer processes
for bar_object in bar_objects:
bar_object.start()
# start the producer processes
foo_object.start()
#Join all started processes
for bar_object in bar_objects:
bar_object.join()
foo_object.join()
The best I can come up with myself is this:
pool_size = 2
pool = Pool(pool_size)
count = 0
for i in foo(6):
count += 1
if count % pool_size == 0:
pool.apply(bar, args=(i,))
else:
pool.apply_async(bar, args=(i,))
pool.close()
pool.join()
for pool_size=2 it outputs:
1440798963.389791 get 0
1440798963.490108 get 1
1440798963.490683 start 0
1440798963.595587 start 1
1440798964.491828 end 0
1440798964.596687 end 1
1440798964.597137 get 2
1440798964.697373 get 3
1440798964.697629 start 2
1440798964.798024 start 3
1440798965.698719 end 2
1440798965.799108 end 3
1440798965.799419 get 4
1440798965.899689 get 5
1440798965.899984 start 4
1440798966.001016 start 5
1440798966.901050 end 4
1440798967.002097 end 5
for pool_size=3 it outputs:
1440799101.917546 get 0
1440799102.018438 start 0
1440799102.017869 get 1
1440799102.118868 get 2
1440799102.119903 start 1
1440799102.219616 start 2
1440799103.019600 end 0
1440799103.121066 end 1
1440799103.220746 end 2
1440799103.221124 get 3
1440799103.321402 get 4
1440799103.321664 start 3
1440799103.422589 get 5
1440799103.422824 start 4
1440799103.523286 start 5
1440799104.322934 end 3
1440799104.423878 end 4
1440799104.524350 end 5
However, it would take 3 new items from the iterator as soon as the apply finishes. If the processing takes variable time, this won't work as well.

running multiple processes simultaneously

I am attempting to create a program in python that runs multiple instances (15) of a function simultaneously over different processors. I have been researching this, and have the below program set up using the Process tool from multiprocessing.
Unfortunately, the program executes each instance of the function sequentially (it seems to wait for one to finish before moving onto the next part of the loop).
from __future__ import print_function
from multiprocessing import Process
import sys
import os
import re
for i in range(1,16):
exec("path%d = 0" % (i))
exec("file%d = open('%d-path','a', 1)" % (i, i))
def stat(first, last):
for j in range(1,40000):
input_string = "water" + str(j) + ".xyz.geocard"
if os.path.exists('./%s' % input_string) == True:
exec("out%d = open('output%d', 'a', 1)" % (first, first))
exec('print("Processing file %s...", file=out%d)' % (input_string, first))
with open('./%s' % input_string,'r') as file:
for line in file:
for i in range(first,last):
search_string = " " + str(i) + " path:"
for result in re.finditer(r'%s' % search_string, line):
exec("path%d += 1" % i)
for i in range(first,last):
exec("print(path%d, file=file%d)" % (i, i))
processes = []
for m in range(1,16):
n = m + 1
p = Process(target=stat, args=(m, n))
p.start()
processes.append(p)
for p in processes:
p.join()
I am reasonably new to programming, and have no experience with parallelization - any help would be greatly appreciated.
I have included the entire program above, replacing "Some Function" with the actual function, to demonstrate that this is not a timing issue. The program can take days to cycle through all 40,000 files (each of which is quite large).
I think what is happening is that you are not doing enough in some_function to observe work happening in parallel. It spawns a process, and it completes before the next one gets spawned. If you introduce a random sleep time into some_function, you'll see that they are in fact running in parallel.
from multiprocessing import Process
import random
import time
def some_function(first, last):
time.sleep(random.randint(1, 3))
print first, last
processes = []
for m in range(1,16):
n = m + 1
p = Process(target=some_function, args=(m, n))
p.start()
processes.append(p)
for p in processes:
p.join()
Output
2 3
3 4
5 6
12 13
13 14
14 15
15 16
1 2
4 5
6 7
9 10
8 9
7 8
11 12
10 11
Are you sure? I just tried it and it worked for me; the results are out of order on every execution, so they're being executed concurrently.
Have a look at your function. It takes "first" and "last", so is its execution time smaller for lower values? In this case, you could expect the smaller numbered arguments to make runtime lower, so it would appear to run in parallel.
ps ux | grep python | grep -v grep | wc -l
> 16
If you execute the code repeatedly (i.e. using a bash script) you can see that every process is starting up. If you want to confirm this, import os and have the function print out os.getpid() so you can see they have a different process ID.
So yeah, double check your results because it seems to me like you've written it concurrently just fine!
This code below can run 10 processes parallelly printing the numbers from 0 to 99.
*if __name__ == "__main__":
is needed to run processes on Windows:
from multiprocessing import Process
def test():
for i in range(0, 100):
print(i)
if __name__ == "__main__": # Here
process_list = []
for _ in range(0, 10):
process = Process(target=test)
process_list.append(process)
for process in process_list:
process.start()
for process in process_list:
process.join()
And, this code below is the shorthand for loop version of the above code running 10 processes parallelly printing the numbers from 0 to 99:
from multiprocessing import Process
def test():
[print(i) for i in range(0, 100)]
if __name__ == "__main__":
process_list = [Process(target=test) for _ in range(0, 10)]
[process.start() for process in process_list]
[process.join() for process in process_list]
This is the result below:
...
99
79
67
71
67
89
81
99
80
68
...

Categories