I have the sketch of my code as follows:
def func1(c):
return a,b
def func2(c,x):
if condition:
a,b = func1(c)
x.append(a,b)
func2(a,x)
func2(b,x)
return x
x = []
y = func2(c, x)
The problem, as you might have figured out from the code, is that I would like func2(b) to be computed in parallel with func2(a) whenever condition is true i.e. before b is replace by a new b from func2(a). But according to my algorithm, this clearly can not happen due to the new b's.
I do think such a problem might be perfect for parallel computing approach. But, I did not use it before and my knowledge about that is quite limited. I did try the suggestion from How to do parallel programming in Python, though. But I got the same result like the sketch above.
Caveat: Threading might not be parallel enough for you (see https://docs.python.org/2/library/threading.html note on the Global Interpreter Lock) so you might have to use the multiprocessing library instead (https://docs.python.org/2/library/multiprocessing.html).
...So I've cheated/been-lazy & used a thread/process neutral term "job". You'll need to pick either threading or multiprocessing for everywhere that I use "job".
def func1(c):
return a,b
def func2(c,x):
if condition:
a,b = func1(c)
x.append(a,b)
a_job = None
if (number_active_jobs() >= NUM_CPUS):
# do a and b sequentially
func2(a, x)
else:
a_job = fork_job(func2, a, x)
func2(b,x)
if a_job is not None:
join(a_job)
x = []
func2(c, x)
# all results are now in x (don't need y)
...that will be best if you need a,b pairs to finish together for some reason.
If you're willing to let the scheduler go nuts, you could "job" them all & then join at the end:
def func1(c):
return a,b
def func2(c,x):
if condition:
a,b = func1(c)
x.append(a,b)
if (number_active_jobs() >= NUM_CPUS):
# do a and b sequentially
func2(a, x)
else:
all_jobs.append(fork_job(func2, a, x))
# TODO: the same job-or-sequential for func2(b,x)
all_jobs = []
x = []
func2(c, x)
for j in all_jobs:
join(j)
# all results are now in x (don't need y)
The NUM_CPUS check could be done with threading.activeCount() instead of a full blown worker threa pool (python - how to get the numebr of active threads started by specific class?).
But with multiprocessing you'd have more work to do with JoinableQueue and a fixed size Pool of workers
From your explanation I have a feeling that it is not that b gets updated (which is not, as DouglasDD explained), but x. To let both recursive calls to work on a same x, you need to take some sort of a snapshot of x. The simplest way is to pass an index of a newly appended tuple, along the lines of
def func2(c, x, length):
...
x.append(a, b)
func2(a, x, length + 1)
func2(b, x, length + 1)
Related
I have a problem where I need to produce something which is naturally computed recursively, but where I also need to be able to interrogate the intermediate steps in the recursion if needed.
I know I can do this by passing and mutating a list or similar structure. However, this looks ugly to me and I'm sure there must be a neater way, e.g. using generators. What I would ideally love to be able to do is something like:
intermediate_results = [f(x) for x in range(T)]
final_result = intermediate_results[T-1]
in an efficient way. While my solution is not performance critical, I can't justify the massive amount of redundant effort in that first line. It looks to me like a generator would be perfect for this except for the fact that f is fundamentally much more suited to recursion in my case (which at least in my mind is the complete opposite of a generator, but maybe I'm just not thinking far enough outside of the box).
Is there a neat Pythonic way of doing something like this that I just don't know about, or do I just need to just capitulate and pollute my function f by passing it an intermediate_results list which I then mutate as a side-effect?
I have a generic solution for you using a decorator. We create a Memoize class which stores the results of previous times the function is executed (including in recursive calls). If the arguments given have already been seen, the cached versions are used to quickly lookup the result.
The custom class has the benefit over an lru_cache in that you can see the results.
from functools import wraps
class Memoize:
def __init__(self):
self.store = {}
def save(self, fun):
#wraps(fun)
def wrapper(*args):
if args not in self.store:
self.store[args] = fun(*args)
return self.store[args]
return wrapper
m = Memoize()
#m.save
def fibo(n):
if n <= 0: return 0
elif n == 1: return 1
else: return fibo(n-1) + fibo(n-2)
Then after running different things you can see what the cache contains. When you run future function calls, m.store will be used as a lookup so calculation doesn't need to be redone.
>>> f(8)
21
>>> m.store
{(1,): 1,
(0,): 0,
(2,): 1,
(3,): 2,
(4,): 3,
(5,): 5,
(6,): 8,
(7,): 13,
(8,): 21}
You could modify the save function to use the name of the function and the args as the key, so that multiple function results can be stored in the same Memoize class.
You can use your existing solution that makes many "redundant" calls to f, but employ the use of function caching to save the results to previous calls to f.
In other words, when f(x1) is called, it's input arguments and corresponding return values are saved, and the next time it is called, the result is simply pulled from the cache
see functools.lru_cache for the standard library solution to this
ie:
from functools import lru_cache
#lru_cache
intermediate_results = [f(x) for x in range(T)]
final_result = intermediate_results[T-1]
Note, however, f must be a pure function (no side-effects, 1-to-1 mapping) for this to work properly
Having considered your comments, I'll now try to give another perspective on the problem.
So, let's consider a concrete example:
def f(x):
a = 2
return g(x) + a if x != 0 else 0
def g(x):
b = 1
return h(x) - b
def h(x):
c = 1/2
return f(x-1)*(1+c)
I
First of all, it should be mentioned that (in our particular case) the algorithm has form of: f(x) = p(f(x - 1)) for some p. It follows that f(x) = p^x(f(0)) = p^x(0). That means we should just apply p to 0 x times to get the desired result, which can be done in an iterative process, so this can be written without recursion. Though I believe that your real case is much harder. Moreover, it would be too boring and uninformative to stop here)
II
Generally speaking, we can divide all possible solutions into two groups: the ones that require refactoring (i.e. rewriting functions f, g, h) and the ones that do not. I have little to offer from the latter one (and I don't think anyone can). Consider the following, however:
def fk(x, k):
a = 2
return k(gk(x, k) + a if x != 0 else 0)
def gk(x, k):
b = 1
return k(hk(x, k) - b)
def hk(x, k):
c = 1/2
return k(fk(x-1, k)*(1+c))
def printret(x):
print(x)
return x
f(4, printret) # see what happens
Inspired by continuation-passing style, but that's totally not it.
What's the point? It's something between your idea of passing a list to write down all the computations and memoizing. This k carries additional behavior with it, such as printing or writing to list (you can make a function that writes to some list, why not?). But if you look carefully you'll see that it lefts inner code of these functions practically untouched (only input and output to function are affected), so one can produce a decorator associated with a function like printret that does essentially the same thing for f, g, h.
Pros: no need to modify code, much more flexible than passing a list, no additional work (like in memoizing).
Cons: Impure (printing or modifying sth), not so flexible as we would like.
III
Now let's see how modifying function bodies can help. Don't be afraid of what's written below, take your time and play with that thing a little.
class Logger:
def __init__(self, lst, cur_val):
self.lst = lst
self.cur_val = cur_val
def bind(self, f):
res = f(self.cur_val)
return Logger([self.cur_val] + res.lst + self.lst, res.cur_val)
def __repr__(self):
return "Logger( " + repr({'value' : self.cur_val,'lst' : self.lst}) + " )"
def unit(x):
return Logger([], x)
# you can also play with lala
def lala(x):
if x <= 0:
return unit(1)
else:
return lala(x - 1).bind(lambda y: unit(2*y))
def f(x):
a = 2
if x == 0:
return unit(0)
else:
return g(x).bind(lambda y: unit(y + a))
def g(x):
b = 1
return h(x).bind(lambda y: unit(y - b))
def h(x):
c = 1/2
return f(x-1).bind(lambda y: unit(y*(1+c)))
f(4) # see for yourself
Logger is called a monad. I'm not very familiar with this concept myself, but I guess I'm doing everything right) f, g, h are functions that take a number and return a Logger instance. Logger's bind takes in a function (like f) and returns Logger with new value (computed by f) and updated 'logs'. The key point - as I see it - is the ability to do whatever we want with collected functions in the order the resulting value was calculated.
Afterword
I'm not at all some kind of 'guru' of functional programming, I believe I'm missing a lot of things here. But what I've understood is that functional programming is about inversing the flow of the program. That's why, for instance, I totally agree with your opinion about generators being opposed to functional programming. When we use generator gen in, say, function func, we yield values one by one to func and func does sth with them in e.g. a loop. The functional approach would be to make gen a function taking func as a parameter and make func perform computations on 'yielded' values. It's like gen and func exchanged their places. So the flow is inversed! And there are plenty of other ways of inversing the flow. Monads are one of them.
itertools islice gets a generator, start value and stop value. it will give you the elements between the start value and stop value as a generator. if islice is not clear you can check the docs here https://docs.python.org/3/library/itertools.html
intermediate_result = map(f, range(T))
final_result = next(itertools.islice(intermediate_result, start=T-1, stop=T))
I got a program that I ran in multiprocessing. I would like to have a progression system with a print.
This is what I came up with:
import multiprocessing as mp
import os
global counter
global size
def f(x):
global counter
global size
print ("{} / {}".format(counter, size))
counter += 1
return x**2
size = 4
counter = 1
result = list()
for x in [1,2,3,4]:
result.append(f(x))
This one works. However, if you replace the bottom part with:
with mp.Pool(processes = 2) as p:
p.starmap(f, [1,2,3,4])
It doesn't. I don't understand why, can anyone help to get that up and running ? Thanks :)
N.B: This is of course a dummy example.
EDIT:
Ok new issue appear with your solution. I'll make an example:
fix1 = 1
fix2 = 2
dynamic = [1,2,3,4,5]
def f(x, y, z):
return x**2 + y + z
size = len(dynamic)
counter = 1
with mp.Pool(processes = 2) as p:
for output in p.starmap(f, [(x, fix1, fix2) for x in dynamic]):
print ("{} / {}".format(counter, size))
counter += 1
This one works but does all the print at the end.
with mp.Pool(processes = 2) as p:
for output in p.imap_unordered(f, [(x, fix1, fix2) for x in dynamic]):
print ("{} / {}".format(counter, size))
counter += 1
This one doesn't work and say that f() is missing 2 required positional arguments fix1 and fix2.
Any idea why I get this behavior?
N.B: I'm running on windows.
On a forking system like linux, subprocesses share a copy-on-write view of the parent memory space. If one side updates memory, it gets its own private copy of the changed pages. On other systems, a new process is created and a new python is executed. In either case, neither side sees the changes the others make. And that means that everyone is updating their own private copy of count and don't see the additions made by the others.
To keep things complicated, stdout is not synchronized. If workers print, you re likely to get garbled messages.
An alternative is to count the results as they come back to the parent pool. The parent tracks the count and the parent is the only one printing. If you don't care about the order of the returned data, then imap_unordered will work well for you.
import multiprocessing as mp
def f(x):
return x**2
data = [1,2,3,4]
result = []
with mp.Pool(processes = 2) as p:
for val in p.imap_unordered(f, data):
result.append(val)
print("progress", len(result)/len(data))
I would like to integrate a system of differential equations for several parameter combinations using Python's multiprocessing module. So, the system should get integrated and the parameter combination should be stored as well as its index and the final value of one of the variables.
While that works fine when I use apply_async - which is already faster than doing it in a simple for-loop - I fail to implement the same thing using map_async which seems to be faster than apply_async. The callback function is never called and I have no clue why. Could anyone explain why this happens and how to get the same output using map_async instead of apply_async?!
Here is my code:
from pylab import *
import multiprocessing as mp
from scipy.integrate import odeint
import time
#my system of differential equations
def myODE (yn,tvec,allpara):
(x, y, z) = yn
a, b = allpara['para']
dx = -x + a*y + x*x*y
dy = b - a*y - x*x*y
dz = x*y
return (dx, dy, dz)
#returns the index of the parameter combination, the parameters and the integrated solution
#this way I know which parameter combination belongs to which outcome in the asynch-case
def runMyODE(yn,tvec,allpara):
return allpara['index'],allpara['para'],transpose(odeint(myODE, yn, tvec, args=(allpara,)))
#for reproducibility
seed(0)
#time settings for integration
dt = 0.01
tmax = 50
tval = arange(0,tmax,dt)
numVar = 3 #number of variables (x, y, z)
numPar = 2 #number of parameters (a, b)
numComb = 5 #number of parameter combinations
INIT = zeros((numComb,numVar)) #initial conditions will be stored here
PARA = zeros((numComb,numPar)) #parameter combinations for a and b will be stored here
#create some initial conditions and random parameters
for combi in range(numComb):
INIT[combi,:] = append(10*rand(2),0) #initial conditions for x and y are randomly chosen, z is 0
PARA[combi,:] = 10*rand(2) #parameter a and b are chosen randomly
#################################using loop over apply####################
#results will be stored in here
asyncResultsApply = []
#my callback function
def saveResultApply(result):
# storing the index, a, b and the final value of z
asyncResultsApply.append((result[0], result[1], result[2][2,-1]))
#start the multiprocessing part
pool = mp.Pool(processes=4)
for combi in range(numComb):
pool.apply_async(runMyODE, args=(INIT[combi,:],tval,{'para': PARA[combi,:], 'index': combi}), callback=saveResultApply)
pool.close()
pool.join()
for res in asyncResultsApply:
print res[0], res[1], res[2] #printing the index, a, b and the final value of z
#######################################using map#####################
#the only difference is that the for loop is replaced by a "map_async" call
print "\n\nnow using map\n\n"
asyncResultsMap = []
#my callback function which is never called
def saveResultMap(result):
# storing the index, a, b and the final value of z
asyncResultsMap.append((result[0], result[1], result[2][2,-1]))
pool = mp.Pool(processes=4)
pool.map_async(lambda combi: runMyODE(INIT[combi,:], tval, {'para': PARA[combi,:], 'index': combi}), range(numComb), callback=saveResultMap)
pool.close()
pool.join()
#this does not work yet
for res in asyncResultsMap:
print res[0], res[1], res[2] #printing the index, a, b and the final value of z
If I understood you correctly, it stems from something that confuses people quite often. apply_async's callback is called after the single op, but so does map's - it does not call the callback on each element, but rather once on the entire result.
You are correct in noting that map is faster than apply_asyncs. If you want something to happen after each result, there are a few ways to go:
You can effectively add the callback to the operation you want to be performed on each element, and map using that.
You could use imap (or imap_unordered) in a loop, and do the callback within the loop body. Of course, this means that all will be performed in the parent process, but the nature of stuff written as callbacks means that's usually not a problem (it tends to be cheap functions). YMMV.
For example, suppose you have the functions f and cb, and you'd like to map f on es with cb for each op. Then you could either do:
def look_ma_no_cb(e):
r = f(e)
cb(r)
return r
p = multiprocessing.Pool()
p.map(look_ma_no_cb, es)
or
for r in p.imap(f, es):
cb(r)
i would like to perform a calculation using python, where the current value (i) of the equation is based on the previous value of the equation (i-1), which is really easy to do in a spreadsheet but i would rather learn to code it
i have noticed that there is loads of information on finding the previous value from a list, but i don't have a list i need to create it! my equation is shown below.
h=(2*b)-h[i-1]
can anyone give me tell me a method to do this ?
i tried this sort of thing, but that will not work as when i try to do the equation i'm calling a value i haven't created yet, if i set h=0 then i get an error that i am out of index range
i = 1
for i in range(1, len(b)):
h=[]
h=(2*b)-h[i-1]
x+=1
h = [b[0]]
for val in b[1:]:
h.append(2 * val - h[-1]) # As you add to h, you keep up with its tail
for large b list (brr, one-letter identifier), to avoid creating large slice
from itertools import islice # For big list it will keep code less wasteful
for val in islice(b, 1, None):
....
As pointed out by #pad, you simply need to handle the base case of receiving the first sample.
However, your equation makes no use of i other than to retrieve the previous result. It's looking more like a running filter than something which needs to maintain a list of past values (with an array which might never stop growing).
If that is the case, and you only ever want the most recent value,then you might want to go with a generator instead.
def gen():
def eqn(b):
eqn.h = 2*b - eqn.h
return eqn.h
eqn.h = 0
return eqn
And then use thus
>>> f = gen()
>>> f(2)
4
>>> f(3)
2
>>> f(2)
0
>>>
The same effect could be acheived with a true generator using yield and send.
First of, do you need all the intermediate values? That is, do you want a list h from 0 to i? Or do you just want h[i]?
If you just need the i-th value you could us recursion:
def get_h(i):
if i>0:
return (2*b) - get_h(i-1)
else:
return h_0
But be aware that this will not work for large i, as it will exceed the maximum recursion depth. (Thanks for pointing this out kdopen) In that case a simple for-loop or a generator is better.
Even better is to use a (mathematically) closed form of the equation (for your example that is possible, it might not be in other cases):
def get_h(i):
if i%2 == 0:
return h_0
else:
return (2*b)-h_0
In both cases h_0 is the initial value that you start out with.
h = []
for i in range(len(b)):
if i>0:
h.append(2*b - h[i-1])
else:
# handle i=0 case here
You are successively applying a function (equation) to the result of a previous application of that function - the process needs a seed to start it. Your result looks like this [seed, f(seed), f(f(seed)), f(f(f(seed)), ...]. This concept is function composition. You can create a generalized function that will do this for any sequence of functions, in Python functions are first class objects and can be passed around just like any other object. If you need to preserve the intermediate results use a generator.
def composition(functions, x):
""" yields f(x), f(f(x)), f(f(f(x)) ....
for each f in functions
functions is an iterable of callables taking one argument
"""
for f in functions:
x = f(x)
yield x
Your specs require a seed and a constant,
seed = 0
b = 10
The equation/function,
def f(x, b = b):
return 2*b - x
f is applied b times.
functions = [f]*b
Usage
print list(composition(functions, seed))
If the intermediate results are not needed composition can be redefined as
def composition(functions, x):
""" Returns f(x), g(f(x)), h(g(f(x)) ....
for each function in functions
functions is an iterable of callables taking one argument
"""
for f in functions:
x = f(x)
return x
print composition(functions, seed)
Or more generally, with no limitations on call signature:
def compose(funcs):
'''Return a callable composed of successive application of functions
funcs is an iterable producing callables
for [f, g, h] returns f(g(h(*args, **kwargs)))
'''
def outer(f, g):
def inner(*args, **kwargs):
return f(g(*args, **kwargs))
return inner
return reduce(outer, funcs)
def plus2(x):
return x + 2
def times2(x):
return x * 2
def mod16(x):
return x % 16
funcs = (mod16, plus2, times2)
eq = compose(funcs) # mod16(plus2(times2(x)))
print eq(15)
While the process definition appears to be recursive, I resisted the temptation so I could stay out of maximum recursion depth hades.
I got curious, searched SO for function composition and, of course, there are numerous relavent Q&A's.
How do I parallelize a recursive function in Python?
My function looks like this:
def f(x, depth):
if x==0:
return ...
else :
return [x] + map(lambda x:f(x, depth-1), list_of_values(x))
def list_of_values(x):
# Heavy compute, pure function
When trying to parallelize it with multiprocessing.Pool.map, Windows opens an infinite number of processes and hangs.
What's a good (preferably simple) way to parallelize it (for a single multicore machine)?
Here is the code that hangs:
from multiprocessing import Pool
pool = pool(processes=4)
def f(x, depth):
if x==0:
return ...
else :
return [x] + pool.map(lambda x:f(x, depth-1), list_of_values(x))
def list_of_values(x):
# Heavy compute, pure function
OK, sorry for the problems with this.
I'm going to answer a slightly different question where f() returns the sum of the values in the list. That is because it's not clear to me from your example what the return type of f() would be, and using an integer makes the code simple to understand.
This is complex because there are two different things happening in parallel:
the calculation of the expensive function in the pool
the recursive expansion of f()
I am very careful to only use the pool to calculate the expensive function. In that way we don't get an "explosion" of processes, but because this is asynchronous we need to postpone a lot of work for the callback that the worker calls once the expensive function is done.
More than that, we need to use a countdown latch so that we know when all the separate sub-calls to f() are complete.
There may be a simpler way (I am pretty sure there is, but I need to do other things), but perhaps this gives you an idea of what is possible:
from multiprocessing import Pool, Value, RawArray, RLock
from time import sleep
class Latch:
'''A countdown latch that lets us wait for a job of "n" parts'''
def __init__(self, n):
self.__counter = Value('i', n)
self.__lock = RLock()
def decrement(self):
with self.__lock:
self.__counter.value -= 1
print('dec', self.read())
return self.read() == 0
def read(self):
with self.__lock:
return self.__counter.value
def join(self):
while self.read():
sleep(1)
def list_of_values(x):
'''An expensive function'''
print(x, ': thinking...')
sleep(1)
print(x, ': thought')
return list(range(x))
pool = Pool()
def async_f(x, on_complete=None):
'''Return the sum of the values in the expensive list'''
if x == 0:
on_complete(0) # no list, return 0
else:
n = x # need to know size of result beforehand
latch = Latch(n) # wait for n entires to be calculated
result = RawArray('i', n+1) # where we will assemble the map
def delayed_map(values):
'''This is the callback for the pool async process - it runs
in a separate thread within this process once the
expensive list has been calculated and orchestrates the
mapping of f over the result.'''
result[0] = x # first value in list is x
for (v, i) in enumerate(values):
def callback(fx, i=i):
'''This is the callback passed to f() and is called when
the function completes. If it is the last of all the
calls in the map then it calls on_complete() (ie another
instance of this function) for the calling f().'''
result[i+1] = fx
if latch.decrement(): # have completed list
# at this point result contains [x]+map(f, ...)
on_complete(sum(result)) # so return sum
async_f(v, callback)
# Ask worker to generate list then call delayed_map
pool.apply_async(list_of_values, [x], callback=delayed_map)
def run():
'''Tie into the same mechanism as above, for the final value.'''
result = Value('i')
latch = Latch(1)
def final_callback(value):
result.value = value
latch.decrement()
async_f(6, final_callback)
latch.join() # wait for everything to complete
return result.value
print(run())
PS: I am using Python 3.2 and the ugliness above is because we are delaying computation of the final results (going back up the tree) until later. It's possible something like generators or futures could simplify things.
Also, I suspect you need a cache to avoid needlessly recalculating the expensive function when called with the same argument as earlier.
See also yaniv's answer - which seems to be an alternative way to reverse the order of the evaluation by being explicit about depth.
After thinking about this, I found a simple, not complete, but good enough answer:
# A partially parallel solution. Just do the first level of recursion in parallel. It might be enough work to fill all cores.
import multiprocessing
def f_helper(data):
return f(x=data['x'],depth=data['depth'], recursion_depth=data['recursion_depth'])
def f(x, depth, recursion_depth):
if depth==0:
return ...
else :
if recursion_depth == 0:
pool = multiprocessing.Pool(processes=4)
result = [x] + pool.map(f_helper, [{'x':_x, 'depth':depth-1, 'recursion_depth':recursion_depth+1 } _x in list_of_values(x)])
pool.close()
else:
result = [x] + map(f_helper, [{'x':_x, 'depth':depth-1, 'recursion_depth':recursion_depth+1 } _x in list_of_values(x)])
return result
def list_of_values(x):
# Heavy compute, pure function
I store the main process id initially and transfer it to sub programs.
When I need to start a multiprocessing job, I check the number of children of the main process. If it is less than or equal to the half of my CPU count, then I run it as parallel. If it greater than the half of my CPU count, then I run it serial. In this way, it avoids bottlenecks and uses CPU cores effectively. You can tune the number of cores for your case. For example, you can set it to the exact number of CPU cores, but you should not exceed it.
def subProgramhWrapper(func, args):
func(*args)
parent = psutil.Process(main_process_id)
children = parent.children(recursive=True)
num_cores = int(multiprocessing.cpu_count()/2)
if num_cores >= len(children):
#parallel run
pool = MyPool(num_cores)
results = pool.starmap(subProgram, input_params)
pool.close()
pool.join()
else:
#serial run
for input_param in input_params:
subProgramhWrapper(subProgram, input_param)