I'm trying to implement a recursive Fibonacci series which returns the value at an index. It's a homework and needs to be done using multi-threading. This is what I've done so far. My question is how do I add the results from live_thread1 and live_thread2. The threads have to be created at every level in the recursion.
def Recursive(n):
if n< 2:
return n
else:
return Recursive(n- 1) + Recursive(n- 2)
def FibonacciThreads(n):
if n< 2:
return n
else:
thread1 = threading.Thread(target=FibonacciThreads,args=(n-1,))
thread2 = threading.Thread(target=FibonacciThreads,args=(n-2,))
thread1.start()
thread2.start()
thread1.join()
thread2.join()
return live_thread1+live_thread2
This is not possible, because you cannot retrieve the return-value of a function executed in another thread.
To implement the behavior you want, you have to make FibonacciThreads a callable object that stores the result as a member variable:
class FibonacciThreads(object):
def __init__(self):
self.result = None
def __call__(self, n):
# implement logic here as above
# instead of a return, store the result in self.result
You can use instances of this class like functions:
fib = FibonacciThreads() # create instance
fib(23) # calculate the number
print fib.result # retrieve the result
Note that, as I said in my comment, this is not a very smart use of threads. If this really is your assignment, it is a bad one.
You can pass a mutable object to the thread to use for storing the result. If you don't want to introduce a new data type, you can for example just use a single element list:
def fib(n, r):
if n < 2:
r[0] = n
else:
r1 = [None]
r2 = [None]
# Start fib() threads that use r1 and r2 for results.
...
# Sum the results of the threads.
r[0] = r1[0] + r2[0]
def FibonacciThreads(n):
r = [None]
fib(n, r)
return r[0]
Related
Is it possible to create a iterator/generator which will decide on the next value based on some result on the previous iteration?
i.e.
y = None
for x in some_iterator(ll, y):
y = some_calculation_on(x)
I would like the logic of choosing the next x to depend on the calculation result allowing different logic for different results, much like in a search problem.
I also want to keep the how to choose the next x and the calculation on x as separate as possible.
Did you that you can send to a generator using generator.send? So yes, you can have a generator to change its behaviour based on feedback from the outside world. From the doc:
generator.send(value)
Resumes the execution and “sends” a value into the generator function.
The value argument becomes the result of the current yield expression.
The send() method returns the next value yielded by the generator
[...]
Example
Here is a counter that will increment only if told to do so.
def conditionalCounter(start=0):
while True:
should_increment = yield start
if should_increment:
start += 1
Usage
Since iteration with a for-loop does not allow to use generator.send, you have to use a while-loop.
import random
def some_calculation_on(value):
return random.choice([True, False])
g = conditionalCounter()
last_value = next(g)
while last_value < 5:
last_value = g.send(some_calculation_on(last_value))
print(last_value)
Output
0
0
1
2
3
3
4
4
5
Make it work in a for-loop
You can make the above work in a for-loop by crafting a YieldReceive class.
class YieldReceive:
stop_iteration = object()
def __init__(self, gen):
self.gen = gen
self.next = next(gen, self.stop_iteration)
def __iter__(self):
return self
def __next__(self):
if self.next is self.stop_iteration:
raise StopIteration
else:
return self.next
def send(self, value):
try:
self.next = self.gen.send(value)
except StopIteration:
self.next = self.stop_iteration
Usage
it = YieldReceive(...)
for x in it:
# Do stuff
it.send(some_result)
It's possible but confusing. If you want to keep the sequence of x values and the calculations on x separate, you should do this explicitly by not involving x with an iterator.
def next_value(x):
"""Custom iterator"""
# Bunch of code defining a new x
yield new_x
x = None
while True:
x = next_value(x)
x = some_calculation_on(x)
# Break when you're done
if finished and done:
break
If you want the loop to execute exactly i times, then use a for loop:
for step in range(i):
x = next_value(x)
x = some_calculation_on(x)
# No break
def conditional_iterator(y):
# stuff to create new values
yield x if (expression involving y) else another_x
for x in conditional_iterator(y):
y = some_computation(x)
I know that you can use .send(value) to send values to an generator. I also know that you can iterate over a generator in a for loop. Is it possible to pass values to a generator while iterating over it in a for loop?
What I'm trying to do is
def example():
previous = yield
for i range(0,10):
previous = yield previous*i
t = example()
for value in example"...pass in a value?...":
"...do something with the result..."
You technically could, but the results would be confusing. eg:
def example():
previous = (yield)
for i in range(1,10):
received = (yield previous)
if received is not None:
previous = received*i
t = example()
for i, value in enumerate(t):
t.send(i)
print value
Outputs:
None
0
2
8
18
Dave Beazley wrote an amazing article on coroutines (tldr; don't mix generators and coroutines in the same function)
Ok, so I figured it out. The trick is to create an additional generator that wraps t.send(value) in a for loop (t.send(value) for value in [...]).
def example():
previous = yield
for i in range(0,10):
previous = yield previous * i
t = examplr()
t.send(None)
for i in (t.send(i) for i in ["list of objects to pass in"]):
print i
I'm slowly getting to wrap my head around Python generators.
While it's not a real life problem for now, I'm still wondering why I can't return a generator from a function.
When I define a function with yield, it acts as a generator. But if I define it inside another function and try to return that instead, I get an ordinary function, i.e. not a generator with next method.
In other words, why the give_gen() approach in code below does not work?
#!/usr/bin/python
import time
def gen(d):
n = 0
while True:
n = n + d
time.sleep(0.5)
yield n
def give_gen(d):
def fn():
n = 0
while True:
n = n + d
time.sleep(0.5)
yield n
return fn
if __name__ == '__main__':
g = give_gen(3) # does not work
g = gen(3) # works well
while True:
print g.next()
# AttributeError: 'function' object has no attribute 'next'
# in case of give_gen
Why can't I return a generator from a function?
A generator function returns a generator only when called. Call fn to create the generator object:
return fn()
or call the returned object:
g = give_gen(3)()
You did call gen(); had you referred to just gen without calling it you'd have a reference to that function.
How do I parallelize a recursive function in Python?
My function looks like this:
def f(x, depth):
if x==0:
return ...
else :
return [x] + map(lambda x:f(x, depth-1), list_of_values(x))
def list_of_values(x):
# Heavy compute, pure function
When trying to parallelize it with multiprocessing.Pool.map, Windows opens an infinite number of processes and hangs.
What's a good (preferably simple) way to parallelize it (for a single multicore machine)?
Here is the code that hangs:
from multiprocessing import Pool
pool = pool(processes=4)
def f(x, depth):
if x==0:
return ...
else :
return [x] + pool.map(lambda x:f(x, depth-1), list_of_values(x))
def list_of_values(x):
# Heavy compute, pure function
OK, sorry for the problems with this.
I'm going to answer a slightly different question where f() returns the sum of the values in the list. That is because it's not clear to me from your example what the return type of f() would be, and using an integer makes the code simple to understand.
This is complex because there are two different things happening in parallel:
the calculation of the expensive function in the pool
the recursive expansion of f()
I am very careful to only use the pool to calculate the expensive function. In that way we don't get an "explosion" of processes, but because this is asynchronous we need to postpone a lot of work for the callback that the worker calls once the expensive function is done.
More than that, we need to use a countdown latch so that we know when all the separate sub-calls to f() are complete.
There may be a simpler way (I am pretty sure there is, but I need to do other things), but perhaps this gives you an idea of what is possible:
from multiprocessing import Pool, Value, RawArray, RLock
from time import sleep
class Latch:
'''A countdown latch that lets us wait for a job of "n" parts'''
def __init__(self, n):
self.__counter = Value('i', n)
self.__lock = RLock()
def decrement(self):
with self.__lock:
self.__counter.value -= 1
print('dec', self.read())
return self.read() == 0
def read(self):
with self.__lock:
return self.__counter.value
def join(self):
while self.read():
sleep(1)
def list_of_values(x):
'''An expensive function'''
print(x, ': thinking...')
sleep(1)
print(x, ': thought')
return list(range(x))
pool = Pool()
def async_f(x, on_complete=None):
'''Return the sum of the values in the expensive list'''
if x == 0:
on_complete(0) # no list, return 0
else:
n = x # need to know size of result beforehand
latch = Latch(n) # wait for n entires to be calculated
result = RawArray('i', n+1) # where we will assemble the map
def delayed_map(values):
'''This is the callback for the pool async process - it runs
in a separate thread within this process once the
expensive list has been calculated and orchestrates the
mapping of f over the result.'''
result[0] = x # first value in list is x
for (v, i) in enumerate(values):
def callback(fx, i=i):
'''This is the callback passed to f() and is called when
the function completes. If it is the last of all the
calls in the map then it calls on_complete() (ie another
instance of this function) for the calling f().'''
result[i+1] = fx
if latch.decrement(): # have completed list
# at this point result contains [x]+map(f, ...)
on_complete(sum(result)) # so return sum
async_f(v, callback)
# Ask worker to generate list then call delayed_map
pool.apply_async(list_of_values, [x], callback=delayed_map)
def run():
'''Tie into the same mechanism as above, for the final value.'''
result = Value('i')
latch = Latch(1)
def final_callback(value):
result.value = value
latch.decrement()
async_f(6, final_callback)
latch.join() # wait for everything to complete
return result.value
print(run())
PS: I am using Python 3.2 and the ugliness above is because we are delaying computation of the final results (going back up the tree) until later. It's possible something like generators or futures could simplify things.
Also, I suspect you need a cache to avoid needlessly recalculating the expensive function when called with the same argument as earlier.
See also yaniv's answer - which seems to be an alternative way to reverse the order of the evaluation by being explicit about depth.
After thinking about this, I found a simple, not complete, but good enough answer:
# A partially parallel solution. Just do the first level of recursion in parallel. It might be enough work to fill all cores.
import multiprocessing
def f_helper(data):
return f(x=data['x'],depth=data['depth'], recursion_depth=data['recursion_depth'])
def f(x, depth, recursion_depth):
if depth==0:
return ...
else :
if recursion_depth == 0:
pool = multiprocessing.Pool(processes=4)
result = [x] + pool.map(f_helper, [{'x':_x, 'depth':depth-1, 'recursion_depth':recursion_depth+1 } _x in list_of_values(x)])
pool.close()
else:
result = [x] + map(f_helper, [{'x':_x, 'depth':depth-1, 'recursion_depth':recursion_depth+1 } _x in list_of_values(x)])
return result
def list_of_values(x):
# Heavy compute, pure function
I store the main process id initially and transfer it to sub programs.
When I need to start a multiprocessing job, I check the number of children of the main process. If it is less than or equal to the half of my CPU count, then I run it as parallel. If it greater than the half of my CPU count, then I run it serial. In this way, it avoids bottlenecks and uses CPU cores effectively. You can tune the number of cores for your case. For example, you can set it to the exact number of CPU cores, but you should not exceed it.
def subProgramhWrapper(func, args):
func(*args)
parent = psutil.Process(main_process_id)
children = parent.children(recursive=True)
num_cores = int(multiprocessing.cpu_count()/2)
if num_cores >= len(children):
#parallel run
pool = MyPool(num_cores)
results = pool.starmap(subProgram, input_params)
pool.close()
pool.join()
else:
#serial run
for input_param in input_params:
subProgramhWrapper(subProgram, input_param)
An statistical accumulator allows one to perform incremental calculations. For instance, for computing the arithmetic mean of a stream of numbers given at arbitrary times one could make an object which keeps track of the current number of items given, n and their sum, sum. When one requests the mean, the object simply returns sum/n.
An accumulator like this allows you to compute incrementally in the sense that, when given a new number, you don't need to recompute the entire sum and count.
Similar accumulators can be written for other statistics (cf. boost library for a C++ implementation).
How would you implement accumulators in Python? The code I came up with is:
class Accumulator(object):
"""
Used to accumulate the arithmetic mean of a stream of
numbers. This implementation does not allow to remove items
already accumulated, but it could easily be modified to do
so. also, other statistics could be accumulated.
"""
def __init__(self):
# upon initialization, the numnber of items currently
# accumulated (_n) and the total sum of the items acumulated
# (_sum) are set to zero because nothing has been accumulated
# yet.
self._n = 0
self._sum = 0.0
def add(self, item):
# the 'add' is used to add an item to this accumulator
try:
# try to convert the item to a float. If you are
# successful, add the float to the current sum and
# increase the number of accumulated items
self._sum += float(item)
self._n += 1
except ValueError:
# if you fail to convert the item to a float, simply
# ignore the exception (pass on it and do nothing)
pass
#property
def mean(self):
# the property 'mean' returns the current mean accumulated in
# the object
if self._n > 0:
# if you have more than zero items accumulated, then return
# their artithmetic average
return self._sum / self._n
else:
# if you have no items accumulated, return None (you could
# also raise an exception)
return None
# using the object:
# Create an instance of the object "Accumulator"
my_accumulator = Accumulator()
print my_accumulator.mean
# prints None because there are no items accumulated
# add one (a number)
my_accumulator.add(1)
print my_accumulator.mean
# prints 1.0
# add two (a string - it will be converted to a float)
my_accumulator.add('2')
print my_accumulator.mean
# prints 1.5
# add a 'NA' (will be ignored because it cannot be converted to float)
my_accumulator.add('NA')
print my_accumulator.mean
# prints 1.5 (notice that it ignored the 'NA')
Interesting design questions arise:
How to make the accumulator
thread-safe?
How to safely remove
items?
How to architect in a way
that allows other statistics to be
plugged in easily (a factory for statistics)
For a generalized, threadsafe higher-level function, you could use something like the following in combination with the Queue.Queue class and some other bits:
from Queue import Empty
def Accumulator(f, q, storage):
"""Yields successive values of `f` over the accumulation of `q`.
`f` should take a single iterable as its parameter.
`q` is a Queue.Queue or derivative.
`storage` is a persistent sequence that provides an `append` method.
`collections.deque` may be particularly useful, but a `list` is quite acceptable.
>>> from Queue import Queue
>>> from collections import deque
>>> from threading import Thread
>>> def mean(it):
... vals = tuple(it)
... return sum(it) / len(it)
>>> value_queue = Queue()
>>> LastThreeAverage = Accumulator(mean, value_queue, deque((), 3))
>>> def add_to_queue(it, queue):
... for value in it:
... value_queue.put(value)
>>> putting_thread = Thread(target=add_to_queue,
... args=(range(0, 12, 2), value_queue))
>>> putting_thread.start()
>>> list(LastThreeAverage)
[0, 1, 2, 4, 6, 8]
"""
try:
while True:
storage.append(q.get(timeout=0.1))
q.task_done()
yield f(storage)
except Empty:
pass
This generator function evades most of its purported responsibility by delegating it to other entities:
It relies on Queue.Queue to supply its source elements in a thread-safe manner
A collections.deque object can be passed in as the value of the storage parameter; this provides, among other things, a convenient way to only use the last n (in this case 3) values
The function itself (in this case mean) is passed as a parameter. This will result in less-than-optimally efficient code in some cases, but is readily applied to all sorts of situations.
Note that there is a possibility of the accumulator timing out if your producer thread takes longer than 0.1 seconds per value. This is easily remedied by passing a longer timeout or by removing the timeout parameter entirely. In the latter case the function will block indefinitely at the end of the queue; this usage makes more sense in a case where it's being used in a sub thread (usually a daemon thread). Of course you can also parametrize the arguments that are passed to q.get as a fourth argument to Accumulator.
If you want to communicate end of queue, i.e. that there are no more values to come, from the producer thread (here putting_thread), you can pass and check for a sentinel value or use some other method. There is more info in this thread; I opted to write a subclass of Queue.Queue called CloseableQueue that provides a close method.
There are various other ways you could customize the behaviour of such a function, for example by limiting the queue size; this is just an example of usage.
edit
As mentioned above, this loses some efficiency because of the necessity of recalculation and also, I think, doesn't really answer your question.
A generator function can also accept values through its send method. So you can write a mean generator function like
def meangen():
"""Yields the accumulated mean of sent values.
>>> g = meangen()
>>> g.send(None) # Initialize the generator
>>> g.send(4)
4.0
>>> g.send(10)
7.0
>>> g.send(-2)
4.0
"""
sum = yield(None)
count = 1
while True:
sum += yield(sum / float(count))
count += 1
Here the yield expression is both bringing values —the arguments to send— into the function, while simultaneously passing the calculated values out as the return value of send.
You can pass the generator returned by a call to that function to a more optimizable accumulator generator function like this one:
def EfficientAccumulator(g, q):
"""Similar to Accumulator but sends values to a generator `g`.
>>> from Queue import Queue
>>> from threading import Thread
>>> value_queue = Queue()
>>> g = meangen()
>>> g.send(None)
>>> mean_accumulator = EfficientAccumulator(g, value_queue)
>>> def add_to_queue(it, queue):
... for value in it:
... value_queue.put(value)
>>> putting_thread = Thread(target=add_to_queue,
... args=(range(0, 12, 2), value_queue))
>>> putting_thread.start()
>>> list(mean_accumulator)
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0]
"""
try:
while True:
yield(g.send(q.get(timeout=0.1)))
q.task_done()
except Empty:
pass
If I were doing this in Python, there are two things I would do differently:
Separate out the functionality of each accumulator.
Not use #property in any way you did.
For the first one, I would likely want to come up with an API for performing an accumulation, perhaps something like:
def add(self, num) # add a number
def compute(self) # compute the value of the accumulator
Then I would create a AccumulatorRegistry that holds onto these accumulators, and allows the user to call actions and add to all of them. The code may look like:
class Accumulators(object):
_accumulator_library = {}
def __init__(self):
self.accumulator_library = {}
for key, value in Accumulators._accumulator_library.items():
self.accumulator_library[key] = value()
#staticmethod
def register(name, accumulator):
Accumulators._accumulator_library[name] = accumulator
def add(self, num):
for accumulator in self.accumulator_library.values():
accumulator.add(num)
def compute(self, name):
self.accumulator_library[name].compute()
#staticmethod
def register_decorator(name):
def _inner(cls):
Accumulators.register(name, cls)
return cls
#Accumulators.register_decorator("Mean")
class Mean(object):
def __init__(self):
self.total = 0
self.count = 0
def add(self, num):
self.count += 1
self.total += num
def compute(self):
return self.total / float(self.count)
I should probably speak to your thread-safe question. Python's GIL protects you from a lot of threading issues. There are a few things you may way to do to protect yourself though:
If these objects are localized to one thread, use threading.local
If not, you can wrap the operations in a lock, using the with context syntax to deal with holding the lock for you.