I have an application where I have a set of objects that do a lot of setting up (this takes up to 30s-1minute per object). Once they have been set-up, I want to pass a parameter vector (small, <50 floats) and return a couple of small arrays back. Computations per object are "very fast" compared to setting up the object. I need to run this fast method many, many times, and have a cluster which I can use.
My idea is that this should be "pool" of workers where each worker gets an object initialised with its own particular configuration (which will be different), and stays in memory in the worker. Subsequently, a method of this object on each worker gets a vector and returns some arrays. These are then combined by the main program (order is not important).
Some MWE that works is as follows (serial version first, dask version second):
import datetime as dt
import itertools
import numpy as np
from dask.distributed import Client, LocalCluster
# Set up demo dask on local machine
cluster = LocalCluster()
client = Client(cluster)
class Model(object):
def __init__(self, power):
self.power = power
def powpow(self, x):
return np.power(x, self.power)
def neg(self, x):
return -x
def compute_me(self, x):
return sum(self.neg(self.powpow(x)))
# Set up the objects locally
bag = [Model(power)
for power in [1, 2, 3, 4, 5]
]
x = [13, 23, 37]
result = [obj.compute_me(x) for obj in bag]
# Using dask
# Wrapper function to pass the local object
# and parameter
def wrap(obj, x):
return obj.compute_me(x)
res = []
for obj,xx in itertools.product(bag, [x,]):
res.append(client.submit(wrap, obj, xx))
result_dask = [r.result() for r in res]
np.allclose(result, result_dask)
In my real-world case, the Model class does a lot of initialisation, pre-calculations, etc, and it takes possibly 10-50 times longer to initialise than to run the compute_me method. Basically, in my case, it'd be beneficial to have each worker have a pre-defined instance of Model locally, and have dask deliver the input to compute_me.
This post (2nd answer) initialising and storing in namespace, but the example doesn't show how pass different initialisation arguments to each worker. Or is there some other way of doing this?
Related
I am working on building a computation graph with Dask. Some of the intermediate values will be used multiple times, but I would like those calculations to only run once. I must be making a trivial mistake, because that's not what happens. Here is a minimal example:
In [1]: import dask
dask.__version__
Out [1]: '1.0.0'
In [2]: class SumGenerator(object):
def __init__(self):
self.sources = []
def register(self, source):
self.sources += [source]
def generate(self):
return dask.delayed(sum)([s() for s in self.sources])
In [3]: sg = SumGenerator()
In [4]: #dask.delayed
def source1():
return 1.
#dask.delayed
def source2():
return 2.
#dask.delayed
def source3():
return 3.
In [5]: sg.register(source1)
sg.register(source1)
sg.register(source2)
sg.register(source3)
In [6]: sg.generate().visualize()
Sadly I am unable to post the resulting graph image, but basically I see two separate nodes for the function source1 that was registered twice. Therefore the function is called twice. I would rather like to have it called once, the result remembered and added twice in the sum. What would be the correct way to do that?
You need to call the dask.delayed decorator by passing the pure=True argument.
From the dask delayed docs
delayed also accepts an optional keyword pure. If False, then subsequent calls will always produce a different Delayed
If you know a function is pure (output only depends on the input, with no global state), then you can set pure=True.
So using that
import dask
class SumGenerator(object):
def __init__(self):
self.sources = []
def register(self, source):
self.sources += [source]
def generate(self):
return dask.delayed(sum)([s() for s in self.sources])
#dask.delayed(pure=True)
def source1():
return 1.
#dask.delayed(pure=True)
def source2():
return 2.
#dask.delayed(pure=True)
def source3():
return 3.
sg = SumGenerator()
sg.register(source1)
sg.register(source1)
sg.register(source2)
sg.register(source3)
sg.generate().visualize()
Output and Graph
Using print(dask.compute(sg.generate())) gives (7.0,) which is the same as the one you wrote but without the extra node as seen in the image.
I've been searching around but have not found a solution. I've been working in Dask dictionary but the team is working in delayed object. I need to convert my dsk{} to the last step delayed object.
What I do now:
def add(x, y):
return x+y
dsk = {
'step1' : (add, 1, 2),
'step2' : (add, 'step1', 3),
'final' : (add, 'step2', 'step1'),
}
dask.visualize(dsk)
client.get(dsk, 'final')
In this way of working, all my functions are normal python functions. However, this is different than our team.
What the team is doing:
#dask.delayed
def add(x, y)
return x+y
step1 = add(1, 2)
step2 = add(step1, 3)
final = add(step2, step1)
final.visualize()
client.submit(final)
Then they are going to further schedule the work using the final step delayed object. How to convert the dsk last step final to the delayed object?
My current thinking (not working yet)
from dask.optimization import cull
outputs = ['final']
dsk1, dependencies = cull(dsk, outputs) # remove unnecessary tasks from the graph
After that, I'm not sure how to construct a delayed object.
Thank you!
Finally, I found a workaround. The idea is to iterate through the dsk to create delayed objects and dependencies.
# Covnert dsk dictionary to dask.delayed objects
for dsk_name, dsk_values in dsk.items():
args = []
dsk_function = dsk_values[0]
dsk_arguments = dsk_values[1:]
for arg in dsk_arguments:
if isinstance(arg, str):
# try to find the arguments in globals and return dependent dask object
args.append( globals().get(arg, arg) )
else:
args.append(arg)
globals()[dsk_name] = dask.delayed(dsk_function)(*args)
We generally recommend that people use Dask delayed. It is less error prone. Today, dictionaries are usually used mostly be people working on Dask itself. That said, if you want to convert a dictionary into a delayed object I recommend looking at the dask.Delayed object.
In [1]: from dask.delayed import Delayed
In [2]: Delayed?
Init signature: Delayed(key, dsk, length=None)
Docstring:
Represents a value to be computed by dask.
Equivalent to the output from a single key in a dask graph.
File: ~/workspace/dask/dask/delayed.py
Type: type
Subclasses: DelayedLeaf, DelayedAttr
So in your case you want
value = Delayed("final", dsk)
I am using joblib to run some simulations in parallel (for a MWE see my answer to this question). The simulator is a class instance which gets called at every step with an external action. If I am running 4 simulations, I want 4 different independent simulators. However, my call to joblib is something like this
simulator = Simulator(...)
results = Parallel(n_jobs=4)([delayed(run_simulation)(simulator) for i in range(4)])
The same Simulator instance is passed as argument in all cases. Is this instance duplicated before being given to the workers (which is what I want), or do they all use the same instance (which I do not want)?
I have answered my own question, although I am not sure of the answer. Any further insights would be appreciated.
This MWE of the problem described in the question appears to suggest that it does
from joblib import Parallel, delayed
class A:
def __init__(self):
self.c = 1
def call(self, a):
self.c += a
def get(self):
return self.c
a = A()
def call_a(inst):
inst.call(1)
return inst.get()
result = Parallel(n_jobs=4)([delayed(call_a)(a) for i in range(10)])
When you run it you obtain
In [2]: result
Out[2]: [2, 2, 2, 2, 2, 2, 2, 2, 2, 2]
If the instance a was shared, you would have obtained the numbers from 2 to 11.
I am trying to reproduce the reactive extensions "shared" observable concept with Python generators.
Say I have an API that gives me an infinite stream that I can use like this:
def my_generator():
for elem in the_infinite_stream():
yield elem
I could use this generator multiple times like so:
stream1 = my_generator()
stream2 = my_generator()
And the_infinite_stream() will be called twice (once for each generator).
Now say that the_infinite_stream() is an expensive operation. Is there a way to "share" the generator between multiple clients? It seems like tee would do that, but I have to know in advance how many independent generators I want.
The idea is that in other languages (Java, Swift) using the reactive extensions (RxJava, RxSwift) "shared" streams, I can conveniently duplicate the stream on the client side. I am wondering how to do that in Python.
Note: I am using asyncio
I took tee implementation and modified it such you can have various number of generators from infinite_stream:
import collections
def generators_factory(iterable):
it = iter(iterable)
deques = []
already_gone = []
def new_generator():
new_deque = collections.deque()
new_deque.extend(already_gone)
deques.append(new_deque)
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
newval = next(it) # fetch a new value and
already_gone.append(newval)
for d in deques: # load it to all the deques
d.append(newval)
yield mydeque.popleft()
return gen(new_deque)
return new_generator
# test it:
infinite_stream = [1, 2, 3, 4, 5]
factory = generators_factory(infinite_stream)
gen1 = factory()
gen2 = factory()
print(next(gen1)) # 1
print(next(gen2)) # 1 even after it was produced by gen1
print(list(gen1)) # [2, 3, 4, 5] # the rest after 1
To cache only some amount of values you can change already_gone = [] into already_gone = collections.deque(maxlen=size) and add size=None parameter to generators_factory.
Consider simple class attributes.
Given
def infinite_stream():
"""Yield a number from a (semi-)infinite iterator."""
# Alternatively, `yield from itertools.count()`
yield from iter(range(100000000))
# Helper
def get_data(iterable):
"""Print the state of `data` per stream."""
return ", ".join([f"{x.__name__}: {x.data}" for x in iterable])
Code
class SharedIterator:
"""Share the state of an iterator with subclasses."""
_gen = infinite_stream()
data = None
#staticmethod
def modify():
"""Advance the shared iterator + assign new data."""
cls = SharedIterator
cls.data = next(cls._gen)
Demo
Given a tuple of client streams (A, B and C),
# Streams
class A(SharedIterator): pass
class B(SharedIterator): pass
class C(SharedIterator): pass
streams = A, B, C
let us modify and print the state of one iterator shared between them:
# Observe changed state in subclasses
A.modify()
print("1st access:", get_data(streams))
B.modify()
print("2nd access:", get_data(streams))
C.modify()
print("3rd access:", get_data(streams))
Output
1st access: A: 0, B: 0, C: 0
2nd access: A: 1, B: 1, C: 1
3rd access: A: 2, B: 2, C: 2
Although any stream can modify the iterator, the class attribute is shared between sub-classes.
See Also
Docs on asyncio.Queue - an async alternative to shared container
Post on the Observer Pattern + asyncio
You can call "tee" repeatedly to create multiple iterators as needed.
it = iter([ random.random() for i in range(100)])
base, it_cp = itertools.tee(it)
_, it_cp2 = itertools.tee(base)
_, it_cp3 = itertools.tee(base)
Sample: http://tpcg.io/ZGc6l5.
You can use single generator and "subscriber generators":
subscribed_generators = []
def my_generator():
while true:
elem = yield
do_something(elem) # or yield do_something(elem) depending on your actual use
def publishing_generator():
for elem in the_infinite_stream():
for generator in subscribed_generators:
generator.send(elem)
subscribed_generators.extend([my_generator(), my_generator()])
# Next is just ane example that forces iteration over `the_infinite_stream`
for elem in publishing_generator():
pass
Instead of generator-function you may also create a class with methods: __next__, __iter__, send, throw. That way you can modify MyGenerator.__init__ method to automatically add new instances of it to subscribed_generators.
This is somewhat similar to event-based approach with a "dumb implementation":
for elem in the_infinite_stream is similar to emitting event
for generator ...: generator.send is similar to sending event to each subscriber.
So one way to implement a "more complex but structured solution" would be to use event-based approach:
For example you can use asyncio.Event
Or some third-party solution like aiopubsub
For any of those approaches you should emit event for each element from the_infinite_stream, and your instances of my_generator should be subscribed to those events.
And other approaches can also be used and the best choice depends: on details of your task, on how are you using event-loop in asyncio. For example:
You can implement the_infinite_stream (or wrapper for it) as some class with "cursors" (objects that track current position in the stream for different subscribers); then each my_generator registers new cursor and uses it to get next item in the infinite stream. In this approach event-loop will not automatically revisit my_generator instances, which might be required if those instances "are not equal" (for example have some "priority balancing")
Intermediate generator calling all the instances of my_generator (as described earlier). In this approach each instance of my_generator is automatically revisited by event-loop. Most likely this approach is thread-safe.
Event-based approaches:
using asyncio.Event. Similar to use of intermediate generator. Not
thread-safe
aiopubsub.
something that uses Observer pattern
Make the_infinite_generator (or wrapper for it) to be "Singleton" that "caches" latest event. Some approaches were described in other answers. Another "caching" solutions can be used:
emit the same element once for each instance of the_infinite_generator (use class with custom __new__ method that tracks instances, or use the same instance of class that has a method returning "shifted" iterator over the_infinite_loop) until someone calls special method on
instance of the_infinite_generator (or on class): infinite_gen.next_cycle. In
this case there should always be some "last finalizing
generator/processor" that at the end of each event-loop's cycle will
do the_infinite_generator().next_cycle()
Similar to previous but same event is allowed to fire multiple times in the same my_generator instance (so they should watch for this case). In this approach the_infinite_generator().next_cycle() can be called "periodically" with loop.call_later or loop.cal_at. This approach might be needed if "subscribers" should be able to handle/analyze: delays, rate-limits, timeouts between events, etc.
Many other solutions are possible. It's hard to propose something specific without looking at your current implementation and without knowing what is the desired behavior of generators that use the_infinite_loop
If I understand your description of "shared" streams correctly, that you really need "one" the_infinite_stream generator and a "handler" for it. Example that tries to do this:
class StreamHandler:
def __init__(self):
self.__real_stream = the_infinite_stream()
self.__sub_streams = []
def get_stream(self):
sub_stream = [] # or better use some Queue/deque object. Using list just to show base principle
self.__sub_streams.append(sub_stream)
while True:
while sub_stream:
yield sub_stream.pop(0)
next(self)
def __next__(self):
next_item = next(self.__real_stream)
for sub_stream in self.__sub_steams:
sub_stream.append(next_item)
some_global_variable = StreamHandler()
# Or you can change StreamHandler.__new__ to make it singleton, or you can create an instance at the point of creation of event-loop
def my_generator():
for elem in some_global_variable.get_stream():
yield elem
But if all your my_generator objects are initialized at the same point of infinite stream, and "equally" iterated inside the loop, then this approach will introduce "unnecessary" memory overhead for each "sub_stream" (used as queue). Unnecessary: because those queues will always be the same (but that can be optimized: if there are some existing "empty" sub_stream than it can be re-used for new sub_streams with some changes to "pop-logic"). And many-many other implementations and nuances can be discussed
If you have a single generator, you can use one queue per "subscriber" and route events to each subscriber as the primary generator produces results.
This has the advantage of allowing the subscribers to move at their own pace, and it can be dropped in existing code with very little changes to the original source.
For example:
def my_gen():
...
m1 = Muxer(my_gen)
m2 = Muxer(my_gen)
consumer1(m1).start()
consumer2(m2).start()
As items are pulled from the primary generator they are inserted into queues for each listener. Listeners can subscribe any time by constructing a new Muxer():
import queue
from threading import Lock
from collections import namedtuple
class Muxer():
Entry = namedtuple('Entry', 'genref listeners, lock')
already = {}
top_lock = Lock()
def __init__(self, func, restart=False):
self.restart = restart
self.func = func
self.queue = queue.Queue()
with self.top_lock:
if func not in self.already:
self.already[func] = self.Entry([func()], [], Lock())
ent = self.already[func]
self.genref = ent.genref
self.lock = ent.lock
self.listeners = ent.listeners
self.listeners.append(self)
def __iter__(self):
return self
def __next__(self):
try:
e = self.queue.get_nowait()
except queue.Empty:
with self.lock:
try:
e = self.queue.get_nowait()
except queue.Empty:
try:
e = next(self.genref[0])
for other in self.listeners:
if not other is self:
other.queue.put(e)
except StopIteration:
if self.restart:
self.genref[0] = self.func()
raise
return e
Original source code, including test suite:
https://gist.github.com/earonesty/cafa4626a2def6766acf5098331157b3
The unit tests run many threads concurrently processing the same generated events in sequence. The code is order preserving, with a lock acquired during the single generator's access.
Caveats: the version here uses a singleton to gate access, otherwise it would be possible to accidentally evade its control over the contained generators. It also allows the contained generators to be "restartable", which was a useful feature for me a the time. There is no "close()" feature, simply because I didn't need it. This is an appropriate use case for __del__ however, since the last reference to a listener is the right time to clean up.
I'm attempting to speed up a multivariate fixed-point iteration algorithm using multiprocessing however, I'm running issues dealing with shared data. My solution vector is actually a named dictionary rather than a vector of numbers. Each element of the vector is actually computed using a different formula. At a high level, I have an algorithm like this:
current_estimate = previous_estimate
while True:
for state in all_states:
current_estimate[state] = state.getValue(previous_estimate)
if norm(current_estimate, previous_estimate) < tolerance:
break
else:
previous_estimate, current_estimate = current_estimate, previous_estimate
I'm trying to parallelize the for-loop part with multiprocessing. The previous_estimate variable is read-only and each process only needs to write to one element of current_estimate. My current attempt at rewriting the for-loop is as follows:
# Class and function definitions
class A(object):
def __init__(self,val):
self.val = val
# representative getValue function
def getValue(self, est):
return est[self] + self.val
def worker(state, in_est, out_est):
out_est[state] = state.getValue(in_est)
def worker_star(a_b_c):
""" Allow multiple arguments for a pool
Taken from http://stackoverflow.com/a/5443941/3865495
"""
return worker(*a_b_c)
# Initialize test environment
manager = Manager()
estimates = manager.dict()
all_states = []
for i in range(5):
a = A(i)
all_states.append(a)
estimates[a] = 0
pool = Pool(process = 2)
prev_est = estimates
curr_est = estimates
pool.map(worker_star, itertools.izip(all_states, itertools.repeat(prev_est), itertools.repreat(curr_est)))
The issue I'm currently running into is that the elements added to the all_states array are not the same as those added to the manager.dict(). I keep getting key value errors when trying to access elements of the dictionary using elements of the array. And debugging, I found that none of the elements are the same.
print map(id, estimates.keys())
>>> [19558864, 19558928, 19558992, 19559056, 19559120]
print map(id, all_states)
>>> [19416144, 19416208, 19416272, 19416336, 19416400]
This is happening because the objects you're putting into the estimates DictProxy aren't actually the same objects as those that live in the regular dict. The manager.dict() call returns a DictProxy, which is proxying access to a dict that actually lives in a completely separate manager process. When you insert things into it, they're really being copied and sent to a remote process, which means they're going to have a different identity.
To work around this, you can define your own __eq__ and __hash__ functions on A, as described in this question:
class A(object):
def __init__(self,val):
self.val = val
# representative getValue function
def getValue(self, est):
return est[self] + self.val
def __hash__(self):
return hash(self.__key())
def __key(self):
return (self.val,)
def __eq__(x, y):
return x.__key() == y.__key()
This means the key look ups for items in the estimates will just use the value of the val attribute to establish identity and equality, rather than the id assigned by Python.