Load local (unserializable) objects on workers - python

I'm trying to use Dataflow in conjunction with Tensorflow for predictions. Those predictions are happening on the workers and I'm currently loading the model through startup_bundle(). Like here:
class PredictDoFn(beam.DoFn):
def start_bundle(self):
self.model = load_model_from_file()
def process(self, element):
...
My current problem is that even if I process 1000 elements, the startup_bundle() function is called multiple times (at least 10) and not once per worked as I've hoped. This slows down the pipeline significantly because the model needs to be loaded many times and it takes every time 30 seconds.
Are there any ways to load the model on the workers on initialisation and not every time in the start_bundle()?
Thanks in advance!
Dimitri

The easiest thing would be for you to add an if self.model is None: self.model = load_model_from_file(), and this may not reduce the number of times your model is reloaded.
This is because DoFn instances are not currently reused across bundles. This means that your model will be forgotten after every work item is executed.
You could also create a global variable where you keep the model. This would reduce the amount of reloads, but it would be really unorthodox (though it may solve your use case).
A global variable approach should work something like this:
class MyModelDoFn(object):
def process(self, elem):
global my_model
if my_model is None:
my_model = load_model_from_file()
yield my_model.apply_to(elem)
An approach that relies on a thread-local variable would look like so. Consider that this will load the model once per thread, so the number of times your model is loaded depends on runner implementation (it will work in Dataflow):
class MyModelDoFn(object):
_thread_local = threading.local()
#property
def model(self):
model = getattr(MyModelDoFn._thread_local, 'model', None)
if not model:
MyModelDoFn._thread_local.model = load_model_from_file()
return MyModelDoFn._thread_local.model
def process(self, elem):
yield self.model.apply_to(elem)
I guess you can load the model from the start_bundle call as well.
Note: This approach is very unorthodox, and is not guaranteed to work in newer versions, nor all runners.

Related

How many times are DoFn's constructed?

I'm using the apache beam python SDK and Dataflow to write an inference pipeline for making predictions with TensorFlow models. I have the prediction step in a DoFn, but I don't want to have to load the model every time I process a bundle because that's very expensive. From the docs here, "If required, a fresh instance of the argument DoFn is created on a worker, and the DoFn.Setup method is called on this instance. This may be through deserialization or other means. A PipelineRunner may reuse DoFn instances for multiple bundles. A DoFn that has terminated abnormally (by throwing an Exception) will never be reused." I've noticed that if I write my code like this
class StatefulGetEmbeddingsDoFn(beam.DoFn):
def __init__(self, model_dir):
self.model = None # initialize
self.model_dir = model_dir
def process(self, element):
if not self.model: # load model if model hasn't been loaded yet
global i
i += 1
logging.info('Getting model: {}'.format(i))
self.model = Model(saved_model_dir=self.model_dir)
ids, b64 = element
embeddings = self.model.predict(b64)
res = [
{
'image': _id,
'embeddings': embedding.tolist()
} for _id, embedding in zip(ids, embeddings)
]
return res
It seems like the model is being loaded more than once on every worker (I've got a cluster of ~30-40 machines). Is there a way of preventing the model from being loaded more than once? I would've expected this DoFn to only be constructed once on every machine but from the logs, it seems like that's not the case...
I know this is an older question, but my initial thoughts are to use the setup and start_bundle methods.
https://beam.apache.org/releases/pydoc/2.22.0/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn.setup

Initializing state on dask-distributed workers

I am trying to do something like
resource = MyResource()
def fn(x):
something = dosemthing(x, resource)
return something
client = Client()
results = client.map(fn, data)
The issue is that resource is not serializable and is expensive to construct.
Therefore I would like to construct it once on each worker and be available to be used by fn.
How do I do this?
Or is there some other way to make resource available on all workers?
You can always construct a lazy resource, something like
class GiveAResource():
resource = [None]
def get_resource(self):
if self.resource[0] is None:
self.resource[0] = MyResource()
return self.resource[0]
An instance of this will serialise between processes fine, so you can include it as an input to any function to be executed on workers, and then calling .get_resource() on it will get your local expensive resource (which will get remade on any worker which appears later on).
This class would be best defined in a module rather than dynamic code.
There is no locking here, so if several threads ask for the resource at the same time when it has not been needed so far, you will get redundant work.

Ray: How to run many actors on one GPU?

I have only one gpu, and I want to run many actors on that gpu. Here's what I do using ray, following https://ray.readthedocs.io/en/latest/actors.html
first define the network on gpu
class Network():
def __init__(self, ***some args here***):
self._graph = tf.Graph()
os.environ['CUDA_VISIBLE_DIVICES'] = ','.join([str(i) for i in ray.get_gpu_ids()])
with self._graph.as_default():
with tf.device('/gpu:0'):
# network, loss, and optimizer are defined here
sess_config = tf.ConfigProto(allow_soft_placement=True)
sess_config.gpu_options.allow_growth=True
self.sess = tf.Session(graph=self._graph, config=sess_config)
self.sess.run(tf.global_variables_initializer())
atexit.register(self.sess.close)
self.variables = ray.experimental.TensorFlowVariables(self.loss, self.sess)
then define the worker class
#ray.remote(num_gpus=1)
class Worker(Network):
# do something
define the learner class
#ray.remote(num_gpus=1)
class Learner(Network):
# do something
train function
def train():
ray.init(num_gpus=1)
leaner = Learner.remote(...)
workers = [Worker.remote(...) for i in range(10)]
# do something
This process works fine when I don't try to make it work on gpu. That is, it works fine when I remove all with tf.device('/gpu:0') and (num_gpus=1). The trouble arises when I keep them: It seems that only learner is created, but none of the workers is constructed. What should I do to make it work?
When you define an actor class using the decorator #ray.remote(num_gpus=1), you are saying that any actor created from this class must have one GPU reserved for it for the duration of the actor's lifetime. Since you have only one GPU, you will only be able to create one such actor.
If you want to have multiple actors sharing a single GPU, then you need to specify that each actor requires less than 1 GPU, for example, if you wish to share one GPU among 4 actors, then you can have each actor require 1/4th of a GPU. This can be done by declaring the actor class with
#ray.remote(num_gpus=0.25)
In addition, you need to make sure that each actor actually respects the limits that you are placing on it. For example, if you want declare an actor with #ray.remote(num_gpus=0.25), then you should also make sure that TensorFlow uses at most one quarter of the GPU memory. See the answers to How to prevent tensorflow from allocating the totality of a GPU memory? for example.

Implementing luigi dynamic graph configuration

I am new to luigi, came across it while designing a pipeline for our ML efforts. Though it wasn't fitted to my particular use case it had so many extra features I decided to make it fit.
Basically what I was looking for was a way to be able to persist a custom built pipeline and thus have its results repeatable and easier to deploy, after reading most of the online tutorials I tried to implement my serialization using the existing luigi.cfg configuration and command line mechanisms and it might have sufficed for the tasks' parameters but it provided no way of serializing the DAG connectivity of my pipeline, so I decided to have a WrapperTask which received a json config file that would then create all the task instances and connect all the input output channels of the luigi tasks (do all the plumbing).
I hereby enclose a small test program for your scrutiny:
import random
import luigi
import time
import os
class TaskNode(luigi.Task):
i = luigi.IntParameter() # node ID
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.required = []
def set_required(self, required=None):
self.required = required # set the dependencies
return self
def requires(self):
return self.required
def output(self):
return luigi.LocalTarget('{0}{1}.txt'.format(self.__class__.__name__, self.i))
def run(self):
with self.output().open('w') as outfile:
outfile.write('inside {0}{1}\n'.format(self.__class__.__name__, self.i))
self.process()
def process(self):
raise NotImplementedError(self.__class__.__name__ + " must implement this method")
class FastNode(TaskNode):
def process(self):
time.sleep(1)
class SlowNode(TaskNode):
def process(self):
time.sleep(2)
# This WrapperTask builds all the nodes
class All(luigi.WrapperTask):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
num_nodes = 513
classes = TaskNode.__subclasses__()
self.nodes = []
for i in reversed(range(num_nodes)):
cls = random.choice(classes)
dependencies = random.sample(self.nodes, (num_nodes - i) // 35)
obj = cls(i=i)
if dependencies:
obj.set_required(required=dependencies)
else:
obj.set_required(required=None)
# delete existing output causing a build all
if obj.output().exists():
obj.output().remove()
self.nodes.append(obj)
def requires(self):
return self.nodes
if __name__ == '__main__':
luigi.run()
So, basically, as is stated in the question's title, this focuses on the dynamic dependencies and generates a 513 node dependency DAG with p=1/35 connectivity probability, it also defines the All (as in make all) class as a WrapperTask that requires all nodes to be built for it to be considered done (I have a version which only connects it to heads of connected DAG components but I didn't want to over complicate).
Is there a more standard (Luigic) way of implementing this? Especially note the not so pretty complication with the TaskNode init and set_required methods, I only did it this way because receiving parameters in the init method clashes somehow with the way luigi registers parameters. I also tried several other ways but this was basically the most decent one (that worked)
If there isn't a standard way I'd still love to hear any insights you have on the way I plan to go before I finish implementing the framework.
I answered a similar question yesterday with a demo. I based that almost entirely off of the example in the docs.. In the docs, assigning dynamic dependencies by yeilding tasks seems like the way they prefer.
luigi.Config and dynamic dependencies can probably give you a pipeline of almost infinite flexibility. They also describe a dummy Task that calls multiple dependency chains here, which could give you even more control.

Get existing or create new App Engine Syntax

I came across this syntax browsing through code for examples. From its surrounding code, it looked like would a) get the entity with the given keyname or b) if the entity did not exist, create a new entity that could be saved. Assume my model class is called MyModel.
my_model = MyModel(key_name='mymodelkeyname',
kwarg1='first arg', kwarg2='second arg')
I'm now running into issues, but only in certain situations. Is my assumption about what this snippet does correct? Or should I always do the following?
my_model = MyModel.get_by_key_name('mymodelkeyname')
if not my_model:
my_model = MyModel(key_name='mymodelkeyname',
kwarg1='first arg', kwarg2='second arg')
else:
# do something with my_model
The constructor, which is what you're using, always constructs a new entity. When you store it, it overwrites any other entity with the same key.
The alternate code you propose also has an issue: it's susceptible to race conditions. Two instances of that code running simultaneously could both determine that the entity does not exist, and each create it, resulting in one overwriting the work of the other.
What you want is the Model.get_or_insert method, which is syntactic sugar for this:
def get_or_insert(cls, key_name, **kwargs):
def _tx():
model = cls.get_by_key_name(key_name)
if not model:
model = cls(key_name=key_name, **kwargs)
model.put()
return model
return db.run_in_transaction(_tx)
Because the get operation and the conditional insert take place in a transaction, the race condition is not possible.
Is this what you are looking for -> http://code.google.com/appengine/docs/python/datastore/modelclass.html#Model_get_or_insert

Categories