I'm using the apache beam python SDK and Dataflow to write an inference pipeline for making predictions with TensorFlow models. I have the prediction step in a DoFn, but I don't want to have to load the model every time I process a bundle because that's very expensive. From the docs here, "If required, a fresh instance of the argument DoFn is created on a worker, and the DoFn.Setup method is called on this instance. This may be through deserialization or other means. A PipelineRunner may reuse DoFn instances for multiple bundles. A DoFn that has terminated abnormally (by throwing an Exception) will never be reused." I've noticed that if I write my code like this
class StatefulGetEmbeddingsDoFn(beam.DoFn):
def __init__(self, model_dir):
self.model = None # initialize
self.model_dir = model_dir
def process(self, element):
if not self.model: # load model if model hasn't been loaded yet
global i
i += 1
logging.info('Getting model: {}'.format(i))
self.model = Model(saved_model_dir=self.model_dir)
ids, b64 = element
embeddings = self.model.predict(b64)
res = [
{
'image': _id,
'embeddings': embedding.tolist()
} for _id, embedding in zip(ids, embeddings)
]
return res
It seems like the model is being loaded more than once on every worker (I've got a cluster of ~30-40 machines). Is there a way of preventing the model from being loaded more than once? I would've expected this DoFn to only be constructed once on every machine but from the logs, it seems like that's not the case...
I know this is an older question, but my initial thoughts are to use the setup and start_bundle methods.
https://beam.apache.org/releases/pydoc/2.22.0/apache_beam.transforms.core.html#apache_beam.transforms.core.DoFn.setup
Related
Sorry about the vague title but I'm not sure exactly how to describe it.
I am currently running tests on a model written in tensorflow.compat.v1. When it is used for inference, it must be restored as follows:
class Model
def __init__(self, filepath):
...
self.sess, self.saver = self.setup_tf()
self.merge = tf.compat.v1.summary.merge(tf.compat.v1.summary.scalar('loss', self.loss)])
def setup_tf():
sess = tf.compat.v1.Session() # TF session
saver = tf.compat.v1.train.Saver(max_to_keep=1)
latest_snapshot = tf.train.latest_checkpoint(join("../", self.model_dir))
saver.restore(sess, latest_snapshot)
return sess, saver
I have 2 of these tests, involving restoring a model and performing some inferences with it. After successful loading and inference of the first model, the second test fails to restore the model completely. The specific error here is
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
class TestInference(unittest.TestCase):
def setUp(self):
pass
def tearDown(self):
pass:
def test1:
model = Model(filepath)
model.infer()
def test2:
model = Model(filepath)
model.infer()
However when I run each test individually (commenting one out), there is no problem in restoring the model. Even when I swap the order of the models being loaded around, there the first one is always successful and the second one fails.
I figure that I should be using setUp and tearDown for each test, but I'm confused on what exactly I should be 'tearing down'? Is it garbage collection? I have tried gc.collect, but the same error occurs.
Here is the testing class, if it helps:
I am having trouble writing a custom predict method using MLFlow and pyspark (2.4.0). What I have so far is a custom transformer that changes the data into the format I need.
class CustomGroupBy(Transformer):
def __init__(self):
pass
def _transform(self, dataset):
df = dataset.select("userid", explode(split("widgetid", ',')).alias("widgetid"))
return(df)
Then I built a custom estimator to run one of the pyspark machine learning algorithms
class PipelineFPGrowth(Estimator, HasInputCol, DefaultParamsReadable, DefaultParamsWritable):
def __init__(self, inputCol=None, minSupport=0.005, minConfidence=0.01):
super(PipelineFPGrowth, self).__init__()
self.minSupport = minSupport
self.minConfidence = minConfidence
def setInputCol(self, value):
return(self._set(inputCol=value))
def _fit(self, dataset):
c = self.getInputCol()
fpgrowth = FPGrowth(itemsCol=c, minSupport=self.minSupport, minConfidence=self.minConfidence)
model = fpgrowth.fit(dataset)
return(model)
This runs in the MLFlow pipeline.
pipeline = Pipeline(stages = [CustomGroupBy,PipelineFPGrowth]).fit(df)
This all works. If I create a new pyspark dataframe with new data to predict on, I get predictions.
newDF = spark.createDataFrame([(123456,['123ABC', '789JSF'])], ["userid", "widgetid"])
pipeline.stages[1].transform(newDF).show(3, False)
# How to access frequent itemset.
pipeline.stages[1].freqItemsets.show(3, False)
Where I run into problems is writing a custom predict. I need to append the frequent itemset that FPGrowth generates to the end of the predictions. I have written the logic for that, but I am having a hard time figuring out how to put it into a custom method. I have tried adding it to my custom estimator but this didn't work. Then I wrote a separate class to take in the returned model and give the extended predictions. This was also unsuccessful.
Eventually I need to log and save the model so I can Dockerize it, which means I will need a custom flavor and to use the pyfunc function. Does anyone have a hint on how to extend the predict method and then log and save the model?
After creating a FastText model using Gensim, I want to load it but am running into errors seemingly related to callbacks.
The code used to create the model is
TRAIN_EPOCHS = 30
WINDOW = 5
MIN_COUNT = 50
DIMS = 256
vocab_model = gensim.models.FastText(sentences=model_input,
size=DIMS,
window=WINDOW,
iter=TRAIN_EPOCHS,
workers=6,
min_count=MIN_COUNT,
callbacks=[EpochSaver("./ftchkpts/")])
vocab_model.save('ft_256_min_50_model_30eps')
and the callback EpochSaver is defined as
from gensim.models.callbacks import CallbackAny2Vec
class EpochSaver(CallbackAny2Vec):
'''Callback to save model after each epoch and show training parameters '''
def __init__(self, savedir):
self.savedir = savedir
self.epoch = 0
os.makedirs(self.savedir, exist_ok=True)
def on_epoch_end(self, model):
savepath = os.path.join(self.savedir, f"ft256_{self.epoch}e")
model.save(savepath)
print(f"Epoch saved: {self.epoch + 1}")
if os.path.isfile(os.path.join(self.savedir, f"ft256_{self.epoch-1}e")):
os.remove(os.path.join(self.savedir, f"ft256_{self.epoch-1}e"))
print("Previous model deleted ")
self.epoch += 1
Aside from the type of model, this is identical to my process for Word2Vec which worked without issue. However when I open another file and try to load the model with
from gensim.models import FastText
vocab = FastText.load(r'vocab/ft_256_min_50_model_30eps')
I'm greeted with the error
AttributeError: Can't get attribute 'EpochSaver' on <module '__main__'>
What can I do to get the vocabulary to load so I can create the embedding layer for my keras model? If it's relevant, this is happening in JupyterLab.
This extra difficulty loading models with custom callbacks is a known, open issue (at least through gensim-3.8.1 and October 2019).
You can see discussions of possible workarounds and fixes there – and the gensim team is considering simply disabling the auto-saving of callbacks at all, requiring them to be re-specified for each later train()/etc call that needs them.
You may be able to load existing models saved with your custom callbacks by importing those same callback classes, as the same names, into the code context where you're doing a load().
You could save callback-free versions of your trained models by blanking the model's callbacks property to its empty default value, just before you save(), eg:
model.callbacks = ()
model.save(save_path)
Then, you wouldn't need to do any special importing of custom classes before a load(). (Of course if you again needed callback functionality on the re-loaded model, they'd then have to be explicitly reestablished after load()).
I have only one gpu, and I want to run many actors on that gpu. Here's what I do using ray, following https://ray.readthedocs.io/en/latest/actors.html
first define the network on gpu
class Network():
def __init__(self, ***some args here***):
self._graph = tf.Graph()
os.environ['CUDA_VISIBLE_DIVICES'] = ','.join([str(i) for i in ray.get_gpu_ids()])
with self._graph.as_default():
with tf.device('/gpu:0'):
# network, loss, and optimizer are defined here
sess_config = tf.ConfigProto(allow_soft_placement=True)
sess_config.gpu_options.allow_growth=True
self.sess = tf.Session(graph=self._graph, config=sess_config)
self.sess.run(tf.global_variables_initializer())
atexit.register(self.sess.close)
self.variables = ray.experimental.TensorFlowVariables(self.loss, self.sess)
then define the worker class
#ray.remote(num_gpus=1)
class Worker(Network):
# do something
define the learner class
#ray.remote(num_gpus=1)
class Learner(Network):
# do something
train function
def train():
ray.init(num_gpus=1)
leaner = Learner.remote(...)
workers = [Worker.remote(...) for i in range(10)]
# do something
This process works fine when I don't try to make it work on gpu. That is, it works fine when I remove all with tf.device('/gpu:0') and (num_gpus=1). The trouble arises when I keep them: It seems that only learner is created, but none of the workers is constructed. What should I do to make it work?
When you define an actor class using the decorator #ray.remote(num_gpus=1), you are saying that any actor created from this class must have one GPU reserved for it for the duration of the actor's lifetime. Since you have only one GPU, you will only be able to create one such actor.
If you want to have multiple actors sharing a single GPU, then you need to specify that each actor requires less than 1 GPU, for example, if you wish to share one GPU among 4 actors, then you can have each actor require 1/4th of a GPU. This can be done by declaring the actor class with
#ray.remote(num_gpus=0.25)
In addition, you need to make sure that each actor actually respects the limits that you are placing on it. For example, if you want declare an actor with #ray.remote(num_gpus=0.25), then you should also make sure that TensorFlow uses at most one quarter of the GPU memory. See the answers to How to prevent tensorflow from allocating the totality of a GPU memory? for example.
I'm trying to use Dataflow in conjunction with Tensorflow for predictions. Those predictions are happening on the workers and I'm currently loading the model through startup_bundle(). Like here:
class PredictDoFn(beam.DoFn):
def start_bundle(self):
self.model = load_model_from_file()
def process(self, element):
...
My current problem is that even if I process 1000 elements, the startup_bundle() function is called multiple times (at least 10) and not once per worked as I've hoped. This slows down the pipeline significantly because the model needs to be loaded many times and it takes every time 30 seconds.
Are there any ways to load the model on the workers on initialisation and not every time in the start_bundle()?
Thanks in advance!
Dimitri
The easiest thing would be for you to add an if self.model is None: self.model = load_model_from_file(), and this may not reduce the number of times your model is reloaded.
This is because DoFn instances are not currently reused across bundles. This means that your model will be forgotten after every work item is executed.
You could also create a global variable where you keep the model. This would reduce the amount of reloads, but it would be really unorthodox (though it may solve your use case).
A global variable approach should work something like this:
class MyModelDoFn(object):
def process(self, elem):
global my_model
if my_model is None:
my_model = load_model_from_file()
yield my_model.apply_to(elem)
An approach that relies on a thread-local variable would look like so. Consider that this will load the model once per thread, so the number of times your model is loaded depends on runner implementation (it will work in Dataflow):
class MyModelDoFn(object):
_thread_local = threading.local()
#property
def model(self):
model = getattr(MyModelDoFn._thread_local, 'model', None)
if not model:
MyModelDoFn._thread_local.model = load_model_from_file()
return MyModelDoFn._thread_local.model
def process(self, elem):
yield self.model.apply_to(elem)
I guess you can load the model from the start_bundle call as well.
Note: This approach is very unorthodox, and is not guaranteed to work in newer versions, nor all runners.