Pytorch sparse loss (Python) - python

I would like to create a simple loss for two sparse tensors in
def criterion_sparse(x,y):
return torch.sparse.sum(
It is giving me the error
NotImplementedError: Could not run 'aten::is_coalesced' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit for possible resolutions. 'aten::is_coalesced' is only available for these backends: [SparseCPU, SparseCUDA, BackendSelect, Python, Named, Conjugate, Negative, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, UNKNOWN_TENSOR_TYPE_ID, Autocast, Batched, VmapMode].
What can I do in this situation, it seems to be a very simple problem.


TensorFlow RuntimeError: "Attempting to capture an EagerTensor without building a function"

I am trying to build a neural network in Python for solving PDEs, and, as such, I have had to write custom training steps. My training function looks like this:
class PDENet:
def train_step():
input = self.input
with tf.GradientTape() as tape, tf.Session() as sess:
output = self.model(input)
self.loss = self.pde_loss(output) # (network does not use training data)
grad = tape.gradient(self.loss, self.model.trainable_weights)
self.optimizer.apply_gradients([(grad, self.model)])
Due to my hardware, I have no choice but to use tensorflow==1.12.0 and keras==2.2.4.
When I run this code, I get "RuntimeError: Attempting to capture an EagerTensor without building a function". I have seen other posts about this, but all of the answers say to update tensorflow/keras, which I can't, use "tf.enable_eager_execution()", which I've already done, and "tf.disable_v2_behavior()", which is nonexistent on older versions of tensorflow. Is there anything else I can do to solve this problem? The error makes me think tensorflow wants me to add #tf.function, but that feature also doesn't seem to exist in tensorflow 1.

Running multiple TensorRT optimized models in Tensorflow

My project uses multiple Keras models. Those models can have an input with different batch size, that varies from 1 to 24. I decided to optimize those models using TF-TRT.
I tried 2 conversion approaches:
from tensorflow.python.compiler.tensorrt import trt_convert as trt
First approach converts the model but does not create a TensorRT engines for the model:
conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(
converter = trt.TrtGraphConverterV2(
Second approach converts the model and and builds TensorRT engine for all possible input shapes:
def input_function():
def input_function():
input_shapes = [(x, MODEL_INPUT_H, MODEL_INPUT_W, 3) for x in range(1, 25)]
for shape in input_shapes:
yield [np.random.normal(size=shape).astype(np.float32)]
conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS._replace(
converter = trt.TrtGraphConverterV2(
In script that uses my models, I use those models consecutively:
some loop:
When the first conversion approach is used to optimize the models, I am able to load all models, but at runtime Tensorflow rebuilds TensorRT engines every time a model execution context changes. This causes a large performance overhead, which I was trying to overcome by caching TensorRT engines for those models (second conversion approach).
The problem is that when I am trying to load more than one TensorRT optimized model with pre-built engines, Tensorflow throws the following error:
2020-04-01 09:11:44.820866: W tensorflow/core/common_runtime/] BaseCollectiveExecutor::StartAbort Internal: Expect engine cache to be empty, but got 24 entries.
[[{{node StatefulPartitionedCall/InitializeTRTResource}}]]
Error - Expect engine cache to be empty, but got 24 entries.
[[{{node StatefulPartitionedCall/InitializeTRTResource}}]] [Op:__inference_restored_function_body_64832]
Function call stack:
The same error occurs when only one engine is saved for each model.
I use the following code to load TensorRT optimized SavedModel:
saved_model_loaded = tf.saved_model.load(
graph_func = saved_model_loaded.signatures['serving_default']
I also tried to convert graph_func to frozen_func, but this didn't make any difference:
graph_func = saved_model_loaded.signatures[signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
frozen_func = convert_to_constants.convert_variables_to_constants_v2(
I am using docker container to optimize/run the models.
Is it possible at all to run simultaneously multiple TensorRT-optimized models with pre-built engines using Tensorflow? Or this can only be done using TensorRT inference server?
If case it is a valid usage scenario, what am I missing in my workflow?
You can avoid the issue with "Error - Expect engine cache to be empty, but got X entries."
You need to perform TensorRT optimizations and build engines for both models in the same Python file.
if __name__ == "__main__":

Tensorflow Dataset .map() API

Couple of questions about this
For occasions when I'd like to do something like the following in Tensorflow (assume I'm creating training examples by loading WAV files):
import tensorflow as tf
def _some_audio_preprocessing_func(filename):
# ... some logic here which mostly uses Tensorflow ops ...
with tf.Session(graph=tf.Graph()) as sess:
wav_filename_placeholder = tf.placeholder(tf.string, [])
wav_loader = io_ops.read_file(wav_filename_placeholder)
wav_decoder = contrib_audio.decode_wav(wav_loader, desired_channels=1)
data =
feed_dict={wav_filename_placeholder: filename})
return data
dataset ='*.wav')
dataset =
If I have a parse_image() function that uses tensor ops - should
this be part of the main Graph? Following the example set in Google's own audio TF tutorial, it looks like they create a separate graph! Doesn't this ruin the point of using Tensorflow to make things faster?
Do I use tf.py_func() any time any single line isn't from the tensorflow library? Again, I wonder what the performance implications are and when I should use this...
When you use, TensorFlow defines a subgraph for all the ops created in the function map_func, and arranges to execute it efficiently in the same session as the rest of your graph. There is almost never any need to create a tf.Graph or tf.Session inside map_func: if your parsing function is made up of TensorFlow ops, these ops can be embedded directly in the graph that defines the input pipeline.
The modified version of the code using would look like this:
import tensorflow as tf
from tensorflow.contrib.framework.python.ops import audio_ops as contrib_audio
def _some_audio_preprocessing_func(filename):
wav_loader = tf.read_file(filename)
return contrib_audio.decode_wav(wav_loader, desired_channels=1)
dataset ='*.wav')
dataset =
If your map_func contains non-TensorFlow operations that you want to apply to each element, you should wrap them in a tf.py_func() (or Dataset.from_generator(), if the data generation process is defined in Python logic). The main performance implication is that any code running in a tf.py_func() is subject to the Global Interpreter Lock, so I would generally recommend trying to find a native TensorFlow implementation for anything that is performance critical.

How to Deploy Amazon-SageMaker Locally in Python

I trained my model in Amazon-SageMaker and downloaded it to my local computer. Unfortunately, I don't have any idea how to run the model locally.
The Model is in a directory with files like:
Would anyone know how to run this locally with Python, or be able to point me to a resource that could help? I am trying to avoid calling the model using the Amazon API.
Edit: The model I used was created with code very similar to this example.
Any help is appreciated, I will award the bounty to whoever is most helpful, even if they don't completely solve the question.
This is not a complete answer as I do not have SageMaker setup (And I do not know MXNet) and so I can not practically test this approach (yes, as already mentioned, I do not want to call this a complete answer rather a probable pointer/approach to solve this issue).
The Assumption -
You mentioned a that your model is very similar to the notebook link you provided. If you read the text in the notebook carefully, you will see at some point there is something like this -
"In this demo, we are using Caltech-256 dataset, which contains 30608 images of 256 objects. For the training and validation data, we follow the splitting scheme in this MXNet example."
See the mention of MXNet there? Let us assume that you did not change a lot and hence your model is built using MXNet as well.
The Approach -
Assuming what I just mentioned, if you go and search in the documentation of AWS SageMaker Python SDK you will see a section about serialization of the modules. Which again, by itself, starts with another assumption -
"If you train function returns a Module object, it will be serialized by the default Module serialization system, unless you've specified a custom save function."
Assuming that this is True for your case, further reading in the same document tells us that "model-shapes.json" is a JSON serialised representation of your models, "model-symbol.json" is the serialization of the module symbols created by calling the 'save' function on the 'symbol' property of module, and finally "module.params" is the serialized (I am not sure if it is text or binary format) form of the module parameters.
Equipped with this knowledge we go and look into the documentation of MXNet. And Voila! We see here how we can save and load models with MXNet. So as you already have those saved files, you just need to load them in a local installation of MXNet and then run them to predict the unknown.
I hope this will help you to find a direction to solve your problem.
Bonus -
I am not sure if this also can do the same job, (it is also mentioned by #Seth Rothschild in the comments) but it should, you can see that AWS SageMaker Python SDK has a way to load models from saved ones as well.
Following SRC's advice, I was able to get it to work by following the instructions in this question and this doc which describe how to load a MXnet model.
I loaded the model like so:
lenet_model = mx.mod.Module.load('model_directory/image-classification',5)
image_l = 64
image_w = 64
lenet_model.bind(for_training=False, data_shapes=[('data',(1,3,image_l,image_w))],label_shapes=lenet_model._label_shapes)
Then predicted using the slightly modified helper functions in the previously linked documentation:
import mxnet as mx
import matplotlib.pyplot as plot
import cv2
import numpy as np
from import DataBatch
def get_image(url, show=False):
# download and show the image
fname =
img = cv2.cvtColor(cv2.imread(fname), cv2.COLOR_BGR2RGB)
if img is None:
return None
if show:
# convert into format (batch, RGB, width, height)
img = cv2.resize(img, (64, 64))
img = np.swapaxes(img, 0, 2)
img = np.swapaxes(img, 1, 2)
img = img[np.newaxis, :]
return img
def predict(url, labels):
img = get_image(url, show=True)
# compute the predict probabilities
prob = lenet_model.get_outputs()[0].asnumpy()
# print the top-5
prob = np.squeeze(prob)
a = np.argsort(prob)[::-1]
for i in a[0:5]:
print('probability=%f, class=%s' %(prob[i], labels[i]))
Finally I called the prediction with this code:
labels = ['a','b','c', 'd','e', 'f']
predict('https://eximagesite/img_tst_a.jpg', labels )
If you want to host your trained model locally, and you are using Apache MXNet as your model framework (as you have in the above example), the simplest way is to use MXNet Model Server:
Once you installed it locally, you can start serving using:
mxnet-model-server \
--models squeezenet=
and then call the local endpoint with the image
curl -O
curl -X POST -F "data=#kitten.jpg"

where do I find bidirectional_rnn in tensorflow 1.0.0?

I am using some code from here: with tensorflow. I think the code was written for an older version of tensorflow, I am using version 1.0.0. I used to upgrade in that github repos, but I am still getting the error:
output, _, _ = contrib_rnn.bidirectional_rnn(fw_cell, bw_cell,
AttributeError: 'module' object has no attribute 'bidirectional_rnn'
this is after I changed the bidirectional_rnn call to use contrib_rnn which is:
from tensorflow.contrib.rnn.python.ops import core_rnn as contrib_rnn
The old call was
output, _, _ = tf.nn.bidirectional_rnn(fw_cell, bw_cell,
tf.unpack(tf.transpose(self.input_data, perm=[1, 0, 2])),
dtype=tf.float32, sequence_length=self.length)
which also doesn't work.
I had to change the LSTMCell, DroputWrapper, etc. to rnn.LSTMCell, but they seem to work fine. It is the bidirectional_rnn that I can't figure out how to change.
In TensorFlow 1.0, you have the choice of two bidirectional RNN functions:
Maybe you can try to reimplement a bidirectional RNN by simply wrapping into a single class two monodirectional RNNs with the parameter "go_backwards=True" set on one of them. Then you can also have control over the type of merge done with the outputs. Maybe taking a look at the implementation in (see the class Bidirectional) could get you started.
