I am running the Generative Adversarial Network in my personal system and I am getting the error provided as below, it may be because of GPU accessing problem as explained in this link: (Function call stack: keras_scratch_graph Error)
Since I want to run the code in my personal system which does not consist with GPU then how to manage that the code should not access the GPU?
The python code is provided in this link: (https://github.com/eriklindernoren/Keras-GAN/tree/master/pix2pix), where the running code is present in pix2pix.py file.
Produced Error is as follow:
C:\ProgramData\Anaconda2\envs\GaitRecognitionCNN-master13\lib\site-packages\keras\engine\training.py:297: UserWarning: Discrepancy between trainable weights and collected trainable weights, did you set `model.trainable` without calling `model.compile` after ?
'Discrepancy between trainable weights and collected trainable'
2020-04-08 17:42:33.366720: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Failed precondition: Error while reading resource variable _AnonymousVar131 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar131/class tensorflow::Var does not exist.
[[{{node mul_33/ReadVariableOp}}]]
Traceback (most recent call last):
File "H:/data_rar_and_others/Code_For_GAN5/pix2pix.py", line 217, in <module>
gan.train(epochs=220, batch_size=4, sample_interval=50)
File "H:/data_rar_and_others/Code_For_GAN5/pix2pix.py", line 165, in train
d_loss_real = self.discriminator.train_on_batch([imgs_A, imgs_B], valid)
File "C:\ProgramData\Anaconda2\envs\GaitRecognitionCNN-master13\lib\site-packages\keras\engine\training.py", line 1514, in train_on_batch
outputs = self.train_function(ins)
File "C:\ProgramData\Anaconda2\envs\GaitRecognitionCNN-master13\lib\site-packages\tensorflow_core\python\keras\backend.py", line 3727, in __call__
outputs = self._graph_fn(*converted_inputs)
File "C:\ProgramData\Anaconda2\envs\GaitRecognitionCNN-master13\lib\site-packages\tensorflow_core\python\eager\function.py", line 1551, in __call__
return self._call_impl(args, kwargs)
File "C:\ProgramData\Anaconda2\envs\GaitRecognitionCNN-master13\lib\site-packages\tensorflow_core\python\eager\function.py", line 1591, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "C:\ProgramData\Anaconda2\envs\GaitRecognitionCNN-master13\lib\site-packages\tensorflow_core\python\eager\function.py", line 1692, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "C:\ProgramData\Anaconda2\envs\GaitRecognitionCNN-master13\lib\site-packages\tensorflow_core\python\eager\function.py", line 545, in call
ctx=ctx)
File "C:\ProgramData\Anaconda2\envs\GaitRecognitionCNN-master13\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable _AnonymousVar131 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar131/class tensorflow::Var does not exist.
[[node mul_33/ReadVariableOp (defined at C:\ProgramData\Anaconda2\envs\GaitRecognitionCNN-master13\lib\site-packages\keras\backend\tensorflow_backend.py:3009) ]] [Op:__inference_keras_scratch_graph_6898]
Function call stack:
keras_scratch_graph
This error has been resolved when I have changed the
from keras.optimizers import Adam
with the following
from tensorflow.keras.optimizers import Adam
Related
How can I fix this error I downloaded this code from GitHub.
predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[0][0].numpy()
throws the error
AttributeError: 'Tensor' object has no attribute 'numpy'
Please help me fix this!
I used:
sess = tf.Session()
with sess.as_default():
predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[0][0].eval()
And i get this error. Someone help me i just want it to work why is this so hard?
D:\Python>python TextGenOut.py
File "TextGenOut.py", line 72
predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[0][0].eval()
^
IndentationError: unexpected indent
D:\Python>python TextGenOut.py
2018-09-16 21:50:57.008663: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-09-16 21:50:57.272973: W T:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1275] OP_REQUIRES failed at resource_variable_ops.cc:480 : Not found: Container localhost does not exist. (Could not find resource: localhost/model/embedding/embeddings)
Traceback (most recent call last):
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1278, in _do_call
return fn(*args)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable model/dense/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/model/dense/kernel)
[[Node: model/dense/MatMul/ReadVariableOp = ReadVariableOp[dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](model/dense/kernel)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "TextGenOut.py", line 72, in <module>
predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[0][0].eval()
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 680, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 4951, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 877, in run
run_metadata_ptr)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1272, in _do_run
run_metadata)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable model/dense/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/model/dense/kernel)
[[Node: model/dense/MatMul/ReadVariableOp = ReadVariableOp[dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](model/dense/kernel)]]
Caused by op 'model/dense/MatMul/ReadVariableOp', defined at:
File "TextGenOut.py", line 66, in <module>
predictions, hidden = model(input_eval, hidden)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\base_layer.py", line 736, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "TextGenOut.py", line 39, in call
x = self.fc(output)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\base_layer.py", line 736, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\layers\core.py", line 943, in call
outputs = gen_math_ops.mat_mul(inputs, self.kernel)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\gen_math_ops.py", line 4750, in mat_mul
name=name)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\op_def_library.py", line 510, in _apply_op_helper
preferred_dtype=default_dtype)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 1094, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1045, in _dense_var_to_tensor
return var._dense_var_to_tensor(dtype=dtype, name=name, as_ref=as_ref) # pylint: disable=protected-access
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1000, in _dense_var_to_tensor
return self.value()
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 662, in value
return self._read_variable_op()
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 745, in _read_variable_op
self._dtype)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\gen_resource_variable_ops.py", line 562, in read_variable_op
"ReadVariableOp", resource=resource, dtype=dtype, name=name)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 3155, in create_op
op_def=op_def)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 1717, in __init__
self._traceback = tf_stack.extract_stack()
FailedPreconditionError (see above for traceback): Error while reading resource variable model/dense/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/model/dense/kernel)
[[Node: model/dense/MatMul/ReadVariableOp = ReadVariableOp[dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](model/dense/kernel)]]
I suspect the place where you copied the code from had eager execution enabled, i.e. had invoked tf.enable_eager_execution() at the start of the program.
You could do the same.
UPDATE: Note that eager execution is enabled by default in TensorFlow 2.0. So the answer above applies only to TensorFlow 1.x
Since the accepted answer did not solve the problem for me so I thought it might be helpful for some people who face the problem and that already have tensorflow version >= 2.2.0 and eager execution enabled.
The issue seems to be that for certain functions during the fitting model.fit()
the #tf.function decorator prohibits the execution of functions like tensor.numpy() for performance reasons.
The solution for me was to pass the flag run_eagerly=True to the model.compile() like this:
model.compile(..., run_eagerly=True)
Tensorflow 2 has a config option to run functions "eagerly" which will enable getting Tensor values via .numpy() method. To enable eager execution, use following command:
tf.config.run_functions_eagerly(True)
Note that this is useful mainly for debugging.
See also: https://www.tensorflow.org/api_docs/python/tf/config/run_functions_eagerly
This can also happen in TF2.0 if your code is wrapped in a #tf.function or inside a Keras layer. Both of those run in graph mode. There's a lot of secretly broken code out of there because behavior differs between eager and graph modes and people are not aware that they're switching contexts, so be careful!
It happens in older version of TF. So try pip install tensorflow --upgrade
otherwise run
import tensorflow as tf
tf.enable_eager_execution()
If you are using Jupyter notebook, restart the Kernel.
tf.multinomial returns a Tensor object that contains a 2D list with drawn samples of shape [batch_size, num_samples]. Calling .eval() on that tensor object is expected to return a numpy ndarray.
Something like this:
predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[0][0].eval()
You also need to ensure that you have a session active (doesn't make a lot of sense otherwise):
sess = tf.Session()
with sess.as_default():
predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[0][0].eval()
I saw similar error when I run code something like the following,
tensor = tf.multiply(ndarray, 42)
tensor.numpy() # throw AttributeError: 'Tensor' object has no attribute 'numpy'
I use anaconda 3 with tensorflow 1.14.0. I upgraded tensorflow with the command below
conda update tensorflow
now tensorflow is 2.0.0, issue fixed. Try this to see if it resolves your issue.
I had the same issue in a tf.function(): But what has worked for me is to transform the numpy array into a tensorflow tensor via tf.convert_to_tensor Doku and then go ahead with tensorflow. Maybe this trick could be useful for anyone...
You can also use tf.get_static_value() to obtain the value of a tensor. This has the benefit of not needing eager mode. See docs here.
As mentioned I wanna use a pre-trained model but only as a feature extractor. My thought is to build a new model and load weights to it:
model = get_tiny_darknet_model() #get the structure of the model
transfer_layer = model.get_layer('global_average_pooling2d')
feature_extractor = Model(inputs=model.input,
outputs=transfer_layer.output)
feature_extractor.load_weights('checkpoints\\checkpoint.h5', by_name=True)
But when I tried to use this feature extractor to make predictions, an error occurs.
train_data = raw_preprocess(train_data) #use a ImageDataGenerator to preprocess the data
train_samples = feature_extractor.predict(train_data)
And after I run this code, the following error occurs:
2020-06-12 16:22:09.409465: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2020-06-12 16:22:10.830022: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-06-12 16:22:10.830519: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-06-12 16:22:10.830794: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node conv2d/Conv2D}}]]
Traceback (most recent call last):
File "E:/Studium/Thesis/loss_test.py", line 35, in <module>
test_mmd()
File "E:/Studium/Thesis/loss_test.py", line 24, in test_mmd
c_train_samples = feature_extractor.predict(train_data)
File "C:\ProgramData\Miniconda3\envs\TF_2G\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 909, in predict
use_multiprocessing=use_multiprocessing)
File "C:\ProgramData\Miniconda3\envs\TF_2G\lib\site-packages\tensorflow_core\python\keras\engine\training_generator.py", line 648, in predict
use_multiprocessing=use_multiprocessing)
File "C:\ProgramData\Miniconda3\envs\TF_2G\lib\site-packages\tensorflow_core\python\keras\engine\training_generator.py", line 265, in model_iteration
batch_outs = batch_function(*batch_data)
File "C:\ProgramData\Miniconda3\envs\TF_2G\lib\site-packages\tensorflow_core\python\keras\engine\training_generator.py", line 535, in predict_on_batch
return model.predict_on_batch(x)
File "C:\ProgramData\Miniconda3\envs\TF_2G\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 1157, in predict_on_batch
outputs = self.predict_function(inputs)
File "C:\ProgramData\Miniconda3\envs\TF_2G\lib\site-packages\tensorflow_core\python\keras\backend.py", line 3740, in __call__
outputs = self._graph_fn(*converted_inputs)
File "C:\ProgramData\Miniconda3\envs\TF_2G\lib\site-packages\tensorflow_core\python\eager\function.py", line 1081, in __call__
return self._call_impl(args, kwargs)
File "C:\ProgramData\Miniconda3\envs\TF_2G\lib\site-packages\tensorflow_core\python\eager\function.py", line 1121, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "C:\ProgramData\Miniconda3\envs\TF_2G\lib\site-packages\tensorflow_core\python\eager\function.py", line 1224, in _call_flat
ctx, args, cancellation_manager=cancellation_manager)
File "C:\ProgramData\Miniconda3\envs\TF_2G\lib\site-packages\tensorflow_core\python\eager\function.py", line 511, in call
ctx=ctx)
File "C:\ProgramData\Miniconda3\envs\TF_2G\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node conv2d/Conv2D (defined at C:\ProgramData\Miniconda3\envs\TF_2G\lib\site-packages\tensorflow_core\python\framework\ops.py:1751) ]] [Op:__inference_keras_scratch_graph_2495]
Function call stack:
keras_scratch_graph
Process finished with exit code 1
However, if I just load the model and use it directly like:
model = load_model('checkpoints\\checkpoint.h5')
train_data = raw_preprocess(train_data)
pred = model.predict(train_data)
Then everything will work just fine and no error will occur.
What's possibly the reason for that? How can I solve this problem?
I just checked the information while running and saw something like a warning:
2020-06-12 21:18:36.913051: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
And I try to reload the original model and a new problem occurs:
WARNING:tensorflow:Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer.
WARNING:tensorflow:sample_weight modes were coerced from
...
to
['...']
This is a new problem that I've never seen before when loading a model. What could cause the problem and how to solve this? I've tried to build a new conda environment but it doesn't help.
If you're using iPython (Spyder or Jupyter), this pretty frequently comes up in the event that you forgot to properly close an old kernel using tensorflow.
I have the following problem:
I have created a model using deepchem, which is a wrapped keras model, trained it and reloaded it. I can predict using this model without a problem.
Now I want to make a copy of this model, which has one less inputs, since one input is always constant in my use scenario and always passing it lead to errors in a function I can't edit.
data = np.array(data.data, dtype=float32)
with tf.Graph().as_default() as temp_graph:
tf.import_graph_def(self.model.session.graph.as_graph_def(),
input_map={self.model._input_placeholders[1].name:
tf.constant(np.array([0], dtype=float32)),})
#self.model.session.graph = temp_graph
#for deep explainer: replace all switched dropouts with dropouts
#get input tensor for this graph
tensors = tf.contrib.graph_editor.get_tensors(temp_graph)
for t in tensors:
if "input_1" in t.name:
input_tensor = t
break
#reshape output --> only singletask!
output = tf.reshape(tensors[-1], [-1, 1])
model = (input_tensor, output)
sess = tf.Session(graph=temp_graph)
feed_dict = dict(zip([input_tensor], [data]))
print(sess.run(output, feed_dict))
In this code fragments I was able to load the graph of my model and pass a constant into its input. Now obviously I can't run this new model in the same session, since that session contains the old model. The way of running the model with the feed dict can't be changed, since it is in another package in the real scenario. I get the following error message:
Error while reading resource variable dense_2/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist.
The full trace is:
Traceback (most recent call last):
File "/EXT/Tobha/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/EXT/Tobha/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/EXT/Tobha/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable dense_2/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/dense_2/bias)
[[{{node import/model/dense_2/BiasAdd/ReadVariableOp}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/EXT/Tobha/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/Models.py", line 490, in <module>
main()
File "/EXT/Tobha/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/Models.py", line 478, in main
evaluate()
File "/EXT/Tobha/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/Models.py", line 445, in evaluate
reader.explain()
File "/EXT/Tobha/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/DataHandling.py", line 1534, in explain
self.explain()
File "/EXT/Tobha/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/DataHandling.py", line 1519, in explain
self._explain_Gradient_SHAP(self.df)
File "/EXT/Tobha/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/DataHandling.py", line 2047, in _explain_Gradient_SHAP
print(sess.run(output, feed_dict))
File "/EXT/Tobha/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/EXT/Tobha/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/EXT/Tobha/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/EXT/Tobha/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable dense_2/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/dense_2/bias)
[[node import/model/dense_2/BiasAdd/ReadVariableOp (defined at /eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/DataHandling.py:2033) ]]
Original stack trace for 'import/model/dense_2/BiasAdd/ReadVariableOp':
File "/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/Models.py", line 490, in <module>
main()
File "/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/Models.py", line 478, in main
evaluate()
File "/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/Models.py", line 445, in evaluate
reader.explain()
File "/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/DataHandling.py", line 1534, in explain
self.explain()
File "/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/DataHandling.py", line 1519, in explain
self._explain_Gradient_SHAP(self.df)
File "/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/DataHandling.py", line 2033, in _explain_Gradient_SHAP
tf.constant(np.array([0], dtype=float32)),})
File "/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 443, in import_graph_def
_ProcessNewOps(graph)
File "/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 236, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3751, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3751, in <listcomp>
for c_op in c_api_util.new_tf_operations(self)
File "/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3641, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
I am using tensorflow 1.14 and Python 3.6 (This can't be changed aswell)
So my problem could be solved in 2 different ways: Either I get to run the second graph with the information that is in the old session, or I get to tell the old session to use one constant input.
Thanks for any help in advance!
best regards
Tobias
Edit:
I eventually fixed this by wrapping the class I was trying to use and overwriting some methods. I think another idea could have been, to replace one Keras input with a keras constant.
This error is a little tricky. Here's a couple of suggestions that spring to mind:
DeepChem HEAD is now running on TensorFlow 2.X. If your problem would be easier to handle in Eager mode, that might be one option. Of course, HEAD isn't stable and there might be other issues that crop up there.
DeepChem models are underneath the hood just made of Keras layers. If you can make a Keras model from the constituent layers of your model, then you can possibly avoid the DeepChem wrapper and solve the problem directly in Keras.
It might also help to add more information on the DeepChem model you're trying to use and the downstream function you're seeing an error in.
export_saved_model used on TPUEstimator raises TypeError: Failed to convert object of type to Tensor with Tensorflow 1.12.0. Am I using it incorrectly or if it is a bug is there some workaround?
I would like to train a model on TPU using TPUEstimator and then use the trained model locally on CPU. I cannot use the graph saved during training directly, but I need to use export_saved_model instead (Github issue).
export_saved_model on TPUEstimator works correctly with Tensorflow 1.13.0rc0, however it fails with current Tensorflow 1.12.0 (another Github issue). At the moment, however, TPUs with Tensorflow 1.13 are not available at Google Cloud and TPUs with Tensorflow 1.12 are not compatible, so upgrading Tensorflow to 1.13 is not an option.
The relevant code is:
def serving_input_receiver_fn():
feature = tf.placeholder(tf.float32, shape=[None, None, None, 2])
return tf.estimator.export.TensorServingInputReceiver(feature, feature)
estimator.export_saved_model(FLAGS.export_dir, serving_input_receiver_fn)
Expected result.
The model should be exported correctly. This happens with Tensorflow 1.13.0rc0 or with TPUEstimator replaced with Estimator. The former can be reproduced using this colab).
Actual result.
Exporting fails with TypeError: Failed to convert object of type and the traceback included below. This can be reproduced with this colab.
...
WARNING:tensorflow:From /Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py:1044: calling SavedModelBuilder.add_meta_graph_and_variables (from tensorflow.python.saved_model.builder_impl) with legacy_init_op is deprecated and will be removed in a future version.
Instructions for updating:
Pass your op to the equivalent parameter main_op instead.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
WARNING:tensorflow:rewrite_for_inference (from tensorflow.contrib.tpu.python.tpu.tpu) is experimental and may change or be removed at any time, and without warning.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running infer on CPU
ERROR:tensorflow:Operation of type Placeholder (policy_labels) is not supported on the TPU. Execution will fail if this op is used in the graph.
ERROR:tensorflow:Operation of type Placeholder (sat_labels) is not supported on the TPU. Execution will fail if this op is used in the graph.
INFO:tensorflow:Done calling model_fn.
Traceback (most recent call last):
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 527, in make_tensor_proto
str_values = [compat.as_bytes(x) for x in proto_values]
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 527, in <listcomp>
str_values = [compat.as_bytes(x) for x in proto_values]
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/util/compat.py", line 61, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got dict_values([<tf.Tensor 'sat_prob:0' shape=(?,) dtype=float32>, <tf.Tensor 'policy_prob:0' shape=(?, ?, 2) dtype=float32>])
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "neurosat_tpu.py", line 253, in <module>
tf.app.run()
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "neurosat_tpu.py", line 248, in main
estimator.export_saved_model(FLAGS.export_dir, serving_input_receiver_fn)
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 734, in export_saved_model
strip_default_attrs=True)
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 663, in export_savedmodel
mode=model_fn_lib.ModeKeys.PREDICT)
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 789, in _export_saved_model_for_mode
strip_default_attrs=strip_default_attrs)
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 907, in _export_all_saved_models
mode=model_fn_lib.ModeKeys.PREDICT)
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2188, in _add_meta_graph_for_mode
check_variables=False))
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 984, in _add_meta_graph_for_mode
config=self.config)
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2192, in _call_model_fn
return self._call_model_fn_for_inference(features, labels, mode, config)
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2253, in _call_model_fn_for_inference
new_tensors.append(array_ops.identity(t))
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 81, in identity
return gen_array_ops.identity(input, name=name)
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3454, in identity
"Identity", input=input, name=name)
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 513, in _apply_op_helper
raise err
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 510, in _apply_op_helper
preferred_dtype=default_dtype)
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1146, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 229, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 208, in constant
value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/Users/michal/.virtualenvs/deepsat/lib/python3.6/site-packages/tensorflow/python/framework/tensor_util.py", line 531, in make_tensor_proto
"supported type." % (type(values), values))
TypeError: Failed to convert object of type <class 'dict_values'> to Tensor. Contents: dict_values([<tf.Tensor 'sat_prob:0' shape=(?,) dtype=float32>, <tf.Tensor 'policy_prob:0' shape=(?, ?, 2) dtype=float32>]). Consider casting elements to a supported type.
Adding argument export_to_tpu=False to TPUEstimator constructor prevents the error in Tensorflow 1.12:
estimator = tf.contrib.tpu.TPUEstimator(..., export_to_tpu=False)
export_to_tpu=False disables exporting the TPU version of the model, but CPU version is still exported and this is sufficient to run the model locally. With Tensorflow 1.13 the bug is fixed and the flag is not necessary.
The answer is based on the Github thread linked in the question.
I am running some TensorFlow code that restores and re-starts training from a checkpoint. Whenever I restore from a cpu build it seems to work perfectly fine. But if I try to restore when I run my code with gpu it seems to not work. In particular I get the error:
Traceback (most recent call last):
File "/home_simulation_research/hbf_tensorflow_code/tf_experiments_scripts/batch_main.py", line 482, in <module>
large_main_hp.main_large_hp_ckpt(arg)
File "/usr/local/lib/python3.4/dist-packages/my_tf_pkg/main_large_hp_checkpointer.py", line 212, in main_large_hp_ckpt
run_hyperparam_search(arg)
File "/usr/local/lib/python3.4/dist-packages/my_tf_pkg/main_large_hp_checkpointer.py", line 231, in run_hyperparam_search
main_hp.main_hp(arg)
File "/usr/local/lib/python3.4/dist-packages/my_tf_pkg/main_hp.py", line 258, in main_hp
with tf.Session(graph=graph) as sess:
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1186, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 551, in __init__
self._session = tf_session.TF_NewDeprecatedSession(opts, status)
File "/usr/lib/python3.4/contextlib.py", line 66, in __exit__
next(self.gen)
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
E tensorflow/core/common_runtime/direct_session.cc:135] Internal: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY; total memory reported: 18446744073709551615
I see that it says I am running out of memory, but when I increase the memory to say 10GBs it doesn't really change anything. This only happens with my gpu build because the cpu one restores perfectly fine.
Anyway have any idea or starting ideas of what might be causing this?
The gpu's are essentially assigned automatically so I'm not quite sure what might be causing it or what are the starting steps to even debug this.
full error:
E tensorflow/core/common_runtime/direct_session.cc:135] Internal: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY; total memory reported: 18446744073709551615
Traceback (most recent call last):
File "/home_simulation_research/hbf_tensorflow_code/tf_experiments_scripts/batch_main.py", line 482, in <module>
large_main_hp.main_large_hp_ckpt(arg)
File "/usr/local/lib/python3.4/dist-packages/my_tf_pkg/main_large_hp_checkpointer.py", line 212, in main_large_hp_ckpt
run_hyperparam_search(arg)
File "/usr/local/lib/python3.4/dist-packages/my_tf_pkg/main_large_hp_checkpointer.py", line 231, in run_hyperparam_search
main_hp.main_hp(arg)
File "/usr/local/lib/python3.4/dist-packages/my_tf_pkg/main_hp.py", line 258, in main_hp
with tf.Session(graph=graph) as sess:
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 1186, in __init__
super(Session, self).__init__(target, graph, config=config)
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/client/session.py", line 551, in __init__
self._session = tf_session.TF_NewDeprecatedSession(opts, status)
File "/usr/lib/python3.4/contextlib.py", line 66, in __exit__
next(self.gen)
File "/usr/local/lib/python3.4/dist-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
Tensorflow CPU use benefits from both physical and virtual memory giving you almost unlimited memory to manipulate your models. Your first step in debugging is to build a smaller model by simply removing some weights/layers and running on the GPU to ensure you have no coding errors. Then slowly increase the layers/weights until the you run out of memory again. This will confirm that you have a memory issue on the GPU. I would recommend initially building your graph on the GPU that way you know it will fit later when you train on it. If you need the large graph then consider allocating parts of the graph to different GPU if you have them.