I'm stuck with restoring pre-trained network with Tensorflow....
import tensorflow as tf
import os
saver = tf.train.import_meta_graph('./model/20170512-110547/model-20170512-110547.meta')
I'd like to use pre-trained network which was trained for face recognition, and then wanna add some layers for transfer learning.
(I downloaded the model from here. https://github.com/davidsandberg/facenet)
When I execute the code above, it shows the error,
WARNING:tensorflow:The saved meta_graph is possibly from an older release:
'model_variables' collection should be of type 'byte_list', but instead is of type 'node_list'.
Traceback (most recent call last):
File "/Users/user/Desktop/desktop/Python/HCR/Transfer_face/test.py", line 7, in <module>
File "/Users/user/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1560, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/Users/user/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 895, in run
File "/Users/user/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/Users/user/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/Users/user/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./model/20170512-110547/
[[Node: save/RestoreV2_491 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_491/tensor_names, save/RestoreV2_491/shape_and_slices)]]
Caused by op u'save/RestoreV2_491', defined at:
File "/Users/user/Desktop/desktop/Python/HCR/Transfer_face/test.py", line 6, in <module>
saver = tf.train.import_meta_graph('./model/20170512-110547/model-20170512-110547.meta')
File "/Users/user/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1698, in import_meta_graph
File "/Users/user/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/meta_graph.py", line 656, in import_scoped_meta_graph
File "/Users/user/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/importer.py", line 313, in import_graph_def
File "/Users/user/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/Users/user/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for ./model/20170512-110547/
[[Node: save/RestoreV2_491 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_491/tensor_names, save/RestoreV2_491/shape_and_slices)]]
I can't understand why the system can't find pre-trained data...
And the directory structure is as below
USER-no-MacBook-Pro:Transfer_face user$ ls -R
model test.py
Import the .pb file.
import tensorflow as tf
from tensorflow.python.framework import tensor_util
with tf.gfile.GFile('20170512-110547.pb', "rb") as f:
graph_def = tf.GraphDef()
#import into default graph
#print some data
wts = [n for n in graph_def.node if n.op == 'Const']
for n in wts:
Linked questions:
Import a simple Tensorflow frozen_model.pb file and make prediction in C++
get the value weights from .pb file by Tensorflow
Related documentation: GraphDef
You need use the ckpt path "./model/20170512-110547/model-20170512-110547.ckpt-250000" instead of the folder path.
How can I fix this error I downloaded this code from GitHub.
predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[0][0].numpy()
throws the error
AttributeError: 'Tensor' object has no attribute 'numpy'
Please help me fix this!
I used:
sess = tf.Session()
with sess.as_default():
predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[0][0].eval()
And i get this error. Someone help me i just want it to work why is this so hard?
D:\Python>python TextGenOut.py
File "TextGenOut.py", line 72
predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[0][0].eval()
IndentationError: unexpected indent
D:\Python>python TextGenOut.py
2018-09-16 21:50:57.008663: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-09-16 21:50:57.272973: W T:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1275] OP_REQUIRES failed at resource_variable_ops.cc:480 : Not found: Container localhost does not exist. (Could not find resource: localhost/model/embedding/embeddings)
Traceback (most recent call last):
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1278, in _do_call
return fn(*args)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1350, in _call_tf_sessionrun
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable model/dense/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/model/dense/kernel)
[[Node: model/dense/MatMul/ReadVariableOp = ReadVariableOp[dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](model/dense/kernel)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "TextGenOut.py", line 72, in <module>
predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[0][0].eval()
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 680, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 4951, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 877, in run
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1272, in _do_run
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\client\session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable model/dense/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/model/dense/kernel)
[[Node: model/dense/MatMul/ReadVariableOp = ReadVariableOp[dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](model/dense/kernel)]]
Caused by op 'model/dense/MatMul/ReadVariableOp', defined at:
File "TextGenOut.py", line 66, in <module>
predictions, hidden = model(input_eval, hidden)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\base_layer.py", line 736, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "TextGenOut.py", line 39, in call
x = self.fc(output)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\engine\base_layer.py", line 736, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\keras\layers\core.py", line 943, in call
outputs = gen_math_ops.mat_mul(inputs, self.kernel)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\gen_math_ops.py", line 4750, in mat_mul
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\op_def_library.py", line 510, in _apply_op_helper
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 1094, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1045, in _dense_var_to_tensor
return var._dense_var_to_tensor(dtype=dtype, name=name, as_ref=as_ref) # pylint: disable=protected-access
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 1000, in _dense_var_to_tensor
return self.value()
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 662, in value
return self._read_variable_op()
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\resource_variable_ops.py", line 745, in _read_variable_op
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\ops\gen_resource_variable_ops.py", line 562, in read_variable_op
"ReadVariableOp", resource=resource, dtype=dtype, name=name)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\util\deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 3155, in create_op
File "C:\Users\fried\AppData\Roaming\Python\Python36\site-packages\tensorflow\python\framework\ops.py", line 1717, in __init__
self._traceback = tf_stack.extract_stack()
FailedPreconditionError (see above for traceback): Error while reading resource variable model/dense/kernel from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/model/dense/kernel)
[[Node: model/dense/MatMul/ReadVariableOp = ReadVariableOp[dtype=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](model/dense/kernel)]]
I suspect the place where you copied the code from had eager execution enabled, i.e. had invoked tf.enable_eager_execution() at the start of the program.
You could do the same.
UPDATE: Note that eager execution is enabled by default in TensorFlow 2.0. So the answer above applies only to TensorFlow 1.x
Since the accepted answer did not solve the problem for me so I thought it might be helpful for some people who face the problem and that already have tensorflow version >= 2.2.0 and eager execution enabled.
The issue seems to be that for certain functions during the fitting model.fit()
the #tf.function decorator prohibits the execution of functions like tensor.numpy() for performance reasons.
The solution for me was to pass the flag run_eagerly=True to the model.compile() like this:
model.compile(..., run_eagerly=True)
Tensorflow 2 has a config option to run functions "eagerly" which will enable getting Tensor values via .numpy() method. To enable eager execution, use following command:
Note that this is useful mainly for debugging.
See also: https://www.tensorflow.org/api_docs/python/tf/config/run_functions_eagerly
This can also happen in TF2.0 if your code is wrapped in a #tf.function or inside a Keras layer. Both of those run in graph mode. There's a lot of secretly broken code out of there because behavior differs between eager and graph modes and people are not aware that they're switching contexts, so be careful!
It happens in older version of TF. So try pip install tensorflow --upgrade
otherwise run
import tensorflow as tf
If you are using Jupyter notebook, restart the Kernel.
tf.multinomial returns a Tensor object that contains a 2D list with drawn samples of shape [batch_size, num_samples]. Calling .eval() on that tensor object is expected to return a numpy ndarray.
Something like this:
predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[0][0].eval()
You also need to ensure that you have a session active (doesn't make a lot of sense otherwise):
sess = tf.Session()
with sess.as_default():
predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[0][0].eval()
I saw similar error when I run code something like the following,
tensor = tf.multiply(ndarray, 42)
tensor.numpy() # throw AttributeError: 'Tensor' object has no attribute 'numpy'
I use anaconda 3 with tensorflow 1.14.0. I upgraded tensorflow with the command below
conda update tensorflow
now tensorflow is 2.0.0, issue fixed. Try this to see if it resolves your issue.
I had the same issue in a tf.function(): But what has worked for me is to transform the numpy array into a tensorflow tensor via tf.convert_to_tensor Doku and then go ahead with tensorflow. Maybe this trick could be useful for anyone...
You can also use tf.get_static_value() to obtain the value of a tensor. This has the benefit of not needing eager mode. See docs here.
I have the following problem:
I have created a model using deepchem, which is a wrapped keras model, trained it and reloaded it. I can predict using this model without a problem.
Now I want to make a copy of this model, which has one less inputs, since one input is always constant in my use scenario and always passing it lead to errors in a function I can't edit.
data = np.array(data.data, dtype=float32)
with tf.Graph().as_default() as temp_graph:
tf.constant(np.array([0], dtype=float32)),})
#self.model.session.graph = temp_graph
#for deep explainer: replace all switched dropouts with dropouts
#get input tensor for this graph
tensors = tf.contrib.graph_editor.get_tensors(temp_graph)
for t in tensors:
if "input_1" in t.name:
input_tensor = t
#reshape output --> only singletask!
output = tf.reshape(tensors[-1], [-1, 1])
model = (input_tensor, output)
sess = tf.Session(graph=temp_graph)
feed_dict = dict(zip([input_tensor], [data]))
print(sess.run(output, feed_dict))
In this code fragments I was able to load the graph of my model and pass a constant into its input. Now obviously I can't run this new model in the same session, since that session contains the old model. The way of running the model with the feed dict can't be changed, since it is in another package in the real scenario. I get the following error message:
Error while reading resource variable dense_2/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist.
The full trace is:
Traceback (most recent call last):
File "/EXT/Tobha/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/EXT/Tobha/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/EXT/Tobha/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable dense_2/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/dense_2/bias)
[[{{node import/model/dense_2/BiasAdd/ReadVariableOp}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/EXT/Tobha/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/Models.py", line 490, in <module>
File "/EXT/Tobha/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/Models.py", line 478, in main
File "/EXT/Tobha/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/Models.py", line 445, in evaluate
File "/EXT/Tobha/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/DataHandling.py", line 1534, in explain
File "/EXT/Tobha/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/DataHandling.py", line 1519, in explain
File "/EXT/Tobha/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/DataHandling.py", line 2047, in _explain_Gradient_SHAP
print(sess.run(output, feed_dict))
File "/EXT/Tobha/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 950, in run
File "/EXT/Tobha/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/EXT/Tobha/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
File "/EXT/Tobha/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Error while reading resource variable dense_2/bias from Container: localhost. This could mean that the variable was uninitialized. Not found: Container localhost does not exist. (Could not find resource: localhost/dense_2/bias)
[[node import/model/dense_2/BiasAdd/ReadVariableOp (defined at /eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/DataHandling.py:2033) ]]
Original stack trace for 'import/model/dense_2/BiasAdd/ReadVariableOp':
File "/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/Models.py", line 490, in <module>
File "/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/Models.py", line 478, in main
File "/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/Models.py", line 445, in evaluate
File "/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/DataHandling.py", line 1534, in explain
File "/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/DataHandling.py", line 1519, in explain
File "/eclipse-workspace/Bachelorarbeit/toolbox_dc_2_3_0/python_source/DataHandling.py", line 2033, in _explain_Gradient_SHAP
tf.constant(np.array([0], dtype=float32)),})
File "/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 443, in import_graph_def
File "/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 236, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3751, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3751, in <listcomp>
for c_op in c_api_util.new_tf_operations(self)
File "/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3641, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/.conda/envs/test_BA_Tobias_std_deepchem-2-3-0_py36_20200114/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
self._traceback = tf_stack.extract_stack()
I am using tensorflow 1.14 and Python 3.6 (This can't be changed aswell)
So my problem could be solved in 2 different ways: Either I get to run the second graph with the information that is in the old session, or I get to tell the old session to use one constant input.
Thanks for any help in advance!
best regards
I eventually fixed this by wrapping the class I was trying to use and overwriting some methods. I think another idea could have been, to replace one Keras input with a keras constant.
This error is a little tricky. Here's a couple of suggestions that spring to mind:
DeepChem HEAD is now running on TensorFlow 2.X. If your problem would be easier to handle in Eager mode, that might be one option. Of course, HEAD isn't stable and there might be other issues that crop up there.
DeepChem models are underneath the hood just made of Keras layers. If you can make a Keras model from the constituent layers of your model, then you can possibly avoid the DeepChem wrapper and solve the problem directly in Keras.
It might also help to add more information on the DeepChem model you're trying to use and the downstream function you're seeing an error in.
I am trying to import a pretrained tensorflow model (the default sound recognition one in the tutorial) and I keep getting this error.
I tried importing using both a checkpoint file and a .pb file, and as a beginner, I have no idea about this error. Any help would be appreciated!
I have tried this on Debian and Windows 10, python3.5 and python 3.6 with multiple versions of tensorflow.
Traceback (most recent call last):
File "C:\tmp\speech_commands_train\Ztest.py", line 4, in <module>
saver = tf.train.import_meta_graph('conv.ckpt-18000.meta')
File "C:\Dev\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 1927, in import_meta_graph
File "C:\Dev\Python36\lib\site-packages\tensorflow\python\framework\meta_graph.py", line 741, in import_scoped_meta_graph
File "C:\Dev\Python36\lib\site-packages\tensorflow\python\util\deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "C:\Dev\Python36\lib\site-packages\tensorflow\python\framework\importer.py", line 457, in import_graph_def
_RemoveDefaultAttrs(op_dict, producer_op_list, graph_def)
File "C:\Dev\Python36\lib\site-packages\tensorflow\python\framework\importer.py", line 227, in _RemoveDefaultAttrs
op_def = op_dict[node.op]
KeyError: 'DecodeWav'
This is the code that I am using to import:
import tensorflow as tf
sess = tf.Session()
saver = tf.train.import_meta_graph('conv.ckpt-18000.meta')
saver.restore(sess, tf.train.latest_checkpoint('./'))
I am running into an error "libhdfs.so: cannot open shared object file: No such file or directory" (stack trace below) while trying to run a python script invoking a Tensorflow reader on a file stored in HDFS. I am running the script on a node on the cluster which has Tensorflow in a virtualenv, activated at the time of execution. I set the following environment variables before execution:
export HADOOP_HDFS_HOME=$HADOOP_HDFS_HOME:/opt/cloudera/parcels/CDH
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/cloudera/parcels/CDH/lib/libhdfs.so
I execute the script as such:
CLASSPATH=$($LD_LIBRARY_PATH} classpath --glob) python TEST.py
This is the code in the script:
filename_queue = tf.train.string_input_producer([
"hdfs://hostname:port/user/hdfs/test.avro" ])
reader =
tf.WholeFileReader() key, value = reader.read(filename_queue)
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
Below is the stack trace of the error. Any ideas on what is causing this error is appreciated (I have already checked that the LD_LIBRARY_PATH variable has an explicit pointer to the libhdfs.so file prior to execution, unable to figure out why it still cannot find the file).
Traceback (most recent call last):
File "TEST.py", line 25, in <module>
File "/home/username/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
File "/home/username/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/username/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/username/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: libhdfs.so: cannot open shared object file: No such file or directory
[[Node: ReaderReadV2 = ReaderReadV2[_device="/job:localhost/replica:0/task:0/cpu:0"](WholeFileReaderV2, input_producer)]]
Caused by op u'ReaderReadV2', defined at:
File "TEST.py", line 19, in <module>
key, value = reader.read(filename_queue)
File "/home/username/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/io_ops.py", line 272, in read
return gen_io_ops._reader_read_v2(self._reader_ref, queue_ref, name=name)
File "/home/username/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 410, in _reader_read_v2
queue_handle=queue_handle, name=name)
File "/home/username/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
File "/home/username/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/username/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
self._traceback = _extract_stack()
NotFoundError (see above for traceback): libhdfs.so: cannot open shared object file: No such file or directory
I also encountered this issue the solution for me was copying this file to:
If you don't know the location of this file execute the following command to find it's location:
sudo updatedb
locate libhdfs.so
This will give you the location of the file. Next copy the file to $HADOOP_HDFS_HOME/lib/native:
cp locationOflibhdfs.so $HADOOP_HDFS_HOME/lib/native
Note: Replace locationOflibhdfs.so with the location of the libhdfs.so file.
im2txt trains for a few thousand steps then halts with the following error.
I've checked the training files and they appear OK.
Running on Ubuntu 16.04, TF r.0.11, GPU mode GTX 970 4Gb.
Not sure if it is lack of RAM?
INFO:tensorflow:global step 56396: loss = 2.4654 (0.41 sec/step)
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors.DataLossError'>, truncated record at 369740238
[[Node: ReaderRead = ReaderRead[_class=["loc:#TFRecordReader", "loc:#filename_queue"], _device="/job:localhost/replica:0/task:0/cpu:0"](TFRecordReader, filename_queue)]]
Caused by op u'ReaderRead', defined at:
File "/home/john/Developer/tensorflow/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 114, in <module>
File "/home/john/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "/home/john/Developer/tensorflow/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/train.py", line 65, in main
File "/home/john/Developer/tensorflow/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/show_and_tell_model.py", line 352, in build
File "/home/john/Developer/tensorflow/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/show_and_tell_model.py", line 153, in build_inputs
File "/home/john/Developer/tensorflow/tensorflow/models/im2txt/bazel-bin/im2txt/train.runfiles/im2txt/im2txt/ops/inputs.py", line 115, in prefetch_input_data
_, value = reader.read(filename_queue)
File "/home/john/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/io_ops.py", line 277, in read
return gen_io_ops._reader_read(self._reader_ref, queue_ref, name=name)
File "/home/john/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/ops/gen_io_ops.py", line 211, in _reader_read
queue_handle=queue_handle, name=name)
File "/home/john/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 748, in apply_op
File "/home/john/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2403, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/john/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1305, in __init__
self._traceback = _extract_stack()
DataLossError (see above for traceback): truncated record at 369740238
[[Node: ReaderRead = ReaderRead[_class=["loc:#TFRecordReader", "loc:#filename_queue"], _device="/job:localhost/replica:0/task:0/cpu:0"](TFRecordReader, filename_queue)]]
INFO:tensorflow:global step 56397: loss = 2.5540 (0.40 sec/step)
I have the same problem, not sure why. I did not see any error when creating tfrecords. During training, the error comes out near the end of the records. BTW I am using tf 0.11rc