I'm trying to explore use of tensorflow with custom ops. I build a simple switch op and verified it as suggested in tensorflow document. Now I'm trying to build the graph and then call run() method in a tensorflow
Session. Below is my code. I get the following error. Can someone help what should I do to fix it. Do I need to re-install tensorflow everytime I add a new custom op to /user_ops/?
import tensorflow as tf
# Create a Constant op that produce integer value
input1 = tf.constant(10)
# Create another op that produce an integer value
input2 = tf.constant(5)
# Create op that produce 0 or 1 as the control input in a switch
input3 = tf.constant(1)
# Create a switch op that takes input1 and input2 as inputs and input3 as
# the control input to produce an output
out = tf.user_ops.simple_switch(input1, input2, input3)
# Launch a default graph
sess = tf.Session()
# Call the 'run()' method and get the result
result = sess.run(out)
print(result)
# Close the Session when we're done!
sess.close()
When executed in python interpreter I get the following error:
Traceback (most recent call last):
File "tensorflow-switch.py", line 14, in
out = tf.simple_switch(input1, input2, input3)
AttributeError: 'module' object has no attribute 'simple_switch'
After adding a user-defined op (in TensorFlow 0.6.0 or earlier), to use it in the Python interpreter you must reinstall from the source repository. The easiest way to do this is to build and install a PIP package using Bazel. (The unit test would pass because running bazel test would cause TensorFlow to be rebuilt, and the rebuilt version to be used when running the tests.)
NOTE: This feature is experimental, and an improved workflow for adding user-defined ops is in development.
Related
I am trying to build a neural network in Python for solving PDEs, and, as such, I have had to write custom training steps. My training function looks like this:
...
tf.enable_eager_execution()
class PDENet:
...
def train_step():
input = self.input
with tf.GradientTape() as tape, tf.Session() as sess:
tape.watch(input)
output = self.model(input)
self.loss = self.pde_loss(output) # (network does not use training data)
grad = tape.gradient(self.loss, self.model.trainable_weights)
self.optimizer.apply_gradients([(grad, self.model)])
...
Due to my hardware, I have no choice but to use tensorflow==1.12.0 and keras==2.2.4.
When I run this code, I get "RuntimeError: Attempting to capture an EagerTensor without building a function". I have seen other posts about this, but all of the answers say to update tensorflow/keras, which I can't, use "tf.enable_eager_execution()", which I've already done, and "tf.disable_v2_behavior()", which is nonexistent on older versions of tensorflow. Is there anything else I can do to solve this problem? The error makes me think tensorflow wants me to add #tf.function, but that feature also doesn't seem to exist in tensorflow 1.
I have an old python script(tf-1.15.2) that needs to be run in TensorFlow-2.2.0 (can not use tf <2.2), I have migrated most of the code to tf-2.2.0, but there are some tensorflow.contrib related methods that are used in the code. So, I would like to use the old version tf-1.15 for running those lines of code that use tensorflow.contrib related APIs.
So, now the question is I have installed tf-1.15.2 globally, I have installed tf-2.2.0 locally. But how to access the specific version of the TensorFlow at a specific point in time while the python process is running?
Example code is below
import tensorflow as tf # version: tf-2.2.0 (local package is imported)
isess = tf.compat.v1.Session()
tf.compat.v1.disable_eager_execution()
# Creatoin of the required placeholders
p = []
for shape in input_shapes:
p.append(tf.compat.v1.placeholder(shape=shape, dtype=input_dtype))
out = tf.einsum(equation, *p)
graph_def = isess.graph_def
# TODO
# To feed this (graph_def, feed_dict, output_tensors) to a session object of tf-1.15.2 and find the output
Now to test the unit test given in tf_einsum_op_test in tf_1.15.2 after replacing the einsum with appropriate function (trace/dot_product/...), I would like to revert back to tf-1.15.2 and check the execution.
The underlying need is to find if the tf versions can be interchanged during the execution flow of a python process. Einsum op is considered since it is not directly supported in tf-1.15.2
Up on Experimenting with subprocess API, I found that it is possible to switch between the tf versions during the python process execution through subprocess call.
# main.py
import tensorflow as tf # version: tf-2.2.0 (local package is imported)
import subprocess
import os
isess = tf.compat.v1.Session()
tf.compat.v1.disable_eager_execution()
# Creatoin of the required placeholders
p = []
for shape in input_shapes:
p.append(tf.compat.v1.placeholder(shape=shape, dtype=input_dtype))
out = tf.einsum(equation, *p)
graph_def = isess.graph_def
#TODO: Save the graph_def in graph.pb
#TODO: Save the feed_dict in input.npz
#TODO: Save the output_tensors
#Change the python path to the global package
os.environ['PYTHONPATH'] = '/usr/local/lib/python3.6/dist-packages'
cmd = ['python3.6','run.py']
out = subprocess.check_output(cmd) #Subprocess call
#run.py
import tensorflow as tf # version: tf-1.15.2 (global package is imported)
import numpy as np
#TODO: Load the graphdef from graph.pb
#TODO: Load the feed_dict from input.npz
#TODO: Load the output tensors
g = tf.import_graph_def(graph_def,name='')
with tf.Session(graph=g) as sess:
output = sess.run(output_tensors,feed_dict)
This works for me.
The below code runs fine in a tutorial but there is an error when I run it locally. Are there any installation errors or something else? Please help. This is link to that tutorial:
https://colab.research.google.com/notebooks/mlcc/tensorflow_programming_concepts.ipynb?utm_source=mlcc&utm_campaign=colab-external&utm_medium=referral&utm_content=tfprogconcepts-colab&hl=en#scrollTo=Md8ze8e9geMi
And the code:
import tensorflow as tf
#create a graph
g = tf.Graph()
#establish the graph as the default graph
with g.as_default():
x = tf.constant(8, name = "x_const")
y = tf.constant(5, name = "y_const")
my_sum = tf.add(x,y, name = "x_y_sum")
#create the session
#this runs the default graph
with tf.Session() as sess:
print(my_sum.eval())
Below is the error that occurs:
gunjan#gunjan-Inspiron-3558:~/Desktop$ python tf1.py
/home/gunjan/anaconda3/lib/python3.5/site-
packages/h5py/__init__.py:34: FutureWarning: Conversion of the second
argument of issubdtype from `float` to `np.floating` is deprecated. In
future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
2018-08-20 22:10:41.619062: I
tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports
instructions that this TensorFlow binary was not compiled to use: AVX2
FMA
Traceback (most recent call last):
File "tf1.py", line 15, in <module>
print(my_sum.eval())
File "/home/gunjan/anaconda3/lib/python3.5/site-
packages/tensorflow/python/framework/ops.py", line 680, in eval
return _eval_using_default_session(self, feed_dict, self.graph,
session)
File "/home/gunjan/anaconda3/lib/python3.5/site-
packages/tensorflow/python/framework/ops.py", line 4942, in
_eval_using_default_session
raise ValueError("Cannot use the default session to evaluate tensor: "
ValueError: Cannot use the default session to evaluate tensor: the
tensor's graph is different from the session's graph. Pass an explicit
session to `eval(session=sess)`.
The problem is that you've created one graph (g) and you're executing code in a separate graph (sess). If you don't need two graphs, you can just use sess:
x = tf.constant(8, name = "x_const")
y = tf.constant(5, name = "y_const")
my_sum = tf.add(x,y, name = "x_y_sum")
#create the session
#this runs the default graph
with tf.Session() as sess:
print(my_sum.eval())
To simply get it working you can pass the session explicitly, as suggested by the error message:
print(my_sum.eval(session=sess))
To understand why it doesn't work exactly as the tutorial specifies it, you could start by comparing the versions of Python and TensorFlow to those used in the tutorial.
import tensorflow as tf
import platform
print("Python version: ", platform.python_version())
print("TensorFlow version", tf.__version__)
For the colab environment you linked, this prints:
Python version: 2.7.14
TensorFlow version 1.10.0
EDIT
Taking another look at your code sample, it's not an issue of version compatibility. The issue is that your copy of the tutorial did not properly preserve the indentation. The second with block needs to be enclosed in the first.
# Establish the graph as the "default" graph.
with g.as_default():
# ...
# Now create a session.
# The session will run the default graph.
with tf.Session() as sess:
print(my_sum.eval())
This ensures that g is used as the default graph for the session, instead of creating a new one like MatthewScarpino points out your incorrectly-indented version does.
If you create/use a Graph object explicitly rather than using the default graph, you need to either (a) pass the graph object to your Session constructor, or (b) create the session in the graph context.
graph = tf.Graph()
with graph.as_default():
build_graph()
with tf.Session(graph=graph) as sess:
do_stuff_with(sess)
or
graph = tf.Graph()
with graph.as_default():
build_graph()
with tf.Session() as sess:
do_stuff_with(sess)
I am using some code from here: https://github.com/monikkinom/ner-lstm with tensorflow. I think the code was written for an older version of tensorflow, I am using version 1.0.0. I used tf_upgrade.py to upgrade model.py in that github repos, but I am still getting the error:
output, _, _ = contrib_rnn.bidirectional_rnn(fw_cell, bw_cell,
AttributeError: 'module' object has no attribute 'bidirectional_rnn'
this is after I changed the bidirectional_rnn call to use contrib_rnn which is:
from tensorflow.contrib.rnn.python.ops import core_rnn as contrib_rnn
The old call was
output, _, _ = tf.nn.bidirectional_rnn(fw_cell, bw_cell,
tf.unpack(tf.transpose(self.input_data, perm=[1, 0, 2])),
dtype=tf.float32, sequence_length=self.length)
which also doesn't work.
I had to change the LSTMCell, DroputWrapper, etc. to rnn.LSTMCell, but they seem to work fine. It is the bidirectional_rnn that I can't figure out how to change.
In TensorFlow 1.0, you have the choice of two bidirectional RNN functions:
tf.nn.bidirectional_dynamic_rnn()
tf.contrib.rnn.static_bidirectional_rnn()
Maybe you can try to reimplement a bidirectional RNN by simply wrapping into a single class two monodirectional RNNs with the parameter "go_backwards=True" set on one of them. Then you can also have control over the type of merge done with the outputs. Maybe taking a look at the implementation in https://github.com/fchollet/keras/blob/master/keras/layers/wrappers.py (see the class Bidirectional) could get you started.
I have tried to modify the CIFAR-10 example to run on the new TensorFlow distributed runtime. However, I get the following error when trying to run the program:
InvalidArgumentError: Cannot assign a device to node 'softmax_linear/biases/ExponentialMovingAverage':
Could not satisfy explicit device specification '/job:local/task:0/device:CPU:0'
I start the cluster using the following commands. On the first node I run:
bazel-bin/tensorflow/core/distributed_runtime/rpc/grpc_tensorflow_server --cluster_spec='local|10.31.101.101:7777;10.31.101.224:7778' --job_name=local --task_id=0
...and on the second node I run:
bazel-bin/tensorflow/core/distributed_runtime/rpc/grpc_tensorflow_server --cluster_spec='local|10.31.101.101:7777;10.31.101.224:7778' --job_name=local --task_id=1
For the CIFAR-10 multi-GPU code, I make the simple modifications, replacing two lines in the train() function. The following line:
with tf.Graph().as_default(), tf.device('/cpu:0'):
...is replaced with:
with tf.Graph().as_default(), tf.device('/job:local/task:0/cpu:0'):
and the following line:
with tf.device('/gpu:%d' % i):
...is replaced with:
with tf.device('/job:local/task:0/gpu:%d' % i):
In my understanding, the second substitution should take care of the model substitution. Running a simpler example, like the code below, works fine:
with tf.device('/job:local/task:0/cpu:0'):
c = tf.constant("Hello, distributed TensorFlow!")
sess.run(c)
print(c)
I can't tell from your program, but my guess is that you also have to modify the line that creates the session to specify the address of one of your worker tasks. For example, given your configuration above, you might write:
sess = tf.Session(
"grpc://10.31.101.101:7777",
config=tf.ConfigProto(
allow_soft_placement=True,
log_device_placement=FLAGS.log_device_placement))
As it happens, we've been trying to improve that error message to make it less confusing. If you update to the latest version in GitHub and run the same code, you should see an error message that explains why the device specification could not be satisfied.