I have a case where matrix multiplication of two matrices with certain dimensions work in numpy, but doesn't work in tensorflow.
x = np.ndarray(shape=(10,20,30), dtype = float)
y = np.ndarray(shape=(30,40), dtype = float)
z = np.matmul(x,y)
print("np shapes: %s x %s = %s" % (np.shape(x), np.shape(y), np.shape(z)))
This works as expected and prints:
np shapes: (10, 20, 30) x (30, 40) = (10, 20, 40)
However in tensorflow when I try to multiply placeholder and variable of the same shapes as the numpy arrays above I get an error
x = tf.placeholder(tf.float32, shape=(10,20,30))
y = tf.Variable(tf.truncated_normal([30,40], name='w'))
print("tf shapes: %s x %s" % (x.get_shape(), y.get_shape()))
tf.matmul(x,y)
Results in
tf shapes: (10, 20, 30) x (30, 40)
InvalidArgumentError:
Shape must be rank 2 but is rank 3 for 'MatMul_12'
(op: 'MatMul') with input shapes: [10,20,30], [30,40].
Why does this operation fail?
Don't know why tf.matmul does not support this kind of multiplication (may be one of the core developers could provide a meaningful answer).
But if you just want to be able to multiply tensors in this way, take a look at tf.einsum function. It could operate with tensors of arbitrary rank.
As suggested by Dmytro tf.einsum can be used to multiply these two arrays.
x = np.ndarray(shape=(10,20,30), dtype = float)
y = np.ndarray(shape=(30,40), dtype = float)
These two operations produce exactly the same result:
np.einsum('ijk,kl->ijl', x, y)
np.matmul(x,y)
And corresponding tensorflow operation also works
tf.einsum('ijk,kl->ijl', tf_x,tf_y)
People already told you that you can use tf.einsum() to get the result you want.
import tensorflow as tf
x = tf.random_normal([10, 20, 30])
y = tf.random_normal([30, 40])
z = tf.einsum('ijk,kl->ijl', x, y)
The reason why tf.matmul() does not work the way you expected is written in the documentation.
The inputs must be matrices (or tensors of rank > 2, representing
batches of matrices), with matching inner dimensions, possibly after
transposition.
In your case you have a matrix y and a tensor x (rank 3 > 2). In your case inner dimensions do not match. If you want, them to match, you will need to have something like this:
import tensorflow as tf
a, b, c = 12, 50, 20
x = tf.random_normal([a, b, c])
y = tf.random_normal([a, c, b])
z = tf.matmul(x, y)
But clearly it calculates not the stuff you want.
Related
I have successfully performed 2d interpolation in python using the RectBivariateSpline method from scipy.interpolate. However, it is performed on numpy arrays. I want to perform it on tensors solely using tensorflow.
This is what I have right now: It works if all are numpy arrays. However, I am having a hard time to rewrite it in tensorflow.
x_old = np.arange(0,256)
y_old = np.arange(0,256)
#x = tensor of shape [256,256]
#y = tensor of shape [256,256]
#in_im = tensor of shape [256,256,3]
#out_im = tensor of shape [256,256,3]
for d in range(0,3):
interpf = RectBivariateSpline( x_old, y_old, in_im[:,:,d])
out_im[:,:,d] = interpf.ev(x[:,:], y[:,:])
The resize operators in tf.image might be what you are looking for, e.g. tf.image.resize_bicubic (https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/image/resize_bicubic)
To convert tensors into numpy array is the solution.
This question about conversion might be helpful.
In short, Any tensor returned by Session.run or eval is a NumPy array.
Example code is below.
import tensorflow as tf
import numpy as np
from scipy.interpolate import RectBivariateSpline
x = tf.constant([1,2,3,4])
y = tf.constant([1,2,3,4,5])
vals = tf.constant([
[4,1,4,4,2],
[4,2,3,2,6],
[3,7,4,3,5],
[2,4,5,3,4]
])
sess = tf.Session()
x, y, vals = sess.run([x, y, vals]) # x, y vals are now ndarray
rect_B_spline = RectBivariateSpline(x, y, vals)
a = tf.constant([3.2, 3.8, 2.2])
b = tf.constant([2.4, 4.3, 3.3])
a = sess.run([a, b])
print(rect_B_spline.ev(a, b))
I am learning statsmodels.api module to use python for regression analysis. So I started from the simple OLS model.
In econometrics, the function is like: y = Xb + e
where X is NxK dimension, b is Kx1, e is Nx1, so adding together y is Nx1. This is perfectly fine from linear algebra point of view.
But I followed the tutorial from Statsmodels as the following:
import numpy as np
nsample = 100 # total obs is 100
x = np.linspace(0, 10, 100) # using np.linspace(start, stop, number)
X = np.column_stack((x, x**2))
beta = np.array([1, 0.1, 10])
e = np.random.normal(size = nsample) # draw numbers from normal distribution
default at mu = 0, and std.dev = 1, size = set by user
# e is n x 1
# Now, we add the constant/intercept term to X
X = sm.add_constant(X)
# Now, we compute the y
y = np.dot(X, beta) + e
So this generates the correct answer. But I have a question about the generation of beta = np.array([1,0.1,10]). This beta, if we use:
beta.shape
(3,)
It has a dimension of (3,), the same goes with y and e except X:
X.shape
(100,3)
e.shape
(100,)
y.shape
(100,)
So I guess initiating array using the following three ways
o = array([1,2,3])
o1 = array([[1],[2],[3]])
o2 = array([[1,2,3]])
print(o.shape)
print(o1.shape)
print(o2.shape)
----------------
(3,)
(3, 1)
(1, 3)
If I use beta = array([[1],[2],[3]]), which is a (3,1), and np.dot(X, beta) gets me a wrong answer, although the dimension seems to work.
If I use array([[1,2,3]]), which is a row vector, the dimension doesn't match for dot product in numpy, neither in linear algebra.
So, I am wondering why for a NxK dot Kx1 numpy dot product, we have to use a (N,K) dot (K,) instead of (N,K) dot (K,1) matrices. What operation makes only np.array([1, 0.1, 10]) works for numpy.dot() while np.array([[1], [0.1], [10]]) doesn't.
Thank you very much.
Some update
Sorry about the confusion, the codes in Statsmodels are randomly generated so I tried to fix the X and get the following input:
f = array([[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15]])
o = array([1,2,3])
o1 = array([[1],[2],[3]])
o2 = array([[1,2,3]])
print(o.shape)
print(o1.shape)
print(o2.shape)
print("---------")
print(np.dot(f,o))
print(np.dot(f,o1))
r1 = np.dot(f,o)
r2 = np.dot(f,o1)
type1 = type(np.dot(f,o))
type2 = type(np.dot(f,o1))
tf = type1 is type2
tf2 = type1 == type2
print(type1)
print(type2)
print(tf)
print(tf2)
-------------------------
(3,)
(3, 1)
(1, 3)
---------
[14 32 50 68 86]
[[14]
[32]
[50]
[68]
[86]]
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
True
True
Sorry again for the confusion and inconvenience, they worked fine.
python/numpy is not a matrix-based language as it is Matlab or Octave or Scilab. These follow the rules of matrix multplication strictly. So
np.dot(f,o) ---------> f*o in Matlab/Octave/Scilab
np.dot(f,o1) ---------> f*o1 does not work in Matlab/Octave/Scilab
python/numpy has the 'broadcasting' which are the rules how the different data types and operations give together a result. It's not obvious why np.dot(f,o1) even should work, but the broadcasting defines some usefull results. You will have to consult the docs for that.
In python/numpy the * is not a matrix operator. You can find out what the broadcasting gives for
print(f*o)
print(f*o1)
print(f*o2)
Rather recently python/numpy has introduced the matrix operator #. You might find out what happens with
print(f#o)
print(f#o1)
print(f#o2)
Does this give some impressions ?
I'm trying to write a chunk of reusable code that reads the shape of one tensor and then uses the resulting object to define the shape of other tensors. I have a choice of reading the dynamic shape of the tensor with tf.shape(tensor) or the static shape of the tensor with tensor.get_shape(). The toy example looks like this (with the two different strategies):
def my_function_strategy_1(x, y):
x_shape = tf.shape(x)
a = tf.reshape(y, x_shape)
b = tf.zeros(x_shape)
num_x_values = x_shape[0]
c = tf.reshape(y, [num_x_values, 4])
d = tf.zeros([num_x_values, 4])
return a, b, c, d
def my_function_strategy_2(x, y):
x_shape = x.get_shape()
a = tf.reshape(y, x_shape)
b = tf.zeros(x_shape)
num_x_values = x_shape[0]
c = tf.reshape(y, [num_x_values, 4])
d = tf.zeros([num_x_values, 4])
return a, b, c, d
I want to use this chunk of code in different graphs. Sometimes the shape of the input tensors will be known and sometimes they will be unknown:
graph_A = tf.Graph()
with graph_A.as_default():
x = tf.placeholder(tf.float32, [2, 4])
y = tf.placeholder(tf.float32, [8])
a, b, c, d = my_function(x, y)
with graph_B.as_default():
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
a, b, c, d = my_function(x, y)
The behavior I want is: (A) When the shapes of the input tensors are known (as in graph_A), I want TensorFlow to calculate all of the shapes in the graph at graph creation time (so it can efficiently allocate resources, etc.), and (B) When the shapes of the input tensors are unknown (as in graph_B), I want the TensorFlow to wait until runtime to calculate all of the shapes in the graph.
The strategy_1 version of the function almost does this. It achieves (B), but it doesn't quite achieve (A) because TensorFlow leaves the shape of some tensors unknown. For example, in the toy example above, the shapes of a, b, and c are calculated at graph creation time, but the shape of d is left unknown (even though d uses very similar operations). You can check this by printing a.get_shape(), b.get_shape(), etc.
Conversely, the strategy_2 version of the function achieves (A) for all tensors in the graph, but doesn't achieve (B) because TensorFlow (understandably) throws an exception when it tries to use the (unknown) static shape of the input tensor to shape other tensors.
Is there a way to achieve both (A) and (B) in a single function? How/why does the strategy_1 version work for most tensors in the graph, but not all?
You can carefully pick the elements of the shape that you know to have a "best of both worlds" result:
def my_get_shape(tensor):
if tensor.shape.ndims is None:
# Fully dynamic
return tf.shape(tensor)
if tensor.shape.is_fully_defined():
# Fully static
return tensor.shape
# Partially static
dyn_shape = tf.shape(tensor)
shape = []
for i, d in enumerate(tensor.shape):
shape.append(d.value if d.value is not None else dyn_shape[i])
return shape
def my_function(x, y):
x_shape = my_get_shape(x) # Or just tf.shape(x)! - see edit
a = tf.reshape(y, x_shape)
b = tf.zeros(x_shape)
num_x_values = x_shape[0]
c = tf.reshape(y, [num_x_values, 4])
d = tf.zeros([num_x_values, 4])
return a, b, c, d
# Fully static
with tf.Graph().as_default():
x = tf.placeholder(tf.float32, [2, 4])
y = tf.placeholder(tf.float32, [8])
a, b, c, d = my_function(x, y)
print('a:', a.shape, ', b:', b.shape, ', c:', c.shape, ', d:', d.shape)
# a: (2, 4) , b: (2, 4) , c: (2, 4) , d: (2, 4)
# Fully dynamic
with tf.Graph().as_default():
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
a, b, c, d = my_function(x, y)
print('a:', a.shape, ', b:', b.shape, ', c:', c.shape, ', d:', d.shape)
# a: <unknown> , b: <unknown> , c: (?, 4) , d: (?, 4)
# Partially static
with tf.Graph().as_default():
x = tf.placeholder(tf.float32, [None, 4])
y = tf.placeholder(tf.float32)
a, b, c, d = my_function(x, y)
print('a:', a.shape, ', b:', b.shape, ', c:', c.shape, ', d:', d.shape)
# a: (?, 4) , b: (?, 4) , c: (?, 4) , d: (?, 4)
EDIT:
Actually, replacing my_get_shape with tf.shape in the previous snippet works exacly the same. It seems that tf.shape should be the default (being careful not to cram the graph with it) unless you explicitly want to keep dimensions undefined.
I have investigated a bit, and I couldn't work the whole thing out completely. I don't know if this is useful, but here are some things I found out. Apparently TensorFlow has, at C++ level (it seems it used to be in Python before, but not anymore), a "shape inference" mechanism. If you look, for example, in tensorflow/core/ops/array_ops.cc) you will see that every operation declaration includes a .SetShapeFn at the end, which is a function that uses InferenceContext to try to guess the output shape of the operation. This class can, among other things , check whether values in a tensor are already known, which is true for example for tf.shape when the given tensor is static or for tf.fill (and related like tf.ones) with known values. The resolution of the shape inference algorithm is what is set as tensor shape in Python, and it can be called directly (although I don't see how it can be useful) through call_cpp_shape_fn:
from tensorflow.python.framework.common_shapes import call_cpp_shape_fn
with tf.Graph().as_default():
print(call_cpp_shape_fn(tf.reshape(tf.placeholder(tf.float32), tf.fill([2], 3)).op))
# Shows this:
# {
# 'shapes': [dim { size: 3 } dim { size: 3 }],
# 'handle_data': [None],
# 'inputs_needed': b'\x12\x01\x01'
# }
print(call_cpp_shape_fn(tf.reshape(tf.placeholder(tf.float32), (2 * tf.fill([2], 3))).op))
# Shows this:
# {
# 'shapes': [dim { size: -1 } dim { size: -1 }],
# 'handle_data': [None],
# 'inputs_needed': b'\x12\x01\x01'
# }
You can see that, while tf.fill([2], 3) was correctly inspected, TensorFlow didn't work out that 2 * tf.fill([2], 3) is [6, 6], presumably because statically keeping track of operations like multiplication, even if operands are known constants, was deemed too expensive.
What I haven't found out is where do ops declare that their values can be statically known, or where/how these values are retrieved exactly. It seems that, for example, for tf.shape, it is able to specifically pick known values and leave the rest as undefined.
So assuming I have this:
TensorShape([Dimension(None), Dimension(32)])
And I use tf.split on this tensor _X with the dimension above:
_X = tf.split(_X, 128, 0)
What is the shape of this new tensor? The output is a list so its hard to know the shape of this new tensor.
tf.split() returns the list of tensor objects. You could know shape of each tensor object as follows
import tensorflow as tf
X = tf.random_uniform([256, 32]);
Y = tf.split(X,128,0)
Y_shape = tf.shape(Y[1])
sess = tf.Session()
X_v,Y_v,Y_shape_v = sess.run([X,Y,Y_shape])
# numpy style
print X_v.shape
print len(Y_v)
print Y_v[100].shape
# TF style
print len(Y)
print Y_shape_v
Output :
(256, 32)
128
(2, 32)
128
[ 2 32]
I hope this helps !
tf.split(X, row = n, column = m) is used to split the data set of the variable into n number of pieces row wise and m numbers of pieces column wise.
For example, we have data_set x of size (10,10),
then tf.split(x, 2, 0) will break the data_set of x in 2 set of size (5, 10)
but if we take tf.split(x, 2, 2),
then we will get 4 sets of data of size (5, 5).
The new version of tensorflow defines split function as follows:
tf.split(
value,
num_or_size_splits,
axis=0,
num=None,
name='split'
)
however, when I try to run it in R:
X = tf$random_uniform(minval=0,
maxval=10,shape(256, 32),name = "X");
Y = tf$split(X,num_or_size_splits = 2,axis = 0)
it reports error message:
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: Rank-0 tensors are not supported as the num_or_size_splits argument to split. Argument provided: 2.0
I have seen that transpose and reshape together can help but I don't know how to use.
Eg. dimshuffle(0, 'x')
What is its equivalent by using transpose and reshape? or is there a better way?
Thank you.
There are three relevant ops for implementing Theano's dimshuffle in TensorFlow:
tf.transpose() is used to permute the dimensions of a tensor. If the pattern specified in the arguments to dimshuffle is a permutation of the input tensor's dimensions (i.e. there is no 'x' or missing dimension) you can use tf.transpose() to implement dimshuffle().
tf.expand_dims() is used to add one or more size-1 dimensions to a tensor. This handles the case where 'x' is specified as part of the dimshuffle() pattern, but does not reorder the existing dimensions.
tf.squeeze() is used to remove one or more size-1 dimensions from a tensor. This handles the case where a dimension is omitted from a dimshuffle() pattern, but it does not reorder the existing dimensions.
Assuming that the input is a vector, your example (dimshuffle(0, 'x')) can be expressed using tf.expand_dims() only:
input = tf.placeholder(tf.float32, [None]) # Defines an arbitrary-sized vector.
result = tf.expand_dims(input, 1)
print result.get_shape() # ==> TensorShape([Dimension(None), Dimension(1)])
Taking a more complicated example, dimshuffle(1, 'x', 0) applied to a matrix would be:
input = tf.placeholder(tf.float32, [128, 32]) # Defines a matrix.
output = tf.expand_dims(tf.transpose(input, [1, 0]), 1)
print output.get_shape()
# ==> TensorShape([Dimension(32), Dimension(1), Dimension(128)])
I implemented dimshuffle for TensorFlow in our framework Returnn (here). The code is this:
def expand_multiple_dims(x, axes, name="expand_multiple_dims"):
"""
:param tf.Tensor x:
:param list[int]|tuple[int] axes: after completion, tf.shape(y)[axis] == 1 for axis in axes
:param str name: scope name
:return: y where we have a new broadcast axis for each axis in axes
:rtype: tf.Tensor
"""
with tf.name_scope(name):
for i in sorted(axes):
x = tf.expand_dims(x, axis=i, name="expand_axis_%i" % i)
return x
def dimshuffle(x, axes, name="dimshuffle"):
"""
Like Theanos dimshuffle.
Combines tf.transpose, tf.expand_dims and tf.squeeze.
:param tf.Tensor x:
:param list[int|str]|tuple[int|str] axes:
:param str name: scope name
:rtype: tf.Tensor
"""
with tf.name_scope(name):
assert all([i == "x" or isinstance(i, int) for i in axes])
real_axes = [i for i in axes if isinstance(i, int)]
bc_axes = [i for (i, j) in enumerate(axes) if j == "x"]
if x.get_shape().ndims is None:
x_shape = tf.shape(x)
x = tf.reshape(x, [x_shape[i] for i in range(max(real_axes) + 1)]) # will have static ndims
assert x.get_shape().ndims is not None
# First squeeze missing axes.
i = 0
while i < x.get_shape().ndims:
if i not in real_axes:
x = tf.squeeze(x, axis=i)
real_axes = [(j if (j < i) else (j - 1)) for j in real_axes]
else:
i += 1
# Now permute.
assert list(sorted(real_axes)) == list(range(x.get_shape().ndims))
if real_axes != list(range(x.get_shape().ndims)):
x = tf.transpose(x, real_axes)
# Now add broadcast dimensions.
if bc_axes:
x = expand_multiple_dims(x, bc_axes)
assert len(axes) == x.get_shape().ndims
return x
If tensorflow is your backend
from keras import baskend as K
K.permute_dimension should do
tf.transpose is probably what you are looking for. it takes an arbitrary permutation.