Convert Tensor of hex strings to int - python

I have a dataset of tf.RaggedTensors with strings representing hexadecimal numbers that look like this:
[
[[b'F6EE', b'BFED', b'4EEA', b'00EE', b'77AE', b'1FBE', b'1A6E',
b'5AEB', b'6A0E', b'212F'],
...
[b'FFEE', b'FFED', b'FEED', b'FDEE', b'FAAE', b'FFBE', b'FA8E',
b'FAEB', b'FA0E', b'E12F']],
...
[[b'FFEE', b'FFED', b'FEED', b'FDEE', b'FAAE', b'FFBE', b'FA8E',
b'FAEB', b'FA0E', b'E12F'],
...
[b'B6EE', b'BFED', b'4EEA', b'00EE', b'77AE', b'1FBE', b'1A6E',
b'5AEB', b'6A0E', b'212F']]
]
I want to convert it into Tensor of int values, but tf.strings.to_number(tensor, tf.int32) doesn't have an option to specify the base as base16. Are there any alternatives?
Dataset contains tf.RaggedTensors, but the target shape is (batch_size, 100, 10). I guess this could be helpful if we were to make a custom function for this.

I think you're looking for something like this.
I first create an example tensor with 3D shape, as the one that you have.
import tensorflow as tf
>> a = tf.convert_to_tensor(['F6EE', 'BFED', '4EEA', '00EE', '77AE', '1FBE', '1A6E',
'5AEB', '6A0E', '212F'])
>> b = tf.convert_to_tensor(['FFEE', 'FFED', 'FEED', 'FDEE', 'FAAE', 'FFBE', 'FA8E',
'FAEB', 'FA0E', 'E12F'])
>> tensor = tf.ragged.stack([[a, b]]).to_tensor()
tf.Tensor(
[[[b'F6EE' b'BFED' b'4EEA' b'00EE' b'77AE' b'1FBE' b'1A6E' b'5AEB'
b'6A0E' b'212F']
[b'FFEE' b'FFED' b'FEED' b'FDEE' b'FAAE' b'FFBE' b'FA8E' b'FAEB'
b'FA0E' b'E12F']]], shape=(1, 2, 10), dtype=string)
Then, based on this answer, I created a custom function that I map to each value of the tensor in order to apply a transformation, in this case a cast.
def my_cast(t):
val = tf.keras.backend.get_value(t)
return int(val, 16)
shape = tf.shape(tensor)
elems = tf.reshape(tensor, [-1])
res = tf.map_fn(fn=lambda t: my_cast(t), elems=elems, fn_output_signature=tf.int32)
res = tf.reshape(res, shape)
print(res)
The output is the tensor:
tf.Tensor(
[[[63214 49133 20202 238 30638 8126 6766 23275 27150 8495]
[65518 65517 65261 65006 64174 65470 64142 64235 64014 57647]]],
shape=(1, 2, 10),
dtype=int32
)
Adding fn_output_signature=tf.int32 to tf.map_fn is important because it lets you obtain a tensor with a different type with respect to the input tensor.

Related

Implement ConvND in Tensorflow

So I need a ND convolutional layer that also supports complex numbers. So I decided to code it myself.
I tested this code on numpy alone and it worked. Tested with several channels, 2D and 1D and complex. However, I have problems when I do it on TF.
This is my code so far:
def call(self, inputs):
with tf.name_scope("ComplexConvolution_" + str(self.layer_number)) as scope:
inputs = self._verify_inputs(inputs) # Check inputs are of expected shape and format
inputs = self.apply_padding(inputs) # Add zeros if needed
output_np = np.zeros( # I use np because tf does not support the assigment
(inputs.shape[0],) + # Per each image
self.output_size, # Image out size
dtype=self.input_dtype # To support complex numbers
)
img_index = 0
for image in inputs:
for filter_index in range(self.filters):
for i in range(int(np.prod(self.output_size[:-1]))): # for each element in the output
index = np.unravel_index(i, self.output_size[:-1])
start_index = tuple([a * b for a, b in zip(index, self.stride_shape)])
end_index = tuple([a+b for a, b in zip(start_index, self.kernel_shape)])
# set_trace()
sector_slice = tuple(
[slice(start_index[ind], end_index[ind]) for ind in range(len(start_index))]
)
sector = image[sector_slice]
new_value = tf.reduce_sum(sector * self.kernels[filter_index]) + self.bias[filter_index]
# I use Tied Bias https://datascience.stackexchange.com/a/37748/75968
output_np[img_index][index][filter_index] = new_value # The complicated line
img_index += 1
output = apply_activation(self.activation, output_np)
return output
input_size is a tuple of shape (dim1, dim2, ..., dim3, channels). An 2D rgb conv for example will be (32, 32, 3) and inputs will have shape (None, 32, 32, 3).
The output size is calculated from an equation I found in this paper: A guide to convolution arithmetic for deep learning
out_list = []
for i in range(len(self.input_size) - 1): # -1 because the number of input channels is irrelevant
out_list.append(int(np.floor((self.input_size[i] + 2 * self.padding_shape[i] - self.kernel_shape[i]) / self.stride_shape[i]) + 1))
out_list.append(self.filters)
Basically, I use np.zeros because if I use tf.zeros I cannot assign the new_value and I get:
TypeError: 'Tensor' object does not support item assignment
However, in this current state I am getting:
NotImplementedError: Cannot convert a symbolic Tensor (placeholder_1:0) to a numpy array.
On that same assignment. I don't see an easy fix, I think I should change the strategy of the code completely.
In the end, I did it in a very inefficient way based in this comment, also commented here but at least it works:
new_value = tf.reduce_sum(sector * self.kernels[filter_index]) + self.bias[filter_index]
indices = (img_index,) + index + (filter_index,)
mask = tf.Variable(tf.fill(output_np.shape, 1))
mask = mask[indices].assign(0)
mask = tf.cast(mask, dtype=self.input_dtype)
output_np = array * mask + (1 - mask) * new_value
I say inefficient because I create a whole new array for each assignment. My code is taking ages to compute for the moment so I will keep looking for improvements and post here if I get something better.

Loop over a tensor and apply function to each element

I want to loop over a tensor which contains a list of Int, and apply a function to each of the elements.
In the function every element will get the value from a dict of python.
I have tried the easy way with tf.map_fn, which will work on add function, such as the following code:
import tensorflow as tf
def trans_1(x):
return x+10
a = tf.constant([1, 2, 3])
b = tf.map_fn(trans_1, a)
with tf.Session() as sess:
res = sess.run(b)
print(str(res))
# output: [11 12 13]
But the following code throw the KeyError: tf.Tensor'map_8/while/TensorArrayReadV3:0' shape=() dtype=int32 exception:
import tensorflow as tf
kv_dict = {1:11, 2:12, 3:13}
def trans_2(x):
return kv_dict[x]
a = tf.constant([1, 2, 3])
b = tf.map_fn(trans_2, a)
with tf.Session() as sess:
res = sess.run(b)
print(str(res))
My tensorflow version is 1.13.1. Thanks ahead.
There is a simple way to achieve, what you are trying.
The problem is that the function passed to map_fn must have tensors as its parameters and tensor as the return value. However, your function trans_2 takes plain python int as parameter and returns another python int. That's why your code doesn't work.
However, TensorFlow provides a simple way to wrap ordinary python functions, which is tf.py_func, you can use it in your case as follows:
import tensorflow as tf
kv_dict = {1:11, 2:12, 3:13}
def trans_2(x):
return kv_dict[x]
def wrapper(x):
return tf.cast(tf.py_func(trans_2, [x], tf.int64), tf.int32)
a = tf.constant([1, 2, 3])
b = tf.map_fn(wrapper, a)
with tf.Session() as sess:
res = sess.run(b)
print(str(res))
you can see I have added a wrapper function, which expects tensor parameter and returns a tensor, that's why it can be used in map_fn. The cast is used because python by default uses 64-bit integers, whereas TensorFlow uses 32-bit integers.
You cannot use a function like that, because the parameter x is a TensorFlow tensor, not a Python value. So, in order for that to work, you would have to turn your dictionary into a tensor as well, but it's not so simple because keys in the dictionary may not be sequential.
You can instead solve this problem without mapping, but instead doing something similar to what is proposed here for NumPy. In TensorFlow, you could implement it like this:
import tensorflow as tf
def replace_by_dict(x, d):
# Get keys and values from dictionary
keys, values = zip(*d.items())
keys = tf.constant(keys, x.dtype)
values = tf.constant(values, x.dtype)
# Make a sequence for the range of values in the input
v_min = tf.reduce_min(x)
v_max = tf.reduce_max(x)
r = tf.range(v_min, v_max + 1)
r_shape = tf.shape(r)
# Mask replacements that are out of the input range
mask = (keys >= v_min) & (keys <= v_max)
keys = tf.boolean_mask(keys, mask)
values = tf.boolean_mask(values, mask)
# Replace values in the sequence with the corresponding replacements
scatter_idx = tf.expand_dims(keys, 1) - v_min
replace_mask = tf.scatter_nd(
scatter_idx, tf.ones_like(values, dtype=tf.bool), r_shape)
replace_values = tf.scatter_nd(scatter_idx, values, r_shape)
replacer = tf.where(replace_mask, replace_values, r)
# Gather the replacement value or the same value if it was not modified
return tf.gather(replacer, x - v_min)
# Test
kv_dict = {1: 11, 2: 12, 3: 13}
with tf.Graph().as_default(), tf.Session() as sess:
a = tf.constant([1, 2, 3])
print(sess.run(replace_by_dict(a, kv_dict)))
# [11, 12, 13]
This will allow you to have values in the input tensor without replacements (left as they are), and also does not require to have all the replacement values in the tensor. It should be efficient unless the minimum and maximum values in your input are very far away.

What is the output of tf.split?

So assuming I have this:
TensorShape([Dimension(None), Dimension(32)])
And I use tf.split on this tensor _X with the dimension above:
_X = tf.split(_X, 128, 0)
What is the shape of this new tensor? The output is a list so its hard to know the shape of this new tensor.
tf.split() returns the list of tensor objects. You could know shape of each tensor object as follows
import tensorflow as tf
X = tf.random_uniform([256, 32]);
Y = tf.split(X,128,0)
Y_shape = tf.shape(Y[1])
sess = tf.Session()
X_v,Y_v,Y_shape_v = sess.run([X,Y,Y_shape])
# numpy style
print X_v.shape
print len(Y_v)
print Y_v[100].shape
# TF style
print len(Y)
print Y_shape_v
Output :
(256, 32)
128
(2, 32)
128
[ 2 32]
I hope this helps !
tf.split(X, row = n, column = m) is used to split the data set of the variable into n number of pieces row wise and m numbers of pieces column wise.
For example, we have data_set x of size (10,10),
then tf.split(x, 2, 0) will break the data_set of x in 2 set of size (5, 10)
but if we take tf.split(x, 2, 2),
then we will get 4 sets of data of size (5, 5).
The new version of tensorflow defines split function as follows:
tf.split(
value,
num_or_size_splits,
axis=0,
num=None,
name='split'
)
however, when I try to run it in R:
X = tf$random_uniform(minval=0,
maxval=10,shape(256, 32),name = "X");
Y = tf$split(X,num_or_size_splits = 2,axis = 0)
it reports error message:
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: Rank-0 tensors are not supported as the num_or_size_splits argument to split. Argument provided: 2.0

How TensorArray and while_loop work together in tensorflow?

I am trying to produce a very easy example for combination of TensorArray and while_loop:
# 1000 sequence in the length of 100
matrix = tf.placeholder(tf.int32, shape=(100, 1000), name="input_matrix")
matrix_rows = tf.shape(matrix)[0]
ta = tf.TensorArray(tf.float32, size=matrix_rows)
ta = ta.unstack(matrix)
init_state = (0, ta)
condition = lambda i, _: i < n
body = lambda i, ta: (i + 1, ta.write(i,ta.read(i)*2))
# run the graph
with tf.Session() as sess:
(n, ta_final) = sess.run(tf.while_loop(condition, body, init_state),feed_dict={matrix: tf.ones(tf.float32, shape=(100,1000))})
print (ta_final.stack())
But I am getting the following error:
ValueError: Tensor("while/LoopCond:0", shape=(), dtype=bool) must be from the same graph as Tensor("Merge:0", shape=(), dtype=float32).
Anyone has on idea what is the problem?
There are several things in your code to point out. First, you don't need to unstack the matrix into the TensorArray to use it inside the loop, you can safely reference the matrix Tensor inside the body and index it using matrix[i] notation. Another issue is the different data type between your matrix (tf.int32) and the TensorArray (tf.float32), based on your code you're multiplying the matrix ints by 2 and writing the result into the array so it should be int32 as well. Finally, when you wish to read the final result of the loop, the correct operation is TensorArray.stack() which is what you need to run in your session.run call.
Here's a working example:
import numpy as np
import tensorflow as tf
# 1000 sequence in the length of 100
matrix = tf.placeholder(tf.int32, shape=(100, 1000), name="input_matrix")
matrix_rows = tf.shape(matrix)[0]
ta = tf.TensorArray(dtype=tf.int32, size=matrix_rows)
init_state = (0, ta)
condition = lambda i, _: i < matrix_rows
body = lambda i, ta: (i + 1, ta.write(i, matrix[i] * 2))
n, ta_final = tf.while_loop(condition, body, init_state)
# get the final result
ta_final_result = ta_final.stack()
# run the graph
with tf.Session() as sess:
# print the output of ta_final_result
print sess.run(ta_final_result, feed_dict={matrix: np.ones(shape=(100,1000), dtype=np.int32)})

Difference in matrix multiplication tensorflow vs numpy

I have a case where matrix multiplication of two matrices with certain dimensions work in numpy, but doesn't work in tensorflow.
x = np.ndarray(shape=(10,20,30), dtype = float)
y = np.ndarray(shape=(30,40), dtype = float)
z = np.matmul(x,y)
print("np shapes: %s x %s = %s" % (np.shape(x), np.shape(y), np.shape(z)))
This works as expected and prints:
np shapes: (10, 20, 30) x (30, 40) = (10, 20, 40)
However in tensorflow when I try to multiply placeholder and variable of the same shapes as the numpy arrays above I get an error
x = tf.placeholder(tf.float32, shape=(10,20,30))
y = tf.Variable(tf.truncated_normal([30,40], name='w'))
print("tf shapes: %s x %s" % (x.get_shape(), y.get_shape()))
tf.matmul(x,y)
Results in
tf shapes: (10, 20, 30) x (30, 40)
InvalidArgumentError:
Shape must be rank 2 but is rank 3 for 'MatMul_12'
(op: 'MatMul') with input shapes: [10,20,30], [30,40].
Why does this operation fail?
Don't know why tf.matmul does not support this kind of multiplication (may be one of the core developers could provide a meaningful answer).
But if you just want to be able to multiply tensors in this way, take a look at tf.einsum function. It could operate with tensors of arbitrary rank.
As suggested by Dmytro tf.einsum can be used to multiply these two arrays.
x = np.ndarray(shape=(10,20,30), dtype = float)
y = np.ndarray(shape=(30,40), dtype = float)
These two operations produce exactly the same result:
np.einsum('ijk,kl->ijl', x, y)
np.matmul(x,y)
And corresponding tensorflow operation also works
tf.einsum('ijk,kl->ijl', tf_x,tf_y)
People already told you that you can use tf.einsum() to get the result you want.
import tensorflow as tf
x = tf.random_normal([10, 20, 30])
y = tf.random_normal([30, 40])
z = tf.einsum('ijk,kl->ijl', x, y)
The reason why tf.matmul() does not work the way you expected is written in the documentation.
The inputs must be matrices (or tensors of rank > 2, representing
batches of matrices), with matching inner dimensions, possibly after
transposition.
In your case you have a matrix y and a tensor x (rank 3 > 2). In your case inner dimensions do not match. If you want, them to match, you will need to have something like this:
import tensorflow as tf
a, b, c = 12, 50, 20
x = tf.random_normal([a, b, c])
y = tf.random_normal([a, c, b])
z = tf.matmul(x, y)
But clearly it calculates not the stuff you want.

Categories