Creating masks for duplicate elements in Tensorflow/Keras - python

I am trying to write a custom loss function for a person-reidentification task which is trained in a multi-task learning setting along with object detection. The filtered label values are of the shape (batch_size, num_boxes). I would like to create a mask such that only the values which repeat in dim 1 are considered for further calculations. How do I do this in TF/Keras-backend?
Short Example:
Input labels = [[0,0,0,0,12,12,3,3,4], [0,0,10,10,10,12,3,3,4]]
Required output: [[0,0,0,0,1,1,1,1,0],[0,0,1,1,1,0,1,1,0]]
(Basically I want to filter out only duplicates and discard unique identities for the loss function).
I guess a combination of tf.unique and tf.scatter could be used but I do not know how.

This code works:
x = tf.constant([[0,0,0,0,12,12,3,3,4], [0,0,10,10,10,12,3,3,4]])
def mark_duplicates_1D(x):
y, idx, count = tf.unique_with_counts(x)
comp = tf.math.greater(count, 1)
comp = tf.cast(comp, tf.int32)
res = tf.gather(comp, idx)
mult = tf.math.not_equal(x, 0)
mult = tf.cast(mult, tf.int32)
res *= mult
return res
res = tf.map_fn(fn=mark_duplicates_1D, elems=x)

Related

Python + Image Processing: Efficiently Assign Pixel Values to Nearest Predefined Value

I implemented an algorithm that uses opencv kmeans to quantize the unique brightness values present in a greyscale image. Quantizing the unique values helped avoid biases towards image backgrounds which are typically all the same value.
However, I struggled to find a way to utilize this data to quantize a given input image.
I implemented a very naive solution, but it is unusably slow for the required input sizes (4000x4000):
for x in range(W):
for y in range(H):
center_id = np.argmin([(arr[y,x]-center)**2 for center in centers])
ret_labels2D[y,x] = sortorder.index(center_id)
ret_qimg[y,x] = centers[center_id]
Basically, I am simply adjusting each pixel to the predefined level with the minimum squared error.
Is there any way to do this faster? I was trying to process an image of size 4000x4000 and this implementation was completely unusable.
Full code:
def unique_quantize(arr, K, eps = 0.05, max_iter = 100, max_tries = 20):
"""#param arr: 2D numpy array of floats"""
H, W = arr.shape
unique_values = np.squeeze(np.unique(arr.copy()))
unique_values = np.array(unique_values, float)
if unique_values.ndim == 0:
unique_values = np.array([unique_values],float)
unique_values = np.ravel(unique_values)
unique_values = np.expand_dims(unique_values,1)
Z = unique_values.astype(np.float32)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER,max_iter,eps)
compactness, labels, centers = cv2.kmeans(Z,K,None,criteria,max_tries,cv2.KMEANS_RANDOM_CENTERS)
labels = np.ravel(np.squeeze(labels))
centers = np.ravel(np.squeeze(centers))
sortorder = list(np.argsort(centers)) # old index --> index to sortorder
ret_center = centers[sortorder]
ret_labels2D = np.zeros((H,W),int)
ret_qimg = np.zeros((H,W),float)
for x in range(W):
for y in range(H):
center_id = np.argmin([(arr[y,x]-center)**2 for center in centers])
ret_labels2D[y,x] = sortorder.index(center_id)
ret_qimg[y,x] = centers[center_id]
return ret_center, ret_labels2D, ret_qimg
EDIT: I looked at the input file again. The size was actually 12000x12000.
As your image is grayscale (presumably 8 bits), a lookup-table will be an efficient solution. It suffices to map all 256 gray-levels to the nearest center once for all, then use this as a conversion table. Even a 16 bits range (65536 entries) would be significantly accelerated.
I recently thought of a much better answer. This code is not extensively tested, but it worked for the use case in my project.
I made use of obscure fancy-indexing techniques in order to keep the entire algorithm contained within numpy functions.
def unique_quantize(arr, K, eps = 0.05, max_iter = 100, max_tries = 20):
"""#param arr: 2D numpy array of floats"""
H, W = arr.shape
unique_values = np.squeeze(np.unique(arr.copy()))
unique_values = np.array(unique_values, float)
if unique_values.ndim == 0:
unique_values = np.array([unique_values],float)
unique_values = np.ravel(unique_values)
unique_values = np.expand_dims(unique_values,1)
Z = unique_values.astype(np.float32)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER,max_iter,eps)
compactness, labels, centers = cv2.kmeans(Z,K,None,criteria,max_tries,cv2.KMEANS_RANDOM_CENTERS)
labels = np.ravel(np.squeeze(labels))
centers = np.ravel(np.squeeze(centers))
sortorder = np.argsort(centers) # old index --> index to sortorder
inverse_sortorder = np.array([list(sortorder).index(i) for i in range(len(centers))],int)
ret_center = centers[sortorder]
ret_labels2D = np.zeros((H,W),int)
ret_qimg = np.zeros((H,W),float)
errors = [np.power((arr-center),2) for center in centers]
errors = np.array(errors,float)
classification = np.squeeze(np.argmin(errors,axis=0))
ret_labels2D = inverse_sortorder[classification]
ret_qimg = centers[classification]
return np.array(ret_center,float), np.array(ret_labels2D,int), np.array(ret_qimg,float)

Implement ConvND in Tensorflow

So I need a ND convolutional layer that also supports complex numbers. So I decided to code it myself.
I tested this code on numpy alone and it worked. Tested with several channels, 2D and 1D and complex. However, I have problems when I do it on TF.
This is my code so far:
def call(self, inputs):
with tf.name_scope("ComplexConvolution_" + str(self.layer_number)) as scope:
inputs = self._verify_inputs(inputs) # Check inputs are of expected shape and format
inputs = self.apply_padding(inputs) # Add zeros if needed
output_np = np.zeros( # I use np because tf does not support the assigment
(inputs.shape[0],) + # Per each image
self.output_size, # Image out size
dtype=self.input_dtype # To support complex numbers
)
img_index = 0
for image in inputs:
for filter_index in range(self.filters):
for i in range(int(np.prod(self.output_size[:-1]))): # for each element in the output
index = np.unravel_index(i, self.output_size[:-1])
start_index = tuple([a * b for a, b in zip(index, self.stride_shape)])
end_index = tuple([a+b for a, b in zip(start_index, self.kernel_shape)])
# set_trace()
sector_slice = tuple(
[slice(start_index[ind], end_index[ind]) for ind in range(len(start_index))]
)
sector = image[sector_slice]
new_value = tf.reduce_sum(sector * self.kernels[filter_index]) + self.bias[filter_index]
# I use Tied Bias https://datascience.stackexchange.com/a/37748/75968
output_np[img_index][index][filter_index] = new_value # The complicated line
img_index += 1
output = apply_activation(self.activation, output_np)
return output
input_size is a tuple of shape (dim1, dim2, ..., dim3, channels). An 2D rgb conv for example will be (32, 32, 3) and inputs will have shape (None, 32, 32, 3).
The output size is calculated from an equation I found in this paper: A guide to convolution arithmetic for deep learning
out_list = []
for i in range(len(self.input_size) - 1): # -1 because the number of input channels is irrelevant
out_list.append(int(np.floor((self.input_size[i] + 2 * self.padding_shape[i] - self.kernel_shape[i]) / self.stride_shape[i]) + 1))
out_list.append(self.filters)
Basically, I use np.zeros because if I use tf.zeros I cannot assign the new_value and I get:
TypeError: 'Tensor' object does not support item assignment
However, in this current state I am getting:
NotImplementedError: Cannot convert a symbolic Tensor (placeholder_1:0) to a numpy array.
On that same assignment. I don't see an easy fix, I think I should change the strategy of the code completely.
In the end, I did it in a very inefficient way based in this comment, also commented here but at least it works:
new_value = tf.reduce_sum(sector * self.kernels[filter_index]) + self.bias[filter_index]
indices = (img_index,) + index + (filter_index,)
mask = tf.Variable(tf.fill(output_np.shape, 1))
mask = mask[indices].assign(0)
mask = tf.cast(mask, dtype=self.input_dtype)
output_np = array * mask + (1 - mask) * new_value
I say inefficient because I create a whole new array for each assignment. My code is taking ages to compute for the moment so I will keep looking for improvements and post here if I get something better.

Custom conv2d operation Pytorch

I have tried a custom Conv2d function which has to work similar to nn.Conv2d but the multiplication and addition used inside nn.Conv2d are replaced with mymult(num1,num2) and myadd(num1,num2).
As per insight from very helpful forums 1,2 what i can do is try unfolding it and then do matrix multiplication. That # part given in the code below can be done using loops with mymult() and myadd() as i believe this # is doing matmul.
def convcheck():
torch.manual_seed(123)
batch_size = 2
channels = 2
h, w = 2, 2
image = torch.randn(batch_size, channels, h, w) # input image
out_channels = 3
kh, kw = 1, 1# kernel size
dh, dw = 1, 1 # stride
size = int((h-kh+2*0)/dh+1) #include padding in place of zero
conv = nn.Conv2d(in_channels=channels, out_channels=out_channels, kernel_size=kw, padding=0,stride=dh ,bias=False)
out = conv (image)
#print('out', out)
#print('out.size()', out.size())
#print('')
filt = conv.weight.data
imageunfold = F.unfold(image,kernel_size=kh,padding=0,stride=dh)
print("Unfolded image","\n",imageunfold,"\n",imageunfold.shape)
kernels_flat = filt.view(out_channels,-1)
print("Kernel Flat=","\n",kernels_flat,"\n",kernels_flat.shape)
res = kernels_flat # imageunfold # I have to replace this operation with mymult() and myadd()
print(res,"\n",res.shape)
#print(res.size(2),"\n",res.shape)
res = res.view(-1, out_channels, size, size)
#print("Same answer as buitlin function",res)
res = kernels_flat # imageunfold can be replaced with this. although there can be some other efficient implementation which i am looking to get help for.
for m_batch in range(len(imageunfold)):
#iterate through rows of X
# iterate through columns of Y
for j in range(imageunfold.size(2)):
# iterate through rows of Y
for k in range(imageunfold.size(1)):
#print(result[m_batch][i][j]," +=", kernels_flat[i][k], "*", imageunfold[m_batch][k][j])
result[m_batch][i][j] += kernels_flat[i][k] * imageunfold[m_batch][k][j]
Can someone please help me vectorize these three loops for faster execution.
The problem was with the dimesions as kernels_flat[dim0_1,dim1_1] and imageunfold[batch,dim0_2,dim1_2] the resultant should have [batch,dim0_1,dim1_2]
res = kernels_flat # imageunfold can be replaced with this. although there can be some other efficient implementation.
for m_batch in range(len(imageunfold)):
#iterate through rows of X
# iterate through columns of Y
for j in range(imageunfold.size(2)):
# iterate through rows of Y
for k in range(imageunfold.size(1)):
#print(result[m_batch][i][j]," +=", kernels_flat[i][k], "*", imageunfold[m_batch][k][j])
result[m_batch][i][j] += kernels_flat[i][k] * imageunfold[m_batch][k][j]
Your code for the matrix multiplication is missing a loop for iterating over the filters.
In the code below I fixed your implementation.
I am currently also looking for optimizations on the code. In my use case, the individual results of the multiplications (without performing addition) need to be accessible after computation. I will post here in case I find a faster solution than this.
for batch_image in range (imageunfold.shape[0]):
for i in range (kernels_flat.shape[0]):
for j in range (imageunfold.shape[2]):
for k in range (kernels_flat.shape[1]):
res_c[batch_image][i][j] += kernels_flat[i][k] * imageunfold[batch_image][k][j]

Produce a dataset of stridded slices from a tfrecords dataset

Continuing from this question and the discussion here - I am trying to use the Dataset API to take a dataset of variable length tensors and cut them into slices (segments) of equal length. Something like:
Dataset = tf.contrib.data.Dataset
segment_len = 6
batch_size = 16
with tf.Graph().as_default() as g:
# get the tfrecords dataset
dataset = tf.contrib.data.TFRecordDataset(filenames).map(
partial(record_type.parse_single_example, graph=g)).batch(batch_size)
# zip it with the number of segments we need to slice each tensor
dataset2 = Dataset.zip((dataset, Dataset.from_tensor_slices(
tf.constant(num_segments, dtype=tf.int64))))
it2 = dataset2.make_initializable_iterator()
def _dataset_generator():
with g.as_default():
while True:
try:
(im, length), count = sess.run(it2.get_next())
dataset3 = Dataset.zip((
# repeat each tensor then use map to take a stridded slice
Dataset.from_tensors((im, length)).repeat(count),
Dataset.range(count))).map(lambda x, c: (
x[0][:, c: c + segment_len],
x[0][:, c + 1: (c + 1) + segment_len],
))
it = dataset3.make_initializable_iterator()
it_init = it.initializer
try:
yield it_init
while True:
yield sess.run(it.get_next())
except tf.errors.OutOfRangeError:
continue
except tf.errors.OutOfRangeError:
return
# Dataset.from_generator need tensorflow > 1.3 !
das_dataset = Dataset.from_generator(
_dataset_generator,
(tf.float32, tf.float32),
# (tf.TensorShape([]), tf.TensorShape([]))
)
das_dataset_it = das_dataset.make_one_shot_iterator()
with tf.Session(graph=g) as sess:
while True:
print(sess.run(it2.initializer))
print(sess.run(das_dataset_it.get_next()))
Of course I do not want to pass the session in the generator but this should be workarounded by the trick given in the link (create a dummy dataset and map the iterator of the other). The code above fails with the biblical:
tensorflow.python.framework.errors_impl.InvalidArgumentError: TypeError: If shallow structure is a sequence, input must also be a sequence. Input has type: <class 'tensorflow.python.framework.ops.Operation'>.
[[Node: PyFunc = PyFunc[Tin=[DT_INT64], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_1"](arg0)]]
[[Node: IteratorGetNext = IteratorGetNext[output_shapes=[<unknown>, <unknown>], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](OneShotIterator)]]
which is I guess because I try to yield the initializer of the iterator but my question is basically if I can achieve at all what I am trying using the dataset API.
The easiest way to build a Dataset from a nested Dataset is to use the Dataset.flat_map() transformation. This transformation applies a function to each element of the input dataset (dataset2 in your example), that function returns a nested Dataset (most likely dataset3 in your example), and then the transformation flattens all the nested datasets into a single Dataset.
dataset2 = ... # As above.
def get_slices(im_and_length, count):
im, length = im_and_length
# Repeat each tensor then use map to take a strided slice.
return Dataset.zip((
Dataset.from_tensors((im, length)).repeat(count),
Dataset.range(count))).map(lambda x, c: (
x[0][:, c + segment_len: (c + 1) + segment_len],
x[0][:, c + 1 + segment_len: (c + 2) + segment_len],
))
das_dataset = dataset2.flat_map(get_slices)

Tensorflow runtime determine the shape of Tensor

Does Tensorflow support runtime determine the shape of Tensor?
The problem is to build a Constant tensor in runtime based on the input vector length_q. The number of columns of the target tensor is the sum of length_q. The code snippet is shown as follows, the length of length_q is fixed to 64.
T = tf.reduce_sum(length_q, 0)[0]
N = np.shape(length_q)[0]
wm = np.zeros((N, T), dtype=np.float32)
# Something inreletive.
count = 0
for i in xrange(N):
ones = np.ones(length_q[i])
wm[i][count:count+length_q[i]] = ones
count += length_q[i]
return tf.Constant(wm)
Update
I want to create a dynamic Tensor according to the input length_q. length_q is some input vector (64*1). The new tensor's shape I want to create depends on the sum of length_q because in each batch the data in length_q changes. The current code snippet is as follows:
def some_matrix(length_q):
T = tf.reduce_sum(length_q, 0)[0]
N = np.shape(length_q)[0]
wm = np.zeros((N, T), dtype=np.float32)
count = 0
return wm
def network_inference(length_q):
wm = tf.constant(some_matrix(length_q));
...
And the problem occurs probably because length_q is the placeholder and doesn't have summation operation. Are there some ways to solve this problem?
It sounds like the tf.fill() op is what you need. This op allows you to specify a shape as a tf.Tensor (i.e. a runtime value) along with a value:
def some_matrix(length_q):
T = tf.reduce_sum(length_q, 0)[0]
N = tf.shape(length_q)[0]
wm = tf.fill([T, N], 0.0)
return wm
Not clear about what you are calculating. If you need to calculate N shape, you can generate ones like this
T = tf.constant(20.0,tf.float32) # tf variable which is reduced sum , 20.0 is example float value
T = tf.cast(T,tf.int32) # columns will be integer only
N = 10 # if numpy integer- assuming np.shape giving 10
# N = length_q.getshape()[0] # if its a tensor, 'lenght_q' replace by your tensor name
wm = tf.ones([N,T],dtype=tf.float32) # N rows and T columns
sess = tf.Session()
sess.run(tf.initialize_all_variables())
sess.run(wm)

Categories