Semantic Segmentation to Bounding Boxes

Semantic Segmentation to Bounding Boxes - python

Suppose you are performing semantic segmentation. For simplicity, let's assume this is 1D segmentation rather than 2D (i.e. we only care about finding objects with width).
So the desired output of our model might be something like:
[
[0, 0, 0, 0, 1, 1, 1], # label channel 1
[1, 1, 1, 0, 0, 1, 1], # label channel 2
[0, 0, 0, 1, 1, 1, 0], # label channel 3
#...
]
However, our trained imperfect model might be more like
[
[0.1, 0.1, 0.1, 0.4, 0.91, 0.81, 0.84], # label channel 1
[0.81, 0.79, 0.85, 0.1, 0.2, 0.61, 0.91], # label channel 2
[0.3, 0.1, 0.24, 0.87, 0.62, 1, 0 ], # label channel 3
#...
]
What would be a performant way, using python, for getting the boundaries of the labels (or bounding box)
e.g. (zero-indexed)
[
[[4, 6]], # "objects" of label 1
[[0, 2], [5, 6]] # "objects" of label 2
[[3, 5]], # "objects" of label 3
]
if it helps, perhaps transforming it to a binary mask would be of more use?
def binarize(arr, cutoff=0.5):
return (arr > cutoff).astype(int)
with a binary mask we just need to find the consecutive integers of the indices of nonzero values:
def consecutive(data, stepsize=1):
return np.split(data, np.where(np.diff(data) != stepsize)[0]+1)
find "runs" of labels:
def binary_boundaries(labels, cutoff=0.5):
return [consecutive(channel.nonzero()[0]) for channel in binarize(labels, cutoff)]
name objects according to channel name:
def binary_objects(labels, cutoff=0.5, channel_names=None):
if channel_names == None:
channel_names = ['channel {}'.format(i) for i in range(labels.shape[0])]
return dict(zip(channel_names, binary_boundaries(labels, cutoff)))

Your trained model returned the float image and not the int image you were looking for (and it's not 'imperfect' if decimals were bothering you) and Yes! you do need to threshold it to get binary image.
Once you do have the binary image, lets do some work with skimage.
label_mask = measure.label(mask)
props = measure.regionprops(label_mask)
mask is your binary image and here you do have props the properties of all the regions which are detected objects actually.
Among these properties, there exists bounding box!

Related

Why does my neural network have extremely low weights after a few epochs?

I just started to learn about neural network and this is my first one. The problem is that the more data I have, the lower the weight become after 2-3 epochs which is unusual and this cause my NN to learn nothing.
To repodruce
In DataSet class, search for function CreateData and change nbofexample to something like 20, you'll see if you print the weights that they are in a normal range (evenly spaced between -1 and 1) but then if you set the nbofexample to something like 200, then after only 2 or 3 epochs, most of the weigths of the last layer will be extremely close from 0 and they will stay in that zone for the rest of the training. Obviously, this cause the NN to fail.
By the way, my NN is basically analyzing arrays of number between 0 and 9 divided by 10 as a normalization to check if the array is sorted. In the code below I put a lot of comments the code can be easily understand.
There's is probably an easy fix but I just don't get it :(
Here is the complete code if you want try it: (it's in python btw)
import numpy as np
import time
import random
import time
#This class is only used for creating the data if needed
class DataSet():
#check if sorted
def checkPossibility(A):
return sorted(A) == A
#will be used later for more complex problems (taken from the faster answer of a coding challenge on LeetCode)
#def checkPossibility(A):
# p = None
# for i in range(len(A) - 1):
# if A[i] > A[i+1]:
# if p is not None:
# return False
# p = i
# return (p is None or p == 0 or p == len(A)-2 or
# A[p-1] <= A[p+1] or A[p] <= A[p+2])
#returns inputs and outputs using my poorly written algorithm
def CreateData():
#settings
nbofchar=4
nbofexample=200
#initialize arrays
inputs = [0]*nbofchar;
output = [1]
#handling dumbness
if nbofexample>pow(10,nbofchar):
print("Too much data... resizing to max data")
nbofexample=pow(10,nbofchar)
elif nbofexample==0:
print("You need examples to train! (Error nbofexample==0)")
#if there is more than half of the max possible example being request, then create all possible examples and delete randomly until it's the requested size
if nbofexample>pow(10,nbofchar)/2:
#creating all possible examples
for i in range(1,pow(10,nbofchar)):
new_ex = [int(a) for a in str(i)]
while len(new_ex)<nbofchar:
new_ex=[0]+new_ex
inputs = np.vstack((inputs,np.dot(new_ex,1/10))) #normalization /10 so the value is between 0 and 1 ¯\_(ツ)_/¯
output = np.vstack((output,[int(DataSet.checkPossibility(new_ex))]))
#deleting
while len(inputs)>nbofexample:
index = random.randint(0,len(inputs)-1)
inputs = np.delete(inputs,index)
output = np.delete(output,index)
return inputs, output
#if there is less than half (or half) then, create example randomly until it's the requested size
else:
i=1
while i < nbofexample:
new_ex = [random.randint(0,9) for a in range(nbofchar)]
if sum(np.any(inputs)==new_ex)==0:
i+=1
inputs = np.vstack((inputs,np.dot(new_ex,1/10))) #normalization /10 so the value is between 0 and 1 ¯\_(ツ)_/¯
output = np.vstack((output,[int(DataSet.checkPossibility(new_ex))]))
return inputs, output
#assigning weights to each layer
class NeuLayer():
def __init__(self, nbofneuron, inputsperneuron):
self.weight = 2 * np.random.random((inputsperneuron,nbofneuron))-1
#the actual neural network
class NeuNet():
def __init__(self, layers):
self.layers = layers
def _sigmoid(self, x):
k = 1
return 1 / (1+np.exp(-x/k))
def _sigmoid_derivative(self, x):
return x * (1-x)
def train(self, training_set_inputs, training_set_outputs, nboftime):
#debug
timer1 = 0
if len(self.layers)<2: return
for iteration in range(nboftime):
delta = [0] * len(self.layers)
error = [0] * len(self.layers)
outputlayers = self.think(training_set_inputs)
#find deltas for each layer "i" (to be able to properly change weights)
for i in range(len(self.layers)-1,-1,-1):
if i==len(self.layers)-1:
error[i] = training_set_outputs - outputlayers[i]
else:
error[i] = np.dot(delta[i+1],self.layers[i+1].weight.T)
delta[i] = error[i] * self._sigmoid_derivative(outputlayers[i])
#assign weigths for each layer "i"
for i in range(len(self.layers)):
if i==0:
self.layers[0].weight += np.dot(training_set_inputs.T,delta[0])
else:
self.layers[i].weight += np.dot(outputlayers[i-1].T,delta[i])
#display progression and the test result
if Display_progression:
if timer1<time.time():
timer1=time.time()+delay
value = ((iteration+1)/nboftime)*100
test_input = np.array([.1,.2,.1,.1])
print('%.2f'%value+"% test_input = " + str(test_input) + " test_output = "+ str(self.think(test_input)[-1]))
#return output of each layer from an input
def think(self, input):
outforlayers = [None]*len(self.layers)
outforlayer = input
for i in range(len(self.layers)):
outforlayer = self._sigmoid(np.dot(outforlayer, self.layers[i].weight))
outforlayers[i] = outforlayer
return outforlayers
#datamaker
creating_data=True
train = True
if creating_data:
#creates files with inputs and their expected output
print("Start creating data...")
input, output = DataSet.CreateData();
print("Data created!")
file = open("data_input","wb")
np.save(file, input)
file.close;
file = open("data_output","wb")
np.save(file, output)
file.close;
if train:
default_data_set=False
if default_data_set:
#default training set
inp_training = np.array([[0, 0, 0, 0, 0], [0.1, 0, 0, 0, 0], [0, 0.1, 0, 0, 0], [0.1, 0.1, 0, 0, 0], [0, 0, 0.1, 0, 0], [0.1, 0, 0.1, 0, 0], [0, 0.1, 0.1, 0, 0], [0.1, 0.1, 0.1, 0, 0],
[0, 0, 0, 0.1, 0], [0.1, 0, 0, 0.1, 0], [0, 0.1, 0, 0.1, 0], [0.1, 0.1, 0, 0.1, 0], [0, 0, 0.1, 0.1, 0], [0.1, 0, 0.1, 0.1, 0], [0, 0.1, 0.1, 0.1, 0], [0.1, 0.1, 0.1, 0.1, 0],
[0, 0, 0, 0, 0.1], [0.1, 0, 0, 0, 0.1], [0, 0.1, 0, 0, 0.1], [0.1, 0.1, 0, 0, 0.1], [0, 0, 0.1, 0, 0.1], [0.1, 0, 0.1, 0, 0.1], [0, 0.1, 0.1, 0, 0.1], [0.1, 0.1, 0.1, 0, 0.1],
[0, 0, 0, 0.1, 0.1], [0.1, 0, 0, 0.1, 0.1], [0, 0.1, 0, 0.1, 0.1], [0.1, 0.1, 0, 0.1, 0.1], [0, 0, 0.1, 0.1, 0.1], [0.1, 0, 0.1, 0.1, 0.1], [0, 0.1, 0.1, 0.1, 0.1], [0.1, 0.1, 0.1, 0.1, 0.1]])
out_training = np.array([[0,0,0,0,0,0,0,1,
0,0,0,1,0,1,1,1,
0,0,0,1,0,1,1,1,
0,1,1,1,1,1,1,1]]).T
else:
print("Loading data files...")
file = open("data_input","rb")
inp_training = np.load(file)
file.close;
file = open("data_output","rb")
out_training = np.load(file)
file.close;
print("Done reading from data files!")
#debug
Display_progression = True;
delay = 1 #seconds
#initialize
np.random.seed(5)
netlayer_input = NeuLayer(10,len(inp_training[0]))
netlayer2 = NeuLayer(10,10)
netlayer3 = NeuLayer(10,10)
netlayer4 = NeuLayer(10,10)
netlayer_out = NeuLayer(len(out_training[0]),10)
All_layers = [netlayer_input,netlayer2,netlayer3,netlayer4,netlayer_out]
brain = NeuNet(All_layers)
#train
print("Start training...")
brain.train(inp_training, out_training, 100000)
print("Done!")
#final test
outputfinal = brain.think(np.array([0,.1,.3,.7]))
#output
a = outputfinal[-1] #[-1] so we get the last layer's output(s)
print(a)
Note
This is my first time asking a question on stackoverflow so tell me if I'm missing crucial information for this question.

Neural Networks can suffer from something known as the Vanishing Gradient Problem, caused by the more classical activations like Sigmoid or Tanh.
In laymen terms, basically activations like Sigmoid and Tanh really squeeze the inputs, right? For example, sigmoid(10) and sigmoid(100) are .9999 and 1 respectively. Even though the inputs have changed so much, the outputs have barely changed - the function is effectively constant at this point. And where a function is almost constant, its derivative tends to zero (or a very small value). These very small derivatives/gradients multiply with each other and become effectively zero, preventing your model from learning anything at all - your weights get stuck and stop updating.
I suggest you do some further reading on this topic at your own time. Among several solutions, one way to solve this is to use a different activation, like ReLU.

Is there a way to make multiple IndexedVertexLists refer to the same vertices while having different lists of indices

I want to have two (or more) IndexedVertexLists which refer to the same vertices while having different lists of indices. The problem I am having is that, if I were to create two (or more) IndexedVertexLists with the same vertices, it would take up twice the amount of the GPU's memory compared to what it actually needs.
What I mean:
import pyglet
vertices = [
0, 0,
0, 0.5,
0.5, 0,
0.5, 0.5
]
indices1 = [0, 1, 2]
indices2 = [0, 2, 3]
vertex_list_indexed_1 = pyglet.graphics.vertex_list_indexed(4, indices1, ('v2f', vertices))
vertex_list_indexed_2 = pyglet.graphics.vertex_list_indexed(4, indices2, ('v2f', vertices))
What I want would be something like this (this does not work, obviously):
import pyglet
vertices = [
0, 0,
0, 0.5,
0.5, 0,
0.5, 0.5
]
indices1 = [0, 1, 2]
indices2 = [0, 2, 3]
vertex_list = pyglet.graphics.vertex_list(4, ('v2f', vertices))
vertex_list_indexed_1 = pyglet.graphics.vertex_list_indexed(4, indices1, vertex_list)
vertex_list_indexed_2 = pyglet.graphics.vertex_list_indexed(4, indices2, vertex_list)
I couldn't find anything in the pyglet documentation that would solve my problem.

Superimpose objects on a video stream using Python and POVRAY

I am using Vapory which is a wrapper Python library for Povray. It allows using Python functions to manipulate typical Povray operations.
I want to superimpose 3D models in every frame of my video stream. The way to do this in Vapory is the following:
from vapory import *
from moviepy.video.io.ffmpeg_writer import ffmpeg_write_image
light = LightSource([10, 15, -20], [1.3, 1.3, 1.3])
wall = Plane([0, 0, 1], 20, Texture(Pigment('color', [1, 1, 1])))
ground = Plane( [0, 1, 0], 0,
Texture( Pigment( 'color', [1, 1, 1]),
Finish( 'phong', 0.1,
'reflection',0.4,
'metallic', 0.3)))
sphere1 = Sphere([-4, 2, 2], 2.0, Pigment('color', [0, 0, 1]),
Finish('phong', 0.8,
'reflection', 0.5))
sphere2 =Sphere([4, 1, 0], 1.0, Texture('T_Ruby_Glass'),
Interior('ior',2))
scene = Scene( Camera("location", [0, 5, -10], "look_at", [1, 3, 0]),
objects = [ ground, wall, sphere1, sphere2, light],
included=["glass.inc"] )
def embed_in_scene(image):
ffmpeg_write_image("__temp__.png", image)
image_ratio = 1.0*image.shape[1]/image.shape[0]
screen = Box([0, 0, 0], [1, 1, 0], Texture(
Pigment( ImageMap('png', '"__temp__.png"', 'once')),
Finish('ambient', 1.2) ),
'scale', [10, 10/image_ratio,1],
'rotate', [0, 20, 0],
'translate', [-3, 1, 3])
new_scene = scene.add_objects([screen])
return new_scene.render(width=800, height=480, antialiasing=0.001)
clip = (VideoFileClip("bunny.mp4") # File containing the original video
.subclip(23, 47) # cut between t=23 and 47 seconds
.fl_image(embed_in_scene) # <= The magic happens
.fadein(1).fadeout(1)
.audio_fadein(1).audio_fadeout(1))
clip.write_videofile("bunny2.mp4",bitrate='8000k')
which results with a video stream as follows:
What I want, however, is that movie box being the whole scene, and spheres to remain where they are. The first thought was to remove the rotation function from the code and it did work, however I still cannot stretch the movie frame to the end corners of the actual scene.
Any thoughts?
EDIT: So I was able to move the camera, get the object to the center. However I still could not get the movie full screen. This is because the camera object is told to look towards the coordinates, and I don't know what coordinates the camera should be directed at, in order to get the picture in full screen. See:

How can I select a row from a SparseTensor in TensorFlow?

Say, if I have two SparseTensors as following:
[[1, 0, 0, 0],
[2, 0, 0, 0],
[1, 2, 0, 0]]
and
[[1.0, 0, 0, 0],
[1.0, 0, 0, 0],
[0.3, 0.7, 0, 0]]
and I want to extract the first two rows out of them. I need both indices and values of non-zeros entries as SparseTensors so that I can pass the result to tf.nn.embedding_lookup_sparse. How can I do this?
My application is:
I want to use word embeddings, which is quite straight forward in TensorFlow. But now I want to use sparse embeddings, i.e.: for common words, they have their own embeddings. For rare words, their embeddings are a sparse linear combination of embeddings of common words.
So I need two cookbooks to indicate how sparse embeddings are composed. In the aforementioned example, the cookbook says: For the first word, it's embedding consists of its own embedding with weight 1.0. Things are similar for the second word. For the last word, it says: the embedding of this word is a linear combination of the embeddings of the first two words, and the corresponding weights are 0.3 and 0.7 respectively.
I need to extract a row, then feed the indices and weights to tf.nn.embedding_lookup_sparse to obtain the final embeddings. How can I do that in TensorFlow?
Or I need to work around it, i.e.: preprocess my data and deal with the cookbook out of TensorFlow?

I checked in with one of the engineers here who knows more about this area, and here's what he passed on:
I am not sure if we have an efficient implementation of the this, but here is a not-so-optimal implementation using dynamic_partition and gather ops.
def sparse_slice(indices, values, needed_row_ids):
num_rows = tf.shape(indices)[0]
partitions = tf.cast(tf.equal(indices[:,0], needed_row_ids), tf.int32)
rows_to_gather = tf.dynamic_partition(tf.range(num_rows), partitions, 2)[1]
slice_indices = tf.gather(indices, rows_to_gather)
slice_values = tf.gather(values, rows_to_gather)
return slice_indices, slice_values
with tf.Session().as_default():
indices = tf.constant([[0,0], [1, 0], [2, 0], [2, 1]])
values = tf.constant([1.0, 1.0, 0.3, 0.7], dtype=tf.float32)
needed_row_ids = tf.constant([1])
slice_indices, slice_values = sparse_slice(indices, values, needed_row_ids)
print(slice_indices.eval(), slice_values.eval())
Update:
The engineer sent on an example to help with multiple rows too, thanks for pointing that out!
def sparse_slice(indices, values, needed_row_ids):
needed_row_ids = tf.reshape(needed_row_ids, [1, -1])
num_rows = tf.shape(indices)[0]
partitions = tf.cast(tf.reduce_any(tf.equal(tf.reshape(indices[:,0], [-1, 1]), needed_row_ids), 1), tf.int32)
rows_to_gather = tf.dynamic_partition(tf.range(num_rows), partitions, 2)[1]
slice_indices = tf.gather(indices, rows_to_gather)
slice_values = tf.gather(values, rows_to_gather)
return slice_indices, slice_values
with tf.Session().as_default():
indices = tf.constant([[0,0], [1, 0], [2, 0], [2, 1]])
values = tf.constant([1.0, 1.0, 0.3, 0.7], dtype=tf.float32)
needed_row_ids = tf.constant([0, 2])

Let sp be the name of your 2d SparseTensor. You can first create an indicator tensor for the rows of your SparseTensor that you want to extract, namely
mask = tf.concat([tf.constant([True, True]), tf.fill([sp.dense_shape[0] - 2],
False)], axis=0)
Next use tf.gather to propagate this to the sparse indices:
mask_sp = tf.gather(mask, sp.indices[:, 0])
Finally,
values = tf.boolean_mask(sp.values, mask_sp)
indices = tf.boolean_mask(sp.indices, mask_sp)
dense_shape = [sp.dense_shape[0] - 2, sp.dense_shape[1]]
output_sp = tf.SparseTensor(indices=indices, values=values, dense_shape=dense_shape)

Shouldn't it behave more like this:
This version will keep the order and frequency of the indices in selected_indices and, therefore, makes it possible to e.g. select the same row multiple times:
import tensorflow as tf
tf.enable_eager_execution()
def sparse_gather(indices, values, selected_indices, axis=0):
"""
indices: [[idx_ax0, idx_ax1, idx_ax2, ..., idx_axk], ... []]
values: [ value1, , ..., valuen]
"""
mask = tf.equal(indices[:, axis][tf.newaxis, :], selected_indices[:, tf.newaxis])
to_select = tf.where(mask)[:, 1]
return tf.gather(indices, to_select, axis=0), tf.gather(values, to_select, axis=0)
indices = tf.constant([[1, 0], [2, 0], [3, 0], [7, 0]])
values = tf.constant([1.0, 2.0, 3.0, 7.0], dtype=tf.float32)
needed_row_ids = tf.constant([7, 3, 2, 2, 3, 7])
slice_indices, slice_values = sparse_gather(indices, values, needed_row_ids)
print(slice_indices, slice_values)

I tried the answer by "Pete Warden" which only worked for small data. Given sparsetensor A with m nonzero elements, we would like to take out n rows. The tf.equal would take m*n space, which is not acceptable in my task.
My suggestion is to use Scipy.sparse instead of tensorflow.
In details:
take out all data from tf, indices & data, and form a Scipy.sparse. use coo
If u need to take out rows, use csr formate. if u need to take out cols, use csc
A[:,m]
transform to coo
transform to tf

Graph Tool's edge_gradient property

I would like to use the edge_gradient property on Graph Tool's gt.graph_draw() in order to better visualize the direction of connections in plots which are too crowded for markers such as arrows.
From the description in the docs, it seems this is what this property should do. Currently, however, it only lets me set the edges to a solid color.
I am using the property like so:
egradient = g.new_edge_property('vector<double>')
g.edge_properties['egradient'] = egradient
e = g.add_edge(v1, v2)
egradient[e] = (0.9, 0.329,0.282,0.478,1)
...
gt.graph_draw(g, ... edge_gradient=g.edge_properties["egradient"])
The appearance remains unchanged if I modify the first value in (0.9, 0.329,0.282,0.478,1) - and if I try to pass it a list of tuples I get this from the graph tool internals:
TypeError: float() argument must be a string or a number
How can I achieve what I am looking for in graph tool? If I can't, then what else is the first value in the edge gradient 5-tuple actually good for?

edge_gradient actually expects a list of integers, not a list of tuples. I made the same mistake at first.
Example: If you want to go from white to black, your `edge_gradient parameter should look like this:
# o r g b a o r g b a
edge_gradient=[0, 1, 1, 1, 1, 1, 0, 0, 0, 1]
That's what the docs mean by, "Each group of 5 elements is interpreted as [o, r, g, b, a] where o is the offset in the range [0, 1] and the remaining values specify the colors.
It gets a little tough to read, so I separate my stop points and format them like this:
# offset r g b a
edge_gradient=[0, 1, 1, 1, 1, \
0.5, 0, 0, 0, 1, \
1, 1, 0, 0, 1]
Which fades from white to black to red. ...In theory, at least. I have had trouble getting edge_gradient to work with more than two gradient stop points. I always end up with some edges coloured like the list I pass to the edge_gradient property, and the rest with strange behaviour, like having the final colour in the middle.

# Set the gradients [must be same shape, not ragged array ex: (1, 15)]
num_edges = 2
grad_length = 15
## 3 Stops: red to grey to blue
egrad_1 = np.asarray([ 0, 1, 0, 0, 1,
0.5, 0.8, 0.8, 0.8, 1,
1, 0, 0, 1, 1])
## 3 Stops: grey to grey to grey
egrad_2 = np.asarray([ 0, 0.8, 0.8, 0.8, 1,
0.5, 0.8, 0.8, 0.8, 1,
1, 0.8, 0.8, 0.8, 1])
# Place into array of shape (num_edges, grad_length)
gradient_list = np.asarray([egrad_1, egrad_2])
# Create graph and add vertices and edges
g1 = gt.Graph(directed=False)
g1.ep.edge_gradient = g1.new_edge_property("vector<double>")
g1v1 = g1.add_vertex()
g1v2 = g1.add_vertex()
e1 = g1.add_edge(g1v1, g1v2)
e2 = g1.add_edge(g1v1, g1v1)
# Set property map
g1.ep.edge_gradient.set_2d_array(np.transpose(gradient_list))
# Draw the graph
gt.graph_draw(g1, edge_gradient=g1.ep.edge_gradient)
Graph Result

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Semantic Segmentation to Bounding Boxes - python

Related

Why does my neural network have extremely low weights after a few epochs?

Is there a way to make multiple IndexedVertexLists refer to the same vertices while having different lists of indices

Superimpose objects on a video stream using Python and POVRAY

How can I select a row from a SparseTensor in TensorFlow?

Graph Tool's edge_gradient property

Categories

Resources