Using Tensorflow Estimator API with Images for SemSeg - python

I tried to implement my model with the Tensorflow Estimator API.
As a data input function I use
def input_fn():
train_in_np = sorted(io_utils.loadDataset(join(basepath, r"leftImg8bit/train/*/*")))
train_out_np = sorted(io_utils.loadDataset(join(basepath, r"gtFine/train/*/*_ignoreLabel.png")))
train_in = tf.constant(train_in_np)
train_out = tf.constant(train_out_np)
tr_data = tf.data.Dataset.from_tensor_slices((train_in, train_out))
tr_data = tr_data.shuffle(len(train_in_np))
tr_data = tr_data.repeat(epoch_cnt+1)
tr_data = tr_data.apply(tf.contrib.data.map_and_batch(parse_files, batch_size=batchSize))
tr_data = tr_data.prefetch(buffer_size=batchSize)
iterator = tr_data.make_initializable_iterator()
return iterator.get_next()
io_utils.loadDataset just returns a list of filepaths.
The data itself is parsed by
def parse_files(in_file, gt_file):
image_in = tf.read_file(in_file)
image_in = tf.image.decode_image(image_in, channels=3)
image_in.set_shape([None, None, 3])
image_in = tf.cast(image_in, tf.float32)
mean, std = tf.nn.moments(image_in, [0, 1])
image_in = image_in - mean
image_in = image_in / std
gt = tf.read_file(gt_file)
gt = tf.image.decode_image(gt, channels=1)
gt.set_shape([None, None, 1])
gt = tf.cast(gt, tf.int32)
return {'img':image_in}, gt
My estimator starts with
def estimator_fcn_model_fn(features, labels, mode, params):
x = tf.feature_column.input_layer(features, params['feature_columns'])
and the feature columns are defined as
my_feature_columns = []
my_feature_columns.append(tf.feature_column.numeric_column(key='img'))
the rest of the code I skipped for clearability.
My problem is with the shape of the feature:
Blockquote ValueError: ('Convolution not supported for input with rank', 2)
The print output of x, features and feature_columns:
Tensor("input_layer/concat:0", shape=(?, 1), dtype=float32)
{'img': tf.Tensor 'IteratorGetNext:0' shape=(?, ?, ?, 3) dtype=float32}
[_NumericColumn(key='img', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]
Has anyone an idea how to resolve this, I would guess it has to do with the kind of feature column, but I have no idea who to apply this for images.

As Y.Luo gave a hint tf.feature_column.input_layer is not suitable for images. An easier way is to directly use the feature dictionary by a key, which might be passed by params for more flexibility.
x = features[params['input_name']]
instead of
x = tf.feature_column.input_layer(features, params['feature_columns'])

Related

Feature importance in neural networks with multiple differently shaped inputs in pytorch and captum (classification)

I have developed a model with three inputs types. Image, categorical data and numerical data. For Image data I've used ResNet50 for the other two I develop my own network.
class MulticlassClassification(nn.Module):
def __init__(self, cat_size, num_col, output_size, layers, p=0.4):
super(MulticlassClassification, self).__init__()
# IMAGE: ResNet
self.cnn = models.resnet50(pretrained = True)
for param in self.cnn.parameters():
param.requires_grad = False
n_inputs = self.cnn.fc.in_features
self.cnn.fc = nn.Sequential(
nn.Linear(n_inputs, 250),
nn.ReLU(),
nn.Dropout(p),
nn.Linear(250, output_size),
nn.LogSoftmax(dim=1)
)
# TABULAR
self.all_embeddings = nn.ModuleList(
[nn.Embedding(categories, size) for categories, size in cat_size]
)
self.embedding_dropout = nn.Dropout(p)
self.batch_norm_num = nn.BatchNorm1d(num_col)
all_layers = []
num_cat_col = sum(e.embedding_dim for e in self.all_embeddings)
input_size = num_cat_col + num_col
for i in layers:
all_layers.append(nn.Linear(input_size, i))
all_layers.append(nn.ReLU(inplace=True))
all_layers.append(nn.BatchNorm1d(i))
all_layers.append(nn.Dropout(p))
input_size = i
all_layers.append(nn.Linear(layers[-1], output_size))
self.layers = nn.Sequential(*all_layers)
#combine
self.combine_fc = nn.Linear(output_size * 2, output_size)
def forward(self, image, x_categorical, x_numerical):
embeddings = []
for i, embedding in enumerate(self.all_embeddings):
print(x_categorical[:,i])
embeddings.append(embedding(x_categorical[:,i]))
x = torch.cat(embeddings, 1)
x = self.embedding_dropout(x)
x_numerical = self.batch_norm_num(x_numerical)
x = torch.cat([x, x_numerical], 1)
x = self.layers(x)
# img
x2 = self.cnn(image)
# combine
x3 = torch.cat([x, x2], 1)
x3 = F.relu(self.combine_fc(x3))
return x
Now after successful training I would like to calculate integrated gradients by using the captum library.
from captum.attr import IntegratedGradients
ig = IntegratedGradients(model)
testiter = iter(testloader)
img, stack_cat, stack_num, target = next(testiter)
attributions_ig = ig.attribute(inputs=(img.cuda(), stack_cat.cuda(), stack_num.cuda()), target=target.cuda())
And here I got an error:
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.cuda.FloatTensor instead (while checking arguments for embedding)
I figured out that captum injects a wrongly shaped tensor into my x_categorical input (with the print in my forward method). It seems like captum only sees the first input tensor and uses it's shape for all other inputs. How can I change this behaviour?
I've found the similar issue here (https://github.com/pytorch/captum/issues/439). It was recommended to use Interpretable Embedding for categorical data. When I used it I got this error:
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
I would be very grateful for any tips and advises how to combine all three inputs and to solve my problem.

Why my prediction function is giving error? ValueError: not enough values to unpack (expected 2, got 1)

I'm trying to make prediction using the pre-trained model for binary segmentation using UNET and pytorch. Here is my code:
model.eval() # Set model to evaluate mode
class SimDataset(Dataset):
def __init__(self, path, transform=None, isMask=False):
self.m = ("test")
self.path = path
self.transform = transform
self.isMask = isMask
def __len__(self):
return len(self.path)
def __getitem__(self, idx):
one_image = os.path.join(self.m, self.path[idx]) # preparing image path/location
img_temp = Image.open(one_image) # load RGB input image
if self.transform:
image = self.transform(img_temp)
input_image = np.array(img_temp).astype('float32') # converting one image to np array
input_image = np.transpose(input_image, (2, 0 ,1)) # converting from hwc to chw [(256,256,3) => (3, 256, 256)]
return [input_image]
testlist = list(os.listdir(r"test"))
len(testlist)
image_datasets = {
'testlist': testlist
}
dataset_sizes = {
x: len(image_datasets[x]) for x in image_datasets.keys()
}
test_dataset = SimDataset(testlist, transform = trans, isMask=False)
test_loader = DataLoader(test_dataset, batch_size=3, shuffle=False, num_workers=0)
inputs, labels = next(iter(test_loader))
inputs = inputs.to(device)
labels = labels.to(device)
pred = model(inputs)
pred = torch.sigmoid(pred)
pred = pred.data.cpu().numpy()
print(pred.shape)
But it is showing error saying -> ValueError: not enough values to unpack (expected 2, got 1).
Your code expects two outputs from the data loader:
inputs, labels = next(iter(test_loader))
However, your __getitem__ method in your dataset, returns only a single output:
return [input_image]
Either you return two outputs from __getitem__, both images and labels.
Or expect only a single output from the test_loader.

"Invalid argument: indices[0,0,0,0] = 30 is not in [0, 30)"

Error:
InvalidArgumentError: indices[0,0,0,0] = 30 is not in [0, 30)
[[{{node GatherV2}}]] [Op:IteratorGetNext]
History:
I have a custom data loader for a tf.keras based U-Net for semantic segmentation, based on this example. It is written as follows:
def parse_image(img_path: str) -> dict:
# read image
image = tf.io.read_file(img_path)
#image = tfio.experimental.image.decode_tiff(image)
if xf == "png":
image = tf.image.decode_png(image, channels = 3)
else:
image = tf.image.decode_jpeg(image, channels = 3)
image = tf.image.convert_image_dtype(image, tf.uint8)
#image = image[:, :, :-1]
# read mask
mask_path = tf.strings.regex_replace(img_path, "X", "y")
mask_path = tf.strings.regex_replace(mask_path, "X." + xf, "y." + yf)
mask = tf.io.read_file(mask_path)
#mask = tfio.experimental.image.decode_tiff(mask)
mask = tf.image.decode_png(mask, channels = 1)
#mask = mask[:, :, :-1]
mask = tf.where(mask == 255, np.dtype("uint8").type(NoDataValue), mask)
return {"image": image, "segmentation_mask": mask}
train_dataset = tf.data.Dataset.list_files(
dir_tls(myear = year, dset = "X") + "/*." + xf, seed = zeed)
train_dataset = train_dataset.map(parse_image)
val_dataset = tf.data.Dataset.list_files(
dir_tls(myear = year, dset = "X_val") + "/*." + xf, seed = zeed)
val_dataset = val_dataset.map(parse_image)
## data transformations--------------------------------------------------------
#tf.function
def normalise(input_image: tf.Tensor, input_mask: tf.Tensor) -> tuple:
input_image = tf.cast(input_image, tf.float32) / 255.0
return input_image, input_mask
#tf.function
def load_image_train(datapoint: dict) -> tuple:
input_image = tf.image.resize(datapoint["image"], (imgr, imgc))
input_mask = tf.image.resize(datapoint["segmentation_mask"], (imgr, imgc))
if tf.random.uniform(()) > 0.5:
input_image = tf.image.flip_left_right(input_image)
input_mask = tf.image.flip_left_right(input_mask)
input_image, input_mask = normalise(input_image, input_mask)
return input_image, input_mask
#tf.function
def load_image_test(datapoint: dict) -> tuple:
input_image = tf.image.resize(datapoint["image"], (imgr, imgc))
input_mask = tf.image.resize(datapoint["segmentation_mask"], (imgr, imgc))
input_image, input_mask = normalise(input_image, input_mask)
return input_image, input_mask
## create datasets-------------------------------------------------------------
buff_size = 1000
dataset = {"train": train_dataset, "val": val_dataset}
# -- Train Dataset --#
dataset["train"] = dataset["train"]\
.map(load_image_train, num_parallel_calls = tf.data.experimental.AUTOTUNE)
dataset["train"] = dataset["train"].shuffle(buffer_size = buff_size,
seed = zeed)
dataset["train"] = dataset["train"].repeat()
dataset["train"] = dataset["train"].batch(bs)
dataset["train"] = dataset["train"].prefetch(buffer_size = AUTOTUNE)
#-- Validation Dataset --#
dataset["val"] = dataset["val"].map(load_image_test)
dataset["val"] = dataset["val"].repeat()
dataset["val"] = dataset["val"].batch(bs)
dataset["val"] = dataset["val"].prefetch(buffer_size = AUTOTUNE)
print(dataset["train"])
print(dataset["val"])
Now I wanted to use a weighted version of tf.keras.losses.SparseCategoricalCrossentropy for my model and I found this tutorial, which is rather similar to the example above.
However, they also offered a weighted version of the loss, using:
def add_sample_weights(image, label):
# The weights for each class, with the constraint that:
# sum(class_weights) == 1.0
class_weights = tf.constant([2.0, 2.0, 1.0])
class_weights = class_weights/tf.reduce_sum(class_weights)
# Create an image of `sample_weights` by using the label at each pixel as an
# index into the `class weights` .
sample_weights = tf.gather(class_weights, indices=tf.cast(label, tf.int32))
return image, label, sample_weights
and
weighted_model.fit(
train_dataset.map(add_sample_weights),
epochs=1,
steps_per_epoch=10)
I combined those approaches since the latter tutorial uses previously loaded data, while I want to draw the images from disc (not enough RAM to load all at once).
Resulting in the code from the first example (long code block above) followed by
def add_sample_weights(image, segmentation_mask):
class_weights = tf.constant(inv_weights, dtype = tf.float32)
class_weights = class_weights/tf.reduce_sum(class_weights)
sample_weights = tf.gather(class_weights,
indices = tf.cast(segmentation_mask, tf.int32))
return image, segmentation_mask, sample_weights
(inv_weights are my weights, an array of 30 float64 values) and
model.fit(dataset["train"].map(add_sample_weights),
epochs = 45, steps_per_epoch = np.ceil(N_img/bs),
validation_data = dataset["val"],
validation_steps = np.ceil(N_val/bs),
callbacks = cllbs)
When I run
dataset["train"].map(add_sample_weights).element_spec
as in the second example, I get an output that looks reasonable to me (similar to the one in the example):
Out[58]:
(TensorSpec(shape=(None, 512, 512, 3), dtype=tf.float32, name=None),
TensorSpec(shape=(None, 512, 512, 1), dtype=tf.float32, name=None),
TensorSpec(shape=(None, 512, 512, 1), dtype=tf.float32, name=None))
However, when I try to fit the model or run something like
a, b, c = dataset["train"].map(add_sample_weights).take(1)
I will receive the error mentioned above.
So far, I have found quite some questions regarding this error (e.g., a, b, c, d), however, they all talk of "embedding layers" and things I am not aware of using.
Where does this error come from and how can I solve it?
Picture tf.gather as a fancy way to do indexing. The error you get is akin to the following example in python:
>>> my_list = [1,2,3]
>>> my_list[3]
IndexError: list index out of range
If you want to use tf.gather, then the range of value of your indices should not be bigger than the dimension size of the Tensor you are willing to index.
In your case, in the call tf.gather(class_weights,indices = tf.cast(segmentation_mask, tf.int32)), with class_weights being a Tensor of dimension (30,), the range of values of segmentation_mask should be between 0 and 29. As far as I can tell from your data pipeline, segmentation_mask has a range of value between 0 and 255. The fix will be problem dependent.

Solved: How to combine tf.gradients with tf.data.dataset and keras models

I'm trying to build a workflow that uses tf.data.dataset batches and an iterator. For performance reasons, I am really trying to avoid using the placeholder->feed_dict loop workflow.
The process I'm trying to implement involves grad-cam (which requires the gradient of the loss with respect to the final convolutional layer of a CNN) as an intermediate step, and ideally I'd like to be able to try it out on several Keras pre-trained models, including non-sequential ones like ResNet.
Most implementations of grad-cam that I've found rely on hand-crafting the CNN of interest in tensorflow. I found one implementation, https://github.com/jacobgil/keras-grad-cam, that is made for keras models, and following that example, I get
def safe_norm(x):
return x / tf.sqrt(tf.reduce_mean(x ** 2) + 1e-8)
vgg_ = VGG19()
dataset = tf.data.Dataset.from_tensor_slices((filenames))
#preprocessing...
it = dataset.make_one_shot_iterator()
files, batch = it.get_next()
conv5_4 = vgg_.layers[-6]
h_k, w_k, c_k = conv5_4.output.shape[1:]
vgg_model = Model(inputs=vgg_.input, outputs=vgg_.output)
conv_model = Model(inputs=vgg_.input, outputs=conv5_4.output)
probs = vgg_model(batch)
predicted_class = tf.argmax(probs, axis=-1)
layer_name = 'block5_conv4'
target_layer = lambda x: target_category_loss(x, predicted_class, n_categories)
x = Lambda(target_layer)(vgg_model.outputs[0])
model = Model(inputs=vgg_model.inputs[0], outputs=x)
loss = K.sum(model.output, axis=-1)
conv_output = [l for l in model.layers if l.name is layer_name][0].output
grads = Lambda(safe_norm)(K.gradients(loss, [conv_output])[0])
gradient_function = K.function([model.input], [conv_output, grads])
output, grads_val = gradient_function([batch])
weights = tf.reduce_mean(grads_val, axis = (1, 2))
cam = tf.ones([batch_size, h_k, w_k], dtype = tf.float32)
cam += tf.reduce_sum(output * tf.reshape(weights, [-1, 1, 1, weights.shape[-1]]), axis=-1)
cam = tf.squeeze(tf.image.resize_images(images=tf.expand_dims(cam, axis=-1), size=(224, 224)))
cam = tf.maximum(cam, 0)
heatmap = cam / tf.reshape(tf.reduce_max(cam, axis=[1, 2]), shape=[-1, 1, 1])
The problem is that gradient_function([batch]) returns a numpy array whose value is determined by the first batch, so that heatmap doesn't change with subsequent evaluations.
I've tried replacing K.function with a Model in various ways, but nothing seems to work. I usually end up either with an error suggesting that grads evaluates to None or that one model or another is expecting a feed_dict and not receiving one.
Is this code salvageable? Is there a better way to do this besides looping through the data several times (once to get all the grad-cams and then again once I have them) or using placeholders and feed_dicts?
Edit:
def safe_norm(x):
return x / tf.sqrt(tf.reduce_mean(x ** 2) + 1e-8)
vgg_ = VGG19()
dataset = tf.data.Dataset.from_tensor_slices((filenames))
#preprocessing...
it = dataset.make_one_shot_iterator()
files, batch = it.get_next()
conv5_4 = vgg_.layers[-6]
h_k, w_k, c_k = conv5_4.output.shape[1:]
vgg_model = Model(inputs=vgg_.input, outputs=vgg_.output)
conv_model = Model(inputs=vgg_.input, outputs=conv5_4.output)
probs = vgg_model(batch)
predicted_class = tf.argmax(probs, axis=-1)
layer_name = 'block5_conv4'
target_layer = lambda x: target_category_loss(x, predicted_class, n_categories)
x = Lambda(target_layer)(vgg_model.outputs[0])
model = Model(inputs=vgg_model.inputs[0], outputs=x)
loss = K.sum(model.output, axis=-1)
conv_output = [l for l in model.layers if l.name is layer_name][0].output
grads = Lambda(safe_norm)(K.gradients(loss, [conv_output])[0])
gradient_function = K.function([model.input], [conv_output, grads])
output, grads_val = gradient_function([batch])
weights = tf.reduce_mean(grads_val, axis = (1, 2))
cam = tf.ones([batch_size, h_k, w_k], dtype = tf.float32)
cam += tf.reduce_sum(output * tf.reshape(weights, [-1, 1, 1, weights.shape[-1]]), axis=-1)
cam = tf.squeeze(tf.image.resize_images(images=tf.expand_dims(cam, axis=-1), size=(224, 224)))
cam = tf.maximum(cam, 0)
heatmap = cam / tf.reshape(tf.reduce_max(cam, axis=[1, 2]), shape=[-1, 1, 1])
# other operations on heatmap and batch ...
# ...
output_function = K.function(model.input, [node1, ..., nodeN])
for batch in range(n_batches):
outputs1, ... , outputsN = output_function(batch)
Gives me the desired outputs for each batch.
Yes, K.function returns numpy arrays because it evaluates the symbolic computation in your graph. What I think you should do is to keep everything symbolic up to K.function, and after getting the gradients, perform all computations of the Grad-CAM weights and final saliency map using numpy.
Then you can iterate on your dataset, evaluate gradient_function on a new batch of data, and compute the saliency map.
If you want to keep everything symbolic, then you should not use K.function to produce the gradient function, but use the symbolic gradient (the output of K.gradient, without lambda) and convolutional feature maps (conv_output) and perform the saliency map computation on top of that, and then build a function (using K.function) that takes the model input, and outputs the saliency map.
Hope the explanation is enough.

TensorFlow model gets zero loss

import tensorflow as tf
import numpy as np
import os
import re
import PIL
def read_image_label_list(img_directory, folder_name):
# Input:
# -Name of folder (test\\\\train)
# Output:
# -List of names of files in folder
# -Label associated with each file
cat_label = 1
dog_label = 0
filenames = []
labels = []
dir_list = os.listdir(os.path.join(img_directory, folder_name)) # List of all image names in 'folder_name' folder
# Loop through all images in directory
for i, d in enumerate(dir_list):
if re.search("train", folder_name):
if re.search("cat", d): # If image filename contains 'Cat', then true
labels.append(cat_label)
else:
labels.append(dog_label)
filenames.append(os.path.join(img_dir, folder_name, d))
return filenames, labels
# Define convolutional layer
def conv_layer(input, channels_in, channels_out):
w_1 = tf.get_variable("weight_conv", [5,5, channels_in, channels_out], initializer=tf.contrib.layers.xavier_initializer())
b_1 = tf.get_variable("bias_conv", [channels_out], initializer=tf.zeros_initializer())
conv = tf.nn.conv2d(input, w_1, strides=[1,1,1,1], padding="SAME")
activation = tf.nn.relu(conv + b_1)
return activation
# Define fully connected layer
def fc_layer(input, channels_in, channels_out):
w_2 = tf.get_variable("weight_fc", [channels_in, channels_out], initializer=tf.contrib.layers.xavier_initializer())
b_2 = tf.get_variable("bias_fc", [channels_out], initializer=tf.zeros_initializer())
activation = tf.nn.relu(tf.matmul(input, w_2) + b_2)
return activation
# Define parse function to make input data to decode image into
def _parse_function(img_path, label):
img_file = tf.read_file(img_path)
img_decoded = tf.image.decode_image(img_file, channels=3)
img_decoded.set_shape([None,None,3])
img_decoded = tf.image.resize_images(img_decoded, (28, 28), method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
img_decoded = tf.image.per_image_standardization(img_decoded)
img_decoded = tf.cast(img_decoded, dty=tf.float32)
label = tf.one_hot(label, 1)
return img_decoded, label
tf.reset_default_graph()
# Define parameterspe
EPOCHS = 10
BATCH_SIZE_training = 64
learning_rate = 0.001
img_dir = 'C:/Users/tharu/PycharmProjects/cat_vs_dog/data'
batch_size = 128
# Define data
features, labels = read_image_label_list(img_dir, "train")
# Define dataset
dataset = tf.data.Dataset.from_tensor_slices((features, labels)) # Takes slices in 0th dimension
dataset = dataset.map(_parse_function)
dataset = dataset.batch(batch_size)
iterator = dataset.make_initializable_iterator()
# Get next batch of data from iterator
x, y = iterator.get_next()
# Create the network (use different variable scopes for reuse of variables)
with tf.variable_scope("conv1"):
conv_1 = conv_layer(x, 3, 32)
pool_1 = tf.nn.max_pool(conv_1, ksize=[1,2,2,1], strides=[1,2,2,1], padding="SAME")
with tf.variable_scope("conv2"):
conv_2 = conv_layer(pool_1, 32, 64)
pool_2 = tf.nn.max_pool(conv_2, ksize=[1,2,2,1], strides=[1,2,2,1], padding="SAME")
flattened = tf.contrib.layers.flatten(pool_2)
with tf.variable_scope("fc1"):
fc_1 = fc_layer(flattened, 7*7*64, 1024)
with tf.variable_scope("fc2"):
logits = fc_layer(fc_1, 1024, 1)
# Define loss function
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf.cast(y, dtype=tf.int32)))
# Define optimizer
train = tf.train.AdamOptimizer(learning_rate).minimize(loss)
with tf.Session() as sess:
# Initiliaze all the variables
sess.run(tf.global_variables_initializer())
# Train the network
for i in range(EPOCHS):
# Initialize iterator so that it starts at beginning of training set for each epoch
sess.run(iterator.initializer)
print("EPOCH", i)
while True:
try:
_, epoch_loss = sess.run([train, loss])
except tf.errors.OutOfRangeError: # Error given when out of data
if i % 2 == 0:
# [train_accuaracy] = sess.run([accuracy])
# print("Step ", i, "training accuracy = %{}".format(train_accuaracy))
print(epoch_loss)
break
I've spent a few hours trying to figure out systematically why I've been getting 0 loss when I run this model.
Features = list of file locations for each image (e.g. ['\data\train\cat.0.jpg', /data\train\cat.1.jpg])
Labels = [Batch_size, 1] one_hot vector
Initially I thought it was because there was something wrong with my data. But I've viewed the data after being resized and the images seems fine.
Then I tried a few different loss functions because I thought maybe I'm misunderstanding what the the tensorflow function softmax_cross_entropy does, but that didn't fix anything.
I've tried running just the 'logits' section to see what the output is. This is just a small sample and the numbers seem fine to me:
[[0.06388957]
[0. ]
[0.16969752]
[0.24913025]
[0.09961276]]
Surely then the softmax_cross_entropy function should be able to compute this loss given that the corresponding labels are 0 or 1? I'm not sure if I'm missing something. Any help would be greatly appreciated.
As documented:
logits and labels must have the same shape, e.g. [batch_size, num_classes] and the same dtype (either float16, float32, or float64).
Since you mentioned your label is "[Batch_size, 1] one_hot vector", I would assume both your logits and labels are [Batch_size, 1] shape. This will certainly lead to zero loss. Conceptually speaking, you have only 1 class (num_classes=1) and your cannot be wrong (loss=0).
So at least for you labels, you should transform it: tf.one_hot(indices=labels, depth=num_classes). Your prediction logits should also have a shape [batch_size, num_classes] output.
Alternatively, you can use sparse_softmax_cross_entropy_with_logits, where:
A common use case is to have logits of shape [batch_size, num_classes] and labels of shape [batch_size]. But higher dimensions are supported.

Categories