In short, I am struggling in trying to convert an image per-pixel category mask in a tf.data.Dataset from an integer-class encoding to a one-hot encoding.
Consider the image segmentation tensorflow tutorial example here:
https://www.tensorflow.org/tutorials/images/segmentation.
The input is an image and the output is a per-pixel integer-labeled category mask. In their example, the mask has a category value at each pixel represented by an integer: {0, 1, or 2}.
The train and test variables are of type tf.data.Dataset and each sample is an (image,mask) tuple.
This form of mask/output is consistent with the sparse_categorical_crossentropy loss function in the tutorial. However, I would like to be able to use other loss functions that require a one-hot encoding instead.
I have been attempting to convert the datasets via the tf.keras.utils.to_categorical() function using a map() call, ie.:
def mask_to_categorical(image, mask):
mask = tf.keras.utils.to_categorical(mask,3)
return image, mask
train = train.map(mask_to_categorical)
However, this fails with an error such as:
{...}/tensorflow_core/python/keras/utils/np_utils.py:40 to_categorical
y = np.array(y, dtype='int')
TypeError: __array__() takes 1 positional argument but 2 were given
Note:
My searching thus far has pointed towards eager/non-eager issues as one possible cause. For what it's worth, I verified that I am running in eager mode via:
>>> print('tf.executing_eagerly() = ', tf.executing_eagerly())
tf.executing_eagerly() = True
Any suggestions? Thank you!
Try modifying your function for the one-hot encoding like this:
def mask_to_categorical(image, mask):
mask = tf.one_hot(tf.cast(mask, tf.int32), 3)
mask = tf.cast(mask, tf.float32)
return image, mask
Related
I wish to take the FFT of the input dataset loaded using ImageDataGenerator. Taking the FFT will double the number of channels as I stack the real and complex parts of the complex output of the FFT together along the channels dimension. The preprocessing_function attribute of the ImageDataGenerator class should output a Numpy tensor with the same shape as the input, so I could not use that.
I tried applying tf.math.fft2d directly on the ImageDataGenerator.flow_from_directory() output, but it is consuming too much RAM - causing the program to crash on Google colab. Another way I tried was to add a custom layer computing the FFT as the first layer of my neural network, but this adds to the training time. So I wish to do it as a pre-processing step.
Could anyone kindly suggest an efficient way to apply a function on ImageDataGenerator.
You can do a custom ImageDataGenerator, but I have no reason to think this is any faster than using it in the first layer. It seems like a costly operation, since tf.signal.fft2d takes complex64 or complex128 dtypes. So it needs casting, and then casting back because neural network weights are tf.float32 and other image processing functions don't take complex dtype.
import tensorflow as tf
labels = ['Cats', 'Dogs', 'Others']
def read_image(file_name):
image = tf.io.read_file(file_name)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize_with_pad(image, target_height=224, target_width=224)
image = tf.cast(image, tf.complex64)
image = tf.signal.fft2d(image)
label = tf.strings.split(file_name, '\\')[-2]
label = tf.where(tf.equal(label, labels))
return image, label
ds = tf.data.Dataset.list_files(r'path\to\my\pictures\*\*.jpg')
ds = ds.map(read_image)
next(iter(ds))
I may be mistaken, but it seems that PyTorch Transformers are autoregressive, which is what masking is for. However, I've seen some implementations where people use just the Encoder and output that directly to a Linear layer.
In my case, I'm trying to convert a spectrogram (rows are frequencies and columns are timesteps) to another spectrogram of the same dimensions. I'm having an impossible time trying to figure out how to do this.
For my model, I have:
class TransformerReconstruct(nn.Module):
def __init__(self, feature_size=250, num_layers=1, dropout=0.1, nhead=10, output_dim=1):
super(TransformerReconstruct, self).__init__()
self.model_type = 'Transformer'
self.src_mask = None
self.pos_encoder = PositionalEncoding(feature_size)
self.encoder_layer = nn.TransformerEncoderLayer(d_model=feature_size, nhead=nhead, dropout=dropout)
self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=num_layers)
self.decoder = nn.Linear(feature_size, output_dim)
self.init_weights()
def init_weights(self):
initrange = 0.1
self.decoder.bias.data.zero_()
self.decoder.weight.data.uniform_(-initrange, initrange)
def forward(self, src):
if self.src_mask is None or self.src_mask.size(0) != len(src):
device = src.device
mask = self._generate_square_subsequent_mask(len(src)).to(device)
self.src_mask = mask
src = self.pos_encoder(src)
output = self.transformer_encoder(src, self.src_mask)
output = self.decoder(output)
return output
def _generate_square_subsequent_mask(self, sz):
mask = (torch.triu(torch.ones(sz, sz)) == 1).transpose(0, 1)
mask = mask.float().masked_fill(mask == 0, float('-inf')).masked_fill(mask == 1, float(0.0))
return mask
And when training, I have:
model = TransformerReconstruct(feature_size=128, nhead=8, output_dim=128, num_layers=6).to(device)
This returns the right shape, but doesn't seem to learn.
My basic training loop looks like:
for i in range(0, len(data_source) - 1, input_window):
data, target = get_batch(data_source, i, 1)
output = recreate_model(data)
and I'm using an MSELoss and I'm trying to learn a very simple identity. Where the input and output are the same, however this is not learning. What could I be doing wrong? Thanks in advance.
Most of the models in Huggingface Transformers are some version of BERT and thus not autoregressive, the only exceptions are decoder-only models (GPT and similar) and sequence-to-sequence model.
There are two conceptually different types of masks: one is the input mask that is specific to the input batch and the purpose is allowing using sequences of different lengths in a single batch. When the sequences get padded to the same length, the self-attention should attend to the padding positions. This is what you are supposed to use when you call self.transformer_encoder in the forward method.
In addition, the autoregressive Transformer decoder uses another type of mask. It is the triangular mask that prevents the self-attention to attend to tokens that are right of the current position (at inference time, words right of the current position are unknown before they are actually generated). This is what you have in the _generate_square_subsequent_mask method and this is what makes the model autoregressive. It is constant and does not depend on the input batch.
To summarize: to have a bidirectional Transformer, just get rid of the triangular mask. If your input sequences are of different lengths, you should use batch-specific masking, if not, just pass a matrix with ones.
If you want the model to stop behaving in an autoregressive manner, you need to 'unhide' tokens to the right of the current token i.e. modify/remove _generate_square_subsequent_mask.
How you modify this depends on the task. Are you trying to recover 'corrupted' input sequences? Then mask a random subset of tokens and treat it as an autoencoder.
If you just wish to approximate the identity function remove the mask completely.
I use Segmentation Models library for multi-class (in my case 4 class) semantic segmentation. The model (UNet with 'resnet34' backbone) is trained with 3000 RGB (224x224x3) images. The accuracy is around 92.80%.
1) Why model.predict() function requires (1,224,224,3) shaped array as input ? I didn't find the answer even in the Keras documentation. Actually, below code is working, I have no problem with it but I want to understand the reason.
predictions = model.predict( test_image.reshape(-1,224,224,3) );
2) predictions is a (1,224,224,3) shaped numpy array. Its data type is float32 and contains some floating numbers. What is the meaning of the numbers inside this array? How can I visualize them? I mean, I assumed that the result array will contain one of 4 class label (from 0 to 3) for every pixel, and then I will apply the color map for each class. In other words, the result should have been a prediction map, but I didn't get it. To understand better what I mean about prediction map, please visit the Jeremy Jordan's blog about semantic segmentation.
result = predictions[0]
plt.imshow(result) # import matplotlib.pyplot as plt
3) What I finally want to do is like Github: mrgloom - Semantic Segmentation Categorical Crossentropy Example did in visualy_inspect_result function.
1) Image input shape in your deep neural network architecture is (224,224,3), so width=height=224 and 3 color channels. And you need an additionnal dimension in case you want to give more than one image at a time to your model. So (1,224,224,3) or (something, 224,224,3).
2) According to the doc of Segementation models repo, you can specify the number of classes you want as output model = Unet('resnet34', classes=4, activation='softmax'). Thus if you reshape your labelled image to have a shape (1,224,224,4). The last dimension is a mask channel indicating with a 0 or 1 if pixel i,j belongs to class k. Then you can predict and access to each output mask
masked = model.predict(np.array([im])[0]
mask_class0 = masked[:,:,0]
mask_class1 = masked[:,:,1]
3) Then using matplotlib you will be able to plot semantic segmentation or using scikit-image : color.label2rgb function
I have a simple 2 layer dense NN which I want to use a regression model to compute 4 number of given ~ 700 features of an image. Unfortunately, I do not have ground truth elements, so I use custom loss function. Here's the source of the function:
def loss_function(logits, img, g, compare_img):
final_img = img_pipeline(vga_8b=img, g=g%external color gamma function%)
with tf.name_scope('Loss'):
loss = score(gt_image=compare_img, curr_img=final_img)
return loss
Where logits are the current evaluated 4 numbers, g is just a interpolated function used as color gamma for the image, img is external grayscale image used to generate the final result image used for the score function. compare_img is not a ground truth image, but some statistical values (kept in python dict) used in the score function to evaluate the current produced image.
Unfortunately, I can't feed g and compare_img as they are python function and python dictionary which cannot be converted to tensors.
Is there a way to hack it somehow and achieve the desired result?
Thanks in advance!
You can use external functions with tensorflow with tf.map but I'm afraid to to say, that these are not able to calculate gradients through it. but you loss functions needs to be derivable in every case. So you have to write the function in tensorflow.
For your dict values you can create a lookuptable with
table = tf.contrib.lookup.HashTable(
tf.contrib.lookup.KeyValueTensorInitializer(keys, values), -1)
I am working on the Python/tensorflow/mnist tutorial.
Since a few weeks using the orignal code from tensorflow web site i get the warning that the image dataset would soon be deprecated abd that i should use the following one :
https://github.com/tensorflow/models/blob/master/official/mnist/dataset.py
I load it it my code using :
from tensorflow.models.official.mnist import dataset
trainfile = dataset.train(data_dir)
Which returns :
tf.data.Dataset.zip((images, labels))
The issue is that I cannot find a,way to separate them in the following way for example :
trainfile = dataset.train(data_dir)
train_data= trainfile.images
train_label= trainfile.label
But this clearly doesnot work because the attributrs images and label do not exist. trainfile is a tf.dataset.
Knowing that tf.dataset is made of int32 and float32 i tried :
train_data = trainfile.map(lambda x,y : x.dtype == tf.float32)
But it returns and empty dataset.
I insist (but will be open mimded) in doing it this way (two complete batches of image and label) because this is how the tutorial works :
https://www.tensorflow.org/tutorials/estimators/cnn
I saw a lot of solution to get elements from datasets but nothing to go back from the zip operations that is done in the following code
tf.data.Dataset.zip((images, labels))
Thanks you in advance for your help.
I hope this helps:
inputs = tf.placeholder(tf.float32, shape=(None, 784), name='inputs')
outputs = tf.placeholder(tf.float32, shape=(None,), name='outputs')
#Prepare a tensorflow dataset
ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))
ds = ds.shuffle(buffer_size=10, reshuffle_each_iteration=True).batch(batch_size=batch_size, drop_remainder=True).repeat()
iter = ds.make_one_shot_iterator()
next = iter.get_next()
inputs = next[0]
outputs = next[1]
TensorFlow's get_single_element() is finally around which can be used to unzip features and labels from the dataset.
This avoids the need of generating and using an iterator using .map(), iter() or one_shot_iterator() (which could be costly for big datasets).
get_single_element() returns a tensor (or a tuple or dict of tensors) encapsulating all the members of the dataset. We need to pass all the members of the dataset batched into a single element.
This can be used to get features as a tensor-array, or features and labels as a tuple or dictionary (of tensor-arrays) depending upon how the original dataset was created.
Check this answer on SO for an example that unpacks features and labels into a tuple of tensor-arrays.
Instead of separating into two datasets, one for images and another for labels, it's best to make a single iterator which returns both the image and the label.
The reason why this is preferred is that it's a lot easier to ensure that you match each example with its label even after a complicated series of shuffles, reorderings, filterings, etc, as you might have in a nontrivial input pipeline.
You can visualize images and find its associated labels
ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))
ds = ds.shuffle(buffer_size=10).batch(batch_size=batch_size)
iter = ds.make_one_shot_iterator()
next = iter.get_next()
def display(image, label):
# display image
...
plt.imshow(image)
...
with tf.Session() as sess:
try:
while True:
image, label = sess.run(next)
# image = numpy array (batch, image_size)
# label = numpy array (batch, label)
display(image[0], label[0]) #display first image in batch
except:
pass