How to convert Tensorflow dataset to 2D numpy array

How to convert Tensorflow dataset to 2D numpy array - python

I have a TensorFlow dataset which contains nearly 15000 multicolored images with 168*84 resolution and a label for each image. Its type and shape are like this:
< ConcatenateDataset shapes: ((168, 84, 3), ()), types: (tf.float32, tf.int32)>
I need to use it to train my network. That's why I need to pass it as a parameter to this function that I built my layers in:
def cnn_model_fn(features, labels, mode):
input_layer = tf.reshape(features["x"], [-1, 168, 84, 3])
# Convolutional Layer #1
conv1 = tf.layers.conv2d(
inputs=input_layer,
filters=32,
kernel_size=[5, 5],
padding="same",
activation=tf.nn.relu)
.
.
.
I tried to convert each tensor into np.array(which is the proper type for the function above, i guess) by using tf.eval() and np.ravel(). But I failed.
So, how can I convert this dataset into the proper type to pass it to the function?
Plus
I am new to python and tensorflow and I don't think I understand why there are datasets if we can not use them directly to build layers (I am following the tutorial in TensorFlow's website btw).
Thanks.

You could try eager execution, previously I gave an answer with session run (showed below).During eager execution using .numpy() on a tensor will convert that tensor to numpy array.Example code (from my use case):
#enable eager execution
from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
tf.enable_eager_execution()
print('Is executing eagerly?',tf.executing_eagerly())
#load datasets
import tensorflow_datasets as tfds
dataset, metadata = tfds.load('cycle_gan/horse2zebra',
with_info=True, as_supervised=True)
train_horses, train_zebras = dataset['trainA'], dataset['trainB']
#load dataset in to numpy array
train_A=train_horses.batch(1000).make_one_shot_iterator().get_next()[0].numpy()
print(train_A.shape)
#preview one of the images
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
print(train_A.shape)
plt.imshow(train_A[1])
plt.show()
Old, session run, answer:
I recently had this problem, and I did it like this:
#load datasets
import tf
import tensorflow_datasets as tfds
dataset, metadata = tfds.load('cycle_gan/horse2zebra',
with_info=True, as_supervised=True)
train_horses, train_zebras = dataset['trainA'], dataset['trainB']
#load dataset in to numpy array
sess = tf.compat.v1.Session()
tra=train_horses.batch(1000).make_one_shot_iterator().get_next()
train_A=np.array(sess.run(tra)[0])
print(train_A.shape)
sess.close()
#preview one of the images
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
print(train_A.shape)
plt.imshow(train_A[1])
plt.show()

It doesn't sound like you set up things using the Tensorflow Dataset pipeline, here is the guide for doing so:
https://www.tensorflow.org/programmers_guide/datasets
You can either follow that (it's the right approach, but there's a small learning curve to get used to it), or you can just pass in the numpy array to sess.run as part of the feed_dict parameter. If you go this way then you should just create a tf.placeholder which will be populated by the value in feed_dict. Many of the basic tutorial examples here follow this approach:
https://github.com/aymericdamien/TensorFlow-Examples

I was also needing to accomplish this task (Dataset to array), but without turning on eager mode. I managed to come up with the following:
dataset = tf.data.Dataset.from_tensor_slices([[1,2],[3,4]])
tensor_array = tf.TensorArray(dtype=dataset.element_spec.dtype,
size=0,
dynamic_size=True,
element_shape=dataset.element_spec.shape)
tensor_array = dataset.reduce(tensor_array, lambda a, t: a.write(a.size(), t))
tensor = tf.reshape(tensor_array.concat(), (-1,)+tuple(dataset.element_spec.shape))
array = tf.Session().run(tensor)
print(type(array))
# <class 'numpy.ndarray'>
print(array)
# [[1 2]
# [3 4]]
What this does:
We start with a dataset containing 2 tensors of shape (2,).
Since eager is off, we need to run the dataset through a Tensorflow session. And since a session requires a tensor, we have to convert the dataset into a tensor.
To accomplish this, we use Dataset.reduce() to put all the elements into a TensorArray (symbolically).
We now use TensorArray.concat() to convert the whole array into a single tensor. However when we do this the whole dataset becomes flattened into a 1-D array. So we need tf.reshape() to get it back into our original tensor's shape, plus an extra dimension to stack them all.
Finally we take the tensor and run it through a session. This gives us our numpy ndarray.

This was the simplest method for me for supervised problem with (X, y).
def dataset_to_numpy(ds):
"""
Convert tensorflow dataset to numpy arrays
"""
images = []
labels = []
# Iterate over a dataset
for i, (image, label) in enumerate(tfds.as_numpy(ds)):
images.append(image)
labels.append(label)
for i, img in enumerate(images):
if i < 3:
print(img.shape, labels[i])
return images, labels
Usage:
ds = tfds.load('mnist', split='train', as_supervised=True)

You can use the following methods to get the images and the corresponding captions:
def separate_dataset(dataset):
images, labels = tf.compat.v1.data.make_one_shot_iterator(dataset.batch(len(dataset))).get_next()
return images, labels

Related

Testing tf.keras.layrs.RandomTranslation on 1D data in tensorflow

I'm trying to randomly translate 1D vectors as they get passed into my tensorflow model. I wanted to check how this would affect my data so I can scale the random translation amount properly, but every time I pass my data into the layer, the output is unchanged. Here is my standalone example:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
layer = tf.keras.layers.RandomTranslation(
height_factor=0.1,
width_factor=.1,
fill_mode='reflect',
interpolation='bilinear',
seed=None)
input_data = np.random.random((50,100,1,1))
check = layer(input_data)
check = check.numpy().reshape(-1, 100)
input_data = input_data.reshape(-1,100)
for i in range(2):
plt.plot(input_data[i], 'blue')
plt.plot(check[i], 'orange')
The image output is
What do I need to do to get this layer to work? I've tried adding dimensions but it didn't help. Is this because the "model" isn't in training mode?

Tensorflow 2.3. How to change the batch for each epoch?

I would like to train with a different custom image augmentation during each epoch in the training.
The wrong solution would be to save the augmented images, and run the training on the saved images. Because if you try to loads hundreds of thousands of images for the training, you will get a memory error.
The right solution will have to use augmentation during the fit routine.
Can you please indicate me how to do it, pointing out a working example?

It won't create many images, and you won't get a memory error. While iterating through the dataset, it will apply random transformations to the image, without "creating" new images that will be saved in memory. So just do it normally:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
import tensorflow_datasets as tfds
[train_set_raw] = tfds.load('cats_vs_dogs', split=['train[:100]'], as_supervised=True)
def augment(tensor):
tensor = tf.cast(x=tensor, dtype=tf.float32)
tensor = tf.image.rgb_to_grayscale(images=tensor)
tensor = tf.image.resize(images=tensor, size=(96, 96))
tensor = tf.divide(x=tensor, y=tf.constant(255.))
tensor = tf.image.random_flip_left_right(image=tensor)
tensor = tf.image.random_brightness(image=tensor, max_delta=2e-1)
tensor = tf.image.random_crop(value=tensor, size=(64, 64, 1))
return tensor
train_set_raw = train_set_raw.shuffle(128).map(lambda x, y: (augment(x), y)).batch(16)
import matplotlib.pyplot as plt
plt.imshow((next(iter(train_set_raw))[0][0][..., 0].numpy()*255).astype(int))
plt.show()

How can I clean memory or use SageMaker instead to avoid MemoryError: Unable to allocate for an array with shape (25000, 2000) and data type float64

I'm using keras to train a model on SageMaker, here's the code I'm using but I hit the error:
MemoryError: Unable to allocate 381. MiB for an array with shape (25000, 2000)
and data type float64
Here's the code:
import pandas as pd
import numpy as np
from keras.datasets import imdb
from keras import models, layers, optimizers, losses, metrics
import matplotlib.pyplot as plt
# load imbd preprocessed dataset
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(
num_words=2000)
# one-hot encoding all the integer into a binary matrix
def vectorize_sequences(sequences, dimension=2000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1.
return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
Then I get the error.
The first time when I run this code it works but it failed when I tried to re-run it, how I can fix it by cleaning the memory or is there a way that I can use the memory on SageMaker?

I wouldn't know about SageMaker or AWS specifically, but something you can do is cast your input to float32, which takes less memory space. You can cast it like this:
train_data = tf.cast(train_data, tf.float32)
float32 is the default value of Tensorflow weights so you don't need float64 anyway. Proof:
import tensorflow as tf
layer = tf.keras.layers.Dense(8)
print(layer(tf.random.uniform((10, 100), 0, 1)).dtype)
<dtype: 'float32'>
My other suggestions are to get less words from your dataset, or to not one-hot encode them. If you're planning on training a recurrent model with an embedding layer, you won't need to anyway.

PyTorch DataLoader returns the batch as a list with the batch as the only entry. How is the best way to get a tensor from my DataLoader

I currently have the following situation where I want to use DataLoader to batch a numpy array:
import numpy as np
import torch
import torch.utils.data as data_utils
# Create toy data
x = np.linspace(start=1, stop=10, num=10)
x = np.array([np.random.normal(size=len(x)) for i in range(100)])
print(x.shape)
# >> (100,10)
# Create DataLoader
input_as_tensor = torch.from_numpy(x).float()
dataset = data_utils.TensorDataset(input_as_tensor)
dataloader = data_utils.DataLoader(dataset,
batch_size=100,
)
batch = next(iter(dataloader))
print(type(batch))
# >> <class 'list'>
print(len(batch))
# >> 1
print(type(batch[0]))
# >> class 'torch.Tensor'>
I expect the batchto be already a torch.Tensor. As of now I index the batch like so, batch[0] to get a Tensor but I feel this is not really pretty and makes the code harder to read.
I found that the DataLoader takes a batch processing function called collate_fn. However, setting data_utils.DataLoader(..., collage_fn=lambda batch: batch[0]) only changes the list to a tuple (tensor([ 0.8454, ..., -0.5863]),) where the only entry is the batch as a Tensor.
You would help me a lot by helping me finding out how to elegantly transform the batch to a tensor (even if this would include telling me that indexing the single entry in batch is okay).

Sorry for inconvenience with my answer.
Actually, you don't have to create Dataset from your tensor, you can pass torch.Tensor directly as it implements __getitem__ and __len__, so this is sufficient:
import numpy as np
import torch
import torch.utils.data as data_utils
# Create toy data
x = np.linspace(start=1, stop=10, num=10)
x = np.array([np.random.normal(size=len(x)) for i in range(100)])
# Create DataLoader
dataset = torch.from_numpy(x).float()
dataloader = data_utils.DataLoader(dataset, batch_size=100)
batch = next(iter(dataloader))

Convert Tensorflow array to Keras array

I am trying to run a Keras model in which I read 88 images from a folder into a numpy array. This array should be converted into a Keras tensor so that I can work with the data in the model. I am running the following code:
import os
import numpy as np
from PIL import Image
from keras import backend as K
current_dir = os.path.dirname('__file__')
image_names = os.listdir(os.path.join(current_dir, 'images'))
images = np.ndarray((len(image_names), 256, 256), dtype=np.uint8)
for i, filename in enumerate(image_names):
images[i] = Image.open(os.path.join(current_dir,
'images',
filename)).resize((256, 256)).convert('L')
images = images.astype(K.floatx())
images *= 0.96/255
images += 0.02
images = images.reshape(images.shape[0], 256, 256, 1)
print(images.shape)
cats_q = K.variable(images)
print(type(cats_q))
print(K.is_keras_tensor(cats_q))
I am getting the following output
(87, 256, 256, 1)
<class 'tensorflow.python.ops.variables.Variable'>
False
How can I convert the output into a Keras tensor? Any help would be much appreciated!
Many thanks,
Andi

You should build your model first, including an input tensor built with the correct size to handle this data, then pass the numpy array to the keras model when you call the 'fit' function.
When you build a keras model, the tensors are edges in the computation graph. You don't want to initialize it with a value, but with a size, then pass the value when necessary.
This page on the keras functional API has some good examples of this.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert Tensorflow dataset to 2D numpy array - python

You can use the following methods to get the images and the corresponding captions: def separate_dataset(dataset): images, labels = tf.compat.v1.data.make_one_shot_iterator(dataset.batch(len(dataset))).get_next() return images, labels

Related

Testing tf.keras.layrs.RandomTranslation on 1D data in tensorflow

Tensorflow 2.3. How to change the batch for each epoch?

How can I clean memory or use SageMaker instead to avoid MemoryError: Unable to allocate for an array with shape (25000, 2000) and data type float64

PyTorch DataLoader returns the batch as a list with the batch as the only entry. How is the best way to get a tensor from my DataLoader

Convert Tensorflow array to Keras array

Categories

Resources