I have an image dataset that I retrieved by means of tf.data.Dataset.list_files().
In my .map() function, I read and decode images, like below:
def map_function(filepath):
image = tf.io.read_file(filename=filepath)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
image = tf.image.resize(image, [IMAGE_WIDTH, IMAGE_HEIGHT])
return image
If I use(this below works)
dataset = tf.data.Dataset.list_files(file_pattern=...)
dataset = dataset.map(map_function)
for image in dataset.as_numpy_iterator():
#Correctly outputs the numpy array, no error is displayed/encountered
print(image)
However, if I use(this below throws error):
dataset = tf.data.Dataset.list_files(file_pattern=...)
dataset = dataset.batch(32).map(map_function)
for image in dataset.as_numpy_iterator():
#Error is displayed
print(image)
ValueError: Shape must be rank 0 but is rank 1 for 'ReadFile' (op:
'ReadFile') with input shapes: [?].
Now, according to this: https://www.tensorflow.org/guide/data_performance#vectorizing_mapping, the code should not fail and the preprocessing step should be optimized(batch processing vs one-time-processing).
Where is the mistake in my code?
*** If I use map().batch() it works fine
The error occurs because map_function expects unbatched elements, but in the second example you give it batched elements.
The example in https://www.tensorflow.org/guide/data_performance is being tricky by defining an increment function which can apply to both batched and unbatched elements, since adding 1 to a batched element like [1, 2, 3] will result in [2, 3, 4].
def increment(x):
return x+1
To use vectorization, you would need to write a vectorized_map_function, which takes in a vector of unbatched elements, applies the map function to each element in the vector, then returns a vector of the results.
In your case though, I don't think vectorizing will have a noticeable impact, since the cost of reading and decoding files is much higher than the overhead of calling a function. Vectorization is most impactful when the map function is extremely cheap, to the point where the time spent on function invocation is comparable to the time spent doing actual work in the map function.
Related
In python I have a function that saturates values stored in an image
def saturateData(minMax, img):
alpB = cal_alpB(minMax)
img[img>minMax[1]] = minMax[1]
img[img<0] = 0
where minMax is a list stores the saturation values to be applied on an image numpy array. I need to do the same operation at a cv::Mat object.
Is there any function that does the same operation. If not how can I achieve this? (since I am working with the images I have to it fast, the function I wrote is O(N^2) which is inefficient that's why I am asking this question!)
I have an image I've read from file with shape (m,n,3) (i.e. it has 3 channels). I also have a matrix to convert the color space with dimensions (3,3). I've already arrived at a few different ways of applying this matrix to each vector in the image; for example,
np.einsum('ij,...j',transform,image)
appears to make for the same results as the following (far slower) implementation.
def convert(im: np.array, transform: np.array) -> np.array:
""" Convert an image array to another colorspace """
dimensions = len(im.shape)
axes = im.shape[:dimensions-1]
# Create a new array (respecting mutability)
new_ = np.empty(im.shape)
for coordinate in np.ndindex(axes):
pixel = im[coordinate]
pixel_prime = transform # pixel
new_[coordinate] = pixel_prime
return new_
However, I found that the following is even more efficient while testing on the example image with line_profiler.
np.moveaxis(np.tensordot(transform, X, axes=((-1),(-1))), 0, 2)
The problem I'm having here is using just a np.tensordot, i.e. removing the need for np.moveaxis. I've spent a few hours attempting to find a solution (I'm guessing it resides in choosing the correct axes), so I thought I'd ask others for help.
You can do it concisely with tensordot if you make image the first argument:
np.tensordot(image, transform, axes=(-1, 1))
You can get better performance from einsum by using the argument optimize=True (requires numpy 1.12 or later):
np.einsum('ij,...j', transform, image, optimize=True)
Or (as Paul Panzer pointed out in a comment), you can simply use matrix multiplication:
image # transform.T
They all take about the same time on my computer.
I'm saving a binary OpenCV Mat to a HDF5 file.
In OpenCV Mat files are stored in memory with first index channel, second index is x-Coordinate and third index is y-Coordinate, so an address access looks like:
address = M.data + M.step[0]*y + M.step[1]*x + ch
Where M.step[0] = NUM_X*NUM_CH and M.step[1] = MAX_CH
The problem I experience is, that Matlab and Python interpret the data in a wrong way.
Though the dimensions of the read data are set correctly (channel, x, y), when I look into the data storage I see, that e.g. numpy reads the data backwards, meaning first y is incremented, then x and lastly the channel number, which means, that it assumes planar configuration of the channel data, while it is actually interleaved. This results in wrong displaying of images.
Is there a way to tell numpy/Matlab to change the data access, without reordering the data?
Thanks in advance.
Edit:
I store everything in a rank 3 dataset in the hdf5 file, where dimension 1 is channel, dimension 2 is x-coordinate and dimension 3 is y-coordinate.
If I read that dataset and process it with OpenCV in C++ the correct image is being displayed. OpenCV in python doesn't work because of error: (-206) Unrecognized or unsupported array type in function cvGetMat
I could solve the question in python by changing the shape and stride of the array, which had been calculated in a wrong way:
If i had a 3*1280*720 uint8 Image with 3 being channel number, 1280 being x-coordinate and 720 being y-coordinate, I would have to assign the shape, so it would look like data.shape = (720, 1280, 3) and the stride would have to be changed to data.strides = (3*1280,3,1).
This link explains how numpy arrays work in memory:
Numpy doc
I'm am trying to implement BiLSTM-Max as described in the following paper:
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data.
I am using Tensorflow for my implementation. I started off with an original LSTM code but have successfully made modifications so that it can run with dynamic length input and also bidirectional (i.e Dynamic Bi-LSTM).
# Bi-LSTM, returns output of shape [n_step, batch_size, n_input]
outputs = tf.contrib.rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,dtype=tf.float32)
# Change output back to [batch_size, n_step, n_input]
outputs = tf.transpose(tf.stack(outputs), [1, 0, 2])
# Retrieve the last output corresponding the length of input sequence
batch_size_ = tf.shape(outputs)[0]
index = tf.range(0, batch_size_) * seq_max_len + (seqlen - 1)
outputs = tf.gather(tf.reshape(outputs, [-1, 2*n_hidden]), index)
Next modifying it to Bi-LSTM-Max, I replaced taking the last ouput and find the max across n_steps as follows:
# Bi-LSTM, returns output of shape [n_step, batch_size, n_input]
outputs = tf.contrib.rnn.static_bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, x,dtype=tf.float32)
# Change output back to [batch_size, n_step, n_input]
outputs = tf.transpose(tf.stack(outputs), [1, 0, 2])
# Retrieve the max output across n_steps
outputs = tf.reduce_max(outputs, reduction_indices=[1])
When I took the max across the n_steps dimensions, I had assumed that those indices>seqlen are 0s, so I could take the max across the entire dimension instead of taking max from 0 to seqlen. Upon closer inspection, I realised that the values of the non assigned indices may be non-zero due to random initialization or it may just be the last assigned value in memory.
This operation is trivial in python arrays, however, for Tensor operation I can't find an easy way. Does anyone have an idea for this?
Probably the easiest thing to do would be to manually set the invalid outputs to zero or -∞ before finding the maximum. You can do that quite easily with tf.sequence_mask and tf.where:
seq_mask = tf.sequence_mask(seqlen, seq_max_len)
# You can also use e.g. -np.inf * tf.ones_like(outputs)
outputs_masked = tf.where(seq_mask, outputs, tf.zeros_like(outputs))
outputs = tf.reduce_max(outputs, axis=1) # axis is preferred to reduction_indices
After doing some processing on an audio or image array, it needs to be normalized within a range before it can be written back to a file. This can be done like so:
# Normalize audio channels to between -1.0 and +1.0
audio[:,0] = audio[:,0]/abs(audio[:,0]).max()
audio[:,1] = audio[:,1]/abs(audio[:,1]).max()
# Normalize image to between 0 and 255
image = image/(image.max()/255.0)
Is there a less verbose, convenience function way to do this? matplotlib.colors.Normalize() doesn't seem to be related.
# Normalize audio channels to between -1.0 and +1.0
audio /= np.max(np.abs(audio),axis=0)
# Normalize image to between 0 and 255
image *= (255.0/image.max())
Using /= and *= allows you to eliminate an intermediate temporary array, thus saving some memory. Multiplication is less expensive than division, so
image *= 255.0/image.max() # Uses 1 division and image.size multiplications
is marginally faster than
image /= image.max()/255.0 # Uses 1+image.size divisions
Since we are using basic numpy methods here, I think this is about as efficient a solution in numpy as can be.
In-place operations do not change the dtype of the container array. Since the desired normalized values are floats, the audio and image arrays need to have floating-point point dtype before the in-place operations are performed.
If they are not already of floating-point dtype, you'll need to convert them using astype. For example,
image = image.astype('float64')
If the array contains both positive and negative data, I'd go with:
import numpy as np
a = np.random.rand(3,2)
# Normalised [0,1]
b = (a - np.min(a))/np.ptp(a)
# Normalised [0,255] as integer: don't forget the parenthesis before astype(int)
c = (255*(a - np.min(a))/np.ptp(a)).astype(int)
# Normalised [-1,1]
d = 2.*(a - np.min(a))/np.ptp(a)-1
If the array contains nan, one solution could be to just remove them as:
def nan_ptp(a):
return np.ptp(a[np.isfinite(a)])
b = (a - np.nanmin(a))/nan_ptp(a)
However, depending on the context you might want to treat nan differently. E.g. interpolate the value, replacing in with e.g. 0, or raise an error.
Finally, worth mentioning even if it's not OP's question, standardization:
e = (a - np.mean(a)) / np.std(a)
You can also rescale using sklearn. The advantages are that you can adjust normalize the standard deviation, in addition to mean-centering the data, and that you can do this on either axis, by features, or by records.
from sklearn.preprocessing import scale
X = scale( X, axis=0, with_mean=True, with_std=True, copy=True )
The keyword arguments axis, with_mean, with_std are self explanatory, and are shown in their default state. The argument copy performs the operation in-place if it is set to False. Documentation here.
You are trying to min-max scale the values of audio between -1 and +1 and image between 0 and 255.
Using sklearn.preprocessing.minmax_scale, should easily solve your problem.
e.g.:
audio_scaled = minmax_scale(audio, feature_range=(-1,1))
and
shape = image.shape
image_scaled = minmax_scale(image.ravel(), feature_range=(0,255)).reshape(shape)
note: Not to be confused with the operation that scales the norm (length) of a vector to a certain value (usually 1), which is also commonly referred to as normalization.
This answer to a similar question solved the problem for me with
np.interp(a, (a.min(), a.max()), (-1, +1))
You can use the "i" (as in idiv, imul..) version, and it doesn't look half bad:
image /= (image.max()/255.0)
For the other case you can write a function to normalize an n-dimensional array by colums:
def normalize_columns(arr):
rows, cols = arr.shape
for col in xrange(cols):
arr[:,col] /= abs(arr[:,col]).max()
A simple solution is using the scalers offered by the sklearn.preprocessing library.
scaler = sk.MinMaxScaler(feature_range=(0, 250))
scaler = scaler.fit(X)
X_scaled = scaler.transform(X)
# Checking reconstruction
X_rec = scaler.inverse_transform(X_scaled)
The error X_rec-X will be zero. You can adjust the feature_range for your needs, or even use a standart scaler sk.StandardScaler()
I tried following this, and got the error
TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''
The numpy array I was trying to normalize was an integer array. It seems they deprecated type casting in versions > 1.10, and you have to use numpy.true_divide() to resolve that.
arr = np.array(img)
arr = np.true_divide(arr,[255.0],out=None)
img was an PIL.Image object.