How to normalize a 4D numpy array? - python

I have a three dimensional numpy array of images (CIFAR-10 dataset). The image array shape is like below:
a = np.random.rand(32, 32, 3)
Before I do any deep learning, I want to normalize the data to get better result. With a 1D array, I know we can do min max normalization like this:
v = np.random.rand(6)
(v - v.min())/(v.max() - v.min())
Out[68]:
array([ 0.89502294, 0. , 1. , 0.65069468, 0.63657915,
0.08932196])
However, when it comes to a 3D array, I am totally lost. Specifically, I have the following questions:
Along which axis do we take the min and max?
How do we implement this with the 3D array?
I appreciate your help!
EDIT:
It turns out I need to work with a 4D Numpy array with shape (202, 32, 32, 3), so the first dimension would be the index for the image, and the last 3 dimensions are the actual image. It'll be great if someone can provide me with the code to normalize such a 4D array. Thanks!
EDIT 2:
Thanks to #Eric's code below, I've figured it out:
x_min = x.min(axis=(1, 2), keepdims=True)
x_max = x.max(axis=(1, 2), keepdims=True)
x = (x - x_min)/(x_max-x_min)

Assuming you're working with image data of shape (W, H, 3), you should probably normalize over each channel (axis=2) separately, as mentioned in the other answer.
You can do this with:
# keepdims makes the result shape (1, 1, 3) instead of (3,). This doesn't matter here, but
# would matter if you wanted to normalize over a different axis.
v_min = v.min(axis=(0, 1), keepdims=True)
v_max = v.max(axis=(0, 1), keepdims=True)
(v - v_min)/(v_max - v_min)

Along which axis do we take the min and max?
To answer this we probably need more information about your data, but in general, when discussing 3 channel images for example, we would normalize using the per-channel min and max. this means that we would perform the normalization 3 times - once per channel.
Here's an example:
img = numpy.random.randint(0, 100, size=(10, 10, 3)) # Generating some random numbers
img = img.astype(numpy.float32) # converting array of ints to floats
img_a = img[:, :, 0]
img_b = img[:, :, 1]
img_c = img[:, :, 2] # Extracting single channels from 3 channel image
# The above code could also be replaced with cv2.split(img) << which will return 3 numpy arrays (using opencv)
# normalizing per channel data:
img_a = (img_a - numpy.min(img_a)) / (numpy.max(img_a) - numpy.min(img_a))
img_b = (img_b - numpy.min(img_b)) / (numpy.max(img_b) - numpy.min(img_b))
img_c = (img_c - numpy.min(img_c)) / (numpy.max(img_c) - numpy.min(img_c))
# putting the 3 channels back together:
img_norm = numpy.empty((10, 10, 3), dtype=numpy.float32)
img_norm[:, :, 0] = img_a
img_norm[:, :, 1] = img_b
img_norm[:, :, 2] = img_c
Edit: It just occurred to me that once you have the one channel data (32x32 image for instance) you can simply use:
from sklearn.preprocessing import normalize
img_a_norm = normalize(img_a)
How do we work with the 3D array?
Well, this is a bit of a big question. If you need functions like array-wise min and max I would use the Numpy versions. Indexing, for instance, is achieved through axis-wide separators - as you can see from my example above.
Also, please refer to Numpy's documentation of ndarray # https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html
to learn more. they really have an amazing set of tools for n-dimensional arrays.

There are different approaches here. You can either decide to normalize over the whole batch of images or normalize per single image. To do that you can either use the mean of a single image or use the mean of the whole batch of images or use a fixed mean from another dataset - e.g. you can use the ImageNet mean value.
If you want to do the same as Tensorflow's tf.image.per_image_standardization you should normalize per single image with the mean of this image. So you loop through all images and do the normalization for all axes in a single image like this:
import math
import numpy as np
from PIL import Image
# open images
image_1 = Image.open("your_image_1.jpg")
image_2 = Image.open("your_image_2.jpg")
images = [image_1, image_2]
images = np.array(images)
standardized_images = []
# standardize images
for image in images:
mean = image.mean()
stddev = image.std()
adjusted_stddev = max(stddev, 1.0/math.sqrt(image.size))
standardized_image = (image - mean) / adjusted_stddev
standardized_images.append(standardized_image)
standardized_images = np.array(standardized_images)

Related

Python cv2.remap from mesh creates pixelated distortions

I have a mesh that is 4 times smaller than an image, I want to distort the image with the information from the mesh but, when using cv2.remap it makes the distortion pixelated (See image below) . How could I make a smoother distortion?
Original:
Desired output:
My output:
My code:
img = np.array(Image.open('astronaut.jpg')) # Shape -> (512, 512, 3)
mesh = Mesh('astronaut.msh').get_uvs() # Shape -> (128, 128, 2), 2 channels for x and y
new_mesh = np.zeros((img.shape[1], img.shape[0], 2))
new_mesh[:,:,0] = np.repeat(np.repeat(mesh[:,:,0], 4, axis=0), 4, axis=1)
new_mesh[:,:,1] = np.repeat(np.repeat(mesh[:,:,1], 4, axis=0), 4, axis=1)
nh, nw = img.shape[:2]
xs, ys = np.meshgrid(np.arange(0, nw), np.arange(0, nh))
xs = xs + new_mesh[:,:,0] * 4 # multiply by constant to modulate distort strength
ys = ys + new_mesh[:,:,1] * 4
xs = np.float32(xs)
ys = np.float32(ys)
dst= cv2.remap(img.astype(np.uint8), xs, ys, cv2.INTER_CUBIC)
OpenCV is not to blame. It does exactly what you tell it to.
Those artefacts come from your use of np.repeat. That merely repeats each index element in your map array. You're not upsampling your mesh properly, you effectively just copy 4x4 patches with that code.
Properly upsample your mesh (np.repeat is wrong), then you get good results. You can easily do that using cv.resize and anything other than nearest-neighbor interpolation. If you need to control boundary behavior exactly, you'll need warpAffine and a custom transformation matrix. You could even use cv.pyrUp (twice).
When you've presented a MRE (some data for the mesh), I'll update my answer with working code.

vectorized "by-layer" scaling of numpy array

I have a numpy array (let's say 100x64x64).
My goal is to scale each 64x64 layer independently and store a scaler for later use.
This is how it can be achieved with a for-loop solution:
scalers_dict={}
for i in range(X.shape[0]):
scalers_dict[i] = MinMaxScaler()
#fitting the scaler
X[i, :, :] = scalers_dict[i].fit_transform(X[i, :, :])
#saving dict of scalers
joblib.dump(value=scalers_dict,filename="dict_of_scalers.scaler")
My real array is much bigger, and it takes quite a while to iterate through it.
Do you have in mind some more vectorized solution for that, or for-loop is the only way?
If I understand correctly how MinMaxScaler works, it can operate on independent arrays which reduce along axis=0.
To make this useful for your case, you'd need to transform X into a (64 * 64, 100) array:
s = X.shape
X = np.moveaxis(X, 0, -1).reshape(-1, s[0])
Alternatively, you can write
X = X.reshape(s[0], -1).T
Now you can do the scaling with
M = MinMaxScaler()
X = M.fit_transform(X)
Since the actual fit is computed on the first dimension, all the results will be of size 100. This will broadcast perfectly now that the last dimension is of the same size.
To get the original shape back, invert the original transformation:
X = X.T.reshape(s)
When you are done, M will be a scaler calibrated for 100 features. There is no need for a dictionary here. Remember that a dictionary keyed by a sequence of integers can better be expressed as a list or array, which is what happens here.
IIUC, you can manually scale:
mm, MM = inputs.min(axis=(1,2)), inputs.max(axis=(1,2))
# save these for later use
joblib.dump((mm,MM), 'minmax.joblib')
def scale(inputs, mm, MM):
return (inputs - mm[:,None,None])/(MM-mm)[:,None,None]
# load pre-saved min & max
mm, MM = joblib.load('minmax.joblib')
# scaled inputs
scale(inputs, mm, MM)

How resize images when those converted to numpy array

Consider we only have images as.npy file. Is it possible to resizing images without converting their to images (because I'm looking for a way that is fast when run the code).
for more info, I asked the way without converting to image, I have images but i don't want use those in code, because my dataset is too large and running with images is so slow, on the other hand, Im not sure which size is better for my imeges, So Im looking for a way that first convert images to npy and save .npy file and then preprocess npy file, for example resize the dimension of images.
Try PIL, maybe it's fast enough for you.
import numpy as np
from PIL import Image
arr = np.load('img.npy')
img = Image.fromarray(arr)
img.resize(size=(100, 100))
Note that you have to compute the aspect ratio if you want to keep it. Or you can use Image.thumbnail(), which can take an antialias filter.
There's also scikit-image, but I suspect it's using PIL under the hood. It works on NumPy arrays:
import skimage.transform as st
st.resize(arr, (100, 100))
I guess the other option is OpenCV.
If you are only dealing with numpy arrays, I think slicing would be enough
Say, the shape of the loaded numpy array is (m, n) (one channel), and the target shape is (a, b). Then, the stride can be (s1, s2) = (m // a, n // b)
So the original array can be sliced by
new_array = old_array[::s1, ::s2]
EDIT
To scale up an array is also quite straight forward if you use masks for advanced slicing. For example, the shape of the original array is (m, n), and the target shape is (a, b). Then, as an example
a, b = 300, 200
m, n = 3, 4
original = np.linspace(1, 12, 12).reshape(3, 4)
canvas = np.zeros((a, b))
(s1, s2) = (a // m, b // n) # the scalar
# the two masks
mask_x = np.concatenate([np.ones(s1) * ind for ind in range(m)])
mask_y = np.concatenate([np.ones(s2) * ind for ind in range(n)])
# make sure the residuals are taken into account
if len(mask_x) < a: mask_x = np.concatenate([mask_x, np.ones(len(mask_x) % a) * (m - 1)])
if len(mask_y) < b: mask_y = np.concatenate([mask_y, np.ones(len(mask_y) % b) * (n - 1)])
mask_x = mask_x.astype(np.int8).tolist()
mask_y = mask_y.astype(np.int8).tolist()
canvas = original[mask_x, :]
canvas = canvas[:, mask_y]

how to merge different dimensions arrays in python?

I am analyzing some image represented datasets using keras. I am stuck that I have two different dimensions of images. Please see the snapshot. Features has 14637 images having dimension (10,10,3) and features2 has dimension (10,10,100)
Is there any way that I can merge/concatenate these two data together.?
If features and features2 contain the features of the same batch of images, that is features[i] is the same image of features2[i] for each i, then it would make sense to group the features in a single array using the numpy function concatenate():
newArray = np.concatenate((features, features2), axis=3)
Where 3 is the axis along which the arrays will be concatenated. In this case, you'll end up with a new array having dimension (14637, 10, 10, 103).
However, if they refer to completely different batches of images and you would like to merge them on the first axis such that the 14637 images of features2 are placed after the first 14637 image, then, there no way you can end up with an array, since numpy array are structured as matrix, non as a list of objects.
For instance, if you try to execute:
> a = np.array([[0, 1, 2]]) // shape = (1, 3)
> b = np.array([[0, 1]]) // shape = (1, 2)
> c = np.concatenate((a, b), axis=0)
Then, you'll get:
ValueError: all the input array dimensions except for the concatenation axis must match exactly
since you are concatenating along axis = 0 but axis 1's dimensions differ.
If dealing with numpy arrays, you should be able to use concatenate method and specify the axis, along which the data should be merged. Basically: np.concatenate((array_a, array_b), axis=2)
I think it would be better if you use class.
class your_class:
array_1 = []
array_2 = []
final_array = []
for x in range(len(your_previous_one_array)):
temp_class = your_class
temp_class.array_1 = your_previous_one_array
temp_class.array_2 = your_previous_two_array
final_array.append(temp_class)

How to apply calculations to elements of a multidimensional matrix in Tensorflow?

I'm relatively new to Python and even more so to Tensorflow so I've been working through some tutorials such as this tutorial. A challenge given was to make an image greyscale. One approach taken here is to just take one colour channel value and duplicate it across all channels. Another is to take an average which can be achieved using tf.reduce_mean as done here. However there are many ways to make an image monochromatic as anyone who has played with GIMP or Photoshop will know. One standard method defined adjusts for the way humans perceive colour and requires that the three colour channels are individually adjusted this way:
Grey = (Red * 0.2126 + Green * 0.7152 + Blue * 0.0722)
Anyway I've achieved it by doing this:
import tensorflow as tf
import numpy as np
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
filename = "MarshOrchid.jpg"
raw_image_data = mpimg.imread(filename)
image = tf.placeholder("float", [None, None, 3])
r = tf.slice(image,[0,0,0],[-1,-1,1])
g = tf.slice(image,[0,0,1],[-1,-1,1])
b = tf.slice(image,[0,0,2],[-1,-1,1])
r = tf.scalar_mul(0.2126,r)
g = tf.scalar_mul(0.7152,g)
b = tf.scalar_mul(0.0722,b)
grey = tf.add(r,tf.add(g,b))
out = tf.concat(2, [grey, grey, grey])
out = tf.cast(out, tf.uint8)
with tf.Session() as session:
result = session.run(out, feed_dict={image: raw_image_data})
plt.imshow(result)
plt.show()
This to me seems hugely inelegant having to cut up the data and apply calculations and then recombine them. A matrix multiplication on individual RGB tuples would be efficient or barring that a function that takes an individual RGB tuple and returns a greyscaled tuple. I've looked at tf.map_fn but can't seem to make it work for this.
Any suggestions or improvements?
How about this?
img = tf.ones([100, 100, 3])
r, g, b = tf.unstack(img, axis=2)
grey = r * 0.2126 + g * 0.7152 + b * 0.0722
out = tf.stack([grey, grey, grey], axis=2)
out = tf.cast(out, tf.uint8)
sample of map_fn
shape of x is (2, 4), so shape of elms_fn is (4,)
if shape of x is (100, 100, 3), shape of elms_fn will be (100, 3)
x = tf.constant([[1, 2, 3, 4],
[5, 6, 7, 8]], dtype=tf.float32)
def avg_fc(elms_fn):
# shape of elms_fn is (4,)
# compute average for each row and return it
avg = tf.reduce_mean(elms_fn)
return avg
# map_fn will stack avg at axis 0
res = tf.map_fn(avg_fc, x)
with tf.Session() as sess:
a = sess.run(res) #[2.5, 6.5]
So having really looked in to this topic, in the current release of tensorflow (r0.12) there doesn't appear to be a simple way to apply custom functions to tuples of values, especially if the result does not effect a reduce. As my initial effort and that of the answer from #xxi you pretty much have to dis-aggregate the tuples before applying a function to them collectively.
I figured out another way to get the result that I wanted without slicing or unstacking but instead reshaping and matrix multiplication which is:
import tensorflow as tf
import numpy as np
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
filename = "MarshOrchid.jpg"
raw_image_data = mpimg.imread(filename)
image = tf.placeholder("float", [None, None, 3])
out = tf.reshape(image, [-1,3])
out = tf.matmul(out,[[0.2126, 0, 0], [0, 0.7152, 0], [0, 0, 0.0722]])
out = tf.reduce_sum(out, 1, keep_dims=True)
out = tf.concat(1, [out, out, out])
out = tf.reshape(out, tf.shape(image))
out = tf.cast(out, tf.uint8)
with tf.Session() as session:
result = session.run(out, feed_dict={image: raw_image_data})
plt.imshow(result)
plt.show()
This worked for the narrow purpose of greyscaling an image but doesn't really give a design pattern to apply for dealing with more generic calculations.
Out of curiosity I profiled these three methods in terms of execution time and memory usage. So which was better?
Method 1 - Slicing: 1.6 seconds & 1.0 GiB memory usage
Method 2 - Unstacking: 1.6 seconds & 1.1 GiB memory usage
Method 3 - Reshape: 1.4 seconds & 1.2 GiB memory usage
So no major differences in performance but interesting nonetheless.
In case you were wondering why the process is so slow, the image used is 5528 x 3685 pixels. But yeah still pretty slow compared to Gimp and others.

Categories