I'm trying to translate an image using the following code.
im = io.imread("path/to/my/image.jpg")
shift_image = scipy.ndimage.shift(im, np.array([1, 2]))
I'm using skimage to read the image.
I get the following error
RuntimeError: sequence argument must have length equal to input rank
The name ndimage (with "n-dimensional" in it) suggests that the package is not going to assume that images are two dimensional, and that any other dimension means something else. After all, 3D images (MRI) are a thing. So in effect, it operates with an abstract n-dimensional array. For an two-dimensional RGB image, the shape is (height, width, 3) because of the three color channels. So the shift would be [1, 2, 0].
Related
I have some images and would like to look at the eigenvalues of the images (as image is a matrix). My issue is that the image is in the shape of TensorShape([577, 700, 3])
How can I possibly to some preprocessing to be able to have its eigen decomposition?
My try:
import tensorflow as tf
import numpy as np
from numpy import linalg as LA
import matplotlib.pyplot as plt
image_path = tf.keras.utils.get_file('YellowLabradorLooking_new.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg')
image_raw = tf.io.read_file(image_path)
image = tf.image.decode_image(image_raw)
image = tf.cast(image, tf.float32)
image = tf.image.resize(image, (224, 224))
LA.eig(image)
If you have n images, and if images are of the same size, and if images are somehow centered, then you may consider that images are samples from a distribution, and you can use eigenvalue decomposition to study how different pixels in the image vary across the collection.
In this situation: say you have a collection of n [H,W] images. You can flatten images and form a [H*W, n] matrix. If the images are RGB, it can be a [H*W*3, n] array -- i.e. each pixel location and each color channel is treated as an independent dimension.
Eigenvalue decomposition will give you a collection of H*W*3-dimensional vectors, which can be reshaped back into RGB images. Getting all eigenvectors is going to be impossible (H*W*3*H*W*3 is usually huge), however calculating top 3-5 eigenvalues and eigenvectors shouldn't be a problem even if HxWx3 is large.
You can find a more detailed description searching for "Eigenfaces"; e.g. opencv-eigenfaces-for-face-recognition, wikipedia, classic CVPR91 paper, etc.
A grayscale image can be (and usually is) represented as a matrix. A colored image can not. It is represented using three matrices, one for each color channel.
This is the problem with your code snippet. la.eig() expects a square array, or an array containing square arrays in its final two axes, but got an array of shape (224, 224, 3).
To fix this, you can shift the two 224 axes to the end using the np.rollaxis() function. The eigenvalues and -vectors will be calculated separately for each color channel.
I was working on the classification of images. I came across this one line and I'm not able to figure the meaning.
plt.imshow(np.squeeze(x_train[3]), cmap=“gray”);
Could you explain this line to me?
This really depends on what your variable x_train contains. You give no context to your code. I can infer that x_train is a numpy array that is indexed at the fourth element or row.
plt.imshow is a function from the matplotlib library which accepts an array that represents an image and draws that to the screen. The array is usually either a 2D-array representing rows and columns of pixels or a 3D-array, where every pixel is characterized by either 3 values for RGB or 4 values for RGBA (A stands for alpha and indicates the transparency).
The cmap="gray" is a keyword argument passed to plt.imshow, which is responsible for mapping a specific colormap to the values found in the array that you passed as the first argument. You can look up the colormap if you google matplotlib colormaps.
Since the gray colormap is used in your code, it is very likely that your array is a 2D-array that represents a grayscale image. In that case, every pixel is only described by one value (usually between 0 and 255) that indicates its color on a scale from black (0) to white (255).
If you pass a 3D-array (so a color image) to imshow, matplotlib will automatically interpret the values in the third dimension as RGB values and correctly show the image.
If you however pass a 2D-array, which is probably the case, matplotlib will map the values to a colormap, which is "viridis" by default. This will result in a green / yellow / blue image. Therefore, it is necessary to tell matplotlib to map it to a grayscale colormap.
I assume that x_train is therefore a numpy array with more than two dimensions that probably contains multiple images. When you index it at the index 3, you obtain a part of the array that holds the values for the image you want to display. This array seems to have more dimensions than are really in use, which is why np.squeeze is used to reduce the unnecessary dimensions. As an example:
import numpy as np
test_array = np.array([[1, 2, 3]])
np.squeeze(test_array)
>>> array([1, 2, 3])
This is all I can tell you from the little information you've given. Next time consider providing more context to your question.
Recently I followed a few tutorials on machine learning, and now I want to test if I can make some image recognition program by myself. For this I want to use the CIFAR 10 dataset, but I think I have a small problem in the conversion of the dataset.
For who is not familiar with this set: the dataset comes as lists of n rows and 3072 columns, in which the first 1024 columns represent the red values, the second 1024 the green values and the last are the blue values. Each row is a single image (size 32x32) and the pixel rows are stacked after each other (first 32 values are the red values for the top-most row of pixels, etc.)
What I wanted to do with this dataset is to transform it to a 4D tensor (with numpy), so I can view the images with matplotlibs .imshow(). the tensor I made has this shape: (n, 32, 32, 3), so the first 'dimension' stores all images, the second stores rows of pixels, the third stores individual pixels and the last represents the rgb values of those pixels. Here is the function I made that should do this:
def rawToRgb(data):
length = data.shape[0]
# convert to flat img array with rgb pixels
newAr = np.zeros([length, 1024, 3])
for img in range(length):
for pixel in range(1024):
newAr[img, pixel, 0] = data[img, pixel]
newAr[img, pixel, 1] = data[img, pixel+1024]
newAr[img, pixel, 2] = data[img, pixel+2048]
# convert to 2D img array
newAr2D = newAr.reshape([length, 32, 32, 3])
# plt.imshow(newAr2D[5998])
# plt.show()
return newAr2D
Which takes a single parameter (a tensor of shape (n, 3072)). I have commented out the pyplot code, as this is only for testing, but when testing, I noticed that everything seems to be ok (I can recognise the shapes of the objects in the images, but I am not sure if the colours are good or not, as I get some oddly-coloured images as well as some pretty normal images... Here are a few examples: purple plane, blue cat, normal horse, blue frog.
Can anyone tell me wether I am making a mistake or not?
The images that appear oddly-coloured are the negative of the actual image, so you need to subtract each pixel value from 255 to get the true value. If you simply want to see what the original images look like, use:
from scipy.misc import imread
import matplotlib.pyplot as plt
img = imread(file_path)
plt.imshow(255 - img)
plt.show()
The original cause of the problem is that the CIFAR-10 data stores the pixel values on a scale of 0-255, but matplotlib's imshow() method (which I assume you are using) expects inputs between 0 and 1. Given an input that is not scaled between 0 and 1, imshow() does some normalization internally, which causes some images to become negatives.
When I load an image with PIL and convert it into a NumPy array:
image = Image.open("myimage.png")
pixels = np.asarray(image)
The data is stored as [x][y][channel]. I.e., the value of pixels[3, 5, 0] will be the the (3, 5) pixel, and the red component of that pixel.
However, I am using a library which requires the image to be in the format [channel][x][y]. Therefore, I am wondering how I can do this conversion?
I know that NumPy has a reshape function, but this doesn't actually allow you to "swap" over the dimensions as I want.
Any help? Thanks!
In order to get the dimensions in the order that you want, you could use the transpose method as follows:
image = Image.open("myimage.png")
pixels = np.asarray(image).transpose(2,0,1)
I have a 2D array that I want to create an image from. I want to transform the image array of dimensions 140x120 to an array of 140x120x3 by stacking the same array 3 times (to get a grayscale image to use with skimage).
I tried the following:
image = np.uint8([image, image, image])
which results in a 3x120x140 image. How can I reorder the array to get 120x140x3 instead?
np.dstack([image, image, image]) (docs) will return an array of the desired shape, but whether this has the right semantics for your application depends on your image generation library.