I have some images and would like to look at the eigenvalues of the images (as image is a matrix). My issue is that the image is in the shape of TensorShape([577, 700, 3])
How can I possibly to some preprocessing to be able to have its eigen decomposition?
My try:
import tensorflow as tf
import numpy as np
from numpy import linalg as LA
import matplotlib.pyplot as plt
image_path = tf.keras.utils.get_file('YellowLabradorLooking_new.jpg', 'https://storage.googleapis.com/download.tensorflow.org/example_images/YellowLabradorLooking_new.jpg')
image_raw = tf.io.read_file(image_path)
image = tf.image.decode_image(image_raw)
image = tf.cast(image, tf.float32)
image = tf.image.resize(image, (224, 224))
LA.eig(image)
If you have n images, and if images are of the same size, and if images are somehow centered, then you may consider that images are samples from a distribution, and you can use eigenvalue decomposition to study how different pixels in the image vary across the collection.
In this situation: say you have a collection of n [H,W] images. You can flatten images and form a [H*W, n] matrix. If the images are RGB, it can be a [H*W*3, n] array -- i.e. each pixel location and each color channel is treated as an independent dimension.
Eigenvalue decomposition will give you a collection of H*W*3-dimensional vectors, which can be reshaped back into RGB images. Getting all eigenvectors is going to be impossible (H*W*3*H*W*3 is usually huge), however calculating top 3-5 eigenvalues and eigenvectors shouldn't be a problem even if HxWx3 is large.
You can find a more detailed description searching for "Eigenfaces"; e.g. opencv-eigenfaces-for-face-recognition, wikipedia, classic CVPR91 paper, etc.
A grayscale image can be (and usually is) represented as a matrix. A colored image can not. It is represented using three matrices, one for each color channel.
This is the problem with your code snippet. la.eig() expects a square array, or an array containing square arrays in its final two axes, but got an array of shape (224, 224, 3).
To fix this, you can shift the two 224 axes to the end using the np.rollaxis() function. The eigenvalues and -vectors will be calculated separately for each color channel.
Related
I am trying to perform pixel classification to perform segmentation of images using machine learning such as SVM, RandomForest etc.
I managed to get an acceptable result by using the grayscale values and RGB values of the image and associating each pixel with its ground truth. Avoiding to post the full code, here is how I made the feature and label array when using the full image:
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
feature_img = np.zeros((img.shape[0], img.shape[1], 4)) # container array, first three dimensions for the rgb values, and the last will hold the grayscale
feature_img[:, :, :3] = img
feature_img[:, :, 3] = img_gray
features = feature_img.reshape(feature_img.shape[0] * feature_img.shape[1], feature_img.shape[2])
gt_features = gt_img.reshape(gt_img.shape[0] * gt_img.shape[1], 1)
For an image of size 512*512 the above will give a features of shape [262144, 4] and an accompanying gt_feature of shape [262144, 1].
this gives me the x and y for sklearn.svm.SVC and like mentioned above this works well.. but the image is very noisy.. since SVM works well with higher dimensionality data I intend to explore that by splitting the image into windows.
Based on the above code, I wanted to split my image of size [512, 1024] into blocks of size [64*64] and use these for training the SVM.
Following the above format, I wrote the below code to split my image into blocks and then .reshape() it into the required format for the classifier but its not working as expected:
win_size = 64
feature_img = blockshaped(img_gray, win_size, win_size)
feature_label = blockshaped(gt_img, win_size, win_size)
# above returns arrays of shape [128, 64, 64]
features = feature_img.reshape(feature_img.shape[1] * feature_img.shape[2], feature_img.shape[0])
# features is of shape [4096, 128]
label_ = feature_label.reshape(feature_label.shape[0] * feature_label.shape[1] * feature_label.shape[2], 1)
# this, as expected returns ``[524288, 1]``
The function blockshaped is from the answer provided here: Slice 2d array into smaller 2d arrays
The reason i want to increase the dimensionality of my feature data is because it is known that SVM works well with higher dimension data and also want to see if block or patch based approach helps the segmentation result.
How would I go about arranging my data, that I have broken down into windows, in a form that can be used to train a classifier?
I've been thinking about your question for 5 hours and read some books to find the answer!
your approach is completely wrong if you are doing segmentation!
when we are using machine learning methods to segmentation we absolutely don't change any pixel place at all.
not only in SVM but also in the neural network when we are approaching segmentation we don't use Pooling methods and even in CNN we use the Same padding to avoid pixel moving.
I was watching a tutorial on a facial recognition project using OpenCV,numpy, PIL.
During training, the image was converted into a numpy array, what is the need of converting it into a numpy array?
THE CODE:
PIL_IMAGE = Image.open(path).convert("L")
image_array = np.array(PIL_IMAGE, "uint8")
TLDR; OpenCV images are stored as three-dimensional Numpy arrays.
When you read in digital images using the library, they are represented as Numpy arrays. The rectangular shape of the array corresponds to the shape of the image. Consider this image of a chair
Here's a visualization of how this image is stored as a Numpy array in OpenCV
If we read in the image of the chair we can see how it is structured with image.shape which returns a tuple (height, width, channels). Image properties will be a tuple of the number of rows, columns, and channels if it is a colored image. If it is a grayscale image, image.shape only returns the number of rows and columns.
import cv2
image = cv2.imread("chair.jpg")
print(image.shape)
(222, 300, 3)
When working with OpenCV images, we specify the y coordinate first, then the x coordinate. Colors are stored as BGR values with blue in layer 0, green in layer 1, and red in layer 2. So for this chair image, it has a height of 222, a width of 300, and has 3 channels (meaning it is a color image). Essentially, when the library reads in any image, it stores it as a Numpy array in this format.
The answer is rather simple:
With Numpy you can make blazing fast operations on numerical arrays, no matter which dimension, shape, etc. they are.
Image processing libraries (OpenCV, PIL, scikit-image) sometimes wrap images in some special format that already uses Numpy behind the scenes. If they are not already using Numpy in the background, the images can be converted to Numpy arrays explicitly. Then you can do speedy numerical calculations on them (convolution, FFT, blurry, filters, ...).
I'm trying to translate an image using the following code.
im = io.imread("path/to/my/image.jpg")
shift_image = scipy.ndimage.shift(im, np.array([1, 2]))
I'm using skimage to read the image.
I get the following error
RuntimeError: sequence argument must have length equal to input rank
The name ndimage (with "n-dimensional" in it) suggests that the package is not going to assume that images are two dimensional, and that any other dimension means something else. After all, 3D images (MRI) are a thing. So in effect, it operates with an abstract n-dimensional array. For an two-dimensional RGB image, the shape is (height, width, 3) because of the three color channels. So the shift would be [1, 2, 0].
Recently I followed a few tutorials on machine learning, and now I want to test if I can make some image recognition program by myself. For this I want to use the CIFAR 10 dataset, but I think I have a small problem in the conversion of the dataset.
For who is not familiar with this set: the dataset comes as lists of n rows and 3072 columns, in which the first 1024 columns represent the red values, the second 1024 the green values and the last are the blue values. Each row is a single image (size 32x32) and the pixel rows are stacked after each other (first 32 values are the red values for the top-most row of pixels, etc.)
What I wanted to do with this dataset is to transform it to a 4D tensor (with numpy), so I can view the images with matplotlibs .imshow(). the tensor I made has this shape: (n, 32, 32, 3), so the first 'dimension' stores all images, the second stores rows of pixels, the third stores individual pixels and the last represents the rgb values of those pixels. Here is the function I made that should do this:
def rawToRgb(data):
length = data.shape[0]
# convert to flat img array with rgb pixels
newAr = np.zeros([length, 1024, 3])
for img in range(length):
for pixel in range(1024):
newAr[img, pixel, 0] = data[img, pixel]
newAr[img, pixel, 1] = data[img, pixel+1024]
newAr[img, pixel, 2] = data[img, pixel+2048]
# convert to 2D img array
newAr2D = newAr.reshape([length, 32, 32, 3])
# plt.imshow(newAr2D[5998])
# plt.show()
return newAr2D
Which takes a single parameter (a tensor of shape (n, 3072)). I have commented out the pyplot code, as this is only for testing, but when testing, I noticed that everything seems to be ok (I can recognise the shapes of the objects in the images, but I am not sure if the colours are good or not, as I get some oddly-coloured images as well as some pretty normal images... Here are a few examples: purple plane, blue cat, normal horse, blue frog.
Can anyone tell me wether I am making a mistake or not?
The images that appear oddly-coloured are the negative of the actual image, so you need to subtract each pixel value from 255 to get the true value. If you simply want to see what the original images look like, use:
from scipy.misc import imread
import matplotlib.pyplot as plt
img = imread(file_path)
plt.imshow(255 - img)
plt.show()
The original cause of the problem is that the CIFAR-10 data stores the pixel values on a scale of 0-255, but matplotlib's imshow() method (which I assume you are using) expects inputs between 0 and 1. Given an input that is not scaled between 0 and 1, imshow() does some normalization internally, which causes some images to become negatives.
I want to apply a Discrete Cosine Transform (as well as the inverse) to an image in Python and I'm wondering what is the best way to do it and how. I've looked at PIL and OpenCV but I still don't understand how to use it.
Example with scipy.fftpack:
from scipy.fftpack import dct, idct
# implement 2D DCT
def dct2(a):
return dct(dct(a.T, norm='ortho').T, norm='ortho')
# implement 2D IDCT
def idct2(a):
return idct(idct(a.T, norm='ortho').T, norm='ortho')
from skimage.io import imread
from skimage.color import rgb2gray
import numpy as np
import matplotlib.pylab as plt
# read lena RGB image and convert to grayscale
im = rgb2gray(imread('images/lena.jpg'))
imF = dct2(im)
im1 = idct2(imF)
# check if the reconstructed image is nearly equal to the original image
np.allclose(im, im1)
# True
# plot original and reconstructed images with matplotlib.pylab
plt.gray()
plt.subplot(121), plt.imshow(im), plt.axis('off'), plt.title('original image', size=20)
plt.subplot(122), plt.imshow(im1), plt.axis('off'), plt.title('reconstructed image (DCT+IDCT)', size=20)
plt.show()
Also, if you plot a small slice of the 2D DCT coefficients array imF (in log domain), you will get a figure like the following (with a checkerboard pattern):
From OpenCV:
DCT(src, dst, flags) → None
Performs a forward or inverse Discrete Cosine transform of a 1D or 2D
floating-point array.
Parameters:
src (CvArr) – Source array, real 1D or 2D array
dst (CvArr) – Destination array of the same size and same type as the source
flags (int) –
Transformation flags, a combination of the following values
CV_DXT_FORWARD do a forward 1D or 2D transform.
CV_DXT_INVERSE do an inverse 1D or 2D transform.
CV_DXT_ROWS do a forward or inverse transform of every individual row of
the input matrix. This flag allows user to transform multiple vectors simultaneously
and can be used to decrease the overhead (which is sometimes several times larger
than the processing itself), to do 3D and higher-dimensional transforms and so forth.
Here is an example of it being used.
The DCT is also available in scipy.fftpack.