Recently I followed a few tutorials on machine learning, and now I want to test if I can make some image recognition program by myself. For this I want to use the CIFAR 10 dataset, but I think I have a small problem in the conversion of the dataset.
For who is not familiar with this set: the dataset comes as lists of n rows and 3072 columns, in which the first 1024 columns represent the red values, the second 1024 the green values and the last are the blue values. Each row is a single image (size 32x32) and the pixel rows are stacked after each other (first 32 values are the red values for the top-most row of pixels, etc.)
What I wanted to do with this dataset is to transform it to a 4D tensor (with numpy), so I can view the images with matplotlibs .imshow(). the tensor I made has this shape: (n, 32, 32, 3), so the first 'dimension' stores all images, the second stores rows of pixels, the third stores individual pixels and the last represents the rgb values of those pixels. Here is the function I made that should do this:
def rawToRgb(data):
length = data.shape[0]
# convert to flat img array with rgb pixels
newAr = np.zeros([length, 1024, 3])
for img in range(length):
for pixel in range(1024):
newAr[img, pixel, 0] = data[img, pixel]
newAr[img, pixel, 1] = data[img, pixel+1024]
newAr[img, pixel, 2] = data[img, pixel+2048]
# convert to 2D img array
newAr2D = newAr.reshape([length, 32, 32, 3])
# plt.imshow(newAr2D[5998])
# plt.show()
return newAr2D
Which takes a single parameter (a tensor of shape (n, 3072)). I have commented out the pyplot code, as this is only for testing, but when testing, I noticed that everything seems to be ok (I can recognise the shapes of the objects in the images, but I am not sure if the colours are good or not, as I get some oddly-coloured images as well as some pretty normal images... Here are a few examples: purple plane, blue cat, normal horse, blue frog.
Can anyone tell me wether I am making a mistake or not?
The images that appear oddly-coloured are the negative of the actual image, so you need to subtract each pixel value from 255 to get the true value. If you simply want to see what the original images look like, use:
from scipy.misc import imread
import matplotlib.pyplot as plt
img = imread(file_path)
plt.imshow(255 - img)
plt.show()
The original cause of the problem is that the CIFAR-10 data stores the pixel values on a scale of 0-255, but matplotlib's imshow() method (which I assume you are using) expects inputs between 0 and 1. Given an input that is not scaled between 0 and 1, imshow() does some normalization internally, which causes some images to become negatives.
Related
I have a numpy array where each element has 3 values (RGB) from 0 to 255, and it spans from [0, 0, 0] to [255, 255, 255] with 256 elements evenly spaced. I want to plot it as a 16 by 16 grid but have no idea how to map the colors (as the numpy array) to the data to create the grid.
import numpy as np
# create an evenly spaced RGB representation as integers
all_colors_int = np.linspace(0, (255 << 16) + (255 << 8) + 255, dtype=int)
# convert the evenly spaced integers to RGB representation
rgb_colors = np.array(tuple(((((255<<16)&k)>>16), ((255<<8)&k)>>8, (255)&k) for k in all_colors_int))
# data to fit the rgb_colors as colors into a plot as a 16 by 16 numpy array
data = np.array(tuple((k,p) for k in range(16) for p in range(16)))
So, how to map the rgb_colors as colors to the data data into a grid plot?
There's quite a bit going on here, and I think it's valuable to talk about it.
linspace
I suggest you read the linspace documentation.
https://numpy.org/doc/stable/reference/generated/numpy.linspace.html
If you want a 16x16 grid, then you should start by generating 16x16=256 values, however if you inspect the shape of the all_colors_int array, you'll notice that it's only generated 50 values, which is the default value of the linspace num argument.
all_colors_int = np.linspace(0, (255 << 16) + (255 << 8) + 255, dtype=int)
print(all_colors_int.shape) # (50,)
Make sure you specify this third 'num' argument to generate the correct quantity of RGB pixels.
As a further side note, (255 << 16) + (255 << 8) + 255 is equivalent to (2^24)-1. The 2^N-1 formula is usually what's used to fill the first N bits of an integer with 1's.
numpy is faster
On your next line, your for loop manually iterates over all of the elements in python.
rgb_colors = np.array(tuple(((((255<<16)&k)>>16), ((255<<8)&k)>>8, (255)&k) for k in all_colors_int))
While this might work, this isn't considered the correct way to use numpy arrays.
You can directly perform bitwise operations to the entire numpy array without the python for loop. For example, to extract bits [16, 24) (which is usually the red channel in an RGB integer):
# Shift over so the 16th bit is now bit 0, then select only the first 8 bits.
RedChannel = (all_colors_int >> 16) & 255
Building the grid
There are many ways to do this in numpy, however I would suggest this approach.
Images are usually represented with a 3-dimensional numpy array, usually of the form
(HEIGHT, WIDTH, CHANNELS)
First, reshape your numpy int array into the 16x16 grid that you want.
reshaped = all_colors_int.reshape((16, 16))
Again, the numpy documentation is really great, give it a read:
https://numpy.org/doc/stable/reference/generated/numpy.reshape.html
Now, extract the red, green and blue channels, as described above, from this reshaped array. If you operate directly on the numpy array, you won't need a nested for-loop to iterate over the 16x16 grid, numpy will handle this for you.
RedChannel = (reshaped >> 16) & 255
GreenChannel = ... # TODO
BlueChannel = ... # TODO
And then finally, we can convert our 3, 16x16 grids, into a 16x16x3 grid, using the numpy stack function
https://numpy.org/doc/stable/reference/generated/numpy.stack.html
grid_rgb = np.stack((
RedChannel,
GreenChannel,
BlueChannel
), axis=2).astype(np.uint8)
Notice two things here
When we 'stack' arrays, we create a new dimension. The axis=2 argument tells numpy to add this new dimension at index 2 (e.g. the third axis). Without this, the shape of our grid would be (3, 16, 16) instead of (16, 16, 3)
The .astype(np.uint8) casts all of the values in this numpy array into a uint8 data type. This is so the grid is compatible with other image manipulation libraries, such as openCV, and PIL.
Show the image
We can use PIL for this.
If you want to use OpenCV, then remember that OpenCV interprets images as BGR not RGB and so your channels will be inverted.
# Show Image
from PIL import Image
Image.fromarray(grid_rgb).show()
If you've done everything right, you'll see an image... And it's all gray.
Why is it gray?
There are over 16 million possible colours. Selecting only 256 of them just so happens to select only pixels with the same R, G and B values which results in an image without any color.
If you want to see some colours, you'll need to either show a bigger image (e.g. 256x256), or alternatively, you can use a dimension that's not a power of two. For example, try a prime number, as this will add a small amount of pseudo-randomness to the RGB selection, e.g. try 17.
Best of luck.
Based solely on the title 'How to plot a normalized RGB map' rather than the approach you've provided, it appears that you'd like to plot a colour spectrum in RGB.
The following approach can be taken to manually construct this.
import cv2
import matplotlib.pyplot as plt
import numpy as np
h = np.repeat(np.arange(0, 180), 180).reshape(180, 180)
s = np.ones((180, 180))*255
v = np.ones((180, 180))*255
hsv = np.stack((h, s, v), axis=2).astype('uint8')
rgb = cv2.cvtColor(hsv, cv2.COLOR_HSV2RGB)
plt.imshow(rgb)
Explanation:
It's generally easier to construct (and decompose) a colour palette using the HSV (hue, saturation, value) colour scale; where hue is the colour itself, saturation can be thought of as the intensity and value as the distance from black. Therefore, there's really only one value to worry about, hue. Saturation and value can be set to 255, for 'full intensity'.
cv2 is used here to simply convert the constructed HSV colourscale to RGB and matplotlib is used to plot the image. (I didn't use cv2 for plotting as it doesn't play nicely with Jupyter.)
The actual spectrum values are constructed in numpy.
Breakdown:
Create the colour spectrum of hue and plug 255 in for the saturation and value. Why is 180 used?
h = np.repeat(np.arange(0, 180), 180).reshape(180, 180)
s = np.ones((180, 180))*255
v = np.ones((180, 180))*255
Stack the three channels H+S+V into a 3-dimensional array, convert the array values to unsigned 8-bit integers, and have cv2 convert from HSV to RGB for us, to be lazy and save us working out the math.
hsv = np.stack((h, s, v), axis=2).astype('uint8')
rgb = cv2.cvtColor(hsv, cv2.COLOR_HSV2RGB)
Plot the RGB image.
plt.imshow(rgb)
I was working on the classification of images. I came across this one line and I'm not able to figure the meaning.
plt.imshow(np.squeeze(x_train[3]), cmap=“gray”);
Could you explain this line to me?
This really depends on what your variable x_train contains. You give no context to your code. I can infer that x_train is a numpy array that is indexed at the fourth element or row.
plt.imshow is a function from the matplotlib library which accepts an array that represents an image and draws that to the screen. The array is usually either a 2D-array representing rows and columns of pixels or a 3D-array, where every pixel is characterized by either 3 values for RGB or 4 values for RGBA (A stands for alpha and indicates the transparency).
The cmap="gray" is a keyword argument passed to plt.imshow, which is responsible for mapping a specific colormap to the values found in the array that you passed as the first argument. You can look up the colormap if you google matplotlib colormaps.
Since the gray colormap is used in your code, it is very likely that your array is a 2D-array that represents a grayscale image. In that case, every pixel is only described by one value (usually between 0 and 255) that indicates its color on a scale from black (0) to white (255).
If you pass a 3D-array (so a color image) to imshow, matplotlib will automatically interpret the values in the third dimension as RGB values and correctly show the image.
If you however pass a 2D-array, which is probably the case, matplotlib will map the values to a colormap, which is "viridis" by default. This will result in a green / yellow / blue image. Therefore, it is necessary to tell matplotlib to map it to a grayscale colormap.
I assume that x_train is therefore a numpy array with more than two dimensions that probably contains multiple images. When you index it at the index 3, you obtain a part of the array that holds the values for the image you want to display. This array seems to have more dimensions than are really in use, which is why np.squeeze is used to reduce the unnecessary dimensions. As an example:
import numpy as np
test_array = np.array([[1, 2, 3]])
np.squeeze(test_array)
>>> array([1, 2, 3])
This is all I can tell you from the little information you've given. Next time consider providing more context to your question.
I am training a Convolutional Neural Network (in tensorflow-gpu) to segment histology slides.
My problem is that the prediction method is extremely slow. The architecture of the neural network is set-up to receive a 75 x 75 RGB pixel array as an input, and classify the central pixel. In other words, for each 75x75 window of pixels the neural net receives, it only classifies 1 pixel (at the window's centre):
I've set up the neural network in this way so that it can be scaled-up and applied to any size image. Each 'window' exists purely to contextualise it's central pixel, which the neural network classifies. The prediction method loops through every pixel in the input image and uses its corresponding 75 x 75 RGB window to classify it.
My current method of generating the 75x75 windows is python-written, slow and unnecessarily serialised (uses for-loops).
Does a parallelised method, that can convert an image into a set of RGB windows, exist?
For example, it would convert a 400 x 700 x 3 image into an matrix of size 280'000 x 75 x 75 x 3. This is because as there are 280'000 pixels in the input image (400x700=280'000), and therefore there should be 280'000 windows, with each of the input's pixels at their centre. As each window has the dimension 75 x 75 x 3, and there are 280'000 windows, the method's output size would be 280'000 x 75 x 75 x 3.
Ideally, I imagine such a method would utilise any available GPUs, due to their advantages in image-processing and parallelised jobs.
Thank you for reading, all suggestions are welcome. :)
I managed to find the perfect function, skimage.util.view_as_windows().
Here is how I used this function, in an example:
First open the image and set your preferred 'window' size:
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from skimage.util import view_as_windows as windows
#lets use a window size of 75 pixels
window_size=75
#load image
data = np.asarray(Image.open("N:\\Insigneo Research Project\\MyScripts\\PoC_CNN\\MATLABLabelling\\split image\\12-1550A-001_01_01.png"))
print(data.shape)
img = plt.imshow(data, interpolation='nearest')
plt.show()
Output: (742, 486, 3), (Image in link below)
Output of above cell: A loaded example image
Then, you must use the window size to pad each of the RGB channels, to allow all pixels to have their own centralised window (padding allows the edge pixels to be centralised):
#assertion ensures window dimensions are odd, ensuring a 'central' pixel exists.
assert window_size%2==1
buffer = int((window_size-1)/2)
padded_image_data = [0]*3
for i in range(3):
padded_image_data[i] = np.pad(data[:,:,i], (buffer,buffer), 'symmetric')
padded_image_data = np.dstack((padded_image_data[0],padded_image_data[1],padded_image_data[2]))
img = plt.imshow(padded_image_data, interpolation='nearest')
plt.show()
Output: (Image in link below)
Output image of above cell: Symmetrically padded version of example image.
And then finally, apply the view_as_windows (shortened to 'windows') function to the padded RGB image:
large_window_matrix = windows(padded_image_data, ((window_size,window_size,3))).reshape(data.shape[0]*data.shape[1],window_size,window_size,3)
print(large_window_matrix.shape)
img = plt.imshow(large_window_matrix[1], interpolation='nearest')
plt.show()
Output: (360612, 75, 75, 3), (Image in link below)
Output of above cell: The first 75x75 RGB window from the padded example image. This window is centred around the most upper-left pixel.
There you have it! the 'large_window_matrix' corresponds to the large matrix holding all the RGB windows:)
I have made myself a numpy array from a picture using
from PIL import Image
import numpy as np
image = Image.open(file)
np.array(image)
its shape is (6000, 6000, 4) and in that array I would like to replace pixel values by one number lets say this green pixel [99,214,104,255] will be 1.
I have only 4 such pixels I want to replace with a number and all other pixels will be 0. Is there a fast and efficient way to do so and what is the best way to minimize the size of the data. Is it better to save it as dict(), where keys will be x,y and values, will be integers? Or is it better to save the whole array as it is with the shape it has? I only need the color values the rest is not important for me.
I need to process such a picture as fast as possible because there is one picture every 5 minutes and lets say i would like to store 1 year of data. That is why I'd like to make it as efficient as possible time and space-wise.
If I understand the question correctly, you can use np.where for this:
>>> arr = np.array(image)
>>> COLOR = [99,214,104,255]
>>> np.where(np.all(arr == COLOR, axis=-1), 1, 0)
This will produce a 6000*6000 array with 1 if the pixel is the selected colour, or 0 if not.
How about just storing in a database: the position and value of the pixels you want to modify, the shape of the image, the dtype of the array and the extension (jpg, etc...). You can use that information to build a new image from an array filled with 0.
I am trying to store an image dataset into a 4D ndarray then plot each image as follows:
i=0
for j in imagelist:
imageall[i] = misc.imread(j) ##(36, 570, 760, 3)
plt.imshow(imageall[i])
plt.show()
i=i+1
However, showing the image from the 4D ndarray gives a bluish image whereas simply reading the image and plotting it shows the image in its normal coloring.
I have compared channels (visually and by computing means in the 2 cases and they are exactly the same).
Can anyone explain the reason for change in displayed image coloration when reading single image and when reading to a 4D ndarray?
Your images have the same channel values as you noted in the question, so the difference in the result suggests that your values are being interpreted differently by plt.imshow. There's some magic to how plt.imshow interprets images based on type, so the most likely reason is that your original array is initialized with the wrong dtype.
Assuming that your pre-allocation is just something like
import numpy as np
imageall = np.empty((n_img,width,height,3))
# or imageall = np.zeros((n_img,width,height,3))
the resulting array will automatically have double type, i.e. dtype=np.float64. When you mutate this array with each image, the input dtype=np.uint8 (as returned from plt.imread) is converted to double, effectively doing
imageall[i] = misc.imread(j).astype(np.float64)
So your channel values ranging from 0 to 255 are stored as floats, which is then misinterpreted by plt.imshow.
You just need to pre-allocate with the right dtype:
imageall = np.empty((n_img,width,height,3),dtype=np.uint8)