I would like to downsample a 3d array by taking the most frequent value (mode) of the original values. After some research, I found the block_reduce function in skimage library. For example, if I wanted like to take the average of the block, I can do it easily:
from skimage.measure import block_reduce
image = np.arange(4*4*4).reshape(4, 4, 4)
new_image = block_reduce(image, block_size=(2,2,2), func=np.mean, cval=np.mean(grades))
In my case, I want to pass the func argument a mode function. However, numpy doesn't have a mode function. According to the documentation, the passed function should accept 'axis' as an argument. I tried some workarounds such as writing my own function and combining np.unique and np.argmax, as well as passing scipy.stats.mode as the function. All of them failed.
I wrote some nested for loops to do this but it's way too slow with large arrays. Is there an easy way to do this? I don't necessarily need to use sci-kit image library.
Thank you in advance.
Let's start with the assumption that the input image shape is divisible by block_size, i.e. corresponding shape dimensions are divisible by each size parameter of block_size.
So, as pre-processing, we need to make blocks out off the input image, like so -
def blockify(image, block_size):
shp = image.shape
out_shp = [s//b for s,b in zip(shp, block_size)]
reshape_shp = np.c_[out_shp,block_size].ravel()
nC = np.prod(block_size)
return image.reshape(reshape_shp).transpose(0,2,4,1,3,5).reshape(-1,nC)
Next up, for our specific case of mode finding, we will use bincount2D_vectorized alongwith argmax -
# https://stackoverflow.com/a/46256361/ #Divakar
def bincount2D_vectorized(a):
N = a.max()+1
a_offs = a + np.arange(a.shape[0])[:,None]*N
return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N)
out = bincount2D_vectorized(blockify(image, block_size=(2,2,2))).argmax(1)
Alternatively, we can use n-dim mode -
out = mode(blockify(image, block_size=(2,2,2)), axis=1)[0]
Finally, if the initial assumption of divisibility doesn't hold true, we need to pad with the appropriate pad value. For the same, we can use np.pad, as part of the blockify method.
Related
I am trying to get a better understanding of numpy reshaping and transpose operations so that I can perform tasks on each local area of a color image (as opposed to the image as a whole). I can do these by creating slices and looping over slices, but I would prefer not having to create python loops. I have come up with some examples that should help me understand the parts that I have been having trouble with. I ordered them from easiest to most difficult. The last one is ultimately the one that I want to solve.
img = np.random.randint(low=0, high=256, size=(6,6,3), dtype=np.uint8)
img_mean = np.mean(img) #mean of the whole image, one value.
channel_means = np.mean(img, axis=(0,1)) #mean of each channel, three values.
binarized_img = np.where(img > img_mean, np.uint8(255), np.uint8(0)) #all values changed to either 0 or 255. Shape of image remains 5,5,3.
binarized_channels = #I would like to be able to do the same as above, but by using a different mean for each channel and without using python loops.
three_by_three_block_means = #I want to reshape the array into four 3x3x3 blocks and get each block's mean (should be 4 different means).
three_by_three_block_channel_means = #Same as above, but this time I want the mean of each channel of each block (should be 12 different means).
#I also want to be able to change the block's size arbitrarily, i.e. from 3x3x3 blocks to 2x2x3 blocks when needed.
binarized_blocks = #same as binarized_img, but done separately for each block based on their means instead of the mean of the whole image.
binarized_block_channels = #same as binarized_blocks, but done separately for each channel in each block.
If someone could show me how to complete these examples using only numpy (no python loops), I could learn from them and use them to accomplish the (similar) tasks that I frequently have trouble with.
The solution to your problem are Strided Convolutions, use scipy.signal.convolve to compute the block means.
from scipy import signal
img = np.random.randint(low=0, high=256, size=(6,6,3), dtype=np.uint8)
img_mean = np.mean(img) #mean of the whole image, one value.
channel_means = np.mean(img, axis=(0,1)) #mean of each channel, three values.
binarized_img = np.where(img > img_mean, np.uint8(255), np.uint8(0)) #all values changed to either 0 or 255. Shape of image remains 5,5,3.
I would like to be able to do the same as above, but by using a
different mean for each channel and without using python loops.
binarized_channels = np.where(img > channels_mean, np.uint8(0),np.uint8(255))
I want to reshape the array into four 3x3x3 blocks and get each
block's mean (should be 4 different means).
Define a mean kernel (all ones divided by the sum of the kernel) of arbitrary shape, and perform a valid convolution of the image. Since scipy does not offer a stride argument we have to do this manually with [::s,::s].
s = 3
kernel = np.ones((s,s,s))/s**3
three_by_three_block_means = signal.convolve(img, kernel, 'valid')[::s,::s] # shape: (2, 2, 1)
Same as above, but this time I want the mean of each channel of each
block (should be 12 different means).
kernel = np.ones(s,s,1)/s**2
three_by_three_block_channel_means = np.concolve(img, kernel, 'valid')[::s,::s] # shape: (2, 2, 3)
I also want to be able to change the block's size arbitrarily, i.e.
from 3x3x3 blocks to 2x2x3 blocks when needed.
Simply change the size of the kernel.
Same as binarized_img, but done separately for each block based on
their means instead of the mean of the whole image.
binarized_blocks = np.where(three_by_three_block_means > img_mean,np.uint8(0),np.uint8(255))
Same as binarized_blocks, but done separately for each channel in each
block.
binarized_block_channels = np.where(three_by_three_block_channel_means > channel_means, np.uint8(0), np.uint8(255))
Hope that solves your problem. Let me know if something is unclear.
I'm trying to convert an RGB to a gray-value image of the same size (with values between 0 and 1). The mapping is done by a dictionary called MASK_LUT_IDX which takes in a tuple (RGB) and returns the corresponding value. The current code is 2x faster than before, but still takes 1.5s (according to timeit), which is proving to be an issue.
import numpy as np
def quickConv(numpy_triple):
return MASK_LUT_IDX[tuple(numpy_triple)]
def ImageSegmenter(masked_img):
rgb_tuples = np.array(masked_img.getdata(), dtype=tuple)
class_idxs = np.apply_along_axis(quickConv, 1,rgb_tuples)
return np.array(class_idxs).reshape(masked_img.size[1],masked_img.size[0])
class_img = ImageSegmenter(masked_img)
Is there a better way of converting this? I've looked into the palette functionalities, but it doesn't seem to quite fit my needs.
Thanks to Cris Luengo's help, here is a sped up version of the function.
def ImageSegmenter(masked_img):
masked_img = np.array(masked_img)
class_idxs = FASTER_MASK_LUT_IDX[masked_img[:,:,0],masked_img[:,:,1],masked_img[:,:,2]]
return class_idxs
Where FASTER_MASK_LUT_IDX is a 3d tensor set given by
FASTER_MASK_LUT_IDX = np.zeros((256,256,256))
for idx,label in zip(CLASS_IDX,CLASS_LABELS):
red_idx = RGB_CLASS_MAPPING[label]['R']
green_idx = RGB_CLASS_MAPPING[label]['G']
blue_idx = RGB_CLASS_MAPPING[label]['B']
FASTER_MASK_LUT_IDX[red_idx,green_idx,blue_idx] = idx/NUM_CLASSES
RGB_CLASS_MAPPING maps an RGB value to a class, which was unrolled using enumerate to create CLASS_IDX and CLASS_LABELS using a list comprehension.
CLASS_IDX,CLASS_LABELS = zip(*[(idx,label for idx,label in enumerate(RGB_CLASS_MAPPING)])
I've created a class of which I pass an image (2D array, 1280x720). It's suppose to iterate through, looking for the highest value:
import bumpy as np
class myCv:
def maxIntLoc(self,image):
intensity = image[0,0] #columns, rows
coordinates = (0,0)
for y in xrange(0,len(image)):
for x in xrange(0,len(image[0])):
if np.all(image[x,y] > intensity):
intensity = image[x,y]
coordinates = (x,y)
return (intensity,coordinates)
Yet when I run it I get the error:
if np.all(image[x,y] > intensity):
IndexError: index 720 is out of bounds for axis 0 with size 720
Any help would be great as I'm new to Python.
Thanks,
Shaun
Regardless of the index error that you are experience, which has been addressed by others, iterating through pixels/voxels is not a valid method for manipulating images. The issue becomes particularly evident in multi-dimensional images, where you face the curse of dimensionality.
The correct way to do this is to use vectorisation in programming languages that support it (e.g. Python, Julia, MATLAB). Through this method, you will achieve the results you're looking for much more efficiently (and thousands of times faster). Click here to find out more about vectorisation (aka. array programming). In Python, this can be achieved either using generators, which are not suitable for images as they don't really produce the results until called; or using NumPy arrays.
Here is an example:
Masking image matrices by vectorisation
from numpy.random import randint
from matplotlib.pyplot import figure, imshow, title, grid, show
def mask_img(img, thresh, replacement):
# Copy of the image for masking. Use of |.copy()| is essential to
# prevent memory mapping.
masked = initial_image.copy()
# Replacement is the value to replace anything that
# (in this case) is bellow the threshold.
masked[initial_image<thresh] = replacement # Mask using vectorisation methods.
return masked
# Initial image to be masked (arbitrary example here).
# In this example, we assign a 100 x 100 matrix of random integers
# between 1 and 256 as our sample image.
initial_image = randint(0, 256, [100, 100])
threshold = 150 # Threshold
# Masking process.
masked_image = mask_img(initial_image, threshold, 0)
# Plots.
fig = figure(figsize=[16,9])
fig.add_subplot(121)
imshow(initial_image, interpolation='None', cmap='gray')
title('Initial image')
grid('off')
fig.add_subplot(122)
imshow(masked_image, interpolation='None', cmap='gray')
title('Masked image')
grid('off')
show()
Which returns:
Of course you can put the masking process (function) in a loop to do this on a batch of images. You can modify the indices and do it on 3D, 4D (e.g. MRI), or 5D (e.g. CAT scan) images too, without the need to iterate over each individual pixel or voxel.
Hope this helps.
In python, like most programming languages, indexes start at 0.
So you can access only pixels from 0 to 719.
Check with a debug print that len(image) and len(image[0]) are indeed returning 1280 and 720.
I am working with some image processing routines, using binary images. In Matlab I can create a lookup table which provides the output for every possible 2^9=512 configurations of 3 x 3 neighbourhoods. That is, I can write a function func which produces a 0 or 1 for such a neighbourhood, and then create a lookup table with
lut = makelut(func,3)
(the "3" indicating the size of neighbourhood). Then that lookup table can be applied to my binary image im with
applylut(im, lut)
But how can I do the same thing in Python? There is an example given here:
http://pydoc.net/Python/scikits-image/0.4.2/skimage.morphology.skeletonize/
which certainly works, but seems very complicated, at least compared to Matlab's commands.
The filters defined in scipy.ndimage may be of use to you. If none of the pre-defined filters match your intent, you can apply a custom filter using
scipy.ndimage.generic_filter.
For example, you can reproduce the result shown on the Mathworks applylut doc page with:
import numpy as np
import scipy.ndimage as ndimage
from PIL import Image
filename = '/tmp/PerformErosionUsingA2by2NeighborhoodExample_01.png'
img = Image.open(filename).convert('L')
arr = np.array(img)
def func(x):
return (x==255).all()*255
arr2 = ndimage.generic_filter(arr, func, size=(2,2))
new_img = Image.fromarray(arr2.astype('uint8'), 'L')
new_img.save('/tmp/out.png')
PerformErosionUsingA2by2NeighborhoodExample_01.png:
out.png:
Note that in this case, ndimage.grey_erosion can produce the same result, and
since it is not calling a Python function once for every pixel, it's also a lot
faster:
arr3 = ndimage.grey_erosion(arr, size=(2,2))
print(np.allclose(arr2,arr3))
# True
Depending on the kind of computation you wish to perform in func, another faster alternative may be to express the result as a NumPy computation on slices. For example, the above grey_erosion could also be expressed as
arr4 = np.pad(arr.astype(bool), ((1,0),(1,0)), 'reflect')
arr4 = arr4[:-1,:-1] & arr4[1:,:-1] & arr4[:-1,1:] & arr4[1:,1:]
arr4 = arr4.astype('uint8')*255
assert np.allclose(arr3, arr4)
Again this is much faster than using generic_filter since here the computation is being performed on whole arrays rather than pixel-by-pixel.
I'm trying to improve the speed of a function that calculates the normalized cross-correlation between a search image and a template image by using the anfft module, which provides Python bindings for the FFTW C library and seems to be ~2-3x quicker than scipy.fftpack for my purposes.
When I take the FFT of my template, I need the result to be padded to the same size as my search image so that I can convolve them. Using scipy.fftpack.fftn I would just use the shape parameter to do padding/truncation, but anfft.fftn is more minimalistic and doesn't do any zero-padding itself.
When I try and do the zero padding myself, I get a very different result to what I get using shape. This example uses just scipy.fftpack, but I have the same problem with anfft:
import numpy as np
from scipy.fftpack import fftn
from scipy.misc import lena
img = lena()
temp = img[240:281,240:281]
def procrustes(a,target,padval=0):
# Forces an array to a target size by either padding it with a constant or
# truncating it
b = np.ones(target,a.dtype)*padval
aind = [slice(None,None)]*a.ndim
bind = [slice(None,None)]*a.ndim
for dd in xrange(a.ndim):
if a.shape[dd] > target[dd]:
diff = (a.shape[dd]-b.shape[dd])/2.
aind[dd] = slice(np.floor(diff),a.shape[dd]-np.ceil(diff))
elif a.shape[dd] < target[dd]:
diff = (b.shape[dd]-a.shape[dd])/2.
bind[dd] = slice(np.floor(diff),b.shape[dd]-np.ceil(diff))
b[bind] = a[aind]
return b
# using scipy.fftpack.fftn's shape parameter
F1 = fftn(temp,shape=img.shape)
# doing my own zero-padding
temp_padded = procrustes(temp,img.shape)
F2 = fftn(temp_padded)
# these results are quite different
np.allclose(F1,F2)
I suspect I'm probably making a very basic mistake, since I'm not overly familiar with the discrete Fourier transform.
Just do the inverse transform and you'll see that scipy does slightly different padding (only to top and right edges):
plt.imshow(ifftn(fftn(procrustes(temp,img.shape))).real)
plt.imshow(ifftn(fftn(temp,shape=img.shape)).real)