I have an image's numpy array of shape (224,224,4). Each pixel has 4 dimension - r,g,b,alpha. I need to extract the (r,g,b) values for each pixel where it's alpha channel is 255.
I thought to first delete all elements in the array where alpha value is <255, and then extract only the first 3 values(r,g,b) of these remaining elements, but doing it in simple loops in Python is very slow. Is there a fast way to do it using numpy operations?
Something similar to this? https://stackoverflow.com/a/21017621/4747268
This should work: arr[arr[:,:,3]==255][:,:,:3]
something like this?
import numpy as np
x = np.random.random((255,255,4))
y = np.where(x[:,:,3] >0.5)
res = x[y][:,0:3]
where you have to fit > 0.5 to your needs (e.g. ==255). The result will be a matrix with all pixels stacked vertically
Related
I'm having some trouble trying to check if a python tuple is in a one dimensional numpy array. I'm working on a loop that will record all the colors present in an image and store them into an array. It worked well using normal lists, but the image is very large and I think NumPy Arrays will speed up the loop as it took several minutes to complete the loop.
Here's what the code looks like:
from PIL import Image
import numpy as np
img = Image.open("bg.jpg").convert("RGB")
pixels = img.load()
colors = np.array([])
for h in range(img.size[1]):
for w in range(img.size[0]):
if pixels[w,h] not in colors:
colors = np.append(colors, pixels[w,h])
else:
continue
When I run this, I get the following error:
DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
if pixels[w,h] in colors:
Thanks in advance, and if you know a faster way to do this please let me know.
I'm not sure what you need exactly. But i hope the next piece of code will help you.
import numpy as np
image = np.arange(75).reshape(5, 5, 3) % 8
# Get the set of unique pixles
pixel_list = image.reshape(-1, 3)
unique_pixels = np.unique(pixel_list, axis = 0)
# Test whether a pixel is in the list of pixels:
i = 0
pixel_in_list = (unique_pixels[i] == pixel_list).all(1).any(0)
# all(1) - all the dimensions (rgb) of the pixels need to match
# any(0) - test if any of the pixels match
# Test whether any of the pixels in the set is in the list of pixels:
compare_all = unique_pixels.reshape(-1, 1, 3) == pixel_list.reshape(1, -1, 3)
pixels_in_list = compare_all.all(2).any()
# all(2) - all the dimensions (rgb) of the pixels need to match
# any() - test if any of the pixelsin the set matches any of the pixels in the list
I found a faster way to make my loop run faster without NumPy and that is by using sets, which is way faster than using lists or NumPy. This is what the code looks like now:
from PIL import Image
img = Image.open("bg.jpg").convert("RGB")
pixels = img.load()
colors = set({})
for h in range(img.size[1]):
for w in range(img.size[0]):
if pixels[w,h] in colors:
continue
else:
colors.add(pixels[w,h])
This solves my initial problem of the lists being too slow to loop through, and it goes around the second problem of NumPy unable to compare the tuples. Thanks for all the replies, have a good day.
Assuming pixels is of shape (3, w, h) or (3, h, w) (i.e., the color channels are along the first axis), and assuming all you're after are the unique colors in the image:
channels = (channel.flatten() for channel in pixels)
colors = set(zip(*channels))
If you want a list instead of a set, colors = list(set(zip(*channels))).
You seem to be misunderstanding where numpy comes in handy. A numpy array of tuples is not going to be any faster than a a Python list of tuples. The speed of numpy comes into play in numerical computation on matrices and vectors. A numpy array of tuples cannot take advantage of any of the things that make numpy so fast.
What you're trying to do is simply not appropriate for numpy, and won't help speed up your code at all.
I have a 3D image which is a numpy array of shape (1314, 489, 3) and looks as follows:
Now I want to calculate the mean RGB color value of the mask (the cob without the black background). Calculating the RGB value for the whole image is easy:
print(np.mean(colormaskcutted, axis=(0, 1)))
>>[186.18434633 88.89164511 46.32022921]
But now I want this mean RGB color value only for the cob. I have a 1D boolean mask
array for the mask with this shape where one value corresponds to all of the 3 color channel values: (1314, 489)
I tried slicing the image array for the mask, as follows:
print(np.mean(colormaskcutted[boolean[:,:,0]], axis=(0, 1)))
>>124.57794089613752
But this returned only one value instead of 3 values for the RGB color.
How can I filter the 3D numpy image for a 1D boolean mask so that the mean RGB color calculation can be performed?
If your question is limited to computing the mean, you don't necessarily need to subset the image. You can simply do, e.g.
np.sum(colormaskcutted*boolean[:,:,None], axis = (0,1))/np.sum(boolean)
P.S. I've played around with indexing, you can amend your original approach as follows:
np.mean(colormaskcutted[boolean,:], axis = 0)
P.P.S. Can't resist some benchmarking. So, the summation approach takes 15.9s (1000 iterations, dimensions like in the example, old computer); the advanced indexing approach is slightly longer, at 17.7s. However, the summation can be optimized further. Using count_nonzero as per Mad Physicist suggestion marginally improves the time to 15.3s. We can also use tensordot to skip creating a temporary array:
np.tensordot(colormaskcutted, boolean, axes = [[0,1], [0,1]])/np.count_nonzero(msk)
This cuts the time to 4.5s.
I want to extract the sum count of all pixels of an RGB image where R=0 and B=0 and where the x,y coordinates of those pixels are lying on the border of an image.
First I get the coordinates of the pixels with R=0 and B=0:
import cv2
import numpy as np
i = cv2.imread("test2.png")
indices = np.where((i[:, :, 0] == 0) & (i[:, :, 2] == 0))
Which gives me a list of the coordinates. Now I want to get the sum of all pixels where the x position is 0 or the image width (in this case 21).
I could sort the list but I would like to stick to numpy arrays if possible. Is there an fancy way to do it?
Approach #1
With X along the second axis, here's one fancy way -
(i[...,[0,2]]==0).all(-1)[:,[0,-1]].sum()
Approach #2
With multi-dim indexing -
(i[:,[0,-1],[0,2]]==0).sum()
Approach #3
For performance, use more of slicing -
mask = (i[...,0]==0) & (i[...,2]==0)
out_x = (mask[:,0] + mask[:,-1]).sum()
On older NumPy versions, np.count_nonzero might be better than .sum().
My numpy array (name: data) has following size: (10L,3L,256L,256L).
It has 10 images with each 3 color channels (RGB) and each an image size of 256x256 pixel.
I want to compute the mean pixel value for each color channel of all 10 images. If I use the numpy function np.mean(data), I receive the mean for all pixel values. Using np.mean(data, axis=1) returns a numpy array with size (10L, 256L, 256L).
If I understand your question correctly you want an array containing the mean value of each channel for each of the three images. (i.e. an array of shape (10,3) ) (Let me know in the comments if this is incorrect and I can edit this answer)
If you are using a version of numpy greater than 1.7 you can pass multiple axes to np.mean as a tuple
mean_values = data.mean(axis=(2,3))
Otherwise you will have to flatten the array first to get it into the correct shape.
mean_values = data.reshape((data.shape[0], data.shape[1], data.shape[2]*data.shape[3])).mean(axis=2)
I have a 2-D array of values and need to mask certain elements of that array (with indices taken from a list of ~ 100k tuple-pairs) before drawing random samples from the remaining elements without replacement.
I need something that is both quite fast/efficient (hopefully avoiding for loops) and has a small memory footprint because in practice the master array is ~ 20000 x 20000.
For now I'd be content with something like (for illustration):
xys=[(1,2),(3,4),(6,9),(7,3)]
gxx,gyy=numpy.mgrid[0:100,0:100]
mask = numpy.where((gxx,gyy) not in set(xys)) # The bit I can't get right
# Now sample the masked array
draws=numpy.random.choice(master_array[mask].flatten(),size=40,replace=False)
Fortunately for now I don't need the x,y coordinates of the drawn fluxes - but bonus points if you know an efficient way to do this all in one step (i.e. it would be acceptable for me to identify those coordinates first and then use them to fetch the corresponding master_array values; the illustration above is a shortcut).
Thanks!
Linked questions:
Numpy mask based on if a value is in some other list
Mask numpy array based on index
Implementation of numpy in1d for 2D arrays?
You can do it efficently using sparse coo matrix
from scipy import sparse
xys=[(1,2),(3,4),(6,9),(7,3)]
coords = zip(*xys)
mask = sparse.coo_matrix((numpy.ones(len(coords[0])), coords ), shape= master_array.shape, dtype=bool)
draws=numpy.random.choice( master_array[~mask.toarray()].flatten(), size=10)