numpy.all strange behaviour [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 days ago.
This post was edited and submitted for review 3 days ago.
Improve this question
In the following code, I try to generate a mask on an image. The mask should only be true where the image (originalFormat, ndarry with shape [720, 1280, 3]) has a specific value (segmentId, nd array with shape [3,]).
Here's a small part of the code as minimal example:
originalFormat = np.array(Image.open(path_to_image))[..., :3]
segment_ids = np.unique(originalFormat.reshape(-1, originalFormat.shape[2]), axis=0)
segment_ids.sort(axis=0)
segmentId = segment_ids[0]
mask = originalFormat == segmentId
test = [True, True, True] in mask
mask = mask.all(axis=2)
test = True in mask
In the second line of the second block, I create the mask and get an output of shape [720, 1280, 3].
In the third line, I test if the image contains the segmentId.
In the fourth line, I apply np.all on the second axis of the mask to get a mask of shape [720, 1280].
In the fifth line, I test again, but now there are no true values in the mask.
So np.all returns false for all values, but should be true for at least one value.

The test [True, True, True] in mask does not do what you think it does. Here [True, True, True] is a plain Python list, and in is a plain Python operator, nothing NumPy-specific, whereas mask is a NumPy array.
Thus, it is actually not quite clear what the test checks. See the linked question. In any case, there do not need to be three True values in the same row for this test to evaluate as True. Here`s a small example reproducing the behavior you observed:
originalFormat = np.arange(12).reshape((2, 2, 3))
segmentId = np.array([6, 4, 5])

Ok, I found the mistake.
It seems like the sort function also sorts the values in every single segmentId, so the resulting segment_ids do not occur in the image.
But this does not explain why
test = segmentId in originalFormat
is True, although segmentId is not in the image.

Related

how to filter an numpy.ndarray by a spevific number?

roi_pixel_img = crop_img[indices_list]
print (roi_pixel_img)
when i add (I use only to use the entire array (meaning only a part):
np.set_printoptions(threshold=sys.maxsize)
th output is:
the whole part happens in a while loop because I'm extracting pixels in this section, which is irrelevant to the question.
My goal is not to include the lines with [0 255 255] in this array, how can I do that?
the type of roi_pixel_img is numpy.ndarray.
is it even possible to answer this question without an example code for you ?
You can do this by creating an indexing array:
r = (roi_pixel_img == [0,255,255]).all(axis = -1)
roi_pixel_img[~r]
The roi_pixel_img == [0,255,255] statement will result in an array with the same shape as roi_pixel_img (say (N, 3)) and will compare element-wise, eg [0,255,0] will result in [True, True, False]. Using .all(axis = -1) Will reduce along the last axis (in this case axis = 1 would produce the same result) and will result in True if all the element match. So r will have shape (N, ).
Using ~r to index will exclude the matching pixels and due to the shape will be broadcast appropriately by numpy.

A boolean list from ranges of True values (start and end), without using a for loop

For example I have this list containing ranges.
x=[[1,4],
[6,7],
[9,9]]
where the first value of each item (e.g. [1,4]) is the start position (1) and, the second value is the end (4) position.
I want to convert this list of ranges into a boolean list, wherein the value is True if the position is between (any of) the ranges (i.e. the start and end positions) indicated in the list above, otherwise the value should be False.
[False, True, True, True, True, False, True, True, False, True]
This is obviously possible using a for loop. However, I am looking for a other options that are one-liners. Ideally, I am looking for some way that could also be applicable to a pandas series.
Note: This is essentially an opposite problem of this question: Get ranges of True values (start and end) in a boolean list (without using a for loop)
A hopefully efficient way using numpy:
low, high = np.array(x).T[:,:, None] # rearrange the limits into a 3d array in a convenient shape
a = np.arange(high.max() + 1) # make a range from 0 to 9
print(((a >= low) & (a <= high)).any(axis=0))
An alternative that edits the array in a python loop:
result = np.zeros(np.array(x).max() + 1, dtype=bool)
for start, end in x:
result[start:end+1] = True
This could be faster depending on the speed of editing a slice of an array relative to numpy 2d matrix comparisons.

Build a new array from an existing using a boolean mask

I have created a boolean mask, say mask, which I want to apply to an existing array, say old to create an entirely new one, say new, which retains only the non zero elements. The new array should then have a smaller dimension with respect to old.
Can some one suggest me the fastest and more coincise way, without using, if possible, the numpy.append function?
Say you have:
old = np.array([2,4,3,5,6])
mask = [True, False, True, False, False]
Simply do:
new = old[mask]
print(new)
[2 3]
I suggest you read about Boolean or “mask” index arrays
Just use logical indexing
x = x[x!=0]

Counting ocurrences of specific True/False ordering in Numpy Array

I have a Numpy Array of True and False values like:
test = np.array([False, False, False, True, False, True, False, True, False,False, False, False, True, True, False, True])
I would like to know the number of times the following pattern (False, True, False) happens in the array. In the test above it will be 4. This is not the only pattern, but I assume that when I understand this code I can probably also make the others.
Of course, I can loop over the array. If the first value is equal, compare the next and otherwise go to the next value in the loop. Like this:
totalTimes=0
def swapToBegin(x):
if(x>=len(test)):
x-=len(test)
return(x)
for i in range(len(test)):
if(test[i]==False):
if(test[swapToBegin(i+1)]==True):
if test[swapToBegin(i+2)]==False:
totalTimes += 1
However, since I need to do this many times, this code will be very slow. Little improvements can be made, since this was made very quickly to show what I need. But there must be a better solution.
Is there a better way to search for a pattern in an array? It does not need to combine the end and beginning of the array, since I would be able to this afterwards. But if it can be included it would be nice.
You haven't given any details on how large test is, so for benchmarks of the methods I've used it has 1000 elements. The next important part is to actually profile the code. You can't say it's slow (or fast) until there are hard numbers to back it up. Your code runs in around 1.49ms on my computer.
You can often get improvements with numpy by removing python loops and replacing them with numpy functions.
So, rather than testing each element individually (lots of if conditions could slow things down) I've put it all into one array comparison, then used all to check that every element matches.
check = array([False, True, False])
sum([(test[i:i+3]==check).all() for i in range(len(test) - 2)])
Profiling this shows it running in 1.91ms.
That's actually a step backwards. So, what could be causing the slowdown? Well, array access using [] creates a new array object which could be part of it. A better approach may be to create one large array with the offsets, then use broadcasting to do the comparison.
sum((c_[test[:-2], test[1:-1], test[2:]] == check).all(1))
This time check is compared with each row of the array c_[test[:-2], test[1:-1], test[2:]]. The axis argument (1) of all is used to only count rows that every element matches. This runs in 40.1us. That's a huge improvement.
Of course, creating the array to broadcast is going to have a large cost in terms of copying elements over. Why not do the comparisons directly?
sum(all([test[i:len(test)-2+i]==v for i, v in enumerate(check)], 0))
This runs in 18.7us.
The last idea to speed things up is using as_strided. This is an advanced trick to alter the strides of an array to get the offset array without copying any data. It's usually not worth the effort, but I'm including it here just for fun.
sum((np.lib.index_tricks.as_strided(test, (len(test) - len(check) + 1, len(check)), test.strides + (1, )) == check).all(1))
This also runs in around 40us. So, the extra effort doesn't add anything in this case.
You can use an array containing [False, True, False] and search for this instead.
searchfor = np.array([False, True, False])

Numpy/Python: Array iteration without for-loop

So it's another n-dimensional array question:
I want to be able to compare each value in an n-dimensional arrays with its neighbours. For example if a is the array which is 2-dimensional i want to be able to check:
a[y][x]==a[y+1][x]
for all elements. So basically check all neighbours in all dimensions. Right now I'm doing it via:
for x in range(1,a.shape[0]-1):
do.something(a[x])
The shape of the array is used, so that I don't run into an index out of range at the edges. So if I want to do something like this in n-D for all elements in the array, I do need n for-loops which seems to be untidy. Is there a way to do so via slicing? Something like a==a[:,-1,:] or am I understanding this fully wrong? And is there a way to tell a slice to stop at the end? Or would there be another idea of getting things to work in a totally other way? Masked arrays?
Greets Joni
Something like:
a = np.array([1,2,3,4,4,5])
a == np.roll(a,1)
which returns
array([False, False, False, False, True, False], dtype=bool
You can specify an axis too for higher dimensions, though as others have said you'll need to handle the edges somehow as the values wrap around (as you can guess from the name)
For a fuller example in 2D:
# generate 2d data
a = np.array((np.random.rand(5,5)) * 10, dtype=np.uint8)
# check all neighbours
for ax in range(len(a.shape)):
for i in [-1,1]:
print a == np.roll(a, i, axis=ax)
This might also be useful, this will compare each element to the following element, along axis=1. You can obviously adjust the axis or the distance. The trick is to make sure that both sides of the == operator have the same shape.
a[:, :-1, :] == a[:, 1:, :]
How about just:
np.diff(a) != 0
?
If you need the neighbours in the other axis, maybe diff the result of np.swapaxes(a) and merge the results together somehow ?

Categories