Using np.where for nested lists - python

I am trying to use np.where() function with nested lists.
I would like to find an index with a given condition of the first layer of the nested list.
For example, if I have the following code
arr = [[1,1], [2,2],[3,3]]
a = np.where(arr == [2,2])
then ideally I would like code to return 'a' as 1.
Since [2,2] is in index 1 of the nested list.
However, I am just getting a empty array back as a result.
Of course, I can make it work easily by implementing external for loop such as
for n in range(len(arr)):
if arr[n] == [2,2]:
a = n
but I would like to implement this simply within the function np.where(write the entire code here).
Is there a way to do this?

Well you can write your own function to do so:
You'll need to
Find every line equal to what you looking for
Get indices of found rows (You can use where):
numpy compression
You can use compression operator to see if each line satisfies the condition. Such as:
np_arr = np.array(
[1, 2, 3, 4, 5]
)
print(np_arr < 3)
This will return a boolean where every element is True or False where the condition is satisfied:
[ True True False False False]
For a 2D array you'll get a 2D boolean array:
to_find = np.array([2, 2])
np_arr = np.array(
[
[1, 1],
[2, 2],
[3, 3],
[2, 2]
]
)
print(np_arr == to_find)
The result is:
[[False False]
[ True True]
[False False]
[ True True]]
Now we are looking for lines with all True values. So we can use all method of ndarray. And we will provide the axis we are looking to look to all. X, Y or Both. We want to look to x axis:
to_find = np.array([2, 2])
np_arr = np.array(
[
[1, 1],
[2, 2],
[3, 3],
[2, 2]
]
)
print((np_arr == to_find).all(axis=1))
The result is:
[False True False True]
Get indices of Trues
At the end you are looking for indices where the values are True:
np.where((np_arr == to_find).all(axis=1))
The result would be:
(array([1, 3]),)

The best solution is that mentioned by #Michael Szczesny, but using np.where you can do this too:
a = np.where(np.array(arr) == [2, 2])[0]
resulted_ind = np.where(np.bincount(a) == 2)[0] # --> [1]

numpy runs in Python, so you can use both the basic Python lists and numpy arrays (which are more like MATLAB matrices)
A list of lists:
In [43]: alist = [[1,1], [2,2],[3,3]]
A list has an index method, which tests against each element of the list (elements here are 2 element lists):
In [44]: alist.index([2,2])
Out[44]: 1
In [45]: alist.index([2,3])
Traceback (most recent call last):
Input In [45] in <cell line: 1>
alist.index([2,3])
ValueError: [2, 3] is not in list
alist==[2,2] returns False, because the list is not the same as the [2,2] list.
If we make an array from that list:
In [46]: arr = np.array(alist)
In [47]: arr
Out[47]:
array([[1, 1],
[2, 2],
[3, 3]])
we can do an == test - but it compares numeric elements.
In [48]: arr == np.array([2,2])
Out[48]:
array([[False, False],
[ True, True],
[False, False]])
Underlying this comparison is the concept of broadcasting, allow it to compare a (3,2) array with a (2,) (a 2d with a 1d). Here's its trivial, but it can be much more complicated.
To find rows where all values are True, use:
In [50]: (arr == np.array([2,2])).all(axis=1)
Out[50]: array([False, True, False])
and where finds the True in that array (the result is a tuple with 1 array):
In [51]: np.where(_)
Out[51]: (array([1]),)
In Octave the equivalent is:
>> arr = [[1,1];[2,2];[3,3]]
arr =
1 1
2 2
3 3
>> all(arr == [2,2],2)
ans =
0
1
0
>> find(all(arr == [2,2],2))
ans = 2

Related

arranging a array without loops list comp or recursion

Definition:Arranged array
an arranged array is an array of dim 2 , shape is square matrix (NXN) and for every cell in the matrix : A[I,J] > A[I,J+1] AND A[I,J] > A[I+1,J]
I have an assignment to write a func that:
gets a numpy array and returns
True if - the given array is an arranged array
False - otherwise
note: We CANNOT use loops, list comps OR recursion. the point of the task is to use numpy things.
assumptions: we can assume that the array isn't empty and has no NA's, also all of the cells are numerics
My code isn't very numpy oriented.. :
def is_square_ordered_matrix(A):
# Checking if the dimension is 2
if A.ndim != 2:
return False
# Checking if it is a squared matrix
if A.shape[0] != A.shape[1]:
return False
# Saving the original shape to reshape later
originalDim = A.shape
# Making it a dim of 1 to use it as a list
arrayAsList = list((A.reshape((1,originalDim[0]**2)))[0])
# Keeping original order before sorting
originalArray = arrayAsList[:]
# Using the values of the list as keys to see if there are doubles
valuesDictionary = dict.fromkeys(arrayAsList, 1)
# If len is different, means there are doubles and i should return False
if len(arrayAsList) != len(valuesDictionary):
return False
# If sorted list is equal to original list it means the original is already ordered and i should return True
arrayAsList.sort(reverse=True)
if originalArray == arrayAsList:
return True
else:
return False
True example:
is_square_ordered_matrix(np.arange(8,-1,-1).reshape((3,3)))
False example:
is_square_ordered_matrix(np.arange(9).reshape((3,3)))
is_square_ordered_matrix(np.arange(5,-1,-1).reshape((3,2)))
Simple comparison:
>>> def is_square_ordered_matrix(a):
... return a.shape[0] == a.shape[1] and np.all(a[:-1] > a[1:]) and np.all(a[:, :-1] > a[:, 1:])
...
>>> is_square_ordered_matrix(np.arange(8,-1,-1).reshape((3,3)))
True
>>> is_square_ordered_matrix(np.arange(9).reshape((3,3)))
False
>>> is_square_ordered_matrix(np.arange(5,-1,-1).reshape((3,2)))
False
First, compare a[:-1] with a[1:], which will compare the elements of each row with the elements of the next row, and then use np.all to judge:
>>> a = np.arange(8,-1,-1).reshape((3,3))
>>> a[:-1] # First and second lines
array([[8, 7, 6],
[5, 4, 3]])
>>> a[1:] # Second and third lines
array([[5, 4, 3],
[2, 1, 0]])
>>> a[:-1] > a[1:]
array([[ True, True, True],
[ True, True, True]])
>>> np.all(a[:-1] > a[1:])
True
Then compare a[:,:-1] with a[:, 1:], which will compare the columns:
>>> a[:, :-1] # First and second columns
array([[8, 7],
[5, 4],
[2, 1]])
>>> a[:, 1:] # Second and third columns
array([[7, 6],
[4, 3],
[1, 0]])
>>> a[:, :-1] > a[:, 1:]
array([[ True, True],
[ True, True],
[ True, True]])
The result of row comparison and column comparison is the result you want.

Python boolean arrays confusion in jupyter IDE

I am new to boolean arrays and find these statements confusing
import numpy as np
a = np.arange(5)
the output of array a is : array([0, 1, 2, 3, 4])
But when i write down this
b = a[True, True, False, False, False]
and print the array b using
print(b)
the output is :
[]
As far as I understand I want to transfer some elements from array a to array b, but why is b empty?
What is happening in this code?
Try nested brackets:
a = np.arange(5)
b = a[[True, True, False, False, False]]
print(b)
Output:
[0 1]

Numpy getting row indices of last two elements of each column in mask

I have a boolean mask shaped (M, N). Each column in the mask may have a different number of True elements, but is guaranteed to have at least two. I want to find the row index of the last two such elements as efficiently as possible.
If I only wanted one element, I could do something like (M - 1) - np.argmax(mask[::-1, :], axis=0). However, that won't help me get the second-to-last index.
I've come up with an iterative solution using np.where or np.nonzero:
M = 4
N = 3
mask = np.array([
[False, True, True],
[True, False, True],
[True, False, True],
[False, True, False]
])
result = np.zeros((2, N), dtype=np.intp)
for col in range(N):
result[:, col] = np.flatnonzero(mask[:, col])[-2:]
This creates the expected result:
array([[1, 0, 1],
[2, 3, 2]], dtype=int64)
I would like to avoid the final loop. Is there a reasonably vectorized form of the above? I am looking for specifically two rows, which are always guaranteed to exist. A general solution for arbitrary element counts is not required.
An argsort does it -
In [9]: np.argsort(mask,axis=0,kind='stable')[-2:]
Out[9]:
array([[1, 0, 1],
[2, 3, 2]])
Another with cumsum -
c = mask.cumsum(0)
out = np.where((mask & (c>=c[-1]-1)).T)[1].reshape(-1,2).T
Specifically for exactly two rows, one way with argmax -
c = mask.copy()
idx = len(c)-c[::-1].argmax(0)-1
c[idx,np.arange(len(idx))] = 0
idx2 = len(c)-c[::-1].argmax(0)-1
out = np.vstack((idx2,idx))

classify np.arrays as duplicates

My goal is to take a list of np.arrays and create an associated list or array that classifies each as having a duplicate or not. Here's what I thought would work:
www = [np.array([1, 1, 1]), np.array([1, 1, 1]), np.array([2, 1, 1])]
uniques, counts = np.unique(www, axis = 0, return_counts = True)
counts = [1 if x > 1 else 0 for x in counts]
count_dict = dict(zip(uniques, counts))
[count_dict[i] for i in www]
The desired output for this case would be :
[1, 1, 0]
because the first and second element have another copy within the original list. It seems that the problem is that I cannot use a np.array as a key for a dictionary.
Suggestions?
First convert www to a 2D Numpy array then do the following:
In [18]: (counts[np.where((www[:,None] == uniques).all(2))[1]] > 1).astype(int)
Out[18]: array([1, 1, 0])
here we use broadcasting for check the equality of all www rows with uniques array and then using all() on last axis to find out which of its rows are completely equal to uniques rows.
Here's the elaborated results:
In [20]: (www[:,None] == uniques).all(2)
Out[20]:
array([[ True, False],
[ True, False],
[False, True]])
# Respective indices in `counts` array
In [21]: np.where((www[:,None] == uniques).all(2))[1]
Out[21]: array([0, 0, 1])
In [22]: counts[np.where((www[:,None] == uniques).all(2))[1]] > 1
Out[22]: array([ True, True, False])
In [23]: (counts[np.where((www[:,None] == uniques).all(2))[1]] > 1).astype(int)
Out[23]: array([1, 1, 0])
In Python, lists (and numpy arrays) cannot be hashed, so they can't be used as dictionary keys. But tuples can! So one option would be to convert your original list to a tuple, and to convert uniques to a tuple. The following works for me:
www = [np.array([1, 1, 1]), np.array([1, 1, 1]), np.array([2, 1, 1])]
www_tuples = [tuple(l) for l in www] # list of tuples
uniques, counts = np.unique(www, axis = 0, return_counts = True)
counts = [1 if x > 1 else 0 for x in counts]
# convert uniques to tuples
uniques_tuples = [tuple(l) for l in uniques]
count_dict = dict(zip(uniques_tuples, counts))
[count_dict[i] for i in www_tuples]
Just a heads-up: this will double your memory consumption, so it may not be the best solution if www is large.
You can mitigate the extra memory consumption by ingesting your data as tuples instead of numpy arrays if possible.

Finding median of masked ndarrays representing images

I have 5 grayscale images in the form of 288x288 ndarrays. The values in each ndarray are just numpy.float32 numbers ranging from 0.0 to 255.0. For each ndarray, I've created a numpy.ma.MaskedArray object as follows:
def bool_row(row):
return [value == 183. for value in row]
mask = [bool_row(row) for row in nd_array_1]
masked_array_1 = ma.masked_array(nd_array_1, mask=mask)
The value 183. represents "garbage" in the image. All 5 images have a bit of "garbage" in them. I want to take the median of the masked images, where taking the median for each point should ignore any masked values. The result would be the correct image with no garbage.
When I try:
ma.median([masked_array_1, masked_array_2, masked_array_3, masked_array_4, masked_array_5], axis=0)
I get what seems to be the median except instead of ignoring masked values, it treats them as 183., so the result just has the superimposed garbage from all the pictures. When I just take the median of two masked images:
ma.median([masked_array_1, masked_array_2], axis=0)
It looks like it started to do the right thing, but then placed the value of 183. even where both masked arrays contain a MaskedConstant.
I could do something like the following, but I feel there's probably a way to make ma.median just behave as expected:
unmasked_array_12 = ma.median([masked_array_1, masked_array_2], axis=0)
mask = [bool_row(row) for row in unmasked_array_12]
masked_array_12 = ma.masked_array(unmasked_array_12, mask=mask)
unmasked_array_123 = ma.median([masked_array_12, masked_array_3], axis=0)
mask = [bool_row(row) for row in unmasked_array_123]
masked_array_123 = ma.masked_array(unmasked_array_123, mask=mask)
...
How do I make ma.median work as expected without resorting to the above unpleasantness?
I suspect the problem is in how ma.median handles a non-array argument. It might be converting a list to a plain numpy array, without checking the types of the elements of the list.
Consider the following example with 1-D arrays:
In [64]: a = ma.array([1, 2, -10, 3, -10, -10], mask=[0,0,1,0,1,1])
In [65]: b = ma.array([1, 2, -10, -10, 4, -10], mask=[0,0,1,1,0,1])
In [66]: a
Out[66]:
masked_array(data = [1 2 -- 3 -- --],
mask = [False False True False True True],
fill_value = 999999)
In [67]: b
Out[67]:
masked_array(data = [1 2 -- -- 4 --],
mask = [False False True True False True],
fill_value = 999999)
The following are not correct--it appears to ignore the masks:
In [68]: ma.median([a, b])
Out[68]: -4.5
In [69]: ma.median([a, b], axis=0)
Out[69]:
masked_array(data = [ 1. 2. -10. -3.5 -3. -10. ],
mask = False,
fill_value = 1e+20)
However, if I first create a new masked array using ma.array, ma.median handles it correctly:
In [70]: c = ma.array([a, b])
In [71]: c
Out[71]:
masked_array(data =
[[1 2 -- 3 -- --]
[1 2 -- -- 4 --]],
mask =
[[False False True False True True]
[False False True True False True]],
fill_value = 999999)
In [72]: ma.median(c)
Out[72]: 2.0
In [73]: ma.median(c, axis=0)
Out[73]:
masked_array(data = [1.0 2.0 -- 3.0 4.0 --],
mask = [False False True False False True],
fill_value = 1e+20)
So to fix your problem, it might be as simple as replacing this:
ma.median([masked_array_1, masked_array_2, masked_array_3, masked_array_4, masked_array_5], axis=0)
with this:
stacked = ma.array([masked_array_1, masked_array_2, masked_array_3, masked_array_4, masked_array_5])
ma.median(stacked, axis=0)
you can use the following to get rid of all of the 183 values just while calculating the median:
masked_arrays=[masked_array_1, masked_array_2, masked_array_3]
no_junk_arrays=[[x for x in masked_array if x is not 183] for masked_array in masked_arrays]
ma.median(no_junk_arrays)
For example
>>> masked_array_1 = [1,183,4]
>>> masked_array_2 = [1,183,2]
>>> masked_array_3 = [2,183,2]
>>> masked_arrays=[masked_array_1,masked_array_2,masked_array_3]
>>> no_junk_arrays=[[x for x in masked_array if x is not 183] for masked_array in masked_arrays]
>>> no_junk_arrays
[[1, 4], [1, 2], [2, 2]]
I'm sure it can be done if you find the clever sequence of numpy functions to invoke. But it can also be done naively:
def merge(a1, a2):
result = []
for x, y in zip(a1, a2):
if x == 183:
x = y
result.append(x)
return result
array_1 = [1, 183, 2]
array_2 = [1, 183, 183]
array_3 = [183, 4, 2]
print merge(merge(array_1, array_2), array_3)
If the result runs really too slowly, you can try the same code on PyPy instead of CPython.
If what you are after is fetching the non-nan value for every pixel, you could do someting along the lines of:
stacked_imgs = np.dstack((img1, img2, img3))
mask = stacked_imgs == 183
# Find the first False, i.e. non-183 entry, along stack axis
index = np.argmin(mask, axis=-1)
correct_image = stacked_image[..., index]
If all non-183 entries for a given pixel are always the same, this will give you the result you are after.

Categories