How to reduce multiple arrays to one? - python

Ignoring my data structure and just given a list of multiple numpy arrays (all arrays have the exact same size/shape):
list[0] = [[0, 2, 0],
[1, 3, 0]]
list[1] = [[0, 0, 0],
[0, 0, 3]]
list[2] = [[5, 0, 0],
[0, 0, 0]]
I want to reduce this list of arrays to only one array. Zero means there is no value. For every entry the only given value should be taken. There are no overlapping values. There is always only one assigned value in one array for a given position.
result: [[5, 2, 0]
[1, 3, 3]]
In my case I have a dictionary with tuples as keys and arrays as values. The arrays are Boolean arrays. Every dict entry represents a special channel and at one specific position only one dict entry has the value True.
I now want to replace all True values by the dictionary keys and reduce this dictionary down to only one array.
For example (near my real data):
dict { (9, 2, 6): [[False, True],
[True, False]]
(1, 5, 8): [[True, False],
[False, True]] }
result: [[(1, 5, 8),(9, 2, 6)]
[(9, 2, 6),(1, 5, 8)]]
How could this be done with list comprehension, a numpy function and/or map & reduce?
First try:
At first I thought I could just turn my numpy arrays into 0 & 1 (.astype(np.float32)) and then just multiply these arrays with my key:
values_filled = [key_tuple * value_array for key_tuple, value_array in dict]
And then to just sum over all arrays:
final = reduce(lambda right, left: right + left, values_filled)
But this obviously doesn't work since my keys are tuples of values and not just numbers.
What I try to archive is to do the opposite of the following operation:
{color: np.all(mask, axis=-1) for (color, mask) in
((color, segmentation == color) for color in colors) if mask.max()}
With this operation I take a segmented image and create a dictionary with predefined colors. The numpy arrays have True at every position where the color is equal to the color in the image at that position / equal to the key of the dictionary.
Now I want to reduce this dictionary back to an image (there where changes on the dictionary).

The first part of your question just requires an array sum:
In [167]: alist = [[[0, 2, 0],
...: [1, 3, 0]],[[0, 0, 0],
...: [0, 0, 3]],[[5, 0, 0],
...: [0, 0, 0]]]
...:
In [168]: alist
Out[168]: [[[0, 2, 0], [1, 3, 0]], [[0, 0, 0], [0, 0, 3]], [[5, 0, 0], [0, 0, 0]]]
In [169]: np.array(alist).shape
Out[169]: (3, 2, 3)
In [170]: np.array(alist).sum(axis=0)
Out[170]:
array([[5, 2, 0],
[1, 3, 3]])
That takes advantage of the fact that 0 doesn't affect the sum, and there aren't any overlapping values.
You apparently have a second question involving a dictionary of boolean arrays or masks. Assuming that's related to the first question, then you just need a way of translating those masks into the list of arrays (or lists) as given in the first.
Starting with the dictionary, we'll need to iterate over the keys (or items). We can use the same summing. After a little experimentation I decided I,J=np.where(v) was the easiest way of mapping the boolean mask on to the target array:
In [200]: dd={ (9, 2, 6): [[False, True],
...: [True, False]],
...: (1, 5, 8): [[True, False],
...: [False, True]] }
...:
In [201]: arr = np.zeros((2,2,3),int)
In [202]: for k,v in dd.items():
...: I,J = np.where(v)
...: arr[I,J,:] += k
...:
In [203]: arr
Out[203]:
array([[[1, 5, 8],
[9, 2, 6]],
[[9, 2, 6],
[1, 5, 8]]])
For the last iteration:
In [204]: k
Out[204]: (1, 5, 8)
In [205]: v
Out[205]: [[True, False], [False, True]]
In [206]: I,J=np.where(v)
In [207]: I,J
Out[207]: (array([0, 1]), array([0, 1]))
In [208]: arr[I,J,:]
Out[208]:
array([[1, 5, 8],
[1, 5, 8]])

This can be done without anything fancy, just good old for loops and list comprehension, and enumerate. I'm sure there's a better one liner out there or a library that can cover it, but here's a vanilla Python solution :
d = { (9, 2, 6): [[False, True],[True, False]],(1, 5, 8): [[True, False],[False, True]] }
new_list = []
for k,v in d.items():
if new_list:
for i, each in enumerate(v):
x = [k if z else new_list[i][j] for j,z in enumerate(each)]
new_list[i] = x
else:
for each in v:
new_list.append([k if x else x for x in each])
print(new_list) # [[(1, 5, 8), (9, 2, 6)], [(9, 2, 6), (1, 5, 8)]]
P.S. also thank you for showing your effort.

Another numpy approach, in case you want it to be a normal numpy array afterwards:
import numpy as np
d = {(9, 2, 6): [[False, True],
[True, False]],
(1, 5, 8): [[True, False],
[False, True]]}
x = np.sum(np.reshape(k, (1,1,-1)) * np.array(v)[..., None] for k, v in d.items())
# x = np.sum(np.array(k)[None, None, :] * np.array(v)[..., None] for k, v in d.items()) # Alternative way
print(X)
# array([[[1, 5, 8],
# [9, 2, 6]],
# [[9, 2, 6],
# [1, 5, 8]]])
np.all(x == np.array([[(1, 5, 8),(9, 2, 6)], [(9, 2, 6),(1, 5, 8)]]))
# True
This basically uses the approach you outlined in your question, of multiplying the truth mask with the value. I just added the fact that the content of that value is another (third) dimension and used numpys broadcasting features to achieve this.

Here is a Numpythonic approach:
arr = np.dstack((lst1, lst2, lst3)) # Create columns along the first axis
mask = arr.astype(bool) # create a mask from places that are nonzero
mask2 = (~mask).all(axis=-1) # another mask that gives rows that are all zero
mask[mask2] = [True, False, False] # Truthify one optional item in all-zero rows
result = arr[mask].reshape(lst1.shape) # get the desire items and reshape
Demo:
In [135]: arr = np.dstack((lst1, lst2, lst3))
In [136]: arr
Out[136]:
array([[[0, 0, 5],
[2, 0, 0],
[0, 0, 0]],
[[1, 0, 0],
[3, 0, 0],
[0, 3, 0]]])
In [137]: mask = arr.astype(bool)
In [138]: mask2 = (~mask).all(axis=-1)
In [139]: mask[mask2] = [True, False, False]
In [140]: arr[mask].reshape(lst1.shape)
Out[140]:
array([[5, 2, 0],
[1, 3, 3]])

Related

Compare two 3d Numpy array and return unmatched values with index and later recreate them without loop

I am currently working on a problem where in one requirement I need to compare two 3d NumPy arrays and return the unmatched values with their index position and later recreate the same array. Currently, the only approach I can think of is to loop across the arrays to get the values during comparing and later recreating. The problem is with scale as there will be hundreds of arrays and looping effects the Latency of the overall application. I would be thankful if anyone can help me with better utilization of NumPy comparison while using minimal or no loops. A dummy code is below:
def compare_array(final_array_list):
base_array = None
i = 0
for array in final_array_list:
if i==0:
base_array =array[0]
else:
index = np.where(base_array != array)
#getting index like (array([0, 1]), array([1, 1]), array([2, 2]))
# to access all unmatched values I need to loop.Need to avoid loop here
i=i+1
return [base_array, [unmatched value (8,10)and its index (array([0, 1]), array([1, 1]), array([2, 2])],..]
# similarly recreate array1 back
def recreate_array(array_list):
# need to avoid looping while recreating array back
return list of array #i.e. [base_array, array_1]
# creating dummy array
base_array = np.array([[[1, 2, 3], [3, 4, 5]], [[5, 6, 7], [7, 8, 9]]])
array_1 = b = np.array([[[1, 2,3], [3, 4,8]], [[5, 6,7], [7, 8,10]]])
final_array_list = [base_array,array_1, ...... ]
#compare base_array with other arrays and get unmatched values (like 8,10 in array_1) and their index
difff_array = compare_array(final_array_list)
# recreate array1 from the base array after receiving unmatched value and its index value
recreate_array(difff_array)
I think this may be what you're looking for:
base_array = np.array([[[1, 2, 3], [3, 4, 5]], [[5, 6, 7], [7, 8, 9]]])
array_1 = b = np.array([[[1, 2,3], [3, 4,8]], [[5, 6,7], [7, 8,10]]])
match_mask = (base_array == array_1)
idx_unmatched = np.argwhere(~match_mask)
# idx_unmatched:
# array([[0, 1, 2],
# [1, 1, 2]])
# values with associated with idx_unmatched:
values_unmatched = base_array[tuple(idx_unmatched.T)]
# values_unmatched:
# array([5, 9])
I'm not sure I understand what you mean by "recreate them" (completely recreate them? why not use the arrays themselves?).
I can help you though by noting that ther are plenty of functions which vectorize with numpy, and as a general rule of thumb, do not use for loops unless G-d himself tells you to :)
For example:
If a, b are any np.arrays (regardless of dimensions), the simple a == b will return a numpy array of the same size, with boolean values. Trues = they are equal in this coordinate, and False otherwise.
The function np.where(c), will convert c to a boolean np.array, and return you the indexes in which c is True.
To clarify:
Here I instantiate two arrays, with b differing from a with -1 values:
Note what a==b is, at the end.
>>> a = np.random.randint(low=0, high=10, size=(4, 4))
>>> b = np.copy(a)
>>> b[2, 3] = -1
>>> b[0, 1] = -1
>>> b[1, 1] = -1
>>> a
array([[9, 9, 3, 4],
[8, 4, 6, 7],
[8, 4, 5, 5],
[1, 7, 2, 5]])
>>> b
array([[ 9, -1, 3, 4],
[ 8, -1, 6, 7],
[ 8, 4, 5, -1],
[ 1, 7, 2, 5]])
>>> a == b
array([[ True, False, True, True],
[ True, False, True, True],
[ True, True, True, False],
[ True, True, True, True]])
Now the function np.where, which output is a bit tricky, but can be used easily. This will return two arrays of the same size: the first array is the rows and the second array is the columns at places in which the given array is True.
>>> np.where(a == b)
(array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3], dtype=int64), array([0, 2, 3, 0, 2, 3, 0, 1, 2, 0, 1, 2, 3], dtype=int64))
Now you can "fix" the b array to match a, by switching the values of b ar indexes in which it differs from a, to be a's indexes:
>>> b[np.where(a != b)]
array([-1, -1, -1])
>>> b[np.where(a != b)] = a[np.where(a != b)]
>>> np.all(a == b)
True

Convert a list to a numpy mask array

Given list like indice = [1, 0, 2] and dimension m = 3, I want to get the mask array like this
>>> import numpy as np
>>> mask_array = np.array([ [1, 1, 0], [1, 0, 0], [1, 1, 1] ])
>>> mask_array
[[1, 1, 0],
[1, 0, 0],
[1, 1, 1]]
Given m = 3, so the axis=1 of mask_array is 3, the row of mask_array indicates the length of indice.
For converting the indice to mask_array, the rule is marking the item values whose index is less or equal to the each entry of inside to value 1. For example, indice[0]=1, so the output is [1, 1, 0], given dimension is 3.
In NumPy, are there any APIs which can be used to do this?
Sure, just use broadcasting with arange(m), make sure to use an np.array for the indices, not a list...
>>> indice = [1, 0, 2]
>>> m = 3
>>> np.arange(m) <= np.array(indice)[..., None]
array([[ True, True, False],
[ True, False, False],
[ True, True, True]])
Note, the [..., None] just reshapes the indices array so that the broadcasting works like we want, like this:
>>> indices = np.array(indice)
>>> indices
array([1, 0, 2])
>>> indices[...,None]
array([[1],
[0],
[2]])

2d numpy mask not working as expected

I'm trying to turn a 2x3 numpy array into a 2x2 array by removing select indexes.
I think I can do this with a mask array with true/false values.
Given
[ 1, 2, 3],
[ 4, 1, 6]
I want to remove one element from each row to give me:
[ 2, 3],
[ 4, 6]
However this method isn't working quite like I would expect:
import numpy as np
in_array = np.array([
[ 1, 2, 3],
[ 4, 1, 6]
])
mask = np.array([
[False, True, True],
[True, False, True]
])
print in_array[mask]
Gives me:
[2 3 4 6]
Which is not what I want. Any ideas?
The only thing 'wrong' with that is it is the shape - 1d rather than 2. But what if your mask was
mask = np.array([
[False, True, False],
[True, False, True]
])
1 value in the first row, 2 in second. It couldn't return that as a 2d array, could it?
So the default behavior when masking like this is to return a 1d, or raveled result.
Boolean indexing like this is effectively a where indexing:
In [19]: np.where(mask)
Out[19]: (array([0, 0, 1, 1], dtype=int32), array([1, 2, 0, 2], dtype=int32))
In [20]: in_array[_]
Out[20]: array([2, 3, 4, 6])
It finds the elements of the mask which are true, and then selects the corresponding elements of the in_array.
Maybe the transpose of where is easier to visualize:
In [21]: np.argwhere(mask)
Out[21]:
array([[0, 1],
[0, 2],
[1, 0],
[1, 2]], dtype=int32)
and indexing iteratively:
In [23]: for ij in np.argwhere(mask):
...: print(in_array[tuple(ij)])
...:
2
3
4
6

Numpy: Find column index for element on each row

Suppose I have a vector with elements to find:
a = np.array([1, 5, 9, 7])
Now I have a matrix where those elements should be searched:
M = np.array([
[0, 1, 9],
[5, 3, 8],
[3, 9, 0],
[0, 1, 7]
])
Now I'd like to get an index array telling in which column of row j of M the element j of a occurs.
The result would be:
[1, 0, 1, 2]
Does Numpy offer such a function?
(Thanks for the answers with list comprehensions, but that's not an option performance-wise. I also apologize for mentioning Numpy just in the final question.)
Note the result of:
M == a[:, None]
>>> array([[False, True, False],
[ True, False, False],
[False, True, False],
[False, False, True]], dtype=bool)
The indices can be retrieved with:
yind, xind = numpy.where(M == a[:, None])
>>> (array([0, 1, 2, 3], dtype=int64), array([1, 0, 1, 2], dtype=int64))
For the first match in each row, it might be an efficient way to use argmax after extending a to 2D as done in #Benjamin's post -
(M == a[:,None]).argmax(1)
Sample run -
In [16]: M
Out[16]:
array([[0, 1, 9],
[5, 3, 8],
[3, 9, 0],
[0, 1, 7]])
In [17]: a
Out[17]: array([1, 5, 9, 7])
In [18]: a[:,None]
Out[18]:
array([[1],
[5],
[9],
[7]])
In [19]: (M == a[:,None]).argmax(1)
Out[19]: array([1, 0, 1, 2])
Lazy solution without any import:
a = [1, 5, 9, 7]
M = [
[0, 1, 9],
[5, 3, 8],
[3, 9, 0],
[0, 1, 7],
]
for n, i in enumerate(M):
for j in a:
if j in i:
print("{} found at row {} column: {}".format(j, n, i.index(j)))
Returns:
1 found at row 0 column: 1
9 found at row 0 column: 2
5 found at row 1 column: 0
9 found at row 2 column: 1
1 found at row 3 column: 1
7 found at row 3 column: 2
Maybe something like this?
>>> [list(M[i,:]).index(a[i]) for i in range(len(a))]
[1, 0, 1, 2]
[sub.index(val) if val in sub else -1 for sub, val in zip(M, a)]
# [1, 0, 1, 2]

Comparing elements in a numpy array, finding pairs, how to deal with edges/corners

I'm trying to create a function below, but I'm not how to exactly to execute this.
Let's say I have a 2D numpy array like
import numpy as np
arr = np.array([[ 1, 2, 3, 4], [ 1, 6, 7, 8], [ 1, 1, 1, 12], [13, 3, 15, 16]])
This is a 4x4 matrix, which looks like this when printed:
array([[ 1, 2, 3, 4],
[ 1, 6, 7, 8],
[ 1, 1, 1, 12],
[13, 3, 15, 16]])
I want to access the elements of arr and compare them to each other. For each element, I would like to see whether all surrounding eight elements (top, bottom, left, right, top-left, top-right, bottom-left, bottom-right) are greater than, less than, or equal to this element I'm at.
I thought about using a if statement in a function like this:
if arr[i][j] == arr[i][j+1]:
print("Found a pair! %d is equal to %d, it's in location (%d, %d)", % (arr[i][j], arr[i][j+1], i, j+1))
elif:
arr[i][j] > arr[i][j+1]:
print("%d is greater than %d, it's in location (%d, %d)", % (arr[i][j], arr[i][j+1], i, j+1))
else:
print("%d is less than %d, it's in location (%d, %d)", % (arr[i][j], arr[i][j+1], i, j+1))
However, (1) I have to do this for all eight surrounding element positions and (2) I'm not sure how to write the function such that it moves from position to position correctly. Somehow one must use recursion for this to work, I think. One could possibly use a while loop as well.
I'm planning on saving all the "pairs" with are equal, and creating a dictionary with these.
EDIT1:
There is still a problem I'm having to understand where the dimensions are:
Our original matrix is shaped (4,4):
When we compare for adjacent pairs horizontally, we find an array shaped (4,3):
arr[:-1] == arr[1:]
#output
array([[ True, False, False, False],
[ True, False, False, False],
[False, False, False, False]], dtype=bool)
When we compare for adjacent pairs vertically, we find an array shaped (3,4):
arr[:, :-1] == arr[:, 1:]
# output
array([[False, False, False],
[False, False, False],
[ True, True, False],
[False, False, False]], dtype=bool)
When I combine these two to see whether there are pairs both vertically and horizontally, how do I know I am not mixing up positions?
Although I don't find it entirely clear what you want to do, adjacent array slices might be a convenient method. For example, arr[:-1] == arr[1:] will tell you where there are pairs in adjacent rows. Then, arr[arr[:-1] == arr[1:]] can give you an array of those values and argwhere can give you the indexes.
>>> import numpy as np
>>> arr
array([[3, 1, 0, 2, 3, 3],
[2, 1, 2, 2, 3, 3],
[2, 3, 0, 1, 1, 0],
[2, 1, 3, 3, 1, 2]])
>>> hpairs = (arr[:, :-1] == arr[:, 1:])
>>> hpairs
array([[False, False, False, False, True],
[False, False, True, False, True],
[False, False, False, True, False],
[False, False, True, False, False]], dtype=bool)
>>> arr[hpairs]
array([3, 2, 3, 1, 3])
>>> np.argwhere(hpairs)
array([[0, 4],
[1, 2],
[1, 4],
[2, 3],
[3, 2]], dtype=int64)
Change the == operator and directions of slices as needed.
That we get a smaller array as a result of the comparison makes sense. After all, the number of possible horizontal pairs is one less than the array width. If either slice used for the comparison arr[:, :-1] == arr[:, 1:] is indexed with the boolean array, we get the left or the right numbers of the pairs. Analogously for the other directions.
What if there are pairs in multiple directions? That depends on what you want to do with them, I suppose. Let's say you want to find clusters of at least three equal numbers in the shape of an L turned 180 degrees. In other words, any position that is the upper of a vertical, and the right of a horizontal pair. (Same sample data as before.)
>>> vpairs = (arr[:-1] == arr[1:])
>>> hpairs[:-1] & vpairs[:, 1:]
array([[False, False, False, False, True],
[False, False, False, False, False],
[False, False, False, True, False]], dtype=bool)
If you want to count the number of equal neighbors at each position, here is one way to do it.
>>> backslashpairs = (arr[:-1, :-1] == arr[1:, 1:])
>>> slashpairs = (arr[1:, :-1] == arr[:-1, 1:])
>>>
>>> equal_neighbors = np.zeros_like(arr, dtype=int)
>>> equal_neighbors[:-1] += vpairs
>>> equal_neighbors[1:] += vpairs
>>> equal_neighbors[:, :-1] += hpairs
>>> equal_neighbors[:, 1:] += hpairs
>>> equal_neighbors[1:, :-1] += slashpairs
>>> equal_neighbors[:-1, 1:] += slashpairs
>>> equal_neighbors[:-1, :-1] += backslashpairs
>>> equal_neighbors[1:, 1:] += backslashpairs
>>> equal_neighbors
array([[0, 1, 0, 2, 3, 3],
[1, 1, 2, 2, 3, 3],
[2, 1, 0, 2, 2, 0],
[1, 0, 2, 1, 2, 0]])
There may be some nice numpy or scipy function out there that does this, but not one I know of.
Below is one solution to solve this.
To add some confusion to it, I have indexed the rows with x, and the columns with y. That simply means that the element at (2, 1) is 7.
The trick with edges and corners is simply to expand the matrix with a border, that is later ignored.
import numpy as np
arr = np.array([[1, 2, 3, 4], [1, 6, 7, 8], [1, 2, 3, 12], [13, 3, 15, 16]])
arr2 = np.zeros((arr.shape[0]+2, arr.shape[1]+2), dtype=arr.dtype)
arr2[1:-1,1:-1] = arr
results = np.zeros(arr2.shape + (9,), dtype=np.int)
print(arr)
transform = {'y': [-1, 0, 1, -1, 1, -1, 0, 1],
'x': [-1, -1, -1, 0, 0, 1, 1, 1]}
for x in range(1, arr2.shape[0]-1):
for y in range(1, arr2.shape[1]-1):
subarr = arr2[x-1:x+2,y-1:y+2].flatten()
mid = len(subarr)//2
value = subarr[mid]
greater = (subarr > value).astype(np.int)
smaller = (subarr < value).astype(np.int)
results[x,y,:] += greater
results[x,y,:] -= smaller
results = np.dstack((results[1:-1,1:-1,:4], results[1:-1,1:-1,5:]))
xpos, ypos, zpos = np.where(results == 0)
matches = []
for x, y, z in zip(xpos, ypos, zpos):
matches.append(((x, y), x+transform['x'][z], y+transform['y'][z]))
print(matches)
which results in
[[ 1 2 3 4]
[ 1 1 7 8]
[ 1 2 3 12]
[13 3 15 16]]
[((0, 0), 1, 0), ((0, 0), 1, 1), ((1, 0), 0, 0), ((1, 0), 1, 1), ((1, 0), 2, 0), ((1, 1), 0, 0), ((1, 1), 1, 0), ((1, 1), 2, 0), ((2, 0), 1, 0), ((2, 0), 1, 1), ((2, 2), 3, 1), ((3, 1), 2, 2)]
In the above code, I store neighbouring matches for equal, greater or larger than as 0, 1 or -1 in a z-dimension. By using a simple transformation, the index of the z-dimension translates to an offset from the point under consideration.
The dstack step is not really necessary, but it gets rid of both the added border, and the self-matches (there's no simply way to "slice out" an element in the middle of an array).
Pairs for greater or smaller than matches can be found by simply chaning the where condition, since those matches are stored as 1 or -1 in the results array.
I am not using a dict to store the result, since this is essentially not possible: a single point can have multiple matches: a dict could only store one match for a single point (using a (x, y) coordinate tuple as key). Hence the matches are stored in a list, with each element a tuple of
((x, y), (xmatch, ymatch))
tuples
Since each pair is matched both ways, all matching pairs are contained twice in matches.

Categories