Finding the Unique Arrays in an List of Arrays - python

I have a list of arrays, say
List = [A,B,C,D,E,...]
where each A,B,C etc. is an nxn array.
I wish to have the most efficient algorithm to find the unique nxn arrays in the list. That is, say if all entries of A and B are equal, then we discard one of them and generate the list
UniqueList = [A,C,D,E,...]

Not sure if there is a faster way, but I think this should be pretty fast (using the built-in unique function of numpy and choosing axis=0 to look for nxn unique arrays. More detail in the numpy doc):
[i for i in np.unique(np.array(List),axis=0)]
Example:
A = np.array([[1,1],[1,1]])
B = np.array([[1,1],[1,2]])
List = [A,B,A]
[array([[1, 1],
[1, 1]]),
array([[1, 1],
[1, 2]]),
array([[1, 1],
[1, 1]])]
Output:
[array([[1, 1],
[1, 1]]),
array([[1, 1],
[1, 2]])]

Related

Applying torch.combinations on multidimensional tensor or tuple of tensors in PyTorch?

Using PyTorch, torch.combinations will only take a 1D tensor as input but I would like to apply it to each 1D tensor in a multidimensional tensor.
inp = torch.tensor([[1, 2, 3],
[2, 3, 4]])
torch.combinations((inp), r=2)
The result is an error saying I can't apply it to that shape but I want to apply it to [1, 2, 3] and [2, 3, 4] individually. I can't do it one by one because the idea is to apply this to large sets of data.
inp = torch.tensor([[1,2,3],[2,3,4]])
inp_tuple = torch.unbind(inp)
print(inp_tuple)
(tensor([1, 2, 3]), tensor([2, 3, 4]))
torch.combinations((inp_tuple), r=2)
I also tried unbinding the tensor and applying it to the tuple of tensors but it gives an error saying it can't be applied to a tuple.
Is there any way that I can get torch.combinations to automatically apply to each individual 1D tensor in a multidimensional tensor or each tensor in a tuple of tensors? If not are there any alternatives to achieve all combinations of each individual part of a multidimensional tensor?
Function torch.combinations returns all possible combinations of size r of the elements contained in the 1D input vector. The reason why multi-dimensional inputs are not supported is probably that you have no guarantee that the different vectors in your input have the exact same number of unique elements. Obviously if one of the vectors has a duplicate element then you would end up with one set of combinations bigger than another which is simply not possible to represent with a homogenous PyTorch tensor.
So from there on, I will assume that the input tensor inp is a 2D tensor shaped (N, C) where each of its N vectors contains C unique elements. The example you gave would fit to this requirement since both vectors have three unique elements each: {1, 2, 3} and {2, 3, 4}.
>>> inp = torch.tensor([[1,2,3],[2,3,4]])
The idea is to apply torch.combinations on an arrangement tensor of length equal to that of our vectors. We can then use those as indices to gather values in our different vectors in our input tensor.
We can retrieve all combinations of an arrangement with the following:
>>> c = torch.combinations(torch.arange(inp.size(1)), r=2)
tensor([[0, 1],
[0, 2],
[1, 2]])
Then we need to reshape and expand both inp and c such that they match in number of dimensions:
>>> x = inp[:,None].expand(-1,len(c),-1)
tensor([[[1, 2, 3],
[1, 2, 3],
[1, 2, 3]],
[[2, 3, 4],
[2, 3, 4],
[2, 3, 4]]])
>>> idx = c[None].expand(len(x), -1, -1)
tensor([[[0, 1],
[0, 2],
[1, 2]],
[[0, 1],
[0, 2],
[1, 2]]])
Finally we can apply torch.gather on x and idx on dim=2. This will return a 3D tensor out such that:
out[i][j][k] = x[i][j][index[i][j][k]]
Let's make our call on torch.gather:
>>> x.gather(dim=2, index=idx)
tensor([[[1, 2],
[1, 3],
[2, 3]],
[[2, 3],
[2, 4],
[3, 4]]])
Which is the desired result.

Taking rows of a matrix given a batch of indices - Python

How can we extract the rows of a matrix given a batch of indices (in Python)?
i = [[0,1],[1,2],[2,3]]
a = jnp.array([[1,2,3,4],[2,3,4,5]])
def extract(A,idx):
A = A[:,idx]
return A
B = extract(a,i)
I expect to get this result (where the matrices are stacked):
B = [[[1,2],
[2,3]],
[[2,3],
[3,4]],
[3,4],
[4,5]]]
And NOT:
B_ = [[1, 2],
[2, 3],
[3, 4]],
[[2, 3],
[3 ,4],
[4, 5]]]
In this case, the rows are stacked, but I want to stack the different matrices.
I tried using
jax.vmap(extract)(a,i),
but this gives me an error since a and i don't have the same dimension.... Is there an alternative, without using loops?
You can do this with vmap if you specify in_axes in the right way, and convert your index list into an index array:
vmap(extract, in_axes=(None, 0))(a, jnp.array(i))
# DeviceArray([[[1, 2],
# [2, 3]],
#
# [[2, 3],
# [3, 4]],
#
# [[3, 4],
# [4, 5]]], dtype=int32)
When you say in_axes=(None, 0), it specifies that you want the first argument to be unmapped, and you want the second argument to be mapped along its leading axis.
The reason you need to convert i from a list to an array is because JAX will only map over array arguments: if vmap encounters a collection like a list, tuple, dict, or a general pytree, it attempts to map over each array-like value within the collection.
You can use indexing right away on the matrix a transposed:
a.T[i,:]

Indexing with lists and arrays in numpy appears inconsistent

Inspired by this other question, I'm trying to wrap my mind around advanced indexing in NumPy and build up more intuitive understanding of how it works.
I've found an interesting case. Here's an array:
>>> y = np.arange(10)
>>> y
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
if I index it a scalar, I get a scalar of course:
>>> y[4]
4
with a 1D array of integers, I get another 1D array:
>>> idx = [4, 3, 2, 1]
>>> y[idx]
array([4, 3, 2, 1])
so if I index it with a 2D array of integers, I get... what do I get?
>>> idx = [[4, 3], [2, 1]]
>>> y[idx]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: too many indices for array
Oh no! The symmetry is broken. I have to index with a 3D array to get a 2D array!
>>> idx = [[[4, 3], [2, 1]]]
>>> y[idx]
array([[4, 3],
[2, 1]])
What makes numpy behave this way?
To make this more interesting, I noticed that indexing with numpy arrays (instead of lists) behaves how I'd intuitively expect, and 2D gives me 2D:
>>> idx = np.array([[4, 3], [2, 1]])
>>> y[idx]
array([[4, 3],
[2, 1]])
This looks inconsistent from where I'm at. What's the rule here?
The reason is the interpretation of lists as index for numpy arrays: Lists are interpreted like tuples and indexing with a tuple is interpreted by NumPy as multidimensional indexing.
Just like arr[1, 2] returns the element arr[1][2] the arr[[[4, 3], [2, 1]]] is identical to arr[[4, 3], [2, 1]] and will, according to the rules of multidimensional indexing return the elements arr[4, 2] and arr[3, 1].
By adding one more list you do tell NumPy that you want slicing along the first dimension, because the outermost list is effectively interpreted as if you only passed in one "list of indices for the first dimension": arr[[[[4, 3], [2, 1]]]].
From the documentation:
Example
From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing:
>>> x = np.array([[1, 2], [3, 4], [5, 6]])
>>> x[[0, 1, 2], [0, 1, 0]]
array([1, 4, 5])
and:
Warning
The definition of advanced indexing means that x[(1,2,3),] is fundamentally different than x[(1,2,3)]. The latter is equivalent to x[1,2,3] which will trigger basic selection while the former will trigger advanced indexing. Be sure to understand why this occurs.
In such cases it's probably better to use np.take:
>>> y.take([[4, 3], [2, 1]]) # 2D array
array([[4, 3],
[2, 1]])
This function [np.take] does the same thing as “fancy” indexing (indexing arrays using arrays); however, it can be easier to use if you need elements along a given axis.
Or convert the indices to an array. That way NumPy interprets it (array is special cased!) as fancy indexing instead of as "multidimensional indexing":
>>> y[np.asarray([[4, 3], [2, 1]])]
array([[4, 3],
[2, 1]])

Alternative to arange with numpy arrays as boundaries

I have two numpy arrays acting as lower and upper boundaries of a range of vectors that I want to generate.
In the a similar way that arange() works, I would like to generate the intermediate members as in the example:
lower_boundary = np.array([1,1])
upper_boundary = np.array([3,3])
expected_result = [[1,1], [1,2], [1,3], [2,1], [2,2], [2,3], [3,1], [3,2], [3,3]]
The result can be a list or another numpy array. So far I have managed to workaround this scenario with nested loops, but the dimensions of 'lower_boundary' and 'upper_boundary' may vary, and my approach is not applicable.
In a typical scenario, both boundaries have at least 4 dimensions.
You can use np.indicies to get a range of index values of your desired range (upper_boundary - lower boundary + 1), reshape it to your needs (reshape(len(upper_boundary),-1)) and add your lower_boundry to values resulting in;
>>> np.indices(upper_boundary - lower_boundary + 1).reshape(len(upper_boundary),-1).T + lower_boundary
array([[1, 1],
[1, 2],
[1, 3],
[2, 1],
[2, 2],
[2, 3],
[3, 1],
[3, 2],
[3, 3]])
Edit: I forgot to correct the code before posting, it should be like this.
Thanks #Divakar for the fix.

NumPy min/max in-place assignment

Is it possible to perform min/max in-place assignment with NumPy multi-dimensional arrays without an extra copy?
Say, a and b are two 2D numpy arrays and I would like to have a[i,j] = min(a[i,j], b[i,j]) for all i and j.
One way to do this is:
a = numpy.minimum(a, b)
But according to the documentation, numpy.minimum creates and returns a new array:
numpy.minimum(x1, x2[, out])
Element-wise minimum of array elements.
Compare two arrays and returns a new array containing the element-wise minima.
So in the code above, it will create a new temporary array (min of a and b), then assign it to a and dispose it, right?
Is there any way to do something like a.min_with(b) so that the min-result is assigned back to a in-place?
numpy.minimum() takes an optional third argument, which is the output array. You can specify a there to have it modified in place:
In [9]: a = np.array([[1, 2, 3], [2, 2, 2], [3, 2, 1]])
In [10]: b = np.array([[3, 2, 1], [1, 2, 1], [1, 2, 1]])
In [11]: np.minimum(a, b, a)
Out[11]:
array([[1, 2, 1],
[1, 2, 1],
[1, 2, 1]])
In [12]: a
Out[12]:
array([[1, 2, 1],
[1, 2, 1],
[1, 2, 1]])

Categories