np.array([1,2,3])
I've got numpy array. I would like to turn it into a numpy array with tuples of each 1:1 permutation. Like this:
np.array([
[(1,1),(1,2),(1,3)],
[(2,1),(2,2),(2,3)],
[(3,1),(3,2),(3,3)],
])
Any thoughts on how to do this efficiently? I need to do this operation a few million times.
You can do something like this:
>>> a = np.array([1, 2, 3])
>>> n = a.size
>>> np.vstack((np.repeat(a, n), np.tile(a, n))).T.reshape(n, n, 2)
array([[[1, 1],
[1, 2],
[1, 3]],
[[2, 1],
[2, 2],
[2, 3]],
[[3, 1],
[3, 2],
[3, 3]]])
Or as suggested by #Jaime you can get around 10x speedup if we take advantage of broadcasting here:
>>> a = np.array([1, 2, 3])
>>> n = a.size
>>> perm = np.empty((n, n, 2), dtype=a.dtype)
perm[..., 0] = a[:, None]
perm[..., 1] = a
...
>>> perm
array([[[1, 1],
[1, 2],
[1, 3]],
[[2, 1],
[2, 2],
[2, 3]],
[[3, 1],
[3, 2],
[3, 3]]])
Timing comparisons:
>>> a = np.array([1, 2, 3]*100)
>>> %%timeit
np.vstack((np.repeat(a, n), np.tile(a, n))).T.reshape(n, n, 2)
...
1000 loops, best of 3: 934 µs per loop
>>> %%timeit
perm = np.empty((n, n, 2), dtype=a.dtype)
perm[..., 0] = a[:, None]
perm[..., 1] = a
...
10000 loops, best of 3: 111 µs per loop
If you're working with numpy, don't work with tuples. Use its power and add another dimension of size two.
My recommendation is:
x = np.array([1,2,3])
np.vstack(([np.vstack((x, x, x))], [np.vstack((x, x, x)).T])).T
or:
im = np.vstack((x, x, x))
np.vstack(([im], [im.T])).T
And for a general array:
ix = np.vstack([x for _ in range(x.shape[0])])
return np.vstack(([ix], [ix.T])).T
This will produce what you want:
array([[[1, 1],
[1, 2],
[1, 3]],
[[2, 1],
[2, 2],
[2, 3]],
[[3, 1],
[3, 2],
[3, 3]]])
But as a 3D matrix, as you can see when looking at its shape:
Out[25]: (3L, 3L, 2L)
This is more efficient than the solution with permutations as the array size get's bigger. Timing my solution against #Kasra's yields 1ms for mine vs. 46ms for the one with permutations for an array of size 100. #AshwiniChaudhary's solution is more efficient though.
Yet another way using numpy.meshgrid.
>>> x = np.array([1, 2, 3])
>>> perms = np.stack(np.meshgrid(x, x))
>>> perms
array([[[1, 2, 3],
[1, 2, 3],
[1, 2, 3]],
[[1, 1, 1],
[2, 2, 2],
[3, 3, 3]]])
>>> perms.transpose().reshape(9, 2)
array([[1, 1],
[1, 2],
[1, 3],
[2, 1],
[2, 2],
[2, 3],
[3, 1],
[3, 2],
[3, 3]])
You can use itertools.product to get the permutations , then convert the result to numpy array.
>>> from itertools import product
>>> p=list(product(a,repeat=2))
>>> np.array([p[i:i+3] for i in range(0,len(p),3)])
array([[[1, 1],
[1, 2],
[1, 3]],
[[2, 1],
[2, 2],
[2, 3]],
[[3, 1],
[3, 2],
[3, 3]]])
I was looking into how to do this better in general, not just for 2-tuples. It can actually be done pretty elegantly using np.indices, which can be used to produce a set of indices to index the original array:
>>> x = np.array([1, 2, 3])
>>> i = np.indices((3, 3)).reshape(2, -1)
>>> a[i].T
array([[1, 1],
[1, 2],
[1, 3],
[2, 1],
[2, 2],
[2, 3],
[3, 1],
[3, 2],
[3, 3]])
The general case is done as follows: let n be the number of items in each permutation.
n = 5
x = np.arange(10)
i = np.indices([x.size for _ in range(n)]).reshape(n, -1)
a = x[i].T
Then you can reshape the result to the n-dimensional array form if needed, but often having the permutations is enough. I didn't test the performance of this method, but certainly native numpy calls and indexing ought to be pretty quick. At least this is more elegant than the other solutions in my opinion. And this is pretty similar to the meshgrid solution provided by #Bill.
Related
I have the following numpy array
a= np.array([1,1])
I have the two elements
b= [2, 2]
c= [3, 3]
I would like to add those elements b and c, so that my output seems like this
a= [[1, 1],
[2, 2].
[3, 3]], #shape=(3,2)
which numpy function should i use?
thanks
Create a new numpy array with the three elements
>>> np.array([a,b,c])
array([[1, 1],
[2, 2],
[3, 3]])
# shape : (3, 2)
If a had more than 1 dimension, np.append can be used :
>>> a= np.array([[1,1], [4,4]])
>>> a
array([[1, 1],
[4, 4]])
>>> np.append(a,[b],axis=0)
array([[1, 1],
[4, 4],
[2, 2]])
I have 0s and 1s store in a 3-dimensional numpy array:
g = np.array([[[0, 1], [0, 1], [1, 0]], [[0, 0], [1, 0], [1, 1]]])
# array([
# [[0, 1], [0, 1], [1, 0]],
# [[0, 0], [1, 0], [1, 1]]])
and I'd like to replace these values by those in another array using a row-wise replacement strategy. For example, replacing the vales of g by x:
x = np.array([[2, 3], [4, 5]])
array([[2, 3],
[4, 5]])
to obtain:
array([
[[2, 3], [2, 3], [3, 2]],
[[4, 4], [5, 4], [5, 5]]])
The idea here would be to have the first row of g replaced by the first elements of x (0 becomes 2 and 1 becomes 3) and the same for the other row (the first dimension - number of "rows" - will always be the same for g and x)
I can't seem to be able to use np.where because there's a ValueError: operands could not be broadcast together with shapes (2,3,2) (2,2) (2,2).
IIUC,
np.stack([x[i, g[i]] for i in range(x.shape[0])])
Output:
array([[[2, 3],
[2, 3],
[3, 2]],
[[4, 4],
[5, 4],
[5, 5]]])
Vectorized approach with np.take_along_axis to index into the last axis of x with g using axis=-1 -
In [20]: np.take_along_axis(x[:,None],g,axis=-1)
Out[20]:
array([[[2, 3],
[2, 3],
[3, 2]],
[[4, 4],
[5, 4],
[5, 5]]])
Or with manual integer-based indexing -
In [27]: x[np.arange(len(g))[:,None,None],g]
Out[27]:
array([[[2, 3],
[2, 3],
[3, 2]],
[[4, 4],
[5, 4],
[5, 5]]])
One solution, is to simply use comprehension directly here:
>>> np.array([[x[i][c] for c in r] for i, r in enumerate(g)])
array([[[2, 3],
[2, 3],
[3, 2]],
[[4, 4],
[5, 4],
[5, 5]]])
From what I understand, g is an array of indexes (indexes being 0 or 1) and x is the array to who's values you use.
Something like this should work (tested quickly)
import numpy as np
def swap_indexes(index_array, array):
out_array = []
for i, row in enumerate(index_array):
out_array.append([array[i,indexes] for indexes in row])
return np.array(out_array)
index_array = np.array([[[0, 1], [0, 1], [1, 0]], [[0, 0], [1, 0], [1, 1]]])
x = np.array([[2, 3], [4, 5]])
print(swap_indexes(index_array, x))
[EDIT: fixed typo that created duplicates]
I want to write a function which takes in a numpy array (whichever is more convenient) and a number.
The function should return a matrix of this powers from 0 to n.
e.g. if I input [1,2] and 3, the matrix should return
np.matrix([[1, 1], [1, 2], [1, 4], [1, 8]])
I know I can write a loop to do this, but is there a more succinct / fast method? Is there way of writing this using generators?
You could use broadcasting -
In [60]: [1,2]**np.arange(4)[:,None]
Out[60]:
array([[1, 1],
[1, 2],
[1, 4],
[1, 8]])
More compact one with np.vander as this is basically a vandermode matrix -
In [78]: np.vander([1,2],4,1).T
Out[78]:
array([[1, 1],
[1, 2],
[1, 4],
[1, 8]])
For matrix type -
In [61]: np.asmatrix([1,2]**np.arange(4)[:,None])
Out[61]:
matrix([[1, 1],
[1, 2],
[1, 4],
[1, 8]])
I have a large n x 2 numpy array that is formatted as (x, y) coordinates. I would like to filter this array so as to:
Identify coordinate pairs with duplicated x-values.
Keep only the coordinate pair of those duplicates with the highest y-value.
For example, in the following array:
arr = [[1, 4]
[1, 8]
[2, 3]
[4, 6]
[4, 2]
[5, 1]
[5, 2]
[5, 6]]
I would like the result to be:
arr = [[1, 8]
[2, 3]
[4, 6]
[5, 6]]
Ive explored np.unique and np.where but cannot figure out how to leverage them to solve this problem. Thanks so much!
Here's one way based on np.maximum.reduceat -
def grouby_maxY(a):
b = a[a[:,0].argsort()] # if first col is already sorted, skip this
grp_idx = np.flatnonzero(np.r_[True,(b[:-1,0] != b[1:,0])])
grp_maxY = np.maximum.reduceat(b[:,1], grp_idx)
return np.c_[b[grp_idx,0], grp_maxY]
Alternatively, if you want to bring np.unique, we can use it to find grp_idx with np.unique(b[:,0], return_index=1)[1].
Sample run -
In [453]: np.random.seed(0)
In [454]: arr = np.random.randint(0,5,(10,2))
In [455]: arr
Out[455]:
array([[4, 0],
[3, 3],
[3, 1],
[3, 2],
[4, 0],
[0, 4],
[2, 1],
[0, 1],
[1, 0],
[1, 4]])
In [456]: grouby_maxY(arr)
Out[456]:
array([[0, 4],
[1, 4],
[2, 1],
[3, 3],
[4, 0]])
I have a 3D array as follow, 'b', which I want to represent an array of 2-D array. I want to remove the duplicates of my 2-D arrays and get the unique ones.
>>> a = [[[1, 2], [1, 2]], [[1, 2], [4, 5]], [[1, 2], [1, 2]]]
>>> b = numpy.array(a)
>>> b
array([[[1, 2],
[1, 2]],
[[1, 2],
[4, 5]],
[[1, 2],
[1, 2]]])
In this above example, I really want to return the following because there exist one duplicate which I want to remove.
unique = array([[[1, 2],
[1, 2]],
[[1, 2],
[4, 5]])
How should do this with numpy package? Thanks
See previous answer: Remove duplicate rows of a numpy array
convert to array of tuples and then apply np.unique()
Converting to tuple and back again is probably going to be quire expensive, instead you can do a generalized view:
def unique_by_first(a):
tmp = a.reshape(a.shape[0], -1)
b = np.ascontiguousarray(tmp).view(np.dtype((np.void, tmp.dtype.itemsize * tmp.shape[1])))
_, idx = np.unique(b, return_index=True)
return a[idx].reshape(-1, *a.shape[1:])
Usage:
print unique_by_first(a)
[[[1 2]
[1 2]]
[[1 2]
[4 5]]]
Effectively, a generalization of previous answers.
You can convert each such 2D slice off the last two axes into a scalar each by considering them as indices on a multi-dimensional grid. The intention is to map each such slice to a scalar based on their uniqueness. Then, using those scalars, we could use np.unique to keep one instance only.
Thus, an implementation would be -
idx = np.ravel_multi_index(a.reshape(a.shape[0],-1).T,a.max(0).ravel()+1)
out = a[np.sort(np.unique(idx, return_index=1)[1])]
Sample run -
In [43]: a
Out[43]:
array([[[8, 1],
[2, 8]],
[[3, 8],
[3, 4]],
[[2, 4],
[1, 0]],
[[3, 0],
[4, 8]],
[[2, 4],
[1, 0]],
[[8, 1],
[2, 8]]])
In [44]: idx = np.ravel_multi_index(a.reshape(a.shape[0],-1).T,a.max(0).ravel()+1)
In [45]: a[np.sort(np.unique(idx, return_index=1)[1])]
Out[45]:
array([[[8, 1],
[2, 8]],
[[3, 8],
[3, 4]],
[[2, 4],
[1, 0]],
[[3, 0],
[4, 8]]])
If you don't mind the order of such slices being maintained, skip the np.sort() at the last step.
Reshape, find the unique rows, then reshape again.
Finding unique tuples by converting to a set.
import numpy as np
a = [[[1, 2], [1, 2]], [[1, 2], [4, 5]], [[1, 2], [1, 2]]]
b = np.array(a)
new_array = [tuple(row) for row in b.reshape(3,4)]
uniques = list(set(new_array))
output = np.array(uniques).reshape(len(uniques), 2, 2)
output
Out[131]:
array([[[1, 2],
[1, 2]],
[[1, 2],
[4, 5]]])