Fast nonzero indices per row/column for (sparse) 2D numpy array - python

I am looking for the fastest way to obtain a list of the nonzero indices of a 2D array per row and per column. The following is a working piece of code:
preds = [matrix[:,v].nonzero()[0] for v in range(matrix.shape[1])]
descs = [matrix[v].nonzero()[0] for v in range(matrix.shape[0])]
Example input:
matrix = np.array([[0,0,0,0],[1,0,0,0],[1,1,0,0],[1,1,1,0]])
Example output
preds = [array([1, 2, 3]), array([2, 3]), array([3]), array([], dtype=int64)]
descs = [array([], dtype=int64), array([0]), array([0, 1]), array([0, 1, 2])]
(The lists are called preds and descs because they refer to the predecessors and descendants in a DAG when the matrix is interpreted as an adjacency matrix but this is not essential to the question.)
Timing example:
For timing purposes, the following matrix is a good representative:
test_matrix = np.zeros(shape=(4096,4096),dtype=np.float32)
for k in range(16):
test_matrix[256*(k+1):256*(k+2),256*k:256*(k+1)]=1
Background: In my code, these two lines take 75% of the time for a 4000x4000 matrix whereas the ensuing topological sort and DP algorithm take only the rest of the quarter. Roughly 5% of the values in the matrix are nonzero so a sparse-matrix solution may be applicable.
Thank you.
(On suggestion posted here as well: https://scicomp.stackexchange.com/questions/35242/fast-nonzero-indices-per-row-column-for-sparse-2d-numpy-array
There are also answers there to which I will provide timings in the comments. This link contains an accepted answer that is twice as fast.)

If you have enough motivation, Numba can do amazing things.
Here is a quick implementation of the logic you need.
Briefly, it computes the equivalent of np.nonzero() but it includes along the way the information to later dispatch the indices into the format you require.
The information is inspired by sparse.csr.indptr and sparse.csc.indptr.
import numpy as np
import numba as nb
#nb.jit
def cumsum(arr):
result = np.empty_like(arr)
cumsum = result[0] = arr[0]
for i in range(1, len(arr)):
cumsum += arr[i]
result[i] = cumsum
return result
#nb.jit
def count_nonzero(arr):
arr = arr.ravel()
n = 0
for x in arr:
if x != 0:
n += 1
return n
#nb.jit
def row_col_nonzero_nb(arr):
n, m = arr.shape
max_k = count_nonzero(arr)
indices = np.empty((2, max_k), dtype=np.uint32)
i_offset = np.zeros(n + 1, dtype=np.uint32)
j_offset = np.zeros(m + 1, dtype=np.uint32)
n, m = arr.shape
k = 0
for i in range(n):
for j in range(m):
if arr[i, j] != 0:
indices[:, k] = i, j
i_offset[i + 1] += 1
j_offset[j + 1] += 1
k += 1
return indices, cumsum(i_offset), cumsum(j_offset)
def row_col_idx_nonzero_nb(arr):
(ii, jj), jj_split, ii_split = row_col_nonzero_nb(arr)
ii_ = np.argsort(jj)
ii = ii[ii_]
return np.split(ii, ii_split[1:-1]), np.split(jj, jj_split[1:-1])
Compared to your approach (row_col_idx_sep() below), and a bunch of others, as per #hpaulj answer (row_col_idx_sparse_lil()) and #knl answer from scicomp.stackexchange.com (row_col_idx_sparse_coo()):
def row_col_idx_sep(arr):
return (
[arr[:, j].nonzero()[0] for j in range(arr.shape[1])],
[arr[i, :].nonzero()[0] for i in range(arr.shape[0])],)
def row_col_idx_zip(arr):
n, m = arr.shape
ii = [[] for _ in range(n)]
jj = [[] for _ in range(m)]
x, y = np.nonzero(arr)
for i, j in zip(x, y):
ii[i].append(j)
jj[j].append(i)
return jj, ii
import scipy as sp
import scipy.sparse
def row_col_idx_sparse_coo(arr):
coo_mat = sp.sparse.coo_matrix(arr)
csr_mat = coo_mat.tocsr()
csc_mat = coo_mat.tocsc()
return (
np.split(csc_mat.indices, csc_mat.indptr)[1:-1],
np.split(csr_mat.indices, csr_mat.indptr)[1:-1],)
def row_col_idx_sparse_lil(arr):
lil_mat = sp.sparse.lil_matrix(arr)
return lil_mat.T.rows, lil_mat.rows
For inputs generated using:
def gen_input(n, density=0.1, dtype=np.float32):
arr = np.zeros(shape=(n, n), dtype=dtype)
indices = tuple(np.random.randint(0, n, (2, int(n * n * density))).tolist())
arr[indices] = 1.0
return arr
One would get (your test_matrix had approximately 0.06 non-zero density):
m = gen_input(4096, density=0.06)
%timeit row_col_idx_sep(m)
# 1 loop, best of 3: 767 ms per loop
%timeit row_col_idx_zip(m)
# 1 loop, best of 3: 660 ms per loop
%timeit row_col_idx_sparse_coo(m)
# 1 loop, best of 3: 205 ms per loop
%timeit row_col_idx_sparse_lil(m)
# 1 loop, best of 3: 498 ms per loop
%timeit row_col_idx_nonzero_nb(m)
# 10 loops, best of 3: 130 ms per loop
Indicating this to be close to twice as fast as the fastest scipy.sparse-based approach.

In [182]: arr = np.array([[0,0,0,0],[1,0,0,0],[1,1,0,0],[1,1,1,0]])
The data is present in the whole-array nonzero, just not broken up into per row/column arrays:
In [183]: np.nonzero(arr)
Out[183]: (array([1, 2, 2, 3, 3, 3]), array([0, 0, 1, 0, 1, 2]))
In [184]: np.argwhere(arr)
Out[184]:
array([[1, 0],
[2, 0],
[2, 1],
[3, 0],
[3, 1],
[3, 2]])
It might be possible to break the array([1, 2, 2, 3, 3, 3]) into sublists, [1,2,3],[2,3],[3],[] based on the other array. But it may take some time to work out the logic for that, and there's no guarantee that it will be faster than your row/column iterations.
Logical operations can reduce the boolean array to column or row, giving the rows or columns where nonzero occurs, but again not ragged:
In [185]: arr!=0
Out[185]:
array([[False, False, False, False],
[ True, False, False, False],
[ True, True, False, False],
[ True, True, True, False]])
In [186]: (arr!=0).any(axis=0)
Out[186]: array([ True, True, True, False])
In [187]: np.nonzero((arr!=0).any(axis=0))
Out[187]: (array([0, 1, 2]),)
In [188]: np.nonzero((arr!=0).any(axis=1))
Out[188]: (array([1, 2, 3]),)
In [189]: arr
Out[189]:
array([[0, 0, 0, 0],
[1, 0, 0, 0],
[1, 1, 0, 0],
[1, 1, 1, 0]])
The scipy.sparse lil format does generate the data you want:
In [190]: sparse
Out[190]: <module 'scipy.sparse' from '/usr/local/lib/python3.6/dist-packages/scipy/sparse/__init__.py'>
In [191]: M = sparse.lil_matrix(arr)
In [192]: M
Out[192]:
<4x4 sparse matrix of type '<class 'numpy.longlong'>'
with 6 stored elements in List of Lists format>
In [193]: M.rows
Out[193]: array([list([]), list([0]), list([0, 1]), list([0, 1, 2])], dtype=object)
In [194]: M.T
Out[194]:
<4x4 sparse matrix of type '<class 'numpy.longlong'>'
with 6 stored elements in List of Lists format>
In [195]: M.T.rows
Out[195]: array([list([1, 2, 3]), list([2, 3]), list([3]), list([])], dtype=object)
But timing probably isn't any better than your row or column iteration.

Related

applying a function to rows of a ndarray && converting itertool object to numpy array

I am trying to create permutations of size 4 from a group of real numbers. After that, I'd like to know the position of the first element in a permutation after I sort it. Here is what I have tried so far. What's the best way to do this?
import numpy as np
from itertools import chain, permutations
N_PLAYERS = 4
N_STATES = 60
np.random.seed(0)
state_space = np.linspace(0.0, 1.0, num=N_STATES, retstep=True)[0].tolist()
perms = permutations(state_space, N_PLAYERS)
perms_arr = np.fromiter(chain(*perms),dtype=np.float16)
def loc(row):
return np.where(np.argsort(row) == 0)[0].tolist()[0]
locs = np.apply_along_axis(loc, 0, perms)
In [153]: N_PLAYERS = 4
...: N_STATES = 60
...: np.random.seed(0)
...: state_space = np.linspace(0.0, 1.0, num=N_STATES, retstep=True)[0].tolist()
...: perms = itertools.permutations(state_space, N_PLAYERS)
In [154]: alist = list(perms)
In [155]: len(alist)
Out[155]: 11703240
Simply making a list from the permuations produces a list of lists, with all sublists of length N_PLAYERS.
Making an array from that with chain flattens it:
In [156]: perms = itertools.permutations(state_space, N_PLAYERS)
In [158]: perms_arr = np.fromiter(itertools.chain(*perms),dtype=np.float16)
In [159]: perms_arr.shape
Out[159]: (46812960,)
In [160]: alist[0]
Which could be reshaped to (11703240,4).
Using apply on that 1d array doesn't work (or make sense):
In [170]: perms_arr.shape
Out[170]: (46812960,)
In [171]: locs = np.apply_along_axis(loc, 0, perms_arr)
In [172]: locs.shape
Out[172]: ()
Reshape to 4 columns:
In [173]: locs = np.apply_along_axis(loc, 0, perms_arr.reshape(-1,4))
In [174]: locs.shape
Out[174]: (4,)
In [175]: locs
Out[175]: array([ 0, 195054, 578037, 769366])
This applies loc to each column, returning one value for each. But loc has a row variable. Is that supposed to be significant?
I could switch the axis; this takes much longer, and al
In [176]: locs = np.apply_along_axis(loc, 1, perms_arr.reshape(-1,4))
In [177]: locs.shape
Out[177]: (11703240,)
list comprehension
This iteration does the same thing as your apply_along_axis, and I expect is faster (though I haven't timed it - it's too slow).
In [188]: locs1 = np.array([loc(row) for row in perms_arr.reshape(-1,4)])
In [189]: np.allclose(locs, locs1)
Out[189]: True
whole array sort
But argsort takes an axis, so I can sort all rows at once (instead of iterating):
In [185]: np.nonzero(np.argsort(perms_arr.reshape(-1,4), axis=1)==0)
Out[185]:
(array([ 0, 1, 2, ..., 11703237, 11703238, 11703239]),
array([0, 0, 0, ..., 3, 3, 3]))
In [186]: np.allclose(_[1],locs)
Out[186]: True
Or going the other direction: - cf with Out[175]
In [187]: np.nonzero(np.argsort(perms_arr.reshape(-1,4), axis=0)==0)
Out[187]: (array([ 0, 195054, 578037, 769366]), array([0, 1, 2, 3]))

Removing max and min elements of array from mean calculation

I am hoping to delete the highest number and the lowest number from the array 3*4. Let's say, the data looks like this:
a=np.array([[1,4,5,10],[2,6,5,0],[3,9,9,0]])
so I expected to see the result like this:
deleted_data=[4,5],[2,5],[3]
Could you advise me how to delete the max and min from each array?
to do so, I did like this (UPDATE):
#to find out the max / min values:
b = np.max(a,1) #max
c = np.min(a,1) #min
#creating dataset after deleting max & min
d=(a!=b[:,None]) & (a!=c[:,None])
f=[i[j] for i,j in zip(a, d)]
output: [array([8, 7, 7, 9, 9, 8]), array([8, 7, 8, 6, 8, 8]), array([9, 8, 9, 9, 8]), array([6, 7, 7, 6, 6, 7]), array([7, 7, 7, 7, 6])]
Now I am not sure how to calculate the mean of the list objects?
I would like to calculate the mean of each array, so I have tried this:
mean1=f.mean(axis=0)
but it did not work.
Another method is to use a Masked Array
import numpy.ma as ma
mask = np.logical_or(a == a.max(1, keepdims = 1), a == a.min(1, keepdims = 1))
a_masked = ma.masked_array(a, mask = mask)
from there if you want an average of the unmasked elements you can just do
a_masked.mean()
Or you could even do the mean of the rows
a_masked.mean(1).data
or columns (strange, but seems to be what you're asking for)
a_masked.mean(0).data
A python list has a remove method.
With a utility function we could remove the min and max elements from a row:
def foo(i,j,k):
il = i.tolist()
il.remove(j)
il.remove(k)
return il
In [230]: [foo(i,j,k) for i,j,k in zip(a,b,c)]
Out[230]: [[4, 5], [2, 5], [3, 9]]
This could be turned back into an array with np.array(...). Note that this removed just one of the 9 in the last row. If it had removed both, the last list would have just 1 value, and the result could not be turned back into a 2d array.
I'm sure we could come up with a pure-array method, possibly useing argmax and argmin instead of max and min. But I think the list approach is a better starting point for a Python beginner.
An array masking approach
In [232]: bi = np.argmax(a,1)
In [233]: ci = np.argmin(a,1)
In [234]: bi
Out[234]: array([3, 1, 1], dtype=int32)
In [235]: ci
Out[235]: array([0, 3, 3], dtype=int32)
In [243]: mask = np.ones_like(a, bool)
In [244]: mask[np.arange(3),bi]=False
In [245]: mask[np.arange(3),ci]=False
In [246]: mask
Out[246]:
array([[False, True, True, False],
[ True, False, True, False],
[ True, False, True, False]], dtype=bool)
In [247]: a[mask]
Out[247]: array([4, 5, 2, 5, 3, 9])
In [248]: _.reshape(3,-1)
Out[248]:
array([[4, 5],
[2, 5],
[3, 9]])
Again this is better if we just delete one max and one min from each row.
Another masking approach:
In [257]: (a!=b[:,None]) & (a!=c[:,None])
Out[257]:
array([[False, True, True, False],
[ True, False, True, False],
[ True, False, False, False]], dtype=bool)
In [258]: a[(a!=b[:,None]) & (a!=c[:,None])]
Out[258]: array([4, 5, 2, 5, 3])
This does remove all '9's in the last row. But it does not preserve the row split.
This preserves the row structure, and allows variable lengths:
In [259]: mask=(a!=b[:,None]) & (a!=c[:,None])
In [260]: [i[j] for i,j in zip(a, mask)]
Out[260]: [array([4, 5]), array([2, 5]), array([3])]
As #hpaulj predicted, there is an array-only method. And it's a doozy. As a one-liner:
a[np.arange(a.shape[0])[:, None], np.sort(np.argpartition(a, (0,-1), axis = 1)[:, 1:-1], axis = 1)]
Let's break that down:
y_ = np.argpartition(a, (0,-1), axis = 1)[:, 1:-1]
argpartiton takes the index of the 0th (smallest) and -1th (largest) elements of each row and moves them to the first and last position repsectively. [:,1:-1] indexes everything else. Now argpartition can sometimes reorder the rest of the elements, so
y = np.sort(y_ , axis = 1)
We sort the rest of the indices back to their orginal positions. Now we have a y.shape -> (m, n-2) array of indices with the max and min removed, for your original (m, n) = a.shape array.
Now to use this, we need the row indicies as well.
x = np.arange(a.shape[0])[:, None]
arange just gives the m row indices. To broadcast this x.shape -> (a.shape[0],) -> (m,) array to your index array, you need the [:, None] to make x.shape -> (m, 1). Now the m lines up for broadcasting and you have your two sets of indices.
a[x, y]
array([[4, 5],
[2, 5],
[3, 9]])
You could get to the final destination of average of elements that are not the max or min per row in two steps with masking -
In [140]: a # input array
Out[140]:
array([[ 1, 4, 5, 10],
[ 2, 6, 5, 0],
[ 3, 9, 9, 0]])
In [141]: m = (a!=a.min(1,keepdims=1)) & (a!=a.max(1,keepdims=1))
In [142]: (a*m).sum(1)/m.sum(1).astype(float)
Out[142]: array([ 4.5, 3.5, 3. ])
This avoids the mess of creating the intermediate ragged arrays, which arent the most convenient data formats to operate with NumPy funcs.
Alternatively, for performance boost, use np.einsum to get the equivalent of (a*m).sum(1) with np.einsum('ij,ij->i',a,m).
Runtime test on bigger array -
In [181]: np.random.seed(0)
In [182]: a = np.random.randint(0,10,(5000,5000))
# #Daniel F' soln from https://stackoverflow.com/a/47325431/
In [183]: %%timeit
...: mask = np.logical_or(a == a.max(1, keepdims = 1), a == a.min(1, keepdims = 1))
...: a_masked = ma.masked_array(a, mask = mask)
...: out = a_masked.mean(1).data
1 loop, best of 3: 251 ms per loop
# Posted in here
In [184]: %%timeit
...: m = (a!=a.min(1,keepdims=1)) & (a!=a.max(1,keepdims=1))
...: out = (a*m).sum(1)/m.sum(1).astype(float)
10 loops, best of 3: 165 ms per loop
# Posted in here with additional einsum
In [185]: %%timeit
...: m = (a!=a.min(1,keepdims=1)) & (a!=a.max(1,keepdims=1))
...: out = np.einsum('ij,ij->i',a,m)/m.sum(1).astype(float)
10 loops, best of 3: 124 ms per loop
If the question is to remove min and/or max elements from a numpy array arr then this is the easiest way in my opinion.
np.delete(arr, np.argmax(arr))
example
tmp = np.random.random(3)
print(tmp)
tmp = np.delete(tmp, np.argmax(tmp))
print(tmp)
returns
[0.7366768 0.65492774 0.93632866]
[0.7366768 0.65492774]

Sum all columns but one in a numpy array

Basically, all columns except a particular one must be summed. I came up with two closely related solutions:
def collapse(arr, i):
return np.hstack((arr[:,i,None], np.sum(arr[:,[j for j in xrange(arr.shape[1]) if j != i]], axis=1, keepdims=True)))
def collapse_transpose(arr, i):
return np.vstack((arr[:,i], np.sum(arr[:,[j for j in xrange(arr.shape[1]) if j != i]], axis=1))).T
Example:
In [42]: arr = np.arange(9).reshape(3, 3)
In [43]: arr
Out[43]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [44]: collapse(arr, 0)
Out[44]:
array([[ 0, 3],
[ 3, 9],
[ 6, 15]])
I thought the later would be faster, but it came out to be slower. Anyway, I don't like the vstack and hstack calls, since they can be slow on huge inputs. Are there any ways to get rid of them?
You are concatenating just 2 arrays
In [282]: (arr[:,i], np.sum(arr[:,[j for j in xrange(3) if j != i]],
axis=1))Out[282]: (array([0, 3, 6]), array([ 3, 9, 15]))
In [283]: (arr[:,i,None], np.sum(arr[:,[j for j in xrange(3) if j != i]], axis=1, keepdims=True))
Out[283]:
(array([[0],
[3],
[6]]), array([[ 3],
[ 9],
[15]]))
vstack and hstack both use concatenate. They just work on different axes and massage the inputs in different ways to ensure they have the correct number of dimensions.
It seems to me that the versions are basically equivalent. You could call concatenate directly, which might shave a % or two off. But the concatenation isn't the biggest time consumer in this case.
np.concatenate((arr[:,i,None],
np.sum(arr[:,[j for j in xrange(3) if j != i]], axis=1, keepdims=True)),
axis=1)
Beyond that look at the timing for the individual pieces. Is arr[:,[j for j in xrange(3) if j != i]],axis=1) as fast as it could be? How about summing it all, and subtracting the ith row?
In [310]: timeit arr.sum(1)-arr[:,i]
10000 loops, best of 3: 22.7 us per loop
In [311]: timeit np.sum(arr[:,[j for j in xrange(3) if j != i]], axis=1)
10000 loops, best of 3: 29.1 us per loop

Set duplicate elements as zeros

How can I convert the duplicate elements in a array 'data' into 0? It has to be done row-wise.
data = np.array([[1,8,3,3,4],
[1,8,9,9,4]])
The answer should be as follows:
ans = array([[1,8,3,0,4],
[1,8,9,0,4]])
Approach #1
One approach with np.unique -
# Find out the unique elements and their starting positions
unq_data, idx = np.unique(data,return_index=True)
# Find out the positions for each unique element, their duplicate positions
dup_idx = np.setdiff1d(np.arange(data.size),idx)
# Set those duplicate positioned elemnents to 0s
data[dup_idx] = 0
Sample run -
In [46]: data
Out[46]: array([1, 8, 3, 3, 4, 1, 3, 3, 9, 4])
In [47]: unq_data, idx = np.unique(data,return_index=True)
...: dup_idx = np.setdiff1d(np.arange(data.size),idx)
...: data[dup_idx] = 0
...:
In [48]: data
Out[48]: array([1, 8, 3, 0, 4, 0, 0, 0, 9, 0])
Approach #2
You can also use sorting and differentiation as a faster approach -
# Get indices for sorted data
sort_idx = np.argsort(data)
# Get duplicate indices and set those in data to 0s
dup_idx = sort_idx[1::][np.diff(np.sort(data))==0]
data[dup_idx] = 0
Runtime tests -
In [110]: data = np.random.randint(0,100,(10000))
...: data1 = data.copy()
...: data2 = data.copy()
...:
In [111]: def func1(data):
...: unq_data, idx = np.unique(data,return_index=True)
...: dup_idx = np.setdiff1d(np.arange(data.size),idx)
...: data[dup_idx] = 0
...:
...: def func2(data):
...: sort_idx = np.argsort(data)
...: dup_idx = sort_idx[1::][np.diff(np.sort(data))==0]
...: data[dup_idx] = 0
...:
In [112]: %timeit func1(data1)
1000 loops, best of 3: 1.36 ms per loop
In [113]: %timeit func2(data2)
1000 loops, best of 3: 467 µs per loop
Extending to a 2D case :
Approach #2 could be extended to work for a 2D array case, avoiding any loop like so -
# Get indices for sorted data
sort_idx = np.argsort(data,axis=1)
# Get sorted linear indices
row_offset = data.shape[1]*np.arange(data.shape[0])[:,None]
sort_lin_idx = sort_idx[:,1::] + row_offset
# Get duplicate linear indices and set those in data as 0s
dup_lin_idx = sort_lin_idx[np.diff(np.sort(data,axis=1),axis=1)==0]
data.ravel()[dup_lin_idx] = 0
Sample run -
In [6]: data
Out[6]:
array([[1, 8, 3, 3, 4, 0, 3, 3],
[1, 8, 9, 9, 4, 8, 7, 9],
[1, 8, 9, 9, 4, 8, 7, 3]])
In [7]: sort_idx = np.argsort(data,axis=1)
...: row_offset = data.shape[1]*np.arange(data.shape[0])[:,None]
...: sort_lin_idx = sort_idx[:,1::] + row_offset
...: dup_lin_idx = sort_lin_idx[np.diff(np.sort(data,axis=1),axis=1)==0]
...: data.ravel()[dup_lin_idx] = 0
...:
In [8]: data
Out[8]:
array([[1, 8, 3, 0, 4, 0, 0, 0],
[1, 8, 9, 0, 4, 0, 7, 0],
[1, 8, 9, 0, 4, 0, 7, 3]])
Here's a simple pure-Python way to do it:
seen = set()
for i, x in enumerate(data):
if x in seen:
data[i] = 0
else:
seen.add(x)
You could use a nested for loop, where you compare each element of the array to every other element to check for duplicate records. Syntax might be a bit off as I am not really familiar with numpy.
for x in range(0, len(data))
for y in range(x+1, len(data))
if(data[x] == data[y])
data[x] = 0
#Divakar has it almost right, but there are a few things that can be further optimized, but don't really fit in a comment. To begin:
rows, cols = data.shape
The first operation is to sort the array to identify the duplicates. Since we will want to undo the sorting, we need to use np.argsort, but if you want to make sure that it is the first occurrence of each repeated value that is kept, you need to use a stable sorting algorithm:
sort_idx = data.argsort(axis=1, kind='mergesort')
Once we have the indices to sort data, to get a sorted copy of the array it is faster to use the indices than to re-sort the array:
sorted_data = data[np.arange(rows)[:, None], sort_idx]
While the principle is similar to that in using np.diff, it is typically faster to use boolean operations. We want an array full of False where the first occurrences of each value happen, and True where the duplicates are:
sorted_mask = np.concatenate((np.zeros((rows, 1), dtype=bool),
sorted_data[:, :-1] == sorted_data[:, 1:]),
axis=1)
We now use that mask to set all the duplicates to zero:
sorted_data[sorted_mask] = 0
And we finally undo the sorting. To revert a permutation you can sort the indices that define it, i.e. you could do:
invert_idx = sort_idx.argsort(axis=1, kind='mergesort')
ans = sorted_data[np.arange(rows)[:, None], invert_idx]
But it is more efficient to use assignment, i.e.:
ans = np.empty_like(data)
ans[np.arange(rows), sort_idx] = sorted_data
Putting it all together:
def zero_dups(data):
rows, cols = data.shape
sort_idx = data.argsort(axis=1, kind='mergesort')
sorted_data = data[np.arange(rows)[:, None], sort_idx]
sorted_mask = np.concatenate((np.zeros((rows, 1), dtype=bool),
sorted_data[:, :-1] == sorted_data[:, 1:]),
axis=1)
sorted_data[sorted_mask] = 0
ans = np.empty_like(data)
ans[np.arange(rows)[:, None], sort_idx] = sorted_data
return ans

Numpy quirk: Apply function to all pairs of two 1D arrays, to get one 2D array

Let's say I have 2 one-dimensional (1D) numpy arrays, a and b, with lengths n1 and n2 respectively. I also have a function, F(x,y), that takes two values. Now I want to apply that function to each pair of values from my two 1D arrays, so the result would be a 2D numpy array with shape n1, n2. The i, j element of the two-dimensional array would be F(a[i], b[j]).
I haven't been able to find a way of doing this without a horrible amount of for-loops, and I'm sure there's a much simpler (and faster!) way of doing this in numpy.
Thanks in advance!
You can use numpy broadcasting to do calculation on the two arrays, turning a into a vertical 2D array using newaxis:
In [11]: a = np.array([1, 2, 3]) # n1 = 3
...: b = np.array([4, 5]) # n2 = 2
...: #if function is c(i, j) = a(i) + b(j)*2:
...: c = a[:, None] + b*2
In [12]: c
Out[12]:
array([[ 9, 11],
[10, 12],
[11, 13]])
To benchmark:
In [28]: a = arange(100)
In [29]: b = arange(222)
In [30]: timeit r = np.array([[f(i, j) for j in b] for i in a])
10 loops, best of 3: 29.9 ms per loop
In [31]: timeit c = a[:, None] + b*2
10000 loops, best of 3: 71.6 us per loop
If F is beyond your control, you can wrap it automatically to be "vector-aware" by using numpy.vectorize. I present a working example below where I define my own F just for completeness. This approach has the simplicity advantage, but if you have control over F, rewriting it with a bit of care to vectorize correctly can have huge speed benefits
import numpy
n1 = 100
n2 = 200
a = numpy.arange(n1)
b = numpy.arange(n2)
def F(x, y):
return x + y
# Everything above this is setup, the answer to your question lies here:
fv = numpy.vectorize(F)
r = fv(a[:, numpy.newaxis], b)
On my computer, the following timings are found, showing the price you pay for "automatic" vectorisation:
%timeit fv(a[:, numpy.newaxis], b)
100 loops, best of 3: 3.58 ms per loop
%timeit F(a[:, numpy.newaxis], b)
10000 loops, best of 3: 38.3 µs per loop
If F() works with broadcast arguments, definitely use that, as others describe.
An alternative is to use
np.fromfunction
(function_on_an_int_grid would be a better name.)
The following just maps the int grid to your a-b grid, then into F():
import numpy as np
def func_allpairs( F, a, b ):
""" -> array len(a) x len(b):
[[ F( a0 b0 ) F( a0 b1 ) ... ]
[ F( a1 b0 ) F( a1 b1 ) ... ]
...
]
"""
def fab( i, j ):
return F( a[i], b[j] ) # F scalar or vec, e.g. gradient
return np.fromfunction( fab, (len(a), len(b)), dtype=int ) # -> fab( all pairs )
#...............................................................................
def F( x, y ):
return x + 10*y
a = np.arange( 100 )
b = np.arange( 222 )
A = func_allpairs( F, a, b )
# %timeit: 1000 loops, best of 3: 241 µs per loop -- imac i5, np 1.9.3
As another alternative that's a bit more extensible than the dot-product, in less than 1/5th - 1/9th the time of nested list comprehensions, use numpy.newaxis (took a bit more digging to find):
>>> import numpy
>>> a = numpy.array([0,1,2])
>>> b = numpy.array([0,1,2,3])
This time, using the power function:
>>> pow(a[:,numpy.newaxis], b)
array([[1, 0, 0, 0],
[1, 1, 1, 1],
[1, 2, 4, 8]])
Compared with an alternative:
>>> numpy.array([[pow(i,j) for j in b] for i in a])
array([[1, 0, 0, 0],
[1, 1, 1, 1],
[1, 2, 4, 8]])
And comparing the timing:
>>> import timeit
>>> timeit.timeit('numpy.array([[pow(i,j) for i in a] for j in b])', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
31.943181037902832
>>> timeit.timeit('pow(a[:, numpy.newaxis], b)', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
5.985810041427612
>>> timeit.timeit('numpy.array([[pow(i,j) for i in a] for j in b])', 'import numpy; a=numpy.arange(10); b=numpy.arange(10)')
109.74687385559082
>>> timeit.timeit('pow(a[:, numpy.newaxis], b)', 'import numpy; a=numpy.arange(10); b=numpy.arange(10)')
11.989138126373291
You could use list comprehensions to create an array of arrays:
import numpy as np
# Arrays
a = np.array([1, 2, 3]) # n1 = 3
b = np.array([4, 5]) # n2 = 2
# Your function (just an example)
def f(i, j):
return i + j
result = np.array([[f(i, j)for j in b ]for i in a])
print result
Output:
[[5 6]
[6 7]
[7 8]]
May I suggest, if your use-case is more limited to products, that you use the outer-product?
e.g.:
import numpy
a = array([0, 1, 2])
b = array([0, 1, 2, 3])
numpy.outer(a,b)
returns
array([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6]])
You can then apply other transformations:
numpy.outer(a,b) + 1
returns
array([[1, 1, 1, 1],
[1, 2, 3, 4],
[1, 3, 5, 7]])
This is much faster:
>>> import timeit
>>> timeit.timeit('numpy.array([[i*j for i in a] for j in b])', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
31.79583477973938
>>> timeit.timeit('numpy.outer(a,b)', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
9.351550102233887
>>> timeit.timeit('numpy.outer(a,b)+1', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
12.308301210403442

Categories