Sum all columns but one in a numpy array - python

Basically, all columns except a particular one must be summed. I came up with two closely related solutions:
def collapse(arr, i):
return np.hstack((arr[:,i,None], np.sum(arr[:,[j for j in xrange(arr.shape[1]) if j != i]], axis=1, keepdims=True)))
def collapse_transpose(arr, i):
return np.vstack((arr[:,i], np.sum(arr[:,[j for j in xrange(arr.shape[1]) if j != i]], axis=1))).T
Example:
In [42]: arr = np.arange(9).reshape(3, 3)
In [43]: arr
Out[43]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [44]: collapse(arr, 0)
Out[44]:
array([[ 0, 3],
[ 3, 9],
[ 6, 15]])
I thought the later would be faster, but it came out to be slower. Anyway, I don't like the vstack and hstack calls, since they can be slow on huge inputs. Are there any ways to get rid of them?

You are concatenating just 2 arrays
In [282]: (arr[:,i], np.sum(arr[:,[j for j in xrange(3) if j != i]],
axis=1))Out[282]: (array([0, 3, 6]), array([ 3, 9, 15]))
In [283]: (arr[:,i,None], np.sum(arr[:,[j for j in xrange(3) if j != i]], axis=1, keepdims=True))
Out[283]:
(array([[0],
[3],
[6]]), array([[ 3],
[ 9],
[15]]))
vstack and hstack both use concatenate. They just work on different axes and massage the inputs in different ways to ensure they have the correct number of dimensions.
It seems to me that the versions are basically equivalent. You could call concatenate directly, which might shave a % or two off. But the concatenation isn't the biggest time consumer in this case.
np.concatenate((arr[:,i,None],
np.sum(arr[:,[j for j in xrange(3) if j != i]], axis=1, keepdims=True)),
axis=1)
Beyond that look at the timing for the individual pieces. Is arr[:,[j for j in xrange(3) if j != i]],axis=1) as fast as it could be? How about summing it all, and subtracting the ith row?
In [310]: timeit arr.sum(1)-arr[:,i]
10000 loops, best of 3: 22.7 us per loop
In [311]: timeit np.sum(arr[:,[j for j in xrange(3) if j != i]], axis=1)
10000 loops, best of 3: 29.1 us per loop

Related

Fast nonzero indices per row/column for (sparse) 2D numpy array

I am looking for the fastest way to obtain a list of the nonzero indices of a 2D array per row and per column. The following is a working piece of code:
preds = [matrix[:,v].nonzero()[0] for v in range(matrix.shape[1])]
descs = [matrix[v].nonzero()[0] for v in range(matrix.shape[0])]
Example input:
matrix = np.array([[0,0,0,0],[1,0,0,0],[1,1,0,0],[1,1,1,0]])
Example output
preds = [array([1, 2, 3]), array([2, 3]), array([3]), array([], dtype=int64)]
descs = [array([], dtype=int64), array([0]), array([0, 1]), array([0, 1, 2])]
(The lists are called preds and descs because they refer to the predecessors and descendants in a DAG when the matrix is interpreted as an adjacency matrix but this is not essential to the question.)
Timing example:
For timing purposes, the following matrix is a good representative:
test_matrix = np.zeros(shape=(4096,4096),dtype=np.float32)
for k in range(16):
test_matrix[256*(k+1):256*(k+2),256*k:256*(k+1)]=1
Background: In my code, these two lines take 75% of the time for a 4000x4000 matrix whereas the ensuing topological sort and DP algorithm take only the rest of the quarter. Roughly 5% of the values in the matrix are nonzero so a sparse-matrix solution may be applicable.
Thank you.
(On suggestion posted here as well: https://scicomp.stackexchange.com/questions/35242/fast-nonzero-indices-per-row-column-for-sparse-2d-numpy-array
There are also answers there to which I will provide timings in the comments. This link contains an accepted answer that is twice as fast.)
If you have enough motivation, Numba can do amazing things.
Here is a quick implementation of the logic you need.
Briefly, it computes the equivalent of np.nonzero() but it includes along the way the information to later dispatch the indices into the format you require.
The information is inspired by sparse.csr.indptr and sparse.csc.indptr.
import numpy as np
import numba as nb
#nb.jit
def cumsum(arr):
result = np.empty_like(arr)
cumsum = result[0] = arr[0]
for i in range(1, len(arr)):
cumsum += arr[i]
result[i] = cumsum
return result
#nb.jit
def count_nonzero(arr):
arr = arr.ravel()
n = 0
for x in arr:
if x != 0:
n += 1
return n
#nb.jit
def row_col_nonzero_nb(arr):
n, m = arr.shape
max_k = count_nonzero(arr)
indices = np.empty((2, max_k), dtype=np.uint32)
i_offset = np.zeros(n + 1, dtype=np.uint32)
j_offset = np.zeros(m + 1, dtype=np.uint32)
n, m = arr.shape
k = 0
for i in range(n):
for j in range(m):
if arr[i, j] != 0:
indices[:, k] = i, j
i_offset[i + 1] += 1
j_offset[j + 1] += 1
k += 1
return indices, cumsum(i_offset), cumsum(j_offset)
def row_col_idx_nonzero_nb(arr):
(ii, jj), jj_split, ii_split = row_col_nonzero_nb(arr)
ii_ = np.argsort(jj)
ii = ii[ii_]
return np.split(ii, ii_split[1:-1]), np.split(jj, jj_split[1:-1])
Compared to your approach (row_col_idx_sep() below), and a bunch of others, as per #hpaulj answer (row_col_idx_sparse_lil()) and #knl answer from scicomp.stackexchange.com (row_col_idx_sparse_coo()):
def row_col_idx_sep(arr):
return (
[arr[:, j].nonzero()[0] for j in range(arr.shape[1])],
[arr[i, :].nonzero()[0] for i in range(arr.shape[0])],)
def row_col_idx_zip(arr):
n, m = arr.shape
ii = [[] for _ in range(n)]
jj = [[] for _ in range(m)]
x, y = np.nonzero(arr)
for i, j in zip(x, y):
ii[i].append(j)
jj[j].append(i)
return jj, ii
import scipy as sp
import scipy.sparse
def row_col_idx_sparse_coo(arr):
coo_mat = sp.sparse.coo_matrix(arr)
csr_mat = coo_mat.tocsr()
csc_mat = coo_mat.tocsc()
return (
np.split(csc_mat.indices, csc_mat.indptr)[1:-1],
np.split(csr_mat.indices, csr_mat.indptr)[1:-1],)
def row_col_idx_sparse_lil(arr):
lil_mat = sp.sparse.lil_matrix(arr)
return lil_mat.T.rows, lil_mat.rows
For inputs generated using:
def gen_input(n, density=0.1, dtype=np.float32):
arr = np.zeros(shape=(n, n), dtype=dtype)
indices = tuple(np.random.randint(0, n, (2, int(n * n * density))).tolist())
arr[indices] = 1.0
return arr
One would get (your test_matrix had approximately 0.06 non-zero density):
m = gen_input(4096, density=0.06)
%timeit row_col_idx_sep(m)
# 1 loop, best of 3: 767 ms per loop
%timeit row_col_idx_zip(m)
# 1 loop, best of 3: 660 ms per loop
%timeit row_col_idx_sparse_coo(m)
# 1 loop, best of 3: 205 ms per loop
%timeit row_col_idx_sparse_lil(m)
# 1 loop, best of 3: 498 ms per loop
%timeit row_col_idx_nonzero_nb(m)
# 10 loops, best of 3: 130 ms per loop
Indicating this to be close to twice as fast as the fastest scipy.sparse-based approach.
In [182]: arr = np.array([[0,0,0,0],[1,0,0,0],[1,1,0,0],[1,1,1,0]])
The data is present in the whole-array nonzero, just not broken up into per row/column arrays:
In [183]: np.nonzero(arr)
Out[183]: (array([1, 2, 2, 3, 3, 3]), array([0, 0, 1, 0, 1, 2]))
In [184]: np.argwhere(arr)
Out[184]:
array([[1, 0],
[2, 0],
[2, 1],
[3, 0],
[3, 1],
[3, 2]])
It might be possible to break the array([1, 2, 2, 3, 3, 3]) into sublists, [1,2,3],[2,3],[3],[] based on the other array. But it may take some time to work out the logic for that, and there's no guarantee that it will be faster than your row/column iterations.
Logical operations can reduce the boolean array to column or row, giving the rows or columns where nonzero occurs, but again not ragged:
In [185]: arr!=0
Out[185]:
array([[False, False, False, False],
[ True, False, False, False],
[ True, True, False, False],
[ True, True, True, False]])
In [186]: (arr!=0).any(axis=0)
Out[186]: array([ True, True, True, False])
In [187]: np.nonzero((arr!=0).any(axis=0))
Out[187]: (array([0, 1, 2]),)
In [188]: np.nonzero((arr!=0).any(axis=1))
Out[188]: (array([1, 2, 3]),)
In [189]: arr
Out[189]:
array([[0, 0, 0, 0],
[1, 0, 0, 0],
[1, 1, 0, 0],
[1, 1, 1, 0]])
The scipy.sparse lil format does generate the data you want:
In [190]: sparse
Out[190]: <module 'scipy.sparse' from '/usr/local/lib/python3.6/dist-packages/scipy/sparse/__init__.py'>
In [191]: M = sparse.lil_matrix(arr)
In [192]: M
Out[192]:
<4x4 sparse matrix of type '<class 'numpy.longlong'>'
with 6 stored elements in List of Lists format>
In [193]: M.rows
Out[193]: array([list([]), list([0]), list([0, 1]), list([0, 1, 2])], dtype=object)
In [194]: M.T
Out[194]:
<4x4 sparse matrix of type '<class 'numpy.longlong'>'
with 6 stored elements in List of Lists format>
In [195]: M.T.rows
Out[195]: array([list([1, 2, 3]), list([2, 3]), list([3]), list([])], dtype=object)
But timing probably isn't any better than your row or column iteration.

Removing max and min elements of array from mean calculation

I am hoping to delete the highest number and the lowest number from the array 3*4. Let's say, the data looks like this:
a=np.array([[1,4,5,10],[2,6,5,0],[3,9,9,0]])
so I expected to see the result like this:
deleted_data=[4,5],[2,5],[3]
Could you advise me how to delete the max and min from each array?
to do so, I did like this (UPDATE):
#to find out the max / min values:
b = np.max(a,1) #max
c = np.min(a,1) #min
#creating dataset after deleting max & min
d=(a!=b[:,None]) & (a!=c[:,None])
f=[i[j] for i,j in zip(a, d)]
output: [array([8, 7, 7, 9, 9, 8]), array([8, 7, 8, 6, 8, 8]), array([9, 8, 9, 9, 8]), array([6, 7, 7, 6, 6, 7]), array([7, 7, 7, 7, 6])]
Now I am not sure how to calculate the mean of the list objects?
I would like to calculate the mean of each array, so I have tried this:
mean1=f.mean(axis=0)
but it did not work.
Another method is to use a Masked Array
import numpy.ma as ma
mask = np.logical_or(a == a.max(1, keepdims = 1), a == a.min(1, keepdims = 1))
a_masked = ma.masked_array(a, mask = mask)
from there if you want an average of the unmasked elements you can just do
a_masked.mean()
Or you could even do the mean of the rows
a_masked.mean(1).data
or columns (strange, but seems to be what you're asking for)
a_masked.mean(0).data
A python list has a remove method.
With a utility function we could remove the min and max elements from a row:
def foo(i,j,k):
il = i.tolist()
il.remove(j)
il.remove(k)
return il
In [230]: [foo(i,j,k) for i,j,k in zip(a,b,c)]
Out[230]: [[4, 5], [2, 5], [3, 9]]
This could be turned back into an array with np.array(...). Note that this removed just one of the 9 in the last row. If it had removed both, the last list would have just 1 value, and the result could not be turned back into a 2d array.
I'm sure we could come up with a pure-array method, possibly useing argmax and argmin instead of max and min. But I think the list approach is a better starting point for a Python beginner.
An array masking approach
In [232]: bi = np.argmax(a,1)
In [233]: ci = np.argmin(a,1)
In [234]: bi
Out[234]: array([3, 1, 1], dtype=int32)
In [235]: ci
Out[235]: array([0, 3, 3], dtype=int32)
In [243]: mask = np.ones_like(a, bool)
In [244]: mask[np.arange(3),bi]=False
In [245]: mask[np.arange(3),ci]=False
In [246]: mask
Out[246]:
array([[False, True, True, False],
[ True, False, True, False],
[ True, False, True, False]], dtype=bool)
In [247]: a[mask]
Out[247]: array([4, 5, 2, 5, 3, 9])
In [248]: _.reshape(3,-1)
Out[248]:
array([[4, 5],
[2, 5],
[3, 9]])
Again this is better if we just delete one max and one min from each row.
Another masking approach:
In [257]: (a!=b[:,None]) & (a!=c[:,None])
Out[257]:
array([[False, True, True, False],
[ True, False, True, False],
[ True, False, False, False]], dtype=bool)
In [258]: a[(a!=b[:,None]) & (a!=c[:,None])]
Out[258]: array([4, 5, 2, 5, 3])
This does remove all '9's in the last row. But it does not preserve the row split.
This preserves the row structure, and allows variable lengths:
In [259]: mask=(a!=b[:,None]) & (a!=c[:,None])
In [260]: [i[j] for i,j in zip(a, mask)]
Out[260]: [array([4, 5]), array([2, 5]), array([3])]
As #hpaulj predicted, there is an array-only method. And it's a doozy. As a one-liner:
a[np.arange(a.shape[0])[:, None], np.sort(np.argpartition(a, (0,-1), axis = 1)[:, 1:-1], axis = 1)]
Let's break that down:
y_ = np.argpartition(a, (0,-1), axis = 1)[:, 1:-1]
argpartiton takes the index of the 0th (smallest) and -1th (largest) elements of each row and moves them to the first and last position repsectively. [:,1:-1] indexes everything else. Now argpartition can sometimes reorder the rest of the elements, so
y = np.sort(y_ , axis = 1)
We sort the rest of the indices back to their orginal positions. Now we have a y.shape -> (m, n-2) array of indices with the max and min removed, for your original (m, n) = a.shape array.
Now to use this, we need the row indicies as well.
x = np.arange(a.shape[0])[:, None]
arange just gives the m row indices. To broadcast this x.shape -> (a.shape[0],) -> (m,) array to your index array, you need the [:, None] to make x.shape -> (m, 1). Now the m lines up for broadcasting and you have your two sets of indices.
a[x, y]
array([[4, 5],
[2, 5],
[3, 9]])
You could get to the final destination of average of elements that are not the max or min per row in two steps with masking -
In [140]: a # input array
Out[140]:
array([[ 1, 4, 5, 10],
[ 2, 6, 5, 0],
[ 3, 9, 9, 0]])
In [141]: m = (a!=a.min(1,keepdims=1)) & (a!=a.max(1,keepdims=1))
In [142]: (a*m).sum(1)/m.sum(1).astype(float)
Out[142]: array([ 4.5, 3.5, 3. ])
This avoids the mess of creating the intermediate ragged arrays, which arent the most convenient data formats to operate with NumPy funcs.
Alternatively, for performance boost, use np.einsum to get the equivalent of (a*m).sum(1) with np.einsum('ij,ij->i',a,m).
Runtime test on bigger array -
In [181]: np.random.seed(0)
In [182]: a = np.random.randint(0,10,(5000,5000))
# #Daniel F' soln from https://stackoverflow.com/a/47325431/
In [183]: %%timeit
...: mask = np.logical_or(a == a.max(1, keepdims = 1), a == a.min(1, keepdims = 1))
...: a_masked = ma.masked_array(a, mask = mask)
...: out = a_masked.mean(1).data
1 loop, best of 3: 251 ms per loop
# Posted in here
In [184]: %%timeit
...: m = (a!=a.min(1,keepdims=1)) & (a!=a.max(1,keepdims=1))
...: out = (a*m).sum(1)/m.sum(1).astype(float)
10 loops, best of 3: 165 ms per loop
# Posted in here with additional einsum
In [185]: %%timeit
...: m = (a!=a.min(1,keepdims=1)) & (a!=a.max(1,keepdims=1))
...: out = np.einsum('ij,ij->i',a,m)/m.sum(1).astype(float)
10 loops, best of 3: 124 ms per loop
If the question is to remove min and/or max elements from a numpy array arr then this is the easiest way in my opinion.
np.delete(arr, np.argmax(arr))
example
tmp = np.random.random(3)
print(tmp)
tmp = np.delete(tmp, np.argmax(tmp))
print(tmp)
returns
[0.7366768 0.65492774 0.93632866]
[0.7366768 0.65492774]

Create a new array with Timesteps and multiple features, e.g for LSTM

Hi I am using numpy to create a new array with timesteps and multiple features, for an LSTM.
i have looked at a number of approaches using strides and reshaping but haven't managed to find an efficient solution.
Here is a function that solves a toy problem, however i have 30,000 samples, each with 100 features.
def make_timesteps(a, timesteps):
array = []
for j in np.arange(len(a)):
unit = []
for i in range(timesteps):
unit.append(np.roll(a, i, axis=0)[j])
array.append(unit)
return np.array(array)
inArr = np.array([[1, 2], [3,4], [5,6]])
inArr.shape => (3, 2)
outArr = make_timesteps(inArr, 2)
outArr.shape => (3, 2, 2)
assert(np.array_equal(outArr,
np.array([[[1, 2], [3, 4]], [[3, 4], [5, 6]], [[5, 6], [1, 2]]])))
=> True
Is there a more efficeint way of doing this (there must be!!) can someone please help?
One trick would be to append last L-1 rows off the array and append those to the start of the array. Then, it would be a simple case of using the very efficient NumPy strides. For people wondering about the cost of this trick, as we will see later on through the timing tests, it's as good as nothing.
The trick leading upto the final goal that would support both forward and backward striding in codes would look something like this -
Backward striding :
def strided_axis0_backward(inArr, L = 2):
# INPUTS :
# a : Input array
# L : Length along rows to be cut to create per subarray
# Append the last row to the start. It just helps in keeping a view output.
a = np.vstack(( inArr[-L+1:], inArr ))
# Store shape and strides info
m,n = a.shape
s0,s1 = a.strides
# Length of 3D output array along its axis=0
nd0 = m - L + 1
strided = np.lib.stride_tricks.as_strided
return strided(a[L-1:], shape=(nd0,L,n), strides=(s0,-s0,s1))
Forward striding :
def strided_axis0_forward(inArr, L = 2):
# INPUTS :
# a : Input array
# L : Length along rows to be cut to create per subarray
# Append the last row to the start. It just helps in keeping a view output.
a = np.vstack(( inArr , inArr[:L-1] ))
# Store shape and strides info
m,n = a.shape
s0,s1 = a.strides
# Length of 3D output array along its axis=0
nd0 = m - L + 1
strided = np.lib.stride_tricks.as_strided
return strided(a[:L-1], shape=(nd0,L,n), strides=(s0,s0,s1))
Sample run -
In [42]: inArr
Out[42]:
array([[1, 2],
[3, 4],
[5, 6]])
In [43]: strided_axis0_backward(inArr, 2)
Out[43]:
array([[[1, 2],
[5, 6]],
[[3, 4],
[1, 2]],
[[5, 6],
[3, 4]]])
In [44]: strided_axis0_forward(inArr, 2)
Out[44]:
array([[[1, 2],
[3, 4]],
[[3, 4],
[5, 6]],
[[5, 6],
[1, 2]]])
Runtime test -
In [53]: inArr = np.random.randint(0,9,(1000,10))
In [54]: %timeit make_timesteps(inArr, 2)
...: %timeit strided_axis0_forward(inArr, 2)
...: %timeit strided_axis0_backward(inArr, 2)
...:
10 loops, best of 3: 33.9 ms per loop
100000 loops, best of 3: 12.1 µs per loop
100000 loops, best of 3: 12.2 µs per loop
In [55]: %timeit make_timesteps(inArr, 10)
...: %timeit strided_axis0_forward(inArr, 10)
...: %timeit strided_axis0_backward(inArr, 10)
...:
1 loops, best of 3: 152 ms per loop
100000 loops, best of 3: 12 µs per loop
100000 loops, best of 3: 12.1 µs per loop
In [56]: 152000/12.1 # Speedup figure
Out[56]: 12561.98347107438
The timings of strided_axis0 stays the same even as we increase the length of subarrays in the output. That just goes to show us the massive benefit with strides and of course the crazy speedups too over the original loopy version.
As promised at the start, here's the timings on stacking cost with np.vstack -
In [417]: inArr = np.random.randint(0,9,(1000,10))
In [418]: L = 10
In [419]: %timeit np.vstack(( inArr[-L+1:], inArr ))
100000 loops, best of 3: 5.41 µs per loop
The timings support the idea of stacking to be a pretty efficient one.

How to append a tuple to a numpy array without it being preformed element-wise?

If I try
x = np.append(x, (2,3))
the tuple (2,3) does not get appended to the end of the array, rather 2 and 3 get appended individually, even if I originally declared x as
x = np.array([], dtype = tuple)
or
x = np.array([], dtype = (int,2))
What is the proper way to do this?
I agree with #user2357112 comment:
appending to NumPy arrays is catastrophically slower than appending to ordinary lists. It's an operation that they are not at all designed for
Here's a little benchmark:
# measure execution time
import timeit
import numpy as np
def f1(num_iterations):
x = np.dtype((np.int32, (2, 1)))
for i in range(num_iterations):
x = np.append(x, (i, i))
def f2(num_iterations):
x = np.array([(0, 0)])
for i in range(num_iterations):
x = np.vstack((x, (i, i)))
def f3(num_iterations):
x = []
for i in range(num_iterations):
x.append((i, i))
x = np.array(x)
N = 50000
print timeit.timeit('f1(N)', setup='from __main__ import f1, N', number=1)
print timeit.timeit('f2(N)', setup='from __main__ import f2, N', number=1)
print timeit.timeit('f3(N)', setup='from __main__ import f3, N', number=1)
I wouldn't use neither np.append nor vstack, I'd just create my python array properly and then use it to construct the np.array
EDIT
Here's the benchmark output on my laptop:
append: 12.4983000173
vstack: 1.60663705793
list: 0.0252208517006
[Finished in 14.3s]
You need to supply the shape to numpy dtype, like so:
x = np.dtype((np.int32, (1,2)))
x = np.append(x,(2,3))
Outputs
array([dtype(('<i4', (2, 3))), 1, 2], dtype=object)
[Reference][1]http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html
If I understand what you mean, you can use vstack:
>>> a = np.array([(1,2),(3,4)])
>>> a = np.vstack((a, (4,5)))
>>> a
array([[1, 2],
[3, 4],
[4, 5]])
I do not have any special insight as to why this works, but:
x = np.array([1, 3, 2, (5,7), 4])
mytuple = [(2, 3)]
mytuplearray = np.empty(len(mytuple), dtype=object)
mytuplearray[:] = mytuple
y = np.append(x, mytuplearray)
print(y) # [1 3 2 (5, 7) 4 (2, 3)]
As others have correctly pointed out, this is a slow operation with numpy arrays. If you're just building some code from scratch, try to use some other data type. But if you know your array will always remain small or you're not going to append much or if you have existing code that you need to tweak quickly, then go ahead.
simplest way:
x=np.append(x,None)
x[-1]=(2,3)
np.append is easy to use with a case like:
In [94]: np.append([1,2,3],4)
Out[94]: array([1, 2, 3, 4])
but its first example is harder to understand. It shows the same sort of flat concatenate that bothers you:
>>> np.append([1, 2, 3], [[4, 5, 6], [7, 8, 9]])
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
Stripped of dimensional tests, np.append does
In [166]: np.append(np.array([1,2],int),(2,3))
Out[166]: array([1, 2, 2, 3])
In [167]: np.concatenate([np.array([1,2],int),np.array((2,3))])
Out[167]: array([1, 2, 2, 3])
So except for the simplest cases you need to understand what np.array((2,3)) does, and how concatenate handles dimensions.
So apart from the speed issues, np.append can be trickier to use that the interface suggests. The parallels to list append are only superficial.
As for append (or concatenate) with dtype=object (not dtype=tuple) or a compound dtype ('i,i'), I couldn't tell you what happens without testing. At a minimum the inputs should already be arrays, and should have a matching dtype. Otherwise the results can unpredicatable.
edit
Don't trust the timings in https://stackoverflow.com/a/38985245/901925. The functions don't produce the same things.
Corrected functions:
In [233]: def g1(num_iterations):
...: x = np.ones((0,2),int)
...: for i in range(num_iterations):
...: x = np.append(x, [(i, i)], axis=0)
...: return x
...:
...: def g2(num_iterations):
...: x = np.ones((0, 2),int)
...: for i in range(num_iterations):
...: x = np.vstack((x, (i, i)))
...: return x
...:
...: def g3(num_iterations):
...: x = []
...: for i in range(num_iterations):
...: x.append((i, i))
...: x = np.array(x)
...: return x
...:
In [234]: g1(3)
Out[234]:
array([[0, 0],
[1, 1],
[2, 2]])
In [235]: g2(3)
Out[235]:
array([[0, 0],
[1, 1],
[2, 2]])
In [236]: g3(3)
Out[236]:
array([[0, 0],
[1, 1],
[2, 2]])
np.append and np.vstack timings are much closer. Both use np.concatenate to do the actual joining. They differ in how the inputs are processed prior to sending them to concatenate.
In [237]: timeit g1(1000)
9.69 ms ± 6.25 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [238]: timeit g2(1000)
12.8 ms ± 7.53 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [239]: timeit g3(1000)
537 µs ± 2.22 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
The wrong results. Note that f1 produces a 1d object dtype array, because the starting value is object dtype array, and there's not axis parameter. f2 duplicates the starting array.
In [240]: f1(3)
Out[240]: array([dtype(('<i4', (2, 1))), 0, 0, 1, 1, 2, 2], dtype=object)
In [241]: f2(3)
Out[241]:
array([[0, 0],
[0, 0],
[1, 1],
[2, 2]])
Not only is it slower to use np.append or np.vstack in a loop, it is also hard to do it right.

Numpy quirk: Apply function to all pairs of two 1D arrays, to get one 2D array

Let's say I have 2 one-dimensional (1D) numpy arrays, a and b, with lengths n1 and n2 respectively. I also have a function, F(x,y), that takes two values. Now I want to apply that function to each pair of values from my two 1D arrays, so the result would be a 2D numpy array with shape n1, n2. The i, j element of the two-dimensional array would be F(a[i], b[j]).
I haven't been able to find a way of doing this without a horrible amount of for-loops, and I'm sure there's a much simpler (and faster!) way of doing this in numpy.
Thanks in advance!
You can use numpy broadcasting to do calculation on the two arrays, turning a into a vertical 2D array using newaxis:
In [11]: a = np.array([1, 2, 3]) # n1 = 3
...: b = np.array([4, 5]) # n2 = 2
...: #if function is c(i, j) = a(i) + b(j)*2:
...: c = a[:, None] + b*2
In [12]: c
Out[12]:
array([[ 9, 11],
[10, 12],
[11, 13]])
To benchmark:
In [28]: a = arange(100)
In [29]: b = arange(222)
In [30]: timeit r = np.array([[f(i, j) for j in b] for i in a])
10 loops, best of 3: 29.9 ms per loop
In [31]: timeit c = a[:, None] + b*2
10000 loops, best of 3: 71.6 us per loop
If F is beyond your control, you can wrap it automatically to be "vector-aware" by using numpy.vectorize. I present a working example below where I define my own F just for completeness. This approach has the simplicity advantage, but if you have control over F, rewriting it with a bit of care to vectorize correctly can have huge speed benefits
import numpy
n1 = 100
n2 = 200
a = numpy.arange(n1)
b = numpy.arange(n2)
def F(x, y):
return x + y
# Everything above this is setup, the answer to your question lies here:
fv = numpy.vectorize(F)
r = fv(a[:, numpy.newaxis], b)
On my computer, the following timings are found, showing the price you pay for "automatic" vectorisation:
%timeit fv(a[:, numpy.newaxis], b)
100 loops, best of 3: 3.58 ms per loop
%timeit F(a[:, numpy.newaxis], b)
10000 loops, best of 3: 38.3 µs per loop
If F() works with broadcast arguments, definitely use that, as others describe.
An alternative is to use
np.fromfunction
(function_on_an_int_grid would be a better name.)
The following just maps the int grid to your a-b grid, then into F():
import numpy as np
def func_allpairs( F, a, b ):
""" -> array len(a) x len(b):
[[ F( a0 b0 ) F( a0 b1 ) ... ]
[ F( a1 b0 ) F( a1 b1 ) ... ]
...
]
"""
def fab( i, j ):
return F( a[i], b[j] ) # F scalar or vec, e.g. gradient
return np.fromfunction( fab, (len(a), len(b)), dtype=int ) # -> fab( all pairs )
#...............................................................................
def F( x, y ):
return x + 10*y
a = np.arange( 100 )
b = np.arange( 222 )
A = func_allpairs( F, a, b )
# %timeit: 1000 loops, best of 3: 241 µs per loop -- imac i5, np 1.9.3
As another alternative that's a bit more extensible than the dot-product, in less than 1/5th - 1/9th the time of nested list comprehensions, use numpy.newaxis (took a bit more digging to find):
>>> import numpy
>>> a = numpy.array([0,1,2])
>>> b = numpy.array([0,1,2,3])
This time, using the power function:
>>> pow(a[:,numpy.newaxis], b)
array([[1, 0, 0, 0],
[1, 1, 1, 1],
[1, 2, 4, 8]])
Compared with an alternative:
>>> numpy.array([[pow(i,j) for j in b] for i in a])
array([[1, 0, 0, 0],
[1, 1, 1, 1],
[1, 2, 4, 8]])
And comparing the timing:
>>> import timeit
>>> timeit.timeit('numpy.array([[pow(i,j) for i in a] for j in b])', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
31.943181037902832
>>> timeit.timeit('pow(a[:, numpy.newaxis], b)', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
5.985810041427612
>>> timeit.timeit('numpy.array([[pow(i,j) for i in a] for j in b])', 'import numpy; a=numpy.arange(10); b=numpy.arange(10)')
109.74687385559082
>>> timeit.timeit('pow(a[:, numpy.newaxis], b)', 'import numpy; a=numpy.arange(10); b=numpy.arange(10)')
11.989138126373291
You could use list comprehensions to create an array of arrays:
import numpy as np
# Arrays
a = np.array([1, 2, 3]) # n1 = 3
b = np.array([4, 5]) # n2 = 2
# Your function (just an example)
def f(i, j):
return i + j
result = np.array([[f(i, j)for j in b ]for i in a])
print result
Output:
[[5 6]
[6 7]
[7 8]]
May I suggest, if your use-case is more limited to products, that you use the outer-product?
e.g.:
import numpy
a = array([0, 1, 2])
b = array([0, 1, 2, 3])
numpy.outer(a,b)
returns
array([[0, 0, 0, 0],
[0, 1, 2, 3],
[0, 2, 4, 6]])
You can then apply other transformations:
numpy.outer(a,b) + 1
returns
array([[1, 1, 1, 1],
[1, 2, 3, 4],
[1, 3, 5, 7]])
This is much faster:
>>> import timeit
>>> timeit.timeit('numpy.array([[i*j for i in a] for j in b])', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
31.79583477973938
>>> timeit.timeit('numpy.outer(a,b)', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
9.351550102233887
>>> timeit.timeit('numpy.outer(a,b)+1', 'import numpy; a=numpy.arange(3); b=numpy.arange(4)')
12.308301210403442

Categories