I have an n-by-3 index array (think of triangles indexing points) and a list of float values associated with the triangles. I now want to get for each index ("point") the minimum value, i.e., check all rows which contain the index, say, 0, and get the minimum value from vals across the respective rows:
import numpy
a = numpy.array([
[0, 1, 2],
[2, 3, 0],
[1, 4, 2],
[2, 5, 3],
])
vals = numpy.array([0.1, 0.5, 0.3, 0.6])
out = [
numpy.min(vals[numpy.any(a == i, axis=1)])
for i in range(6)
]
# out = numpy.array([0.1, 0.1, 0.1, 0.5, 0.3, 0.6])
This solution is inefficient because it does a full array comparison for every i.
This problem is quite similar to numpy's ufuncs, but numpy.min.at doesn't exist.
Any hints?
Approach #1
One approach based on array-assignment to setup a 2D array filled up NaNs, using those a values as column indices (so assumes those to be integers), then mapping vals into it and looking for nan-skipped min values for the final output -
nr,nc = len(a),a.max()+1
m = np.full((nr,nc),np.nan)
m[np.arange(nr)[:,None],a] = vals[:,None]
out = np.nanmin(m,axis=0)
Approach #2
Another one again based on array-assignment, but uses masking and np.minimum.reduceat in favor of dealing with NaNs -
nr,nc = len(a),a.max()+1
m = np.zeros((nc,nr),dtype=bool)
m[a.T,np.arange(nr)] = 1
c = m.sum(1)
shift_idx = np.r_[0,c[:-1].cumsum()]
out = np.minimum.reduceat(np.broadcast_to(vals,m.shape)[m],shift_idx)
Approach #3
Another based on argsort (assuming you have all integers from 0 to a.max() in a) -
sidx = a.ravel().argsort()
c = np.bincount(a.ravel())
out = np.minimum.reduceat(vals[sidx//a.shape[1]],np.r_[0,c[:-1].cumsum()])
Approach #4
For memory efficiency and hence perf. and also to complete the set -
from numba import njit
#njit
def numba1(a, vals, out):
m,n = a.shape
for j in range(m):
for i in range(n):
e = a[j,i]
if vals[j] < out[e]:
out[e] = vals[j]
return out
def func1(a, vals, outlen=None): # feed in output length as outlen if known
if outlen is not None:
N = outlen
else:
N = a.max()+1
out = np.full(N,np.inf)
return numba1(a, vals, out)
You may switch to pd.GroupBy or itertools.groupby if your for loop goes way beyond 6.
For instance,
r = n.ravel()
pd.Series(np.arange(len(r))//3).groupby(r).apply(lambda s: vals[s].min())
This solution would be faster for long loops, and probably slower for small loops (< 50)
Here is one based on this Q&A:
If you have pythran, compile
file <stb_pthr.py>
import numpy as np
#pythran export sort_to_bins(int[:], int)
def sort_to_bins(idx, mx):
if mx==-1:
mx = idx.max() + 1
cnts = np.zeros(mx + 2, int)
for i in range(idx.size):
cnts[idx[i]+2] += 1
for i in range(2, cnts.size):
cnts[i] += cnts[i-1]
res = np.empty_like(idx)
for i in range(idx.size):
res[cnts[idx[i]+1]] = i
cnts[idx[i]+1] += 1
return res, cnts[:-1]
Otherwise the script will fall back to a sparse matrix based approach which is only slightly slower:
import numpy as np
try:
from stb_pthr import sort_to_bins
HAVE_PYTHRAN = True
except:
HAVE_PYTHRAN = False
from scipy.sparse import csr_matrix
def sort_to_bins_sparse(idx, mx):
if mx==-1:
mx = idx.max() + 1
aux = csr_matrix((np.ones_like(idx),idx,np.arange(idx.size+1)),
(idx.size,mx)).tocsc()
return aux.indices, aux.indptr
if not HAVE_PYTHRAN:
sort_to_bins = sort_to_bins_sparse
def f_op():
mx = a.max() + 1
return np.fromiter((np.min(vals[np.any(a == i, axis=1)])
for i in range(mx)),vals.dtype,mx)
def f_pp():
idx, bb = sort_to_bins(a.reshape(-1),-1)
res = np.minimum.reduceat(vals[idx//3], bb[:-1])
res[bb[:-1]==bb[1:]] = np.inf
return res
def f_div_3():
sidx = a.ravel().argsort()
c = np.bincount(a.ravel())
bb = np.r_[0,c.cumsum()]
res = np.minimum.reduceat(vals[sidx//a.shape[1]],bb[:-1])
res[bb[:-1]==bb[1:]] = np.inf
return res
a = np.array([
[0, 1, 2],
[2, 3, 0],
[1, 4, 2],
[2, 5, 3],
])
vals = np.array([0.1, 0.5, 0.3, 0.6])
assert np.all(f_op()==f_pp())
from timeit import timeit
a = np.random.randint(0,1000,(10000,3))
vals = np.random.random(10000)
assert len(np.unique(a))==1000
assert np.all(f_op()==f_pp())
print("1000/1000 labels, 10000 rows")
print("op ", timeit(f_op, number=10)*100, 'ms')
print("pp ", timeit(f_pp, number=100)*10, 'ms')
print("div", timeit(f_div_3, number=100)*10, 'ms')
a = 1 + 2 * np.random.randint(0,5000,(1000000,3))
vals = np.random.random(1000000)
nl = len(np.unique(a))
assert np.all(f_div_3()==f_pp())
print(f"{nl}/{a.max()+1} labels, 1000000 rows")
print("pp ", timeit(f_pp, number=10)*100, 'ms')
print("div", timeit(f_div_3, number=10)*100, 'ms')
a = 1 + 2 * np.random.randint(0,100000,(1000000,3))
vals = np.random.random(1000000)
nl = len(np.unique(a))
assert np.all(f_div_3()==f_pp())
print(f"{nl}/{a.max()+1} labels, 1000000 rows")
print("pp ", timeit(f_pp, number=10)*100, 'ms')
print("div", timeit(f_div_3, number=10)*100, 'ms')
Sample run (timings include #Divakar approach 3 for reference):
1000/1000 labels, 10000 rows
op 145.1122640981339 ms
pp 0.7944229000713676 ms
div 2.2905819199513644 ms
5000/10000 labels, 1000000 rows
pp 113.86540920939296 ms
div 417.2476712032221 ms
100000/200000 labels, 1000000 rows
pp 158.23634970001876 ms
div 486.13436080049723 ms
UPDATE: #Divakar's latest (approach 4) is hard to beat, being essentially a C implementation. Nothing wrong with that except that jitting is not an option but a requirement here (the unjitted code is no fun to run). If one accepts that, the same can, of course, be done with pythran:
pythran -O3 labeled_min.py
file <labeled_min.py>
import numpy as np
#pythran export labeled_min(int[:,:], float[:])
def labeled_min(A, vals):
mn = np.empty(A.max()+1)
mn[:] = np.inf
M,N = A.shape
for i in range(M):
v = vals[i]
for j in range(N):
c = A[i,j]
if v < mn[c]:
mn[c] = v
return mn
Both give another massive speedup:
from labeled_min import labeled_min
func1() # do not measure jitting time
print("nmb ", timeit(func1, number=100)*10, 'ms')
print("pthr", timeit(lambda:labeled_min(a,vals), number=100)*10, 'ms')
Sample run:
nmb 8.41792532010004 ms
pthr 8.104007659712806 ms
pythran comes out a few percent faster but this is only because I moved vals lookup out of the inner loop; without that they are all but equal.
For comparison, the previously best with and without non python helpers on the same problem:
pp 114.04887529788539 ms
pp (py only) 147.0821460010484 ms
Apparently, numpy.minimum.at exists:
import numpy
a = numpy.array([
[0, 1, 2],
[2, 3, 0],
[1, 4, 2],
[2, 5, 3],
])
vals = numpy.array([0.1, 0.5, 0.3, 0.6])
out = numpy.full(6, numpy.inf)
numpy.minimum.at(out, a.reshape(-1), numpy.repeat(vals, 3))
Related
The code calculates the minimum value at each row and picks the next minimum by scanning the nearby element on the same and the next row. Instead, I want the code to start with minimum value of the first row and then progress by scanning the nearby elements. I don't want it to calculate the minimum value for each row. The outputs are attached.
import numpy as np
from scipy.ndimage import minimum_filter as mf
Pe = np.random.rand(5,5)
b = np.zeros((Pe.shape[0], 2))
#the footprint of the window, i.e., we do not consider the value itself or value in the row above
ft = np.asarray([[0, 0, 0],
[1, 0, 1],
[1, 1, 1]])
#applying scipy's minimum filter
#mode defines what should be considered as values at the edges
#setting the edges to INF
Pe_min = mf(Pe, footprint=ft, mode="constant", cval=np.inf)
#finding rowwise index of minimum value
idx = Pe.argmin(axis=1)
#retrieving minimum values and filtered values
b[:, 0] = np.take_along_axis(Pe, idx[None].T, 1).T[0]
b[:, 1] = np.take_along_axis(Pe_min, idx[None].T, 1).T[0]
print(b)
Present Output:
Desired Output:
You can solve this using a simple while loop: for a given current location, each step of the loop iterates over the neighborhood to find the smallest value amongst all the valid next locations and then update/write it.
Since this can be pretty inefficient in pure Numpy, you can use Numba so the code can be executed efficiently. Here is the implementation:
import numpy as np
import numba as nb
Pe = np.random.rand(5,5)
# array([[0.58268917, 0.99645225, 0.06229945, 0.5741654 , 0.41407074],
# [0.4933553 , 0.93253261, 0.1485588 , 0.00133828, 0.09301049],
# [0.49055436, 0.53794993, 0.81358814, 0.25031136, 0.76174586],
# [0.69885908, 0.90878292, 0.25387689, 0.25735301, 0.63913838],
# [0.33781117, 0.99406778, 0.49133067, 0.95026241, 0.14237322]])
#nb.njit('int_[:,:](float64[:,::1])', boundscheck=True)
def minValues(arr):
n, m = arr.shape
assert n >= 1 and m >= 2
res = []
i, j = 0, np.argmin(arr[0,:])
res.append((i, j))
iPrev = jPrev = -1
while iPrev < n-1:
cases = [(i, j-1), (i, j+1), (i+1, j-1), (i+1, j), (i+1, j+1)]
minVal = np.inf
iMin = jMin = -1
# Find the best candidate (smalest value)
for (i2, j2) in cases:
if i2 == iPrev and j2 == jPrev: # No cycles
continue
if i2 < 0 or i2 >= n or j2 < 0 or j2 >= m: # No out-of-bounds
continue
if arr[i2, j2] < minVal:
iMin, jMin = i2, j2
minVal = arr[i2, j2]
assert not np.isinf(minVal)
# Store it and update the values
res.append((iMin, jMin))
iPrev, jPrev = i, j
i, j = iMin, jMin
return np.array(res)
minValues(Pe)
# array([[0, 2],
# [1, 3],
# [1, 4],
# [2, 3],
# [3, 2],
# [3, 3],
# [4, 4],
# [4, 3]], dtype=int32)
The algorithm is relatively fast as it succeeds to find the result of a path of length 141_855 in a Pe array of shape (100_000, 1_000) in only 15 ms on my machine (although it can be optimized). The same code using only CPython (ie. without the Numba JIT) takes 591 ms.
I would like to generate a numpy array by performing a sum of indexed values from another array
For for example, given the following arrays:
row_indices = np.array([[1, 1, 1], [0, 1, 1]])
col_indices = np.array([[0, 0, 1], [1, 1, 1]])
values = np.array([[2, 2, 3], [2, 4, 4]])
I would like so set a new array indexed_sum in the following way:
for i in range(row_indices.size):
indexed_sum[row_indices.flat[i], col_indices.flat[i]] += values.flat[i]
Such that:
indexed_sum = np.array([[0, 2], [4, 11]])
However, since this is a python loop and these arrays can be very large, this takes an unacceptable amount of time. Is there an efficient numpy method that I can use to accomplish this?
You might find success with numba, another Python package. I timed the following two functions in a Jupyter notebook with %timeit. Results are below:
import numba
import numpy as np
# Your loop, but in a function.
def run_sum(row_indicies, col_indicies, values):
indexed_sum = np.zeros((row_indicies.max() + 1, col_indicies.max() + 1))
for i in range(row_indicies.size):
indexed_sum[row_indicies.flat[i], col_indicies.flat[i]] += values.flat[i]
return indexed_sum
# Your loop with a numba decorator.
#numba.jit(nopython=True) # note you may be able to parallelize too
def run_sum_numba(row_indicies, col_indicies, values):
indexed_sum = np.zeros((row_indicies.max() + 1, col_indicies.max() + 1))
for i in range(row_indicies.size):
indexed_sum[row_indicies.flat[i], col_indicies.flat[i]] += values.flat[i]
return indexed_sum
My example data to have something bigger to chew on:
row_id_big = np.random.randint(0, 100, size=(1000,))
col_id_big = np.random.randint(0, 100, size=(1000,))
values_big = np.random.randint(0, 10, size=(1000,))
Results:
%timeit run_sum(row_id_big, col_id_big, values_big)
# 1.04 ms ± 12.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit run_sum_numba(row_id_big, col_id_big, values_big)
# 3.85 µs ± 44.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The loop with the numba decorator is a couple hundred times faster in this example. I'm not positive about the memory usage compared to your example. I had to initialize a numpy array to have somewhere to put the data, but if you have a better way of doing that step you might be able to improve performance further.
A note with numba: you need to run your loop once to start seeing the major speed improvements. You might be able to initialize the jit with just a toy example like yours here and see the same speedup.
Since the tradeoff between speed and memmory-ussage I think your method is well situable. But you can still make it faster:
avoid flattening the arrays inside the loop this will save you some time
instead of using .flat[:] or .flatten() use .ravel() (I'm not sure why but it seems to be faster)
also avoid for i in range.. just zip in the values of interest (see method3)
Here a good solution that will speed-up things:
r_i_f = row_indices.ravel()
c_i_f = col_indices.ravel()
v_f = values.ravel()
indexed_sum = np.zeros((row_indices.max()+1,col_indices.max()+1))
for i,j,v in zip(r_i_f,c_i_f,v_f):
indexed_sum[i, j] += v
return indexed_sum
To see a comparision here's some toy code (correct any detail it's not proportioned and let me know if it works well for you)
def method1(values,row_indices,col_indices):
"""OP method"""
indexed_sum = np.zeros((row_indices.max()+1,col_indices.max()+1))
for i in range(row_indices.size):
indexed_sum[row_indices.flat[i], col_indices.flat[i]] += values.flat[i]
return indexed_sum
def method2(values,row_indices,col_indices):
"""just raveling before loop. Time saved here is considerable"""
r_i_f = row_indices.ravel()
c_i_f = col_indices.ravel()
v_f = values.ravel()
indexed_sum = np.zeros((row_indices.max()+1,col_indices.max()+1))
for i in range(row_indices.size):
indexed_sum[r_i_f[i], c_i_f[i]] += v_f[i]
return indexed_sum
def method3(values,row_indices,col_indices):
"""raveling, then avoiding range(...), just zipping
the time saved here is small but by no cost"""
r_i_f = row_indices.ravel()
c_i_f = col_indices.ravel()
v_f = values.ravel()
indexed_sum = np.zeros((row_indices.max()+1,col_indices.max()+1))
for i,j,v in zip(r_i_f,c_i_f,v_f):
indexed_sum[i, j] += v
return indexed_sum
from time import perf_counter
import numpy as np
out_size = 50
in_shape = (5000,5000)
values = np.random.randint(10,size=in_shape)
row_indices = np.random.randint(out_size,size=in_shape)
col_indices = np.random.randint(out_size,size=in_shape)
t1 = perf_counter()
v1 = method1(values,row_indices,col_indices)
t2 = perf_counter()
v2 = method2(values,row_indices,col_indices)
t3 = perf_counter()
v3 = method3(values,row_indices,col_indices)
t4 = perf_counter()
print(f"method1: {t2-t1}")
print(f"method2: {t3-t2}")
print(f"method3: {t4-t3}")
Outputs for values of shape 5000x5000 and output shaped as 50x50:
method1: 23.66934896100429
method2: 14.241692076990148
method3: 11.415708078013267
aditional a comparison between fltten methods (in my computer)
q = np.random.randn(5000,5000)
t1 = perf_counter()
q1 = q.flatten()
t2 = perf_counter()
q2 = q.ravel()
t3 = perf_counter()
q3 = q.reshape(-1)
t4 = perf_counter()
q4 = q.flat[:]
t5 = perf_counter()
#print times:
print(f"q.flatten: {t2-t1}")
print(f"q.ravel: {t3-t2}")
print(f"q.reshape(-1): {t4-t3}")
print(f"q.flat[:]: {t5-t4}")
Outputs:
q.flatten: 0.043878231997950934
q.ravel: 5.550700007006526e-05
q.reshape(-1): 0.0006349250033963472
q.flat[:]: 0.08832104799512308
There's a lot of options for this, but they're all reinventing a wheel that kinda already exists.
import numpy as np
from scipy import sparse
row_indices = np.array([[1, 1, 1], [0, 1, 1]])
col_indices = np.array([[0, 0, 1], [1, 1, 1]])
values = np.array([[2, 2, 3], [2, 4, 4]])
What you want is the built-in behavior for the scipy sparse matrices:
arr = sparse.coo_matrix((values.flat, (row_indices.flat, col_indices.flat)))
Which yields a sparse data structure:
>>> arr
<2x2 sparse matrix of type '<class 'numpy.int64'>'
with 6 stored elements in COOrdinate format>
But you can convert it back to a numpy array easily:
>>> arr.A
array([[ 0, 2],
[ 4, 11]])
There are some good answers here, but in the end I cheated and wrote an extension module method using the numpy C API, which runs in the trivial time that I wanted.
The code is precisely as boring as one would expect, but since an answer would seem incomplete without some, here is the core of it. It does make some unfortunate assumptions about typing that I mean to fill in with time.
int* row_data = PyArray_DATA(row_indices);
int* col_data = PyArray_DATA(col_indices);
double* value_data = PyArray_DATA(values);
double* output_data = PyArray_DATA(sum_obj);
for(int i = 0; i < input_rows; ++i)
{
for(int j = 0; j < input_cols; ++j)
{
long output_row = row_data[i*input_cols+j];
long output_col = col_data[i*input_cols+j];
output_data[output_row*out_col_count+output_col] += value_data[i*input_cols+j];
}
}
Assume we have a numpy array A with shape (N, ) and a matrix D with shape (M, 3) which has data and another matrix I with shape (M, 3) which has corresponding index of each data element in D. How can we construct A given D and I such that the repeated element indexes are added?
Example:
############# A[I] := D ###################################
A = [0.5, 0.6] # Final Reduced Data Vector
D = [[0.1, 0.1 0.2], [0.2, 0.4, 0.1]] # Data
I = [[0, 1, 0], [0, 1, 1]] # Indices
For example:
A[0] = D[0][0] + D[0][2] + D[1][0] # 0.5 = 0.1 + 0.2 + 0.2
Since in index matrix we have:
I[0][0] = I[0][2] = I[1][0] = 0
Target is to avoid looping over all elements to be efficient for large N, M (10^6-10^9).
I doubt you can get much faster than np.bincount - and notice how the official documentation provides this exact usecase
# Your example
A = [0.5, 0.6]
D = [[0.1, 0.1, 0.2], [0.2, 0.4, 0.1]]
I = [[0, 1, 0], [0, 1, 1]]
# Solution
import numpy as np
D, I = np.array(D).flatten(), np.array(I).flatten()
print(np.bincount(I, D)) #[0.5 0.6]
The shape of I and D doesn't matter: you can clearly ravel the arrays without changing the outcome:
index = np.ravel(I)
data = np.ravel(D)
Now you can sort both arrays according to I:
sorter = np.argsort(index)
index = index[sorter]
data = data[sorter]
This is helpful because now index looks like this:
0, 0, 0, 1, 1, 1
And data is this:
0.1, 0.2, 0.2, 0.1, 0.4, 0.1
Adding together runs of consecutive numbers should be easier than processing random locations. Let's start by finding the indices where the runs start:
runs = np.r_[0, np.flatnonzero(np.diff(index)) + 1]
Now you can use the fact that ufuncs like np.add have a partial reduce operation called reduceat. This allows you to sum regions of an array:
a = np.add.reduceat(data, runs)
If I is guaranteed to contain all indices in [0, A.size) at least once, you're done: just assign to A instead of a. If not, you can make the mapping using the fact that the start of each run in index is the target index:
A = np.zeros(n)
A[index[runs]] = a
Algorithmic complexity analysis:
ravel is O(1) in time and space if the data is in an array. If it's a list, this is O(MN) in time and space
argsort is O(MN log MN) in time and O(MN) in space
Indexing by sorter is O(MN) in time and space
Computing runs is O(MN) in time and O(MN + M) = O(MN) in space
reduceat is a single pass: O(MN) in time, O(M) in space
Reassigning A is O(M) in time and space
Total: O(MN log MN) time, O(MN) space
TL;DR
def make_A(D, I, M):
index = np.ravel(I)
data = np.ravel(D)
sorter = np.argsort(index)
index = index[sorter]
if index[0] < 0 or index[-1] >= M:
raise ValueError('Bad indices')
data = data[sorter]
runs = np.r_[0, np.flatnonzero(np.diff(index)) + 1]
a = np.add.reduceat(data, runs)
if a.size == M:
return a
A = np.zeros(M)
A[index[runs]] = a
return A
If you know the size of A beforehand, as it seems you do, you can simply use add.at:
import numpy as np
D = [[0.1, 0.1, 0.2], [0.2, 0.4, 0.1]]
I = [[0, 1, 0], [0, 1, 1]]
arr_D = np.array(D)
arr_I = np.array(I)
A = np.zeros(2)
np.add.at(A, arr_I, arr_D)
print(A)
Output
[0.5 0.6]
If you don't know the size of A, you can use max to compute it:
A = np.zeros(arr_I.max() + 1)
np.add.at(A, arr_I, arr_D)
print(A)
Output
[0.5 0.6]
The time complexity of this algorithm is O(N), with also space complexity O(N).
The:
arr_I.max() + 1
is what bincount does under the hood, from the documentation:
The result of binning the input array. The length of out is equal to
np.amax(x)+1.
That being said, bincount is at least one order of magnitude faster:
I = np.random.choice(1000, size=(1000, 3), replace=True)
D = np.random.random((1000, 3))
%timeit make_A_with_at(I, D, 1000)
213 µs ± 25 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit make_A_with_bincount(I, D)
11 µs ± 15.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I have an array input_data of shape (A, B, C), and an array ind of shape (B,). I want to loop through the B axis and take the sum of elements C[B[i]] and C[B[i]+1]. The desired output is of shape (A, B). I have the following code which works, but I feel is inefficient due to index-based looping through the B axis. Is there a more efficient method?
import numpy as np
input_data = np.random.rand(2, 6, 10)
ind = [ 2, 3, 5, 6, 5, 4 ]
out = np.zeros( ( input_data.shape[0], input_data.shape[1] ) )
for i in range( len(ind) ):
d = input_data[:, i, ind[i]:ind[i]+2]
out[:, i] = np.sum(d, axis = 1)
Edited based on Divakar's answer:
import timeit
import numpy as np
N = 1000
input_data = np.random.rand(10, N, 5000)
ind = ( 4999 * np.random.rand(N) ).astype(np.int)
def test_1(): # Old loop-based method
out = np.zeros( ( input_data.shape[0], input_data.shape[1] ) )
for i in range( len(ind) ):
d = input_data[:, i, ind[i]:ind[i]+2]
out[:, i] = np.sum(d, axis = 1)
return out
def test_2():
extent = 2 # Comes from 2 in "ind[i]:ind[i]+2"
m,n,r = input_data.shape
idx = (np.arange(n)*r + ind)[:,None] + np.arange(extent)
out1 = input_data.reshape(m,-1)[:,idx].reshape(m,n,-1).sum(2)
return out1
print timeit.timeit(stmt = test_1, number = 1000)
print timeit.timeit(stmt = test_2, number = 1000)
print np.all( test_1() == test_2(), keepdims = True )
>> 7.70429363482
>> 0.392034666757
>> [[ True]]
Here's a vectorized approach using linear indexing with some help from broadcasting. We merge the last two axes of the input array, calculate the linear indices corresponding to the last two axes, perform slicing and reshape back to a 3D shape. Finally, we do summation along the last axis to get the desired output. The implementation would look something like this -
extent = 2 # Comes from 2 in "ind[i]:ind[i]+2"
m,n,r = input_data.shape
idx = (np.arange(n)*r + ind)[:,None] + np.arange(extent)
out1 = input_data.reshape(m,-1)[:,idx].reshape(m,n,-1).sum(2)
If the extent is always going to be 2 as stated in the question - "... sum of elements C[B[i]] and C[B[i]+1]", then you could simply do -
m,n,r = input_data.shape
ind_arr = np.array(ind)
axis1_r = np.arange(n)
out2 = input_data[:,axis1_r,ind_arr] + input_data[:,axis1_r,ind_arr+1]
You could also use integer array indexing combined with basic slicing:
import numpy as np
m,n,r = 2, 6, 10
input_data = np.arange(2*6*10).reshape(m, n, r)
ind = np.array([ 2, 3, 5, 6, 5, 4 ])
out = np.zeros( ( input_data.shape[0], input_data.shape[1] ) )
for i in range( len(ind) ):
d = input_data[:, i, ind[i]:ind[i]+2]
out[:, i] = np.sum(d, axis = 1)
out2 = input_data[:, np.arange(n)[:,None], np.add.outer(ind,range(2))].sum(axis=-1)
print(out2)
# array([[ 5, 27, 51, 73, 91, 109],
# [125, 147, 171, 193, 211, 229]])
assert np.allclose(out, out2)
In some areas of physics, phase factors such as (-1)^n, where n is some integer formed from summing or subtracting other integers, often appear. Is there, in general, any performance improvement to the following:
sgn = lambda k: -1 if k % 2 else 1
over simply
sgn = (-1)**k
And if so, what would be the best way to vectorize the former?
Edit: Mr E has provided a fast solution for k bounded in some integer range, but I'm a bit concerned that my k might fall outside this range. Initially I thought of:
In [1]: sg = np.array([1,-1])
In [2]: k = np.array([201, 0, 2, -37])
In [3]: sg[k % 2]
Out[4]: array([-1, 1, 1, -1])
But the modulus operation seems to slow it down compared with the power approach:
ph1 = lambda k: (-1)**k
sgn = np.array([1,-1])
ph2 = lambda k: sgn[k % 2]
x = np.random.randint(-200, 200, 100)
%timeit ph1(x)
1000000 loops, best of 3: 264 ns per loop
%timeit ph2(x)
1000000 loops, best of 3: 284 ns per loop
Mr E has answered how to vectorise it with numpy, but with respect to which of those is more efficient time wise, the following code uses timeit for various values of k and plots the results using matplotlib.
The results suggest that using -1 if k % 2 else 1 is consistently faster than using (-1)**k.
import matplotlib.pyplot as plt
from timeit import Timer
def f1(k): return -1 if k % 2 else 1
def f2(k): return (-1)**k
result1 = []
result2 = []
x = range(0,1000, 100)
for n in x:
print(n)
timer1 = Timer('f1({})'.format(n), setup='from __main__ import f1')
timer2 = Timer('f2({})'.format(n), setup='from __main__ import f2')
result1.append(timer1.timeit(100000))
result2.append(timer2.timeit(100000))
plt.plot(x, result1, label='-1 if k % 2 else 1')
plt.plot(x, result2, label='(-1)**k')
plt.xlabel('k')
plt.ylabel('Time')
plt.legend(loc='best')
plt.show()
You can use the tile function to do this efficiently. Example:
>>> np.tile([1, -1], 10)
array([ 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1, -1, 1,
-1, 1, -1])
Alternatively, if you are summing a series (-1)^k a_k and you have already computed a_k, as a numpy array, you can sum a_k[::2] and a_k[1::2] and take the difference. This avoids computing the alternating sign and multiplying.
To answer your edited question:
>>> k = np.array([1, 3, 2, 5, 10, 7])
>>> foo = np.array([1, -1])
>>> foo[k % 2]
array([-1, -1, 1, -1, 1, -1])
>>>