fill in numpy array without looping though all indices - python

I want to use a high-dimensional numpy array to store the norms of weighted sums of matrices.
For example:
mat1, mat2, mat3, mat4 = np.random.rand(3, 3), np.random.rand(3, 3), np.random.rand(3, 3), np.random.rand(3, 3)
res = np.empty((8, 7, 6, 5))
for i in range(8):
for j in range(7):
for p in range(6):
for q in range(5):
res[i, j, p, q] = np.linalg.norm(i * mat1 + j * mat2 + p * mat3 + q * mat4)
I would like to ask that are there any methods to avoid this nested loop?

Solution
Here's one way you can do it, via adding axes with None (equivalent to np.newaxis):
def weighted_norms(mat1, mat2, mat3, mat4):
P = mat1 * np.arange(8)[:, None, None]
Q = mat2 * np.arange(7)[:, None, None]
R = mat3 * np.arange(6)[:, None, None]
S = mat4 * np.arange(5)[:, None, None]
summation = S + R[:, None] + Q[:, None, None] + P[:, None, None, None]
return np.linalg.norm(summation, axis=(4, 5))
Veracity and a simple benchmark
In [6]: output = weighted_norms(mat1, mat2, mat3, mat4)
In [7]: np.allclose(output, res)
Out[7]: True
In [8]: %timeit weighted_norms(mat1, mat2, mat3, mat4)
71.3 µs ± 446 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Explanation
By adding two new axes to the np.arange objects, you can force the broadcasting you want, producing 0 * mat1, 1 * mat1, 2 * mat1 ....
The real tricky bit is then constructing the (8, 7, 6, 5, 3, 3) array (which is the shape before evaluating the norm which collapses the last two dimensions).
Notice that the summation of all the weighted 3D arrays starts with the last array, S, and progressively adds more weighted 3D arrays. The way it does this is by adding a new axis to broadcast over at each step.
For example, the shape of S is (5, 3, 3) and in order to correctly add R you need to insert a new axis. So the shape of R goes from (6, 3, 3) to (6, 1, 3, 3). This second dimension of size 1 is what allows us to broadcast the sum of S over R such that each array in the 3D S is added to each array in R (that's one level of nested loop).
Then we need to add Q (for every array in Q, for every array in R, for every array in S), so we need to insert two new axes turning Q from (7, 3, 3) to (7, 1, 1, 3, 3).
Finally, P goes from (8, 3, 3) to (8, 1, 1, 1, 3, 3).
It may help to "visualize" this by overlaying the shapes:
(5, 3, 3) <- S
:
+ (6, 1, 3, 3) <- R[:, None]
---------------------
(6, 5, 3, 3)
: :
+ (7, 1, 1, 3, 3) <- Q[:, None, None]
---------------------
(7, 6, 5, 3, 3)
: : :
+ (8, 1, 1, 1, 3, 3) <- P[:, None, None, None]
---------------------
(8, 7, 6, 5, 3, 3)
Generalizing
Here's a generalized version using a helper function for adding axes just to clean up the code a little:
from typing import Tuple
import numpy as np
def add_axes(x: np.ndarray, n: int) -> np.ndarray:
"""
Inserts `n` number of new axes into `x` from axis 1 onward.
e.g., for `x.shape == (3, 3)`, `add_axes(x, 2) -> (3, 1, 1, 3)`
"""
return np.expand_dims(x, axis=(*range(1, n + 1),))
def weighted_norms(arrs: Tuple[np.ndarray], weights: Tuple[int]) -> np.ndarray:
if len(arrs) != len(weights):
raise ValueError("Number of arrays must match number of weights")
summation = np.empty((weights[-1], *arrs[-1].shape))
for i, (x, w) in enumerate(zip(arrs[::-1], weights[::-1])):
summation = summation + add_axes(x * add_axes(np.arange(w), 2), i)
return np.linalg.norm(summation, axis=(-1, -2))
Usage:
In [10]: arrs = (mat1, mat2, mat3, mat4)
In [11]: weights = (8, 7, 6, 5)
In [12]: output = weighted_norms(arrs, weights)
In [13]: np.allclose(output, res)
Out[13]: True
In [14]: %timeit weighted_norms(arrs, weights)
109 µs ± 3.07 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Related

Python: Element-wise multiplication of 3d with 3d arrays

I am having some problems implementing the following equation in a performant way using Python:
beta and gamma are cartesian coordinates {x,y} and b,m are some index value which can be quite big n=10000. I have a working version of the code which is shown below for the simple case of l=2 and m,b = 4 (l and m always have the same length). I checked the code using timeit and the the bottleneck is the element-wise multiplication with an array of size (3,3) and the reshaping of the resulting array into shape (3m,3m).
Does anybody has an idea how to increase the performance? (I also noticed that my current version suffers a quite big overhead for large values of l....)
import numpy as np
g_l3 = np.array([[1, 4, 5],[2, 6, 7]])
g_l33 = g_l3.reshape(-1, 3, 1) * g_l3.reshape(-1, 1, 3)
A_lm = np.arange(1, 9, 1).reshape(2, 4)
B_lb = np.arange(7, 15, 1).reshape(2, 4)
AB_lmb = A_lm.reshape(-1, 4, 1) * B_lb.reshape(-1, 1, 4)
D_lmb33 = np.sum(g_l33.reshape(-1, 1, 1, 3, 3) * AB_lmb.reshape(-1, 4, 4, 1, 1), axis=0)
D = np.concatenate(np.concatenate(D_lmb33, axis=2), axis=0)
In [387]: %%timeit
...: g_l3 = np.array([[1, 4, 5],[2, 6, 7]])
...
...: D_lmb33 = np.sum(g_l33.reshape(-1, 1, 1, 3, 3) * AB_lmb.reshape(-1, 4,
...: 4, 1, 1), axis=0)
...: D = np.concatenate(np.concatenate(D_lmb33, axis=2), axis=0)
...:
...:
70.7 µs ± 226 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Examining the pieces, and rewriting the reshape with newaxis, which is visually clearer to me - though basically the same speed:
In [388]: g_l3.shape
Out[388]: (2, 3)
In [389]: g_l33.shape
Out[389]: (2, 3, 3)
In [390]: np.allclose(g_l33, g_l3[:,:,None]*g_l3[:,None,:])
Out[390]: True
In [391]: AB_lmb.shape
Out[391]: (2, 4, 4)
In [392]: np.allclose(AB_lmb, A_lm[:,:,None]*B_lb[:,None,:])
Out[392]: True
So these the common outer products on the last dimension of 2d arrays.
And another outer,
In [393]: temp=g_l33.reshape(-1, 1, 1, 3, 3) * AB_lmb.reshape(-1, 4, 4, 1, 1)
In [394]: temp.shape
Out[394]: (2, 4, 4, 3, 3)
In [396]: np.allclose(temp, g_l33[:,None,None,:,:] * AB_lmb[:, :,:, None,None])
Out[396]: True
These probably could be combined into one expression, but that's not necessary.
D_lmb33 sums on the leading dimension:
In [405]: D_lmb33.shape
Out[405]: (4, 4, 3, 3)
the double concatenate can also be done with a transpose and reshape:
In [406]: np.allclose(D_lmb33.transpose(1,2,0,3).reshape(12,12),D)
Out[406]: True
Overall your code appears to make efficient use of the numpy. For a large leading dimension that (N,4,4,3,3) intermediate array could be large, and take time. But within numpy itself there isn't an alternative. I don't think the algebra allows us to do the sum earlier. Using numba or numexpr another question.

numpy.dot as part of a vectorized operation

Say I have three numpy arrays and I want to perform a calculation over them:
a = np.array([[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],[1,2,3,4,5,6,7],
[1,2,3,4,5,6,7]]) #shape is (5,7)
b = np.array([[11],[12],[11],[12],[11]]) #shape is (5,1)
c = np.array([[10],[20],[30],[40],[50],[60],[70]]) #shape is (5,1)
The calculation is: 10 + (b(rows) * (c . a(rows)))
Where c . a is the dot product of C and the row of a.
By rows, I mean doing it as a vector where I need my result to be (7,1) (one row per each column I have on a)
I'm trying to do something like:
result = 10 + (b[:][:] * (np.dot(c.T, a[:]) + b))
But this fails the np.dot operation with shapes being misaligned for that numpy.dot operation. I'm trying to figure out how to perform the calculation above as a one-liner (no for loops) in a way that Python will interpret the vectorized operation, especially for that np.dot part.
Any hints?
Thanks for your time
EDIT: this is a for loop that solves my problem. I'd like to replace that for loop with one Python line.
iBatchSize = a.shape[0]
iFeatureCount = a.shape[1]
result = np.zeros((iBatchSize,1))
for i in range(iBatchSize):
for j in range(iFeatureCount):
result [i] = 10 + (b[i][0] * (np.dot(c.T, a[i]) + b))
EDIT 2: Corrected array a with the correct array
EDIT 3: Corrected expected shape for result
In [31]: a = np.array([[1,2],[2,3],[3,4],[4,5],[5,6],[6,7],[7,8]]) #shape is (5,7)
...: b = np.array([[11],[12],[11],[12],[11]]) #shape is (5,1)
...: c = np.array([[10],[20],[30],[40],[50],[60],[70]]) #shape is (7,1)
In [32]: a.shape, b.shape, c.shape
Out[32]: ((7, 2), (5, 1), (7, 1))
a.shape does not match the comment.
In [33]: iBatchSize = a.shape[0]
...: iFeatureCount = a.shape[1]
...:
...: result = np.zeros((iBatchSize,1))
...:
...: for i in range(iBatchSize):
...: for j in range(iFeatureCount):
...: result [i] = 10 + (b[i][0] * (np.dot(c.T, a[i]) + b))
...:
Traceback (most recent call last):
File "<ipython-input-33-717691add3dd>", line 8, in <module>
result [i] = 10 + (b[i][0] * (np.dot(c.T, a[i]) + b))
File "<__array_function__ internals>", line 6, in dot
ValueError: shapes (1,7) and (2,) not aligned: 7 (dim 1) != 2 (dim 0)
np.dot is raising that error. It expects the last of first arg to match with the 2nd to the last (or only) of second arg:
In [34]: i
Out[34]: 0
In [35]: c.T.shape
Out[35]: (1, 7)
In [37]: a[i].shape
Out[37]: (2,)
This dot works:
In [38]: np.dot(c.T,a).shape # (1,7) with (7,2) => (1,2)
Out[38]: (1, 2)
====
With the correct a,
10 + (b[i][0] * (np.dot(c.T, a[i]) + b))
is (5,1) array (because of the +b), which can't be put in result[i].
===
a simple dot of a and c produces a (5,1) which can be combined with b (either with + or * or both), resulting in a (5,1) array:
In [68]: np.dot(a,c).shape
Out[68]: (5, 1)
In [69]: b*(np.dot(a,c)+b)
Out[69]:
array([[15521],
[16944],
[15521],
[16944],
[15521]])

merge intervals without using loops and classic python code

my problem is to merge intervals where I have the overlapping
example:
input:
[(4,8),(6,10),(11,12),(15,20),(20,25)]
output:
[(4, 10),(11,12), (15, 25)]
input:
([(4,8),(6,10),(11,12),(15,20)])
output:
[(4, 10),(11,12), (15, 20)]
I did it with classic python code(using loops, if conditions)
BUT I want to do it with python libraries (pandas, numpy..) in few lines
is there any suggestions?
Thanks in advance
Assuming that your input tuples are sorted like in the examples, something like this does the job:
p = [(4, 8), (6, 10), (11, 12), (15, 20), (20, 25)]
ind = np.where(np.diff(np.array(p).flatten()) <= 0)[0]
np.delete(p, [ind, ind+1]).reshape(-1, 2)
output:
array([[ 4, 10],
[11, 12],
[15, 25]])
Then you can convert it to [(4, 10), (11, 12), (15, 25)] using e.g. list(map(tuple, ...)).
Edit: the above works only if each tuple (x_i, y_i) is such that x_i <= x_{i+1} and y_i <= y_{i+1} for all i's, as in the original examples.
To make it work with the only condition x_i <= y_i for all i, you have to preprocess the list:
# Example from comments (modified by subtracting the min value and removing duplicates)
p = [(0, 90), (72, 81), (87, 108), (459, 606)]
p = list(zip(sorted([tup[0] for tup in p]), sorted([tup[1] for tup in p])))
ind = np.where(np.diff(np.array(p).flatten()) <= 0)[0]
ind = ind[ind % 2 == 1] # this is needed for cases when x_i = y_i
np.delete(p, [ind, ind+1]).reshape(-1, 2)
Output:
array([[ 0, 108],
[459, 606]])
I'm not sure this is necessarily the best option, since it takes O(max_interval_value * num_intervals) time and memory, but this is a straightforward implementation with NumPy:
import numpy as np
def simplify_intervals(intervals):
# Make intervals into an array
intervals = np.asarray(intervals)
# Make array for zero to the greatest interval end (plus bounds values
r = np.arange(intervals[:, 1].max() + 3)[:, np.newaxis]
# Check what elements of the array are within each interval
m = (r >= intervals[:, 0] + 1) & (r <= intervals[:, 1] + 1)
# Collapse belonging test for each value
ind = m.any(1).astype(np.int8)
# Find where the belonging test changes
d = np.diff(ind)
# Find interval bounds
start = np.where(d > 0)[0]
end = np.where(d < 0)[0] - 1
# Make final intervals array
return np.stack((start, end), axis=1)
print(simplify_intervals([(4, 8), (6, 10), (11, 12), (15, 20), (20, 25)]))
# [[ 4 12]
# [15 25]]
print(simplify_intervals(([(4,8),(6,10),(11,12),(15,20)])))
# [[ 4 12]
# [15 20]]
Note: This assumes positive interval values. It could be adapted to support negative ranges, and actually optimized a bit to only consider values from the smallest one to the largest one.
EDIT:
If you want to use this method for large number of intervals or bounds, you may benefit from using Numba instead:
import numpy as np
import numba as nb
#nb.njit
def simplify_intervals_nb(intervals):
n = 0
for _, end in intervals:
n = max(n, end)
r = np.arange(n + 3)
m = np.zeros(n + 3, dtype=np.bool_)
for start, end in intervals:
m |= (r >= start + 1) & (r <= end + 1)
ind = m.astype(np.int8)
# Find where the belonging test changes
d = np.diff(ind)
# Find interval bounds
start = np.where(d > 0)[0]
end = np.where(d < 0)[0] - 1
# Make final intervals array
return np.stack((start, end), axis=1)
Quick test in IPython:
import random
random.seed(100)
start = [random.randint(0, 10000) for _ in range(300)]
end = start = [s + random.randint(0, 3000) for s in start]
intervals = list(zip(start, end))
print(np.all(simplify_intervals(intervals) == simplify_intervals_nb(intervals)))
# True
%timeit simplify_intervals(intervals)
# 15.2 ms ± 179 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit simplify_intervals_nb(intervals)
# 9.54 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Numpy select matrix specified by a matrix of indices, from multidimensional array

I have a numpy array a of size 5x5x4x5x5. I have another matrix b of size 5x5. I want to get a[i,j,b[i,j]] for i from 0 to 4 and for j from 0 to 4. This will give me a 5x5x1x5x5 matrix. Is there any way to do this without just using 2 for loops?
Let's think of the matrix a as 100 (= 5 x 5 x 4) matrices of size (5, 5). So, if you could get a liner index for each triplet - (i, j, b[i, j]) - you are done. That's where np.ravel_multi_index comes in. Following is the code.
import numpy as np
import itertools
# create some matrices
a = np.random.randint(0, 10, (5, 5, 4, 5, 5))
b = np.random(0, 4, (5, 5))
# creating all possible triplets - (ind1, ind2, ind3)
inds = list(itertools.product(range(5), range(5)))
(ind1, ind2), ind3 = zip(*inds), b.flatten()
allInds = np.array([ind1, ind2, ind3])
linearInds = np.ravel_multi_index(allInds, (5,5,4))
# reshaping the input array
a_reshaped = np.reshape(a, (100, 5, 5))
# selecting the appropriate indices
res1 = a_reshaped[linearInds, :, :]
# reshaping back into desired shape
res1 = np.reshape(res1, (5, 5, 1, 5, 5))
# verifying with the brute force method
res2 = np.empty((5, 5, 1, 5, 5))
for i in range(5):
for j in range(5):
res2[i, j, 0] = a[i, j, b[i, j], :, :]
print np.all(res1 == res2) # should print True
There's np.take_along_axis exactly for this purpose -
np.take_along_axis(a,b[:,:,None,None,None],axis=2)

Sums of subarrays

I have a 2d array of integers and I want to sum up 2d sub arrays of it. Both arrays can have arbitrary dimensions, although we can assume that the subarray will be orders of magnitudes smaller than the total array.
The reference implementation in python is trivial:
def sub_sums(arr, l, m):
result = np.zeros((len(arr) // l, len(arr[0]) // m))
rows = len(arr) // l * l
cols = len(arr[0]) // m * m
for i in range(rows):
for j in range(cols):
result[i // l, j // m] += arr[i, j]
return result
The question is how I do this best using numpy, hopefully without any looping in python at all. For 1d arrays cumsum and r_ would work and I could use that with a bit of looping to implement a solution for 2d, but I'm still learning numpy and I'm almost certain there's some cleverer way.
Example output:
arr = np.asarray([range(0, 5),
range(4, 9),
range(8, 13),
range(12, 17)])
result = sub_sums(arr, 2, 2)
gives:
[[ 0 1 2 3 4]
[ 4 5 6 7 8]
[ 8 9 10 11 12]
[12 13 14 15 16]]
[[ 10. 18.]
[ 42. 50.]]
There is a blockshaped function which does something rather close to what you want:
In [81]: arr
Out[81]:
array([[ 0, 1, 2, 3, 4],
[ 4, 5, 6, 7, 8],
[ 8, 9, 10, 11, 12],
[12, 13, 14, 15, 16]])
In [82]: blockshaped(arr[:,:4], 2,2)
Out[82]:
array([[[ 0, 1],
[ 4, 5]],
[[ 2, 3],
[ 6, 7]],
[[ 8, 9],
[12, 13]],
[[10, 11],
[14, 15]]])
In [83]: blockshaped(arr[:,:4], 2,2).shape
Out[83]: (4, 2, 2)
Once you have the "blockshaped" array, you can obtain the desired result by reshaping (so the numbers in one block are strung out along a single axis) and then calling the sum method on that axis.
So, with a slight modification of the blockshaped function, you can define sub_sums like this:
import numpy as np
def sub_sums(arr, nrows, ncols):
h, w = arr.shape
h = (h // nrows)*nrows
w = (w // ncols)*ncols
arr = arr[:h,:w]
return (arr.reshape(h // nrows, nrows, -1, ncols)
.swapaxes(1, 2)
.reshape(h // nrows, w // ncols, -1).sum(axis=-1))
arr = np.asarray([range(0, 5),
range(4, 9),
range(8, 13),
range(12, 17)])
print(sub_sums(arr, 2, 2))
yields
[[10 18]
[42 50]]
Edit: Ophion provides a nice improvement -- use np.einsum instead of reshaping before summing:
def sub_sums_ophion(arr, nrows, ncols):
h, w = arr.shape
h = (h // nrows)*nrows
w = (w // ncols)*ncols
arr = arr[:h,:w]
return np.einsum('ijkl->ik', arr.reshape(h // nrows, nrows, -1, ncols))
In [105]: %timeit sub_sums(arr, 2, 2)
10000 loops, best of 3: 112 µs per loop
In [106]: %timeit sub_sums_ophion(arr, 2, 2)
10000 loops, best of 3: 76.2 µs per loop
Here is the simpler way:
In [160]: import numpy as np
In [161]: arr = np.asarray([range(0, 5),
range(4, 9),
range(8, 13),
range(12, 17)])
In [162]: np.add.reduceat(arr, [0], axis=1)
Out[162]:
array([[10],
[30],
[50],
[70]])
In [163]: arr
Out[163]:
array([[ 0, 1, 2, 3, 4],
[ 4, 5, 6, 7, 8],
[ 8, 9, 10, 11, 12],
[12, 13, 14, 15, 16]])
In [164]: import numpy as np
In [165]: arr = np.asarray([range(0, 5),
range(4, 9),
range(8, 13),
range(12, 17)])
In [166]: arr
Out[166]:
array([[ 0, 1, 2, 3, 4],
[ 4, 5, 6, 7, 8],
[ 8, 9, 10, 11, 12],
[12, 13, 14, 15, 16]])
In [167]: np.add.reduceat(arr, [0], axis=1)
Out[167]:
array([[10],
[30],
[50],
[70]])
A very small change in your code is to use slicing and perform the sums of the sub-arrays using the sum() method:
def sub_sums(arr, l, m):
result = np.zeros((len(arr) // l, len(arr[0]) // m))
rows = len(arr) // l * l
cols = len(arr[0]) // m * m
for i in range(len(arr) // l):
for j in range(len(arr[0]) // m):
result[i, j] = arr[i*m:(i+1)*m, j*l:(j+1)*l].sum()
return result
Doing some very simple benchmarks shows that this is slower in the 2x2 case, about equal to your approach in the 3x3 case and faster for bigger sub-arrays (sub_sums2 is your version of the code):
In [19]: arr = np.asarray([range(100)] * 100)
In [20]: %timeit sub_sums(arr, 2, 2)
10 loops, best of 3: 21.8 ms per loop
In [21]: %timeit sub_sums2(arr, 2, 2)
100 loops, best of 3: 9.56 ms per loop
In [22]: %timeit sub_sums(arr, 3, 3)
100 loops, best of 3: 9.58 ms per loop
In [23]: %timeit sub_sums2(arr, 3, 3)
100 loops, best of 3: 9.36 ms per loop
In [24]: %timeit sub_sums(arr, 4, 4)
100 loops, best of 3: 5.58 ms per loop
In [25]: %timeit sub_sums2(arr, 4, 4)
100 loops, best of 3: 9.56 ms per loop
In [26]: %timeit sub_sums(arr, 10, 10)
1000 loops, best of 3: 939 us per loop
In [27]: %timeit sub_sums2(arr, 10, 10)
100 loops, best of 3: 9.48 ms per loop
Notice that with 10x10 sub-arrays it's 1000 times faster. In the 2x2 case it's about twice as slow. Your method basically takes always the same time, while my implementation gets faster with bigger sub-arrays.
I'm pretty sure we can avoid using the for loops explicitly (maybe reshaping the array so that it has the sub-arrays as rows?), but I'm not an expert in numpy and it may take some time before I'll be able to find the final solution. However I believe that 3 orders of magnitude are already a nice improvement.

Categories