Increasing performance of highly repeated numpy array index operations

Increasing performance of highly repeated numpy array index operations - python

In my program code I've got numpy value arrays and numpy index arrays. Both are preallocated and predefined during program initialization.
Each part of the program has one array values on which calculations are performed, and three index arrays idx_from_exch, idx_values and idx_to_exch. There is on global value array to exchange the values of several parts: exch_arr.
The index arrays have between 2 and 5 indices most of the times, seldomly (most probably never) more indices are needed. dtype=np.int32, shape and values are constant during the whole program run. Thus I set ndarray.flags.writeable=False after initialization, but this is optional. The index values of the index arrays idx_values and idx_to_exch are sorted in numerical order, idx_source may be sorted, but there is no way to define that. All index arrays corresponding to one value array/part have the same shape.
The values arrays and also the exch_arr usually have between 50 and 1000 elements. shape and dtype=np.float64 are constant during the whole program run, the values of the arrays change in each iteration.
Here are the example arrays:
import numpy as np
import numba as nb
values = np.random.rand(100) * 100 # just some random numbers
exch_arr = np.random.rand(60) * 3 # just some random numbers
idx_values = np.array((0, 4, 55, -1), dtype=np.int32) # sorted but varying steps
idx_to_exch = np.array((7, 8, 9, 10), dtype=np.int32) # sorted and constant steps!
idx_from_exch = np.array((19, 4, 7, 43), dtype=np.int32) # not sorted and varying steps
The example indexing operations look like this:
values[idx_values] = exch_arr[idx_from_exch] # get values from exchange array
values *= 1.1 # some inplace array operations, this is just a dummy for more complex things
exch_arr[idx_to_exch] = values[idx_values] # pass some values back to exchange array
Since these operations are being applied once per iteration for several million iterations, speed is crucial. I've been looking into many different ways of increasing indexing speed in my previous question, but forgot to be specific enough considering my application (especially getting values by indexing with constant index arrays and passing them to another indexed array).
The best way to do it seems to be fancy indexing so far. I'm currently also experimenting with numba guvectorize, but it seems that it is not worth the effort since my arrays are quite small.
memoryviews would be nice, but since the index arrays do not necessarily have consistent steps, I know of no way to use memoryviews.
So is there any faster way to do repeated indexing? Some way of predefining memory address arrays for each indexing operation, as dtype and shape are always constant? ndarray.__array_interface__ gave me a memory address, but I wasn't able to use it for indexing. I thought about something like:
stride_exch = exch_arr.strides[0]
mem_address = exch_arr.__array_interface__['data'][0]
idx_to_exch = idx_to_exch * stride_exch + mem_address
Is that feasible?
I've also been looking into using strides directly with as_strided, but as far as I know only consistent strides are allowed and my problem would require inconsistent strides.
Any help is appreciated!
Thanks in advance!
edit:
I just corrected a massive error in my example calculation!
The operation values = values * 1.1 changes the memory address of the array. All my operations in the program code are layed out to not change the memory address of the arrays, because alot of other operations rely on using memoryviews. Thus I replaced the dummy operation with the correct in-place operation: values *= 1.1

One solution to getting round expensive fancy indexing using numpy boolean arrays is using numba and skipping over the False values in your numpy boolean array.
Example implementation:
#numba.guvectorize(['float64[:], float64[:,:], float64[:]'], '(n),(m,n)->(m)', nopython=True, target="cpu")
def test_func(arr1, arr2, inds, res):
for i in range(arr1.shape[0]):
if not inds[i]:
continue
for j in range(arr2.shape[0]):
res[j, i] = arr1[i] + arr2[j, i]
Of course, play around with the numpy data types (smaller byte sizes will run faster) and target being "cpu" or "parallel".

Related

Python - sum the intersection of rows and columns in a matrix

Let's suppose we have a matrix and a list of indexes:
adj_mat = np.array([[1,2,3],
[4,5,6],
[7,8,9]])
indexes = [0,2]
What I want is to sum the rows and columns corresponding to the sub matrix we get by the intersection of the rows and columns of the indexes list. In this case it would be:
sub_matrix = ([[1,3]
[7,9]])
result_rows = [4,16]
result_columns = [8,12]
However, I do this calculation rather a lot of times with the same original matrix and different indexes lists, so I am looking for an efficent solution without creating the sub matrix each iteration. My solution so far is (and for columns respectively):
def sum_rows(matrix, indexes):
sum_r = [0]*len(indexes)
for i in range(len(indexes)):
for j in indexes:
sum_r[i] += matrix.item(indexes[i], j)
return sum_r
I'm looking for a more efficient algorithm as I remember there is a method which looks like this that sums all rows (or columns?) in the indexes:
matrix.sum(:, indexes)
matrix.sum(indexes, indexes)
I assume what I need is the second line, if it exists. I tried to google it, with or without numpy, but couldn't find the right syntax.
Is there a solution as I described here but I'm just using the wrong syntax? Or any other suggestions for improvement?

IIUC:
import numpy as np
adj_mat = np.array([[1,2,3],
[4,5,6],
[7,8,9]])
indexes = np.array([1, 3]) - 1
sub_matrix = adj_mat[np.ix_(indexes, indexes)]
result_rows, result_columns = sub_matrix.sum(axis=1), sub_matrix.sum(axis=0)
Result:
array([ 4, 16]) # result_rows
array([ 8, 12]) # result_columns

So assuming you made a mistake and you meant indexes = [0,2] and sub_matrix = [[1,3], [7,9]], then this should do what you want
def sum_sub(matrix, indices):
"""
Returns the sum of each row and column (as a tuple)
for each index in indices (as an array)
"""
# note that this sub matrix does not copy any data from matrix,
# it is a "view" which simply holds a reference to matrix
sub_mat = matrix[np.ix_(indices, indices)]
return sub_mat.sum(axis=1), sub_mat.sum(axis=0)
sum_row, sum_col = sum_sub(np.arange(1,10).reshape((3,3)), [0,2])
The results of this are
sum_col # --> [ 8 12]
sum_row # --> [ 4 16]

Since the point of efficiency was brought up in the question, a little further analysis should probably be done.
First and foremost, the code looks like code to find a matrix inverse using the adjoint matrix. Unless that particular method is important to the project, the standard np.linalg.inv() is almost certainly going to be faster than anything we cook up here. Moreover, in many applications you can get away with solving a system of linear equations rather than finding an inverse and multiplying by it, cutting run times in half or more again.
Second, any discussion of efficient numpy code needs to address views as opposed to copies. Memory allocation, writing to memory, and memory deallocation are all extremely expensive operations when compared with standard floating point arithmetic. That's not to say that they're slow, but you can notice an order of magnitude or two of difference in the speed of code memory efficient code vs nearly anything else. That's the entire premise behind the fastest implementation of persistent homology calculations I know of, among other things.
All of the other answers (at the time of writing) create a copy of the data they're working with, explicitly storing that information in a new variable sub_matrix. It isn't possible to create every fancy-indexed matrix with a copy, but oftentimes equivalent operations can be performed.
For example, if this really is a set of computations on adjoint matrices so that your indexes variable consists of all but one of the available indices (in your example, all but the middle index), then instead of explicitly summing over all the intended indices, we can sum over all indices and subtract the one we don't care about. The effect is that all the intermediate matrices are views rather than copies, preventing the expensive memory allocations. On my machine, this is twice as fast for the tiny 3x3 example given and 10x as fast for 500x500 matrices.
bad_row = 1
bad_col = 1
result_rows = (np.sum(adj_mat, axis=1)-adj_mat[:,bad_col])[np.arange(adj_mat.shape[0])!=bad_row]
result_cols = (np.sum(adj_mat, axis=0)-adj_mat[bad_row,:])[np.arange(adj_mat.shape[1])!=bad_col]
Of course, it's even faster if you can use slices to represent whatever you're doing and you don't have to work around the problem with extra operations as I did, but the example you gave doesn't easily permit slices.

Speeding up fancy indexing with numpy

I have two numpy arrays and each has shape of (10000,10000).
One is value array and the other one is index array.
Value=np.random.rand(10000,10000)
Index=np.random.randint(0,1000,(10000,10000))
I want to make a list (or 1D numpy array) by summing all the "Value array" referring the "Index array". For example, for each index i, finding matching array index and giving it to value array as argument
for i in range(1000):
NewArray[i] = np.sum(Value[np.where(Index==i)])
However, This is too slow since I have to do this loop through 300,000 arrays.
I tried to come up with some logical indexing method like
NewArray[Index] += Value[Index]
But it didn't work.
The next thing I tried is using dictionary
for k, v in list(zip(Index.flatten(),Value.flatten())):
NewDict[k].append(v)
and
for i in NewDict:
NewDict[i] = np.sum(NewDict[i])
But it was slow too
Is there any smart way to speed up?

I had two thoughts. First, try masking, it speeds this up by about 4x:
for i in range(1000):
NewArray[i] = np.sum(Value[Index==i])
Alternately, you can sort your arrays to put the values you're adding together in contiguous memory space. Masking or using where() has to gather all your values together each time you call sum on the slice. By front-loading this gathering, you might be able to speed things up considerably:
# flatten your arrays
vals = Value.ravel()
inds = Index.ravel()
s = np.argsort(inds) # these are the indices that will sort your Index array
v_sorted = vals[s].copy() # the copy here orders the values in memory instead of just providing a view
i_sorted = inds[s].copy()
searches = np.searchsorted(i_sorted, np.arange(0, i_sorted[-1] + 2)) # 1 greater than your max, this gives you your array end...
for i in range(len(searches) -1):
st = searches[i]
nd = searches[i+1]
NewArray[i] = v_sorted[st:nd].sum()
This method takes 26 sec on my computer vs 400 using the old way. Good luck. If you want to read more about contiguous memory and performance check this discussion out.

Python: return the row index of the minimum in a matrix

I wanna print the index of the row containing the minimum element of the matrix
my matrix is matrix = [[22,33,44,55],[22,3,4,12],[34,6,4,5,8,2]]
and the code
matrix = [[22,33,44,55],[22,3,4,12],[34,6,4,5,8,2]]
a = np.array(matrix)
buff_min = matrix.argmin(axis = 0)
print(buff_min) #index of the row containing the minimum element
min = np.array(matrix[buff_min])
print(str(min.min(axis=0))) #print the minium of that row
print(min.argmin(axis = 0)) #index of the minimum
print(matrix[buff_min]) # print all row containing the minimum
after running, my result is
1
3
1
[22, 3, 4, 12]
the first number should be 2, because the minimum is 2 in the third list ([34,6,4,5,8,2]), but it returns 1. It returns 3 as minimum of the matrix.
What's the error?

I am not sure which version of Python you are using, i tested it for Python 2.7 and 3.2 as mentioned your syntax for argmin is not correct, its should be in the format
import numpy as np
np.argmin(array_name,axis)
Next, Numpy knows about arrays of arbitrary objects, it's optimized for homogeneous arrays of numbers with fixed dimensions. If you really need arrays of arrays, better use a nested list. But depending on the intended use of your data, different data structures might be even better, e.g. a masked array if you have some invalid data points.
If you really want flexible Numpy arrays, use something like this:
np.array([[22,33,44,55],[22,3,4,12],[34,6,4,5,8,2]], dtype=object)
However this will create a one-dimensional array that stores references to lists, which means that you will lose most of the benefits of Numpy (vector processing, locality, slicing, etc.).
Also, to mention if you can resize your numpy array thing might work, i haven't tested it, but by the concept that should be an easy solution. But i will prefer use a nested list in this case of input matrix

Does this work?
np.where(a == a.min())[0][0]
Note that all rows of the matrix need to contain the same number of elements.

Replace loop with broadcasting in numpy -> memory error

I have an 2D-array (array1), which has an arbitrary number of rows and in the first column I have strictly monotonic increasing numbers (but not linearly), which represent a position in my system, while the second one gives me a value, which represents the state of my system for and around the position in the first column.
Now I have a second array (array2); its range should usually be the same as for the first column of the first array, but does not matter to much, as you will see below.
I am now interested for every element in array2:
1. What is the argument in array1[:,0], which has the closest value to the current element in array2?
2. What is the value (array1[:,1]) of those elements.
As usually array2 will be longer than the number of rows in array1 it is perfectly fine, if I get one argument from array1 more than one time. In fact this is what I expect.
The value from 2. is written in the second and third column, as you will see below.
My striped code looks like this:
from numpy import arange, zeros, absolute, argmin, mod, newaxis, ones
ysize1 = 50
array1 = zeros((ysize1+1,2))
array1[:,0] = arange(ysize1+1)**2
# can be any strictly monotonic increasing array
array1[:,1] = mod(arange(ysize1+1),2)
# in my current case, but could also be something else
ysize2 = (ysize1)**2
array2 = zeros((ysize2+1,3))
array2[:,0] = arange(0,ysize2+1)
# is currently uniformly distributed over the whole range, but does not necessarily have to be
a = 0
for i, array2element in enumerate(array2[:,0]):
a = argmin(absolute(array1[:,0]-array2element))
array2[i,1] = array1[a,1]
It works, but takes quite a lot time to process large arrays. I then tried to implement broadcasting, which seems to work with the following code:
indexarray = argmin(absolute(ones(array2[:,0].shape[0])[:,newaxis]*array1[:,0]-array2[:,0][:,newaxis]),1)
array2[:,2]=array1[indexarray,1] # just to compare the results
Unfortunately now I seem to run into a different problem: I get a memory error on the sizes of arrays I am using in the line of code with the broadcasting.
For small sizes it works, but for larger ones where len(array2[:,0]) is something like 2**17 (and could be even larger) and len(array1[:,0]) is about 2**14. I get, that the size of the array is bigger than the available memory. Is there an elegant way around that or to speed up the loop?
I do not need to store the intermediate array(s), I am just interested in the result.
Thanks!

First lets simplify this line:
argmin(absolute(ones(array2[:,0].shape[0])[:,newaxis]*array1[:,0]-array2[:,0][:,newaxis]),1)
it should be:
a = array1[:, 0]
b = array2[:, 0]
argmin(abs(a - b[:, newaxis]), 1)
But even when simplified, you're creating two large temporary arrays. If a and b have sizes M and N, b - a and abs(...) each create a temporary array of size (M, N). Because you've said that a is monotonically increasing, you can avoid the issue all together by using a binary search (sorted search) which is much faster anyways. Take a look at the answer I wrote to this question a while back. Using the function from this answer, try this:
closest = find_closest(array1[:, 0], array2[:, 0])
array2[:, 2] = array1[closest, 1]

Efficient way for appending numpy array

I will keep it simple.I have a loop that appends new row to a numpy array...what is the efficient way to do this.
n=np.zeros([1,2])
for x in [[2,3],[4,5],[7,6]]
n=np.append(n,x,axis=1)
Now the thing is there is a [0,0] sticking to it so I have to remove it by
del n[0]
Which seems dumb...So please tell me an efficient way to do this.
n=np.empty([1,2])
is even worse it creates an uninitialised value.

A bit of technical explanation for the "why lists" part.
Internally, the problem for a list of unknown length is that it needs to fit in memory somehow regardless of its length. There are essentially two different possibilities:
Use a data structure (linked list, some tree structure, etc.) which makes it possible to allocate memory separately for each new element in a list.
Store the data in a contiguous memory area. This area has to be allocated when the list is created, and it has to be larger than what we initially need. If we get more stuff into the list, we need to try to allocate more memory, preferably at the same location. If we cannot do it at the same location, we need to allocate a bigger block and move all data.
The first approach enables all sorts of fancy insertion and deletion options, sorting, etc. However, it is slower in sequential reading and allocates more memory. Python actually uses the method #2, the lists are stored as "dynamic arrays". For more information on this, please see:
Size of list in memory
What this means is that lists are designed to be very efficient with the use of append. There is very little you can do to speed things up if you do not know the size of the list beforehand.
If you know even the maximum size of the list beforehand, you are probably best off allocating a numpy.array using numpy.empty (not numpy.zeros) with the maximum size and then use ndarray.resize to shrink the array once you have filled in all data.
For some reason numpy.array(l) where l is a list is often slow with large lists, whereas copying even large arrays is quite fast (I just tried to create a copy of a 100 000 000 element array; it took less than 0.5 seconds).
This discussion has more benchmarking on different options:
Fastest way to grow a numpy numeric array
I have not benchmarked the numpy.empty + ndarray.resize combo, but both should be rather microsecond than millisecond operations.

There are three ways to do this, if you already have everything in a list:
data = [[2, 3], [4, 5], [7, 6]]
n = np.array(data)
If you know how big the final array will be:
exp = np.array([2, 3])
n = np.empty((3, 2))
for i in range(3):
n[i, :] = i ** exp
If you don't know how big the final array will be:
exp = np.array([2, 3])
n = []
i = np.random.random()
while i < .9:
n.append(i ** exp)
i = np.random.random()
n = np.array(n)
Just or the record you can start with n = np.empty((0, 2)) but I would not suggest appending to that array in a loop.

You might want to try:
import numpy as np
n = np.reshape([], (0, 2))
for x in [[2,3],[4,5],[7,6]]:
n = np.append(n, [x], axis=0)
Instead of np.append you can also use n = np.vstack([n,x]). I also agree with #Bi Rico that I also would use a list, if n does not need to accessed within the loop.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.