I have a problem with some numpy stuff. I need a numpy array to behave in an unusual manner by returning a slice as a view of the data I have sliced, not a copy. So heres an example of what I want to do:
Say we have a simple array like this:
a = array([1, 0, 0, 0])
I would like to update consecutive entries in the array (moving left to right) with the previous entry from the array, using syntax like this:
a[1:] = a[0:3]
This would get the following result:
a = array([1, 1, 1, 1])
Or something like this:
a[1:] = 2*a[:3]
# a = [1,2,4,8]
To illustrate further I want the following kind of behaviour:
for i in range(len(a)):
if i == 0 or i+1 == len(a): continue
a[i+1] = a[i]
Except I want the speed of numpy.
The default behavior of numpy is to take a copy of the slice, so what I actually get is this:
a = array([1, 1, 0, 0])
I already have this array as a subclass of the ndarray, so I can make further changes to it if need be, I just need the slice on the right hand side to be continually updated as it updates the slice on the left hand side.
Am I dreaming or is this magic possible?
Update: This is all because I am trying to use Gauss-Seidel iteration to solve a linear algebra problem, more or less. It is a special case involving harmonic functions, I was trying to avoid going into this because its really not necessary and likely to confuse things further, but here goes.
The algorithm is this:
while not converged:
for i in range(len(u[:,0])):
for j in range(len(u[0,:])):
# skip over boundary entries, i,j == 0 or len(u)
u[i,j] = 0.25*(u[i-1,j] + u[i+1,j] + u[i, j-1] + u[i,j+1])
Right? But you can do this two ways, Jacobi involves updating each element with its neighbours without considering updates you have already made until the while loop cycles, to do it in loops you would copy the array then update one array from the copied array. However Gauss-Seidel uses information you have already updated for each of the i-1 and j-1 entries, thus no need for a copy, the loop should essentially 'know' since the array has been re-evaluated after each single element update. That is to say, every time we call up an entry like u[i-1,j] or u[i,j-1] the information calculated in the previous loop will be there.
I want to replace this slow and ugly nested loop situation with one nice clean line of code using numpy slicing:
u[1:-1,1:-1] = 0.25(u[:-2,1:-1] + u[2:,1:-1] + u[1:-1,:-2] + u[1:-1,2:])
But the result is Jacobi iteration because when you take a slice: u[:,-2,1:-1] you copy the data, thus the slice is not aware of any updates made. Now numpy still loops right? Its not parallel its just a faster way to loop that looks like a parallel operation in python. I want to exploit this behaviour by sort of hacking numpy to return a pointer instead of a copy when I take a slice. Right? Then every time numpy loops, that slice will 'update' or really just replicate whatever happened in the update. To do this I need slices on both sides of the array to be pointers.
Anyway if there is some really really clever person out there that awesome, but I've pretty much resigned myself to believing the only answer is to loop in C.
Late answer, but this turned up on Google so I probably point to the doc the OP wanted. Your problem is clear: when using NumPy slices, temporaries are created. Wrap your code in a quick call to weave.blitz to get rid of the temporaries and have the behaviour your want.
Read the weave.blitz section of PerformancePython tutorial for full details.
accumulate is designed to do what you seem to want; that is, to proprigate an operation along an array. Here's an example:
from numpy import *
a = array([1,0,0,0])
a[1:] = add.accumulate(a[0:3])
# a = [1, 1, 1, 1]
b = array([1,1,1,1])
b[1:] = multiply.accumulate(2*b[0:3])
# b = [1 2 4 8]
Another way to do this is to explicitly specify the result array as the input array. Here's an example:
c = array([2,0,0,0])
multiply(c[:3], c[:3], c[1:])
# c = [ 2 4 16 256]
Just use a loop. I can't immediately think of any way to make the slice operator behave the way you're saying you want it to, except maybe by subclassing numpy's array and overriding the appropriate method with some sort of Python voodoo... but more importantly, the idea that a[1:] = a[0:3] should copy the first value of a into the next three slots seems completely nonsensical to me. I imagine that it could easily confuse anyone else who looks at your code (at least the first few times).
It is not the correct logic.
I'll try to use letters to explain it.
Image array = abcd with a,b,c,d as elements.
Now, array[1:] means from the element in position 1 (starting from 0) on.
In this case:bcd and array[0:3] means from the character in position 0 up to the third character (the one in position 3-1) in this case: 'abc'.
Writing something like:
array[1:] = array[0:3]
means: replace bcd with abc
To obtain the output you want, now in python, you should use something like:
a[1:] = a[0]
It must have something to do with assigning a slice. Operators, however, as you may already know, do follow your expected behavior:
>>> a = numpy.array([1,0,0,0])
>>> a[1:]+=a[:3]
>>> a
array([1, 1, 1, 1])
If you already have zeros in your real-world problem where your example does, then this solves it. Otherwise, at added cost, set them to zero either by multiplying by zero or assigning to zero, (whichever is faster)
edit:
I had another thought. You may prefer this:
numpy.put(a,[1,2,3],a[:3])
Numpy must be checking if the target array is the same as the input array when doing the setkey call. Luckily, there are ways around it. First, I tried using numpy.put instead
In [46]: a = numpy.array([1,0,0,0])
In [47]: numpy.put(a,[1,2,3],a[0:3])
In [48]: a
Out[48]: array([1, 1, 1, 1])
And then from the documentation of that, I gave using flatiters a try (a.flat)
In [49]: a = numpy.array([1,0,0,0])
In [50]: a.flat[1:] = a[0:3]
In [51]: a
Out[51]: array([1, 1, 1, 1])
But this doesn't solve the problem you had in mind
In [55]: a = np.array([1,0,0,0])
In [56]: a.flat[1:] = 2*a[0:3]
In [57]: a
Out[57]: array([1, 2, 0, 0])
This fails because the multiplication is done before the assignment, not in parallel as you would like.
Numpy is designed for repeated application of the exact same operation in parallel across an array. To do something more complicated, unless you can find decompose it in terms of functions like numpy.cumsum and numpy.cumprod, you'll have to resort to something like scipy.weave or writing the function in C. (See the PerfomancePython page for more details.) (Also, I've never used weave, so I can't guarantee it will do what you want.)
You could have a look at np.lib.stride_tricks.
There is some information in these excellent slides:
http://mentat.za.net/numpy/numpy_advanced_slides/
with stride_tricks starting at slide 29.
I'm not completely clear on the question though so can't suggest anything more concrete - although I would probably do it in cython or fortran with f2py or with weave. I'm liking fortran more at the moment because by the time you add all the required type annotations in cython I think it ends up looking less clear than the fortran.
There is a comparison of these approaches here:
www. scipy. org/ PerformancePython
(can't post more links as I'm a new user)
with an example that looks similar to your case.
In the end I came up with the same problem as you. I had to resort to use Jacobi iteration and weaver:
while (iter_n < max_time_steps):
expr = "field[1:-1, 1:-1] = (field[2:, 1:-1] "\
"+ field[:-2, 1:-1]+"\
"field[1:-1, 2:] +"\
"field[1:-1, :-2] )/4."
weave.blitz(expr, check_size=0)
#Toroidal conditions
field[:,0] = field[:,self.flow.n_x - 2]
field[:,self.flow.n_x -1] = field[:,1]
iter_n = iter_n + 1
It works and is fast, but is not Gauss-Seidel, so convergence can be a bit tricky. The only option of doing Gauss-Seidel as a traditional loop with indexes.
i would suggest cython instead of looping in c. there might be some fancy numpy way of getting your example to work using a lot of intermediate steps... but since you know how to write it in c already, just write that quick little bit as a cython function and let cython's magic make the rest of the work easy for you.
Related
Let's suppose we have a matrix and a list of indexes:
adj_mat = np.array([[1,2,3],
[4,5,6],
[7,8,9]])
indexes = [0,2]
What I want is to sum the rows and columns corresponding to the sub matrix we get by the intersection of the rows and columns of the indexes list. In this case it would be:
sub_matrix = ([[1,3]
[7,9]])
result_rows = [4,16]
result_columns = [8,12]
However, I do this calculation rather a lot of times with the same original matrix and different indexes lists, so I am looking for an efficent solution without creating the sub matrix each iteration. My solution so far is (and for columns respectively):
def sum_rows(matrix, indexes):
sum_r = [0]*len(indexes)
for i in range(len(indexes)):
for j in indexes:
sum_r[i] += matrix.item(indexes[i], j)
return sum_r
I'm looking for a more efficient algorithm as I remember there is a method which looks like this that sums all rows (or columns?) in the indexes:
matrix.sum(:, indexes)
matrix.sum(indexes, indexes)
I assume what I need is the second line, if it exists. I tried to google it, with or without numpy, but couldn't find the right syntax.
Is there a solution as I described here but I'm just using the wrong syntax? Or any other suggestions for improvement?
IIUC:
import numpy as np
adj_mat = np.array([[1,2,3],
[4,5,6],
[7,8,9]])
indexes = np.array([1, 3]) - 1
sub_matrix = adj_mat[np.ix_(indexes, indexes)]
result_rows, result_columns = sub_matrix.sum(axis=1), sub_matrix.sum(axis=0)
Result:
array([ 4, 16]) # result_rows
array([ 8, 12]) # result_columns
So assuming you made a mistake and you meant indexes = [0,2] and sub_matrix = [[1,3], [7,9]], then this should do what you want
def sum_sub(matrix, indices):
"""
Returns the sum of each row and column (as a tuple)
for each index in indices (as an array)
"""
# note that this sub matrix does not copy any data from matrix,
# it is a "view" which simply holds a reference to matrix
sub_mat = matrix[np.ix_(indices, indices)]
return sub_mat.sum(axis=1), sub_mat.sum(axis=0)
sum_row, sum_col = sum_sub(np.arange(1,10).reshape((3,3)), [0,2])
The results of this are
sum_col # --> [ 8 12]
sum_row # --> [ 4 16]
Since the point of efficiency was brought up in the question, a little further analysis should probably be done.
First and foremost, the code looks like code to find a matrix inverse using the adjoint matrix. Unless that particular method is important to the project, the standard np.linalg.inv() is almost certainly going to be faster than anything we cook up here. Moreover, in many applications you can get away with solving a system of linear equations rather than finding an inverse and multiplying by it, cutting run times in half or more again.
Second, any discussion of efficient numpy code needs to address views as opposed to copies. Memory allocation, writing to memory, and memory deallocation are all extremely expensive operations when compared with standard floating point arithmetic. That's not to say that they're slow, but you can notice an order of magnitude or two of difference in the speed of code memory efficient code vs nearly anything else. That's the entire premise behind the fastest implementation of persistent homology calculations I know of, among other things.
All of the other answers (at the time of writing) create a copy of the data they're working with, explicitly storing that information in a new variable sub_matrix. It isn't possible to create every fancy-indexed matrix with a copy, but oftentimes equivalent operations can be performed.
For example, if this really is a set of computations on adjoint matrices so that your indexes variable consists of all but one of the available indices (in your example, all but the middle index), then instead of explicitly summing over all the intended indices, we can sum over all indices and subtract the one we don't care about. The effect is that all the intermediate matrices are views rather than copies, preventing the expensive memory allocations. On my machine, this is twice as fast for the tiny 3x3 example given and 10x as fast for 500x500 matrices.
bad_row = 1
bad_col = 1
result_rows = (np.sum(adj_mat, axis=1)-adj_mat[:,bad_col])[np.arange(adj_mat.shape[0])!=bad_row]
result_cols = (np.sum(adj_mat, axis=0)-adj_mat[bad_row,:])[np.arange(adj_mat.shape[1])!=bad_col]
Of course, it's even faster if you can use slices to represent whatever you're doing and you don't have to work around the problem with extra operations as I did, but the example you gave doesn't easily permit slices.
Suppose you have:
data = np.array([
[ 0, 1 ],
[ 0, 1 ]
], dtype='int64')
calling
data[:, 1]
yields
[1. 1.]
however
data[(slice(None,None,None), slice(1,2,None))]
yields
[[1.]
[1.]]
how come?
How would I explicitly write the slice object to get the equivalent of [:, 1]?
Indexing with a slice object has different semantics from indexing with an integer. Indexing with a single integer collapses / removes the corresponding dimension, whereas indexing with a slice never does so.
If you want to have the behavior of indexing with a single integer along a certain axis, you could emulate it with a slice and some reshaping logic after that; but the least convoluted solution would be to just replace that slice with the corresponding integer.
Alternatively, there is np.squeeze. Absolutely never use it without the axis keyword since that is an guaranteed recipe for code that produces unintentional behavior. But squeezing only those axes that you sliced will have the effect that you seem to be after.
If you will permit me a mini-rant: On a higher level, I suspect that what you are after is very much a numpy-antipattern though. If I had to formulate the first rule of numpy, it is not squeeze axes just because they happen to be singleton. Embrace ndarrays and realize that more axes does not make your code more complicated, but it allows you to express meaningfull semantics of your data. The fact that you are slicing this axis in the first place suggests that the size of this axis isnt fundamentally 1; it just happens to be so in a particular case. Squeezing out that singleton dimension is going to make it near impossible for any downstream code to be written in a numpythonic and bugfree manner. If you absolutely must squeeze an array, like say before passing it to a plot function, treat it like you would a stack allocated variable in C++ and don't let any reference to it leak out of the current scope, because you will have garbled the semantics of that array, and downstream code no longer knows 'what is it looking at'.
Seems like you need to remove singleton dimensions yourself after using a slice object. Or there is some underlying implementation detail that I don't understand.
import numpy as np
data = np.array([
[ 0, 1 ],
[ 0, 1 ]
], dtype='int64')
print("desired result")
print(data[:, 1])
print("first try")
print(data[(slice(None,None,None), slice(1,2,None))])
print("solution")
print(data[[slice(None,None,None), slice(1,2,None)]].squeeze())
output:
desired result
[1 1]
first try
[[1]
[1]]
solution
[1 1]
Python has a built in functionality for checking the validity of entire slices: slice.indices. Is there something similar that is built-in for individual indices?
Specifically, I have an index, say a = -2 that I wish to normalize with respect to a 4-element list. Is there a method that is equivalent to the following already built in?
def check_index(index, length):
if index < 0:
index += length
if index < 0 or index >= length:
raise IndexError(...)
My end result is to be able to construct a tuple with a single non-None element. I am currently using list.__getitem__ to do the check for me, but it seems a little awkward/overkill:
items = [None] * 4
items[a] = 'item'
items = tuple(items)
I would like to be able to do
a = check_index(a, 4)
items = tuple('item' if i == a else None for i in range(4))
Everything in this example is pretty negotiable. The only things that are fixed is that I am getting a in a way that can have all of the problems that an arbitrary index can have and that the final result has to be a tuple.
I would be more than happy if the solution used numpy and only really applied to numpy arrays instead of Python sequences. Either one would be perfect for the application I have in mind.
If I understand correctly, you can use range(length)[index], in your example range(4)[-2]. This properly handles negative and out-of-bounds indices. At least in recent versions of Python, range() doesn't literally create a full list so this will have decent performance even for large arguments.
If you have a large number of indices to do this with in parallel, you might get better performance doing the calculation with Numpy vectorized arithmetic, but I don't think the technique with range will work in that case. You'd have to manually do the calculation using the implementation in your question.
There is a function called numpy.core.multiarray.normalize_axis_index which does exactly what I need. It is particularly useful to be because the implementation I had in mind was for numpy array indexing:
from numpy.core.multiarray import normalize_axis_index
>>> normalize_axis_index(3, 4)
3
>>> normalize_axis_index(-3, 4)
1
>>> normalize_axis_index(-5, 4)
...
numpy.core._internal.AxisError: axis -5 is out of bounds for array of dimension 4
The function was added in version 1.13.0. The source for this function is available here, and the documentation source is here.
I have an operation that I'm doing commonly which I'm calling a "jagged-slice" because I don't know the real name for it. It's best explained by example:
a = np.random.randn(50, 10)
entries_of_interest = np.random.randint(10, size = 50) # Vector of 50 indices between 0 and 9
# Now I want the values contained in each row of a at the corresponding index in "entries of interest"
jagged_slice_of_a = a[np.arange(a.shape[0]), entries_of_interest]
# jagged_slice_of_a is now a vector with 50 elements. Good.
Only problem is it's a bit cumbersome to do this a[np.arange(a.shape[0]), entries_of_interest] indexing (it seems silly to have to construct the "np.arange(a.shape[0])" just for the sake of this). I'd like something like the : operator for this, but the : does something else. Is there any more succinct way to do this operation?
Best answer:
No, there is no better way with native numpy. You can create a helper function for this if you want.
This is combersome only in the sense that it requires more typing for a task that seems so simple to you.
a[np.arange(a.shape[0]), entries_of_interest]
But as you note, the syntactically simpler a[:, entries_of_interest] has another interpretation in numpy. Choosing a subset of the columns of an array is a more common task that choosing one (random) item from each row.
Your case is just a specialized instance of
a[I, J]
where I and J are 2 arrays of the same shape. In the general case entries_of_interest could be smaller than a.shape[0] (not all the rows), or larger (several items from some rows), or even be 2d. It could even select certain elements repeatedly.
I have found in other SO questions that performing this kind of element selection is faster when applied to a.flat. But that requires some math to construct the I*n+J kind of flat index.
With your special knowledge of J, constructing I seems extra work, but numpy can't make that kind of assumption. If this selection was more common someone could write a function that wraps your expression
def peter_selection(a,I):
# check the a.shape[0]==I.shape[0]
return a[np.arange(a.shape[0]), I]
I think that your current method is probably the best way.
You can also use choose for this kind of selection. This is syntactically clearer, but is trickier to get right and potentially more limited. The equivalent with this method would be:
entries_of_interest.choose(a.T)
The elements in jagged_slice_of_a are the diagonal elements of a[:,entries_of_interest]
A slightly less cumbersome way of doing this would therefore be to use np.diagonal to extract them.
jagged_slice_of_a = a[:, entries_of_interest].diagonal()
I am using the matlab python engine to access data from my matlab projects in python. This works quite well, but I do have a problem to efficiently use the matlab arrays in python. For example I want an array from matlab, I use (eng stands for the matlab engine):
x = eng.eval(arg)
What I get is a matlab.double array which looks like this:
matlab.double([[1.0,2.0],[4.0,3.0],[2.0,5.0]])
Doesn't look too bad. Let's try to catch an entry:
>>> x[2][1]
5.0
Yay! How about a full row?
>>> x[0]
matlab.double([1.0,2.0])
.. well, at least it's a row, but I'm not found of the "matlab.double" prefix.. how about a column?
>>> x[:][0]
matlab.double([1.0,2.0])
Wait, what? I try to select all rows, and then the first element from each, but instead I get just the row. And in fact:
x[i] == x[:][i] == x[i][:]
So, basically two problems arise: Selecting a row brings me the unwanted "matlab.double" prefix, and selecting a column (personally more important) doesn't work at all. Any suggestions here?
What i did now, was to reread every value for itself and safe it into a new python array:
c = [[] for _ in range(len(x[0]))]
for i in range(len(x[0])):
for j in range(len(x)):
c[i].append(x[j][i])
This works, but there is one catch: It extremely slows the code down with growing data. And of course, it doesn't feel beautiful to reread every single entry, if they are in fact already stored in x.
Thanks for reading through this long text, I just assume I explain a little bit more since probably only a few people work with the python matlab engine.
A more generic approach, which I'm using productively now, also allowing me to round the values if needed (careful though, rounding takes more calculation time):
from math import log10, floor
def convert(self, x, ROUND=0):
conv = []
for _ in range(x.size[0]):
lst = x._data[_::x.size[0]].tolist()
if ROUND is not 0:
lst = [self.round_sig(elem, ROUND) if elem != 0 and
elem == elem else elem for elem in lst]
conv.append(lst)
return conv
def round_sig(self, x, sig=2):
return round(x, sig-int(floor(log10(abs(x))))-1)
With the input from the discussion below I managed to find a reasonable way:
c = []
for _ in range(x.size[1]):
c.append(x._data[_*x.size[0]:_*x.size[0]+x.size[0]].tolist())
return c
This way the command takes around 0.009s instead of 0.045s from before. Using the zip function was around 0.022s. Thanks alot, the code runs 5 times faster now!
For clarification: x.size[i] gives me the size of the matlab.double array. x._data gives an one dimensional array of type:
array('d', [1.0,2.0,4.0 ... ])
Therefore it includes a tolist() method to get an actual list, which I needed.
There's a much more concise (and Pythonic) way to extract a column using list comprehension, and it also gives you the output immediately as a list of Python floats:
>>> x = matlab.double([[1.0,2.0],[4.0,3.0],[2.0,5.0]])
>>> [d[0] for d in x]
[1.0, 4.0, 2.0]