Iterate over last axis of a numpy array - python

Let's say we have a (20, 5) array. We can iterate over each row very pythonically:
import numpy as np
xs = np.array(range(100)).reshape(20, 5)
for x in xs:
print(x)
If we want to iterate over another axis (here in the example, iterate over columns, but I'm looking for a solution for each possible axis in a ndarray), it's less direct, we can use the method from Iterating over arbitrary dimension of numpy.array:
for i in range(xs.shape[-1]):
x = xs[..., i]
print(x)
Is there a more direct way to iterate over another axis, like (pseudo-code):
for x in xs.iterator(axis=-1):
print(x)
?

I think that as_strided from the stride tricks module should do the work here.
It creates a view into the array and not a copy (as stated by the docs).
Here is a simple demonstration of as_stided capabilities:
from numpy.lib.stride_tricks import as_strided
import numpy as np
xs = np.array(range(3 *3 * 4)).reshape(3,3, 4)
for x in xs:
print(x)
output:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]
[[24 25 26 27]
[28 29 30 31]
[32 33 34 35]]
function to iterate over array specific axis:
def iterate_over_axis(arr, axis=0):
strides = arr.strides
strides_ = [strides[axis], *strides[0:axis], *strides[(axis+1):]]
shape = arr.shape
shape_ = [shape[axis], *shape[0:axis], *shape[(axis+1):]]
return as_strided(arr, strides=strides_, shape=shape_)
for x in iterate_over_axis(xs, axis=1):
print(x)
output:
[[ 0 1 2 3]
[12 13 14 15]
[24 25 26 27]]
[[ 4 5 6 7]
[16 17 18 19]
[28 29 30 31]]
[[ 8 9 10 11]
[20 21 22 23]
[32 33 34 35]]

Related

Slicing array with numpy?

import numpy as np
r = np.arange(36)
r.resize((6, 6))
print(r)
# prints:
# [[ 0 1 2 3 4 5]
# [ 6 7 8 9 10 11]
# [12 13 14 15 16 17]
# [18 19 20 21 22 23]
# [24 25 26 27 28 29]
# [30 31 32 33 34 35]]
print(r[:,::7])
# prints:
# [[ 0]
# [ 6]
# [12]
# [18]
# [24]
# [30]]
print(r[:,0])
# prints:
# [ 0 6 12 18 24 30]
The r[:,::7] gives me a column, the r[:,0] gives me a row, they both have the same numbers. Would be glad if someone could explain to me why?
Because the step argument is greater than the corresponding shape so you'll just get the first "row". However these are not identical (even if they contain the same numbers) because the scalar index in [:, 0] flattens the corresponding dimension (so you'll get a 1D array). But [:, ::7] will keep the number of dimensions intact but alters the shape of the step-sliced dimension.

Shifting the location of tensor3 elements based on an offset vector

I have a Theano tensor3 (i.e., a 3-dimensional array) x:
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
as well as a Theano vector (i.e., a 1-dimensional array) y, which we will refer as an "offset" vector, since it specifies the desired offset:
[2, 1]
I want to shift the location of elements of x based on vector y, so that the output be as follows (the shift is performed on the second dimension):
[[[ a b c d]
[ e f g h]
[ 0 1 2 3]]
[[ i j k l]
[12 13 14 15]
[16 17 18 19]]]
where the a, b, …, l could be any number.
For example, a valid output could be:
[[[ 0 0 0 0]
[ 0 0 0 0]
[ 0 1 2 3]]
[[ 0 0 0 0]
[12 13 14 15]
[16 17 18 19]]]
Another valid output could be:
[[[ 4 5 6 7]
[ 8 9 10 11]
[ 0 1 2 3]]
[[20 21 22 23]
[12 13 14 15]
[16 17 18 19]]]
I am aware of the function theano.tensor.roll(x, shift, axis=None), however the shift can only take a scalar as input, i.e. it shifts all elements with the same offset.
E.g., the code:
import theano.tensor
from theano import shared
import numpy as np
x = shared(np.arange(24).reshape((2,3,4)))
print('theano.tensor.roll(x, 2, axis=1).eval(): \n{0}'.
format(theano.tensor.roll(x, 2, axis=1).eval()))
outputs:
theano.tensor.roll(x, 2, axis=1).eval():
[[[ 4 5 6 7]
[ 8 9 10 11]
[ 0 1 2 3]]
[[16 17 18 19]
[20 21 22 23]
[12 13 14 15]]]
which is not what I want.
How can I shift the location of tensor3 elements based on an offset vector? (note that in the code provided in this example, the tensor3 is a shared variable for convenience, but in my actual code it will be a symbolic variable)
I couldn't find any dedicated function for that purpose, so I simply ended up using theano.scan:
import theano
import theano.tensor
from theano import shared
import numpy as np
y = shared(np.array([2,1]))
x = shared(np.arange(24).reshape((2,3,4)))
print('x.eval():\n{0}\n'.format(x.eval()))
def shift_and_reverse_row(matrix, y):
'''
Shift and reverse the matrix in the direction of the first dimension (i.e., rows)
matrix: matrix
y: scalar
'''
new_matrix = theano.tensor.zeros_like(matrix)
new_matrix = theano.tensor.set_subtensor(new_matrix[:y,:], matrix[y-1::-1,:])
return new_matrix
new_x, updates = theano.scan(shift_and_reverse_row, outputs_info=None,
sequences=[x, y[::-1]] )
new_x = new_x[:, ::-1, :]
print('new_x.eval(): \n{0}'.format(new_x.eval()))
output:
x.eval():
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
new_x.eval():
[[[ 0 0 0 0]
[ 0 0 0 0]
[ 0 1 2 3]]
[[ 0 0 0 0]
[12 13 14 15]
[16 17 18 19]]]

Removing rows from a multi dimensional numpy array

I have a rather big 3 dimensional numpy (2000,2500,32) array that I need to manipulate.Some rows are bad so I would need to delete several rows.
In order to detect which row is "bad" I using the following function
def badDetect(x):
for i in xrange(10,19):
ptp = np.ptp(x[i*100:(i+1)*100])
if ptp < 0.01:
return True
return False
which marks as bad any sequence of 2000 that has a range of 100 values with peak to peak value less than 0.01.
When this is the case I want to remove that sequence of 2000 values (which can be selected from numpy with a[:,x,y])
Numpy delete seems to be accepting indexes but only for 2 dimensional arrays.
You will definitely have to reshape your input array, because cutting out "rows" from a 3D cube leaves a structure that cannot be properly addressed.
As we don't have your data, I'll use a different example first to explain how this possible solution works:
>>> import numpy as np
>>> from numpy.lib.stride_tricks import as_strided
>>>
>>> threshold = 18
>>> a = np.arange(5*3*2).reshape(5,3,2) # your dataset of 2000x2500x32
>>> # Taint the data:
... a[0,0,0] = 5
>>> a[a==22]=20
>>> print(a)
[[[ 5 1]
[ 2 3]
[ 4 5]]
[[ 6 7]
[ 8 9]
[10 11]]
[[12 13]
[14 15]
[16 17]]
[[18 19]
[20 21]
[20 23]]
[[24 25]
[26 27]
[28 29]]]
>>> a2 = a.reshape(-1, np.prod(a.shape[1:]))
>>> print(a2) # Will prove to be much easier to work with!
[[ 5 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 20 23]
[24 25 26 27 28 29]]
As you can see, from the representation above, it already becomes much clearer now over which windows you want to compute the peak to peak value. And you'll need this form if you're going to remove "rows" (now they have been transformed to columns) from this datastructure, something you couldn't do in 3 dimensions!
>>> isize = a.itemsize # More generic, in case you have another dtype
>>> slice_size = 4 # How big each continuous slice is over which the Peak2Peak value is calculated
>>> slices = as_strided(a2,
... shape=(a2.shape[0] + 1 - slice_size, slice_size, a2.shape[1]),
... strides=(isize*a2.shape[1], isize*a2.shape[1], isize))
>>> print(slices)
[[[ 5 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 20 23]]
[[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 20 23]
[24 25 26 27 28 29]]]
So I took, as an example, a window size of 4 elements: If the peak to peak value within any of these 4 element slices (per dataset, so per column) is less than a certain threshold, I want to exclude it. That can be done like this:
>>> mask = np.all(slices.ptp(axis=1) >= threshold, axis=0) # These are the ones that are of interest
>>> print(a2[:,mask])
[[ 1 2 3 5]
[ 7 8 9 11]
[13 14 15 17]
[19 20 21 23]
[25 26 27 29]]
You can now clearly see that the tainted data has been removed. But remember, you could not have simply removed that data from a 3D array (but you could've masked it then).
Obviously, you'll have to set the threshold to .01 in your use-case, and the slice_size to 100.
Beware, while the as_strided form is extremely memory-efficient, computing the peak to peak values of this array and storing that result does require a good amount of memory in your case: 1901x(2500x32) in the full case scenario, so when you do not ignore the first 1000 slices. In your case, where you're only interested in the slices from 1000:1900, you would have to add that to the code like so:
mask = np.all(slices[1000:1900,:,:].ptp(axis=1) >= threshold, axis=0)
And that would reduce the memory required to store this mask to "only" 900x(2500x32) values (of whatever data type you were using).

Numpy fancy indexing in multiple dimensions

Let's say I have an numpy array A of size n x m x k and another array B of size n x m that has indices from 1 to k.
I want to access each n x m slice of A using the index given at this place in B,
giving me an array of size n x m.
Edit: that is apparently not what I want!
[[ I can achieve this using take like this:
A.take(B)
]] end edit
Can this be achieved using fancy indexing?
I would have thought A[B] would give the same result, but that results
in an array of size n x m x m x k (which I don't really understand).
The reason I don't want to use take is that I want to be able to assign this portion something, like
A[B] = 1
The only working solution that I have so far is
A.reshape(-1, k)[np.arange(n * m), B.ravel()].reshape(n, m)
but surely there has to be an easier way?
Suppose
import numpy as np
np.random.seed(0)
n,m,k = 2,3,5
A = np.arange(n*m*k,0,-1).reshape((n,m,k))
print(A)
# [[[30 29 28 27 26]
# [25 24 23 22 21]
# [20 19 18 17 16]]
# [[15 14 13 12 11]
# [10 9 8 7 6]
# [ 5 4 3 2 1]]]
B = np.random.randint(k, size=(n,m))
print(B)
# [[4 0 3]
# [3 3 1]]
To create this array,
print(A.reshape(-1, k)[np.arange(n * m), B.ravel()])
# [26 25 17 12 7 4]
as a nxm array using fancy indexing:
i,j = np.ogrid[0:n, 0:m]
print(A[i, j, B])
# [[26 25 17]
# [12 7 4]]

Iteration through all 1 dimensional subarrays of a multi-dimensional array

What is the fastest way to iterate through all one dimensional sub-arrays of an n dimensional array in python.
For example consider the 3-D array:
import numpy as np
a = np.arange(24)
a = a.reshape(2,3,4)
The desired sequence of yields from the iterator is :
a[:,0,0]
a[:,0,1]
..
a[:,2,3]
a[0,:,0]
..
a[1,:,3]
a[0,0,:]
..
a[1,2,:]
Here is a compact implementation of such an iterator:
def iter1d(a):
return itertools.chain.from_iterable(
numpy.rollaxis(a, axis, a.ndim).reshape(-1, dim)
for axis, dim in enumerate(a.shape))
This will yield the subarrays in the order you gave in your post:
for x in iter1d(a):
print x
prints
[ 0 12]
[ 1 13]
[ 2 14]
[ 3 15]
[ 4 16]
[ 5 17]
[ 6 18]
[ 7 19]
[ 8 20]
[ 9 21]
[10 22]
[11 23]
[0 4 8]
[1 5 9]
[ 2 6 10]
[ 3 7 11]
[12 16 20]
[13 17 21]
[14 18 22]
[15 19 23]
[0 1 2 3]
[4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]
[20 21 22 23]
The trick here is to iterate over all axes, and for each axis reshape the array to a two-dimensional array the rows of which are the desired one-dimensional subarrays.
There may be a more efficient way, but this should work...
import itertools
import numpy as np
a = np.arange(24)
a = a.reshape(2,3,4)
colon = slice(None)
dimensions = [range(dim) + [colon] for dim in a.shape]
for dim in itertools.product(*dimensions):
if dim.count(colon) == 1:
print a[dim]
This yields (I'm leaving out a trivial bit of code to print the left hand side of this...):
a[0,0,:] --> [0 1 2 3]
a[0,1,:] --> [4 5 6 7]
a[0,2,:] --> [ 8 9 10 11]
a[0,:,0] --> [0 4 8]
a[0,:,1] --> [1 5 9]
a[0,:,2] --> [ 2 6 10]
a[0,:,3] --> [ 3 7 11]
a[1,0,:] --> [12 13 14 15]
a[1,1,:] --> [16 17 18 19]
a[1,2,:] --> [20 21 22 23]
a[1,:,0] --> [12 16 20]
a[1,:,1] --> [13 17 21]
a[1,:,2] --> [14 18 22]
a[1,:,3] --> [15 19 23]
a[:,0,0] --> [ 0 12]
a[:,0,1] --> [ 1 13]
a[:,0,2] --> [ 2 14]
a[:,0,3] --> [ 3 15]
a[:,1,0] --> [ 4 16]
a[:,1,1] --> [ 5 17]
a[:,1,2] --> [ 6 18]
a[:,1,3] --> [ 7 19]
a[:,2,0] --> [ 8 20]
a[:,2,1] --> [ 9 21]
a[:,2,2] --> [10 22]
a[:,2,3] --> [11 23]
The key here is that indexing a with (for example) a[0,0,:] is equivalent to indexing a with a[(0,0,slice(None))]. (This is just generic python slicing, nothing numpy-specific. To prove it to yourself, you can write a dummy class with just a __getitem__ and print what's passed in when you index an instance of your dummy class.).
So, what we want is every possible combination of 0 to nx, 0 to ny, 0 to nz, etc and a None for each axis.
However, we want 1D arrays, so we need to filter out anything with more or less than one None (i.e. we don't want a[:,:,:], a[0,:,:], a[0,0,0] etc).
Hopefully that makes some sense, anyway...
Edit: I'm assuming that the exact order doesn't matter... If you need the exact ordering you list in your question, you'll need to modify this...
Your friends are the slice() objects, numpy's ndarray.__getitem__() method, and possibly the itertools.chain.from_iterable.

Categories