Removing rows from a multi dimensional numpy array

Removing rows from a multi dimensional numpy array - python

I have a rather big 3 dimensional numpy (2000,2500,32) array that I need to manipulate.Some rows are bad so I would need to delete several rows.
In order to detect which row is "bad" I using the following function
def badDetect(x):
for i in xrange(10,19):
ptp = np.ptp(x[i*100:(i+1)*100])
if ptp < 0.01:
return True
return False
which marks as bad any sequence of 2000 that has a range of 100 values with peak to peak value less than 0.01.
When this is the case I want to remove that sequence of 2000 values (which can be selected from numpy with a[:,x,y])
Numpy delete seems to be accepting indexes but only for 2 dimensional arrays.

You will definitely have to reshape your input array, because cutting out "rows" from a 3D cube leaves a structure that cannot be properly addressed.
As we don't have your data, I'll use a different example first to explain how this possible solution works:
>>> import numpy as np
>>> from numpy.lib.stride_tricks import as_strided
>>>
>>> threshold = 18
>>> a = np.arange(5*3*2).reshape(5,3,2) # your dataset of 2000x2500x32
>>> # Taint the data:
... a[0,0,0] = 5
>>> a[a==22]=20
>>> print(a)
[[[ 5 1]
[ 2 3]
[ 4 5]]
[[ 6 7]
[ 8 9]
[10 11]]
[[12 13]
[14 15]
[16 17]]
[[18 19]
[20 21]
[20 23]]
[[24 25]
[26 27]
[28 29]]]
>>> a2 = a.reshape(-1, np.prod(a.shape[1:]))
>>> print(a2) # Will prove to be much easier to work with!
[[ 5 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 20 23]
[24 25 26 27 28 29]]
As you can see, from the representation above, it already becomes much clearer now over which windows you want to compute the peak to peak value. And you'll need this form if you're going to remove "rows" (now they have been transformed to columns) from this datastructure, something you couldn't do in 3 dimensions!
>>> isize = a.itemsize # More generic, in case you have another dtype
>>> slice_size = 4 # How big each continuous slice is over which the Peak2Peak value is calculated
>>> slices = as_strided(a2,
... shape=(a2.shape[0] + 1 - slice_size, slice_size, a2.shape[1]),
... strides=(isize*a2.shape[1], isize*a2.shape[1], isize))
>>> print(slices)
[[[ 5 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 20 23]]
[[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 20 23]
[24 25 26 27 28 29]]]
So I took, as an example, a window size of 4 elements: If the peak to peak value within any of these 4 element slices (per dataset, so per column) is less than a certain threshold, I want to exclude it. That can be done like this:
>>> mask = np.all(slices.ptp(axis=1) >= threshold, axis=0) # These are the ones that are of interest
>>> print(a2[:,mask])
[[ 1 2 3 5]
[ 7 8 9 11]
[13 14 15 17]
[19 20 21 23]
[25 26 27 29]]
You can now clearly see that the tainted data has been removed. But remember, you could not have simply removed that data from a 3D array (but you could've masked it then).
Obviously, you'll have to set the threshold to .01 in your use-case, and the slice_size to 100.
Beware, while the as_strided form is extremely memory-efficient, computing the peak to peak values of this array and storing that result does require a good amount of memory in your case: 1901x(2500x32) in the full case scenario, so when you do not ignore the first 1000 slices. In your case, where you're only interested in the slices from 1000:1900, you would have to add that to the code like so:
mask = np.all(slices[1000:1900,:,:].ptp(axis=1) >= threshold, axis=0)
And that would reduce the memory required to store this mask to "only" 900x(2500x32) values (of whatever data type you were using).

Related

Iterate over last axis of a numpy array

Let's say we have a (20, 5) array. We can iterate over each row very pythonically:
import numpy as np
xs = np.array(range(100)).reshape(20, 5)
for x in xs:
print(x)
If we want to iterate over another axis (here in the example, iterate over columns, but I'm looking for a solution for each possible axis in a ndarray), it's less direct, we can use the method from Iterating over arbitrary dimension of numpy.array:
for i in range(xs.shape[-1]):
x = xs[..., i]
print(x)
Is there a more direct way to iterate over another axis, like (pseudo-code):
for x in xs.iterator(axis=-1):
print(x)
?

I think that as_strided from the stride tricks module should do the work here.
It creates a view into the array and not a copy (as stated by the docs).
Here is a simple demonstration of as_stided capabilities:
from numpy.lib.stride_tricks import as_strided
import numpy as np
xs = np.array(range(3 *3 * 4)).reshape(3,3, 4)
for x in xs:
print(x)
output:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]
[[24 25 26 27]
[28 29 30 31]
[32 33 34 35]]
function to iterate over array specific axis:
def iterate_over_axis(arr, axis=0):
strides = arr.strides
strides_ = [strides[axis], *strides[0:axis], *strides[(axis+1):]]
shape = arr.shape
shape_ = [shape[axis], *shape[0:axis], *shape[(axis+1):]]
return as_strided(arr, strides=strides_, shape=shape_)
for x in iterate_over_axis(xs, axis=1):
print(x)
output:
[[ 0 1 2 3]
[12 13 14 15]
[24 25 26 27]]
[[ 4 5 6 7]
[16 17 18 19]
[28 29 30 31]]
[[ 8 9 10 11]
[20 21 22 23]
[32 33 34 35]]

How to manually select values from an array

For example I have a matrix array
a=np.arrange(25).shape(5,5)
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
How do I make an 1D array of elements that I would like to choose manually? For example [2,3], [4,1], [1,0] and [2,2], so I get the following:
b=[13, 21, 5, 12]
The array should a reference rather than a copy.

You can make a function for this.
# defining the function
def get_value(matrix, row_list, col_list):
for i, j in zip(row_list, col_list):
return matrix[row_list, col_list]
# initializing the array
a = np.arange(0, 25, 1).reshape(5, 5)
# getting the required values and printing
b = get_value(a, [2,4,1,0], [3,1,0,2])
# output
print(b)
Edit
I'll let the previous answer be as is, just in case if anyone else stumbles upon that and needs it.
What the question wants is to give a value from b (i.e. b[0] which is 13) and change the value from the original matrix a based on the index of that passed value from b in a.
def change_the_value(old_mat, val_to_change, new_val):
mat_coor = np.array(np.matrix(np.where(old_mat == val_to_change)).T)[0]
old_mat[mat_coor[0], mat_coor[1]] = new_val
a = np.arange(0, 25, 1).reshape(5,5)
b = [13, 16, 5, 12]
change_the_value(a, b[0], 0)

a=np.arange(25).reshape(5,5)
search=[[2,3], [4,1], [1,0], [2,2]]
for row,col in search:
print(row,col, a[row][col])
output:
r c result
2 3 13
4 1 21
1 0 5
2 2 12

First of all, I've found that constructing a non-contiguous view to a Numpy array is not natively possible, because Numpy efficiently utilises contiguous memory layout of an array, which enables dramatic speed increase.
Here's a solution I found that works the best so far:
Instead of having a view to an array, I construct a collection indices, that I would like to process, [2,3], [4,1], [1,0], [2,2].
The collection type I have chosen are Sets, due to exclusion of duplicates and set().add and set().discard methods that do not require search. Keeping order was not necessary.
To use them for indexing an array they have to be casted from a set of tuples set{(2,3),(4,1),(1,0),(2,2)} to a tuple of arrays (ndarray([2,4,1,2], ndarray[3,1,0,2]).
Which can be achieved by unzipping a set and constructing a tuple of arrays:
import numpy as np
a=np.arrange(25).shape(5,5)
>>>[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
my_set = {(2,3),(4,1),(1,0),(2,2)}
uzip_set = list(zip(*my_set))
seq_from_set = (np.asarray(uzip_set[0]),np.asarray(uzip_set[1]))
print(seq_from_set)
>>>(array[2,4,1,2], array[3,1,0,2])
And array a can be manipulated by providing such a sequence of indices:
b = a[seq_from_set]
print(b)
>>>array[13,21,5,12]
a[seq_from_set] = 0
print(a)
>>>[[ 0 1 2 3 4]
[ 0 6 7 8 9]
[10 11 0 0 14]
[15 16 17 18 19]
[20 0 22 23 24]]
The solution is a bit sophisticated compared to something native, but works surprisingly fast. This allows an easy management of the collection of indices and supports quick conversion to a stream of indices on demand.

Slicing array with numpy?

import numpy as np
r = np.arange(36)
r.resize((6, 6))
print(r)
# prints:
# [[ 0 1 2 3 4 5]
# [ 6 7 8 9 10 11]
# [12 13 14 15 16 17]
# [18 19 20 21 22 23]
# [24 25 26 27 28 29]
# [30 31 32 33 34 35]]
print(r[:,::7])
# prints:
# [[ 0]
# [ 6]
# [12]
# [18]
# [24]
# [30]]
print(r[:,0])
# prints:
# [ 0 6 12 18 24 30]
The r[:,::7] gives me a column, the r[:,0] gives me a row, they both have the same numbers. Would be glad if someone could explain to me why?

Because the step argument is greater than the corresponding shape so you'll just get the first "row". However these are not identical (even if they contain the same numbers) because the scalar index in [:, 0] flattens the corresponding dimension (so you'll get a 1D array). But [:, ::7] will keep the number of dimensions intact but alters the shape of the step-sliced dimension.

Resampling a numpy array representing an image

I am looking for how to resample a numpy array representing image data at a new size, preferably having a choice of the interpolation method (nearest, bilinear, etc.). I know there is
scipy.misc.imresize
which does exactly this by wrapping PIL's resize function. The only problem is that since it uses PIL, the numpy array has to conform to image formats, giving me a maximum of 4 "color" channels.
I want to be able to resize arbitrary images, with any number of "color" channels. I was wondering if there is a simple way to do this in scipy/numpy, or if I need to roll my own.
I have two ideas for how to concoct one myself:
a function that runs scipy.misc.imresize on every channel separately
create my own using scipy.ndimage.interpolation.affine_transform
The first one would probably be slow for large data, and the second one does not seem to offer any other interpolation method except splines.

Based on your description, you want scipy.ndimage.zoom.
Bilinear interpolation would be order=1, nearest is order=0, and cubic is the default (order=3).
zoom is specifically for regularly-gridded data that you want to resample to a new resolution.
As a quick example:
import numpy as np
import scipy.ndimage
x = np.arange(9).reshape(3,3)
print 'Original array:'
print x
print 'Resampled by a factor of 2 with nearest interpolation:'
print scipy.ndimage.zoom(x, 2, order=0)
print 'Resampled by a factor of 2 with bilinear interpolation:'
print scipy.ndimage.zoom(x, 2, order=1)
print 'Resampled by a factor of 2 with cubic interpolation:'
print scipy.ndimage.zoom(x, 2, order=3)
And the result:
Original array:
[[0 1 2]
[3 4 5]
[6 7 8]]
Resampled by a factor of 2 with nearest interpolation:
[[0 0 1 1 2 2]
[0 0 1 1 2 2]
[3 3 4 4 5 5]
[3 3 4 4 5 5]
[6 6 7 7 8 8]
[6 6 7 7 8 8]]
Resampled by a factor of 2 with bilinear interpolation:
[[0 0 1 1 2 2]
[1 2 2 2 3 3]
[2 3 3 4 4 4]
[4 4 4 5 5 6]
[5 5 6 6 6 7]
[6 6 7 7 8 8]]
Resampled by a factor of 2 with cubic interpolation:
[[0 0 1 1 2 2]
[1 1 1 2 2 3]
[2 2 3 3 4 4]
[4 4 5 5 6 6]
[5 6 6 7 7 7]
[6 6 7 7 8 8]]
Edit: As Matt S. pointed out, there are a couple of caveats for zooming multi-band images. I'm copying the portion below almost verbatim from one of my earlier answers:
Zooming also works for 3D (and nD) arrays. However, be aware that if you zoom by 2x, for example, you'll zoom along all axes.
data = np.arange(27).reshape(3,3,3)
print 'Original:\n', data
print 'Zoomed by 2x gives an array of shape:', ndimage.zoom(data, 2).shape
This yields:
Original:
[[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]]
[[ 9 10 11]
[12 13 14]
[15 16 17]]
[[18 19 20]
[21 22 23]
[24 25 26]]]
Zoomed by 2x gives an array of shape: (6, 6, 6)
In the case of multi-band images, you usually don't want to interpolate along the "z" axis, creating new bands.
If you have something like a 3-band, RGB image that you'd like to zoom, you can do this by specifying a sequence of tuples as the zoom factor:
print 'Zoomed by 2x along the last two axes:'
print ndimage.zoom(data, (1, 2, 2))
This yields:
Zoomed by 2x along the last two axes:
[[[ 0 0 1 1 2 2]
[ 1 1 1 2 2 3]
[ 2 2 3 3 4 4]
[ 4 4 5 5 6 6]
[ 5 6 6 7 7 7]
[ 6 6 7 7 8 8]]
[[ 9 9 10 10 11 11]
[10 10 10 11 11 12]
[11 11 12 12 13 13]
[13 13 14 14 15 15]
[14 15 15 16 16 16]
[15 15 16 16 17 17]]
[[18 18 19 19 20 20]
[19 19 19 20 20 21]
[20 20 21 21 22 22]
[22 22 23 23 24 24]
[23 24 24 25 25 25]
[24 24 25 25 26 26]]]

If you want to resample, then you should look at Scipy's cookbook for rebinning. In particular, the congrid function defined at the end will support rebinning or interpolation (equivalent to the function in IDL with the same name). This should be the fastest option if you don't want interpolation.
You can also use directly scipy.ndimage.map_coordinates, which will do a spline interpolation for any kind of resampling (including unstructured grids). I find map_coordinates to be slow for large arrays (nx, ny > 200).
For interpolation on structured grids, I tend to use scipy.interpolate.RectBivariateSpline. You can choose the order of the spline (linear, quadratic, cubic, etc) and even independently for each axis. An example:
import scipy.interpolate as interp
f = interp.RectBivariateSpline(x, y, im, kx=1, ky=1)
new_im = f(new_x, new_y)
In this case you're doing a bi-linear interpolation (kx = ky = 1). The 'nearest' kind of interpolation is not supported, as all this does is a spline interpolation over a rectangular mesh. It's also not the fastest method.
If you're after bi-linear or bi-cubic interpolation, it is generally much faster to do two 1D interpolations:
f = interp.interp1d(y, im, kind='linear')
temp = f(new_y)
f = interp.interp1d(x, temp.T, kind='linear')
new_im = f(new_x).T
You can also use kind='nearest', but in that case get rid of the transverse arrays.

Have you looked at Scikit-image? Its transform.pyramid_* functions might be useful for you.

I've recently just found an issue with scipy.ndimage.interpolation.zoom, which I've submitted as a bug report: https://github.com/scipy/scipy/issues/3203
As an alternative (or at least for me), I've found that scikit-image's skimage.transform.resize works correctly: http://scikit-image.org/docs/dev/api/skimage.transform.html#skimage.transform.resize
However it works differently to scipy's interpolation.zoom - rather than specifying a mutliplier, you specify the the output shape that you want. This works for 2D and 3D images.
For just 2D images, you can use transform.rescale and specify a multiplier or scale as you would with interpolation.zoom.

You can use interpolate.interp2d.
For example, considering an image represented by a numpy array arr, you can resize it to an arbitrary height and width as follows:
W, H = arr.shape[:2]
new_W, new_H = (600,300)
xrange = lambda x: np.linspace(0, 1, x)
f = interp2d(xrange(W), xrange(H), arr, kind="linear")
new_arr = f(xrange(new_W), xrange(new_H))
Of course, if your image has multiple channels, you have to perform the interpolation for each one.

This solution scales X and Y of the fed image without affecting RGB channels:
import numpy as np
import scipy.ndimage
matplotlib.pyplot.imshow(scipy.ndimage.zoom(image_np_array, zoom = (7,7,1), order = 1))
Hope this is useful.

Numpy fancy indexing in multiple dimensions

Let's say I have an numpy array A of size n x m x k and another array B of size n x m that has indices from 1 to k.
I want to access each n x m slice of A using the index given at this place in B,
giving me an array of size n x m.
Edit: that is apparently not what I want!
[[ I can achieve this using take like this:
A.take(B)
]] end edit
Can this be achieved using fancy indexing?
I would have thought A[B] would give the same result, but that results
in an array of size n x m x m x k (which I don't really understand).
The reason I don't want to use take is that I want to be able to assign this portion something, like
A[B] = 1
The only working solution that I have so far is
A.reshape(-1, k)[np.arange(n * m), B.ravel()].reshape(n, m)
but surely there has to be an easier way?

Suppose
import numpy as np
np.random.seed(0)
n,m,k = 2,3,5
A = np.arange(n*m*k,0,-1).reshape((n,m,k))
print(A)
# [[[30 29 28 27 26]
# [25 24 23 22 21]
# [20 19 18 17 16]]
# [[15 14 13 12 11]
# [10 9 8 7 6]
# [ 5 4 3 2 1]]]
B = np.random.randint(k, size=(n,m))
print(B)
# [[4 0 3]
# [3 3 1]]
To create this array,
print(A.reshape(-1, k)[np.arange(n * m), B.ravel()])
# [26 25 17 12 7 4]
as a nxm array using fancy indexing:
i,j = np.ogrid[0:n, 0:m]
print(A[i, j, B])
# [[26 25 17]
# [12 7 4]]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Removing rows from a multi dimensional numpy array - python

Related

Iterate over last axis of a numpy array

How to manually select values from an array

Slicing array with numpy?

Resampling a numpy array representing an image

Numpy fancy indexing in multiple dimensions

Categories

Resources