numpy slice strange behavior - python

I have a 5 dimension array like this
a=np.random.randint(10,size=[2,3,4,5,600])
a.shape #(2,3,4,5,600)
I want to get the first element of the 2nd dimension, and several elements of the last dimension
b=a[:,0,:,:,[1,3,5,30,17,24,30,100,120]]
b.shape #(9,2,4,5)
as you can see, the last dimension was automatically converted to the first dimension.
why? and how to avoid that?

This behavior is described in the numpy documentation. In the expression
a[:,0,:,:,[1,3,5,30,17,24,30,100,120]]
both 0 and [1,3,5,30,17,24,30,100,120] are advanced indexes, separated by slices. As the documentation explains, in such case dimensions coming from advanced indexes will be first in the resulting array.
If we replace 0 by the slice 0:1 it will change this situation (since it will leave only one advanced index), and then the order of dimensions will be preserved. Thus one way to fix this issue is to use the 0:1 slice and then squeeze the appropriate axis:
a[:,0:1,:,:,[1,3,5,30,17,24,30,100,120]].squeeze(axis=1)
Alternatively, one can keep both advanced indexes, and then rearrange axes:
np.moveaxis(a[:,0,:,:,[1,3,5,30,17,24,30,100,120]], 0, -1)

Related

Removing indices from rows in 3D array

I have a 3D array with the shape (9, 100, 7200). I want to remove the 2nd half of the 7200 values in every row so the new shape will be (9, 100, 3600).
What can I do to slice the array or delete the 2nd half of the indices? I was thinking np.delete(arr, [3601:7200], axis=2), but I get an invalid syntax error when using the colon.
Why not just slicing?
arr = arr[:,:,:3600]
The syntax error occurs because [3601:7200] is not valid python. I assume you are trying to create a new array of numbers to pass as the obj parameter for the delete function. You could do it this way using something like the range function:
np.delete(arr, range(3600,7200), axis=2)
keep in mind that this will not modify arr, but it will return a new array with the elements deleted. Also, notice I have used 3600 not 3601.
However, its often better practice to use slicing in a problem like this:
arr[:,:,:3600]
This gives your required shape. Let me break this down a little. We are slicing a numpy array with 3 dimensions. Just putting a colon in means we are taking everything in that dimension. :3600 means we are taking the first 3600 elements in that dimension. A better way to think about deleting the last have, is to think of it as keeping the first half.

Fit one array into another regardless of sizes - Python

Consider numpy arrays arr1 and arr2. They can be any number of dimensions. For example
arr1=np.zeros([5,8])
arr2=np.ones([4,10])
I would like to put arr2 into arr1 either by cutting off excess lengths in some dimensions, or filling missing length with zeros.
I have tried:
arr1[exec(str(",:"*len([arr1.shape]))[1:])]=arr2[exec(str(",:"*len([arr2.shape]))[1:])]
which is basically the same as
arr1[:,:]=arr2[:,:]
I would like to do this preferably in one line and without "for" loops.
You could use this :
arr1[:min(arr1.shape[0], arr2.shape[0]), :min(arr1.shape[1], arr2.shape[1])]=arr2[:min(arr1.shape[0], arr2.shape[0]), :min(arr1.shape[1], arr2.shape[1])]
without any for loop.
It's the same concept you applied in second try, but with a condition to choose minimum length.
I solved this by coming up with the following. I used slice() as #hpaulj suggested. Considering I want to assign ph10 (an array) to ph14 (an array of zeros of size bound1):
ph14=np.zeros(bound1)
ph10=np.array(list1)
ind_min=np.min([ph14.shape,ph10.shape],0)
ph24=[]
for n2 in range(0,len(ind_min.shape)):
ph24=ph24+[slice(0,ind_min[n2])]
ph14[ph24]=ph10[ph24]

Filtered Numpy Array Changes Number of Dimensions

I'm having trouble getting used to Numpy arrays (I'm a Matlab user). When I try to select just a range of values from an array, I see the resulting array has an extra dimension:
ioi = np.nonzero((self.data_array[0,:] >= range_start) & (self.data_array[0,:] <= range_end))
print("self.data_array.shape = {0}".format(self.data_array.shape))
print("self.data_array.shape[:,ioi] = {0}".format(self.data_array[:,ioi].shape))
The result is:
self.data_array.shape = (5, 50000)
self.data_array.shape[:,ioi] = (5, 1, 408)
I also see that ioi is a tuple. I don't know if that has anything to do with it.
What is happening here to create that extra dimension and what should I do, in the most direct way, to get an array shape of (5,408) in this case?
The simplest and most efficient thing would be to get rid of the np.nonzero call, and use logical indexing just as one would in Matlab. Here's an example. (I'm using random data of the same shape, FYI.)
>>> data = np.random.randn(5, 5000)
>>> start, end = -0.5, 0.5
>>> ioi = (data[0] > start) & (data[0] < end)
>>> print(ioi.shape)
(5000,)
>>> print(ioi.sum())
1900
>>> print(data[:, ioi].shape)
(5, 1900)
The np.nonzero call is not usually needed. Just like Matlab's find function, it's slow compared with logical indexing, and usually one's goal can be more efficiently accomplished with logical indexing. np.nonzero, just like find, should mostly be used only when you need the actual index values themselves.
As you suspected, the reason for the extra dimensions is that tuples are handled differently from other types of indexing arrays in NumPy. This is to allow more flexible indexing, such as with slices, ellipses, etc. See this useful page for in-depth explanation, especially the last section.
There are at least two other options to solve the problem. One is to use the ioi array, as returned from np.nonzero, directly as your only index to the data array. As in: self.data_array[ioi]. Part of why you have an extra dimension is that you actually have two set of indices in your call: the slice (:) and the tuple ioi. np.nonzero is guaranteed to return a tuple exactly for this reason, so that its output can always be used to directly index the source array.
The last option is to call np.squeeze on the returned array, but I'd opt for one of the above first.

transpose manipulation with indexing over multiple dimensions

I have a trouble with numpy ndarray when I'm indexing multiple dimensions at the same time :
> a = np.random.random((25,50,30))
> b = a[0,:,np.arange(30)]
> print(b.shape)
Here I expected the result to be (50,30), but actually the real result is (30,50) !
Can someone explain it to me please I don't get it and this feature introduces tons of bugs in my code. Thank you :)
Additional information :
Indexing in one dimension works perfectly :
> b = a[0,:,:]
> print(b.shape)
(50,30)
And when I have the transposition :
> a[0,:,0] == b[0,:]
True
From numpy docs
The easiest way to understand the situation may be to think in terms
of the result shape. There are two parts to the indexing operation,
the subspace defined by the basic indexing (excluding integers) and
the subspace from the advanced indexing part. Two cases of index
combination need to be distinguished:
The advanced indexes are separated by a slice, ellipsis or newaxis.
For example x[arr1, :, arr2].
The advanced indexes are all next to each other. For example x[...,
arr1, arr2, :] but not x[arr1, :, 1] since 1 is an advanced index in
this regard.
In the first case, the dimensions resulting from the advanced indexing
operation come first in the result array, and the subspace dimensions
after that. In the second case, the dimensions from the advanced
indexing operations are inserted into the result array at the same
spot as they were in the initial array (the latter logic is what makes
simple advanced indexing behave just like slicing).
(my emphasis) the highlighted bit applies to your
b = a[0,:,np.arange(30)]
When you use a list or array of integers to index a numpy array, you're using something that is known as Fancy Indexing. The rules for Fancy Indexing are not so straightforward as one might think. This is the reason that you're array has the wrong dimension. To avoid surprises, I'd recommend you to stick with slicing. So, you should change your code to:
a = np.random.random((25,50,30))
b = a[0,:,:]
print(b.shape)

Shapes of numpy arrays

Been working with numpy for a while now. Just when I think I have arrays figured out, though, it throws me another curve. For instance, I construct the 3D array pltz, and then
>>> gridset2 = range(0, pltx.shape[2], grdspc)
>>> pltz[10,:,gridset2].shape
(17, 160)
>>> pltz[10][:,gridset2].shape
(160, 17)
Why on Earth are the two shapes different?
Since your indexing expression has both a : and a list in it, NumPy needs to apply both the basic and advanced indexing rules, and the way they interact is kind of weird. The relevant documentation is here, and you should consult it if you want to know the full details. I'll focus on the part that causes this shape mismatch.
When all components of the indexing expression that use advanced indexing are next to each other, dimensions of the result coming from advanced indexing are placed into the result in the position of the dimensions they replace. Advanced indexing components are array-likes, such as arrays, lists, and scalars; scalars can also be used in basic indexing, but for this purpose, they're considered advanced. Thus, if arr.shape == (10, 20, 30) and ind.shape = (2, 3, 4), then
arr[:, ind, :].shape == (10, 2, 3, 4, 30)
Your first expression falls into this case.
On the other hand, if components of the indexing expression that use advanced indexing are separated by components that use basic indexing, there is no unambiguous place to insert the advanced indexing dimensions. For example, with
arr[ind, :, ind]
the result needs to have dimensions of length 2, 3, 4, and 20, and there's no good place to stick the 20.
When advanced indexing components are separated by basic indexing components, NumPy sticks all dimensions resulting from advanced indexing at the start of the result array. Basic indexing components are :, ..., and np.newaxis (None). Your second expression falls into this case.
Since your second expression has advanced indexing components separated by basic indexing components and your first expression doesn't, your two expressions use different indexing rules. To avoid this, you could separate the basic indexing and advanced indexing into two stages, or you could replace the basic indexing with equivalent advanced indexing. Whatever you do, I recommend putting an explanatory comment above such code.
You should tell us the lenth of gridset2 and the shape of pltz.
But I've deduced from the documentation that user2357112 gave us that
len(gridset2) == 17
pltz.shape[1] == 160
http://docs.scipy.org/doc/numpy-1.10.0/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
The advanced indexes are separated by a slice, ellipsis or newaxis.
For example x[arr1, :, arr2].
The advanced indexes are all next to
each other. For example x[..., arr1, arr2, :] but not x[arr1, :, 1]
since 1 is an advanced index in this regard.
In the first case, the
dimensions resulting from the advanced indexing operation come first
in the result array, and the subspace dimensions after that. In the
second case, the dimensions from the advanced indexing operations are
inserted into the result array at the same spot as they were in the
initial array
>>> pltz[10,:,gridset2].shape
(17, 160)
This is the first case in the quote, a slice in the middle. gridset2 is advanced indexing (e.g. [1,2,3,...]). It is put first; the [10,:] subspace is placed after.
>>> pltz[10][:,gridset2].shape
(160, 17)
with pltz[10], the new array (a view) is 2d `(160,N)'. It now puts the size 17 dim last, the 2nd case in the documentation.

Categories