Indexing a 2d array with a 3d array in numpy - python

I have two arrays.
"a", a 2d numpy array.
import numpy.random as npr
a = array([[5,6,7,8,9],[10,11,12,14,15]])
array([[ 5, 6, 7, 8, 9],
[10, 11, 12, 14, 15]])
"idx", a 3d numpy array constituting three index variants I want to use to index "a".
idx = npr.randint(5, size=(nsamp,shape(a)[0], shape(a)[1]))
array([[[1, 2, 1, 3, 4],
[2, 0, 2, 0, 1]],
[[0, 0, 3, 2, 0],
[1, 3, 2, 0, 3]],
[[2, 1, 0, 1, 4],
[1, 1, 0, 1, 0]]])
Now I want to index "a" three times with the indices in "idx" to obtain an object as follows:
array([[[6, 7, 6, 8, 9],
[12, 10, 12, 10, 11]],
[[5, 5, 8, 7, 5],
[11, 14, 12, 10, 14]],
[[7, 6, 5, 6, 9],
[11, 11, 10, 11, 10]]])
The naive "a[idx]" does not work. Any ideas as to how to do this? (I use Python 3.4 and numpy 1.9)

You can use choose to make the selection from a:
>>> np.choose(idx, a.T[:,:,np.newaxis])
array([[[ 6, 7, 6, 8, 9],
[12, 10, 12, 10, 11]],
[[ 5, 5, 8, 7, 5],
[11, 14, 12, 10, 14]],
[[ 7, 6, 5, 6, 9],
[11, 11, 10, 11, 10]]])
As you can see, a has to be reshaped from an array with shape (2, 5) to an array with shape (5, 2, 1) first. This is essentially so that it is broadcastable with idx, which has shape (3, 2, 5).
(I learned this method from #immerrr's answer here: https://stackoverflow.com/a/26225395/3923281)

You can use take array method:
import numpy
a = numpy.array([[5,6,7,8,9],[10,11,12,14,15]])
idx = numpy.random.randint(5, size=(3, a.shape[0], a.shape[1]))
print a.take(idx)

Related

How to make a 2d numpy array from an empty numpy array by adding 1d numpy arrays?

So I'm trying to start an empty numpy array with a = np.array([]), but when i append other numpy arrays (like [1, 2, 3, 4, 5, 6, 7, 8] and [9, 10, 11, 12, 13, 14, 15, 16] to this array, then the result im basically getting is
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16].
But what i want as result is: [[1, 2, 3, 4, 5, 6, 7, 8], [9, 10, 11, 12, 13, 14, 15, 16]]
IIUC you want to keep adding lists to your np.array. In that case, you can use something like np.vstack to "append" the new lists to the array.
a = np.array([[1, 2, 3],[4, 5, 6]])
np.vstack([a, [7, 8, 9]])
>>> array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
You can also use np.c_[], especially if a and b are already 1D arrays (but it also works with lists):
a = [1, 2, 3, 4, 5, 6, 7, 8]
b = [9, 10, 11, 12, 13, 14, 15, 16]
>>> np.c_[a, b]
array([[ 1, 9],
[ 2, 10],
[ 3, 11],
[ 4, 12],
[ 5, 13],
[ 6, 14],
[ 7, 15],
[ 8, 16]])
It also works "multiple times":
>>> np.c_[np.c_[a, b], a, b]
array([[ 1, 9, 1, 9],
[ 2, 10, 2, 10],
[ 3, 11, 3, 11],
[ 4, 12, 4, 12],
[ 5, 13, 5, 13],
[ 6, 14, 6, 14],
[ 7, 15, 7, 15],
[ 8, 16, 8, 16]])

Slice array, but overlapped interval in Python

There are 3D-array in my data. I just want to slice 3D-array 2 by 2 by 2 with overlapped interval in Python.
Here is an example for 2D.
a = [1, 2, 3, 4;
5, 6, 7, 8]
Also, this is what I expected after slicing the array in 2 by 2.
[1, 2; [2, 3; [3, 4;
5, 6] 6, 7] 7, 8]
In 3D,
[[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]]
Like this,(maybe not exactly..)
[1, 2 [2, 3
4, 5] 5, 6] ...
[1, 2 [2, 3
4, 5] 5, 6]
I think, by using np.split, I could slice the array, but without overlapped. Please give me some helpful tips.
You should have a look at numpy.ndarray.strides and numpy.lib.stride_tricks
Tuple of bytes to step in each dimension when traversing an array.
The byte offset of element (i[0], i[1], ..., i[n]) in an array a is:
offset = sum(np.array(i) * a.strides)
See also the numpy documentation
Following a 2D example using strides:
x = np.arange(20).reshape([4, 5])
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
>>> from numpy.lib import stride_tricks
>>> stride_tricks.as_strided(x, shape=(3, 2, 5),
strides=(20, 20, 4))
...
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9]],
[[ 5, 6, 7, 8, 9],
[ 10, 11, 12, 13, 14]],
[[ 10, 11, 12, 13, 14],
[ 15, 16, 17, 18, 19]]])
Also see this question on Stackoverflow, where this example is from, to increase your understanding.

Zero pad array based on other array's shape

I've got K feature vectors that all share dimension n but have a variable dimension m (n x m). They all live in a list together.
to_be_padded = []
to_be_padded.append(np.reshape(np.arange(9),(3,3)))
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
to_be_padded.append(np.reshape(np.arange(18),(3,6)))
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]])
to_be_padded.append(np.reshape(np.arange(15),(3,5)))
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
What I am looking for is a smart way to zero pad the rows of these np.arrays such that they all share the same dimension m. I've tried solving it with np.pad but I have not been able to come up with a pretty solution. Any help or nudges in the right direction would be greatly appreciated!
The result should leave the arrays looking like this:
array([[0, 1, 2, 0, 0, 0],
[3, 4, 5, 0, 0, 0],
[6, 7, 8, 0, 0, 0]])
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]])
array([[ 0, 1, 2, 3, 4, 0],
[ 5, 6, 7, 8, 9, 0],
[10, 11, 12, 13, 14, 0]])
You could use np.pad for that, which can also pad 2-D arrays using a tuple of values specifying the padding width, ((top, bottom), (left, right)). For that you could define:
def pad_to_length(x, m):
return np.pad(x,((0, 0), (0, m - x.shape[1])), mode = 'constant')
Usage
You could start by finding the ndarray with the highest amount of columns. Say you have two of them, a and b:
a = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
b = np.array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
m = max(i.shape[1] for i in [a,b])
# 5
And then use this parameter to pad the ndarrays:
pad_to_length(a, m)
array([[0, 1, 2, 0, 0],
[3, 4, 5, 0, 0],
[6, 7, 8, 0, 0]])
I believe there is no very efficient solution for this. I think you will need to loop over the list with a for loop and treat every array individually:
for i in range(len(to_be_padded)):
padded = np.zeros((n, maxM))
padded[:,:to_be_padded[i].shape[1]] = to_be_padded[i]
to_be_padded[i] = padded
where maxM is the longest m of the matrices in your list.

How does numpy.reshape() with order = 'F' work?

I thought I understood the reshape function in Numpy until I was messing around with it and came across this example:
a = np.arange(16).reshape((4,4))
which returns:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
This makes sense to me, but then when I do:
a.reshape((2,8), order = 'F')
it returns:
array([[0, 8, 1, 9, 2, 10, 3, 11],
[4, 12, 5, 13, 6, 14, 7, 15]])
I would expect it to return:
array([[0, 4, 8, 12, 1, 5, 9, 13],
[2, 6, 10, 14, 3, 7, 11, 15]])
Can someone please explain what is happening here?
The elements of a in order 'F'
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
are [0,4,8,12,1,5,9 ...]
Now rearrange them in a (2,8) array.
I think the reshape docs talks about raveling the elements, and then reshaping them. Evidently the ravel is done first.
Experiment with a.ravel(order='F').reshape(2,8).
Oops, I get what you expected:
In [208]: a = np.arange(16).reshape(4,4)
In [209]: a
Out[209]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [210]: a.ravel(order='F')
Out[210]: array([ 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15])
In [211]: _.reshape(2,8)
Out[211]:
array([[ 0, 4, 8, 12, 1, 5, 9, 13],
[ 2, 6, 10, 14, 3, 7, 11, 15]])
OK, I have to keep the 'F' order during the reshape
In [214]: a.ravel(order='F').reshape(2,8, order='F')
Out[214]:
array([[ 0, 8, 1, 9, 2, 10, 3, 11],
[ 4, 12, 5, 13, 6, 14, 7, 15]])
In [215]: a.ravel(order='F').reshape(2,8).flags
Out[215]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
...
In [216]: a.ravel(order='F').reshape(2,8, order='F').flags
Out[216]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
From np.reshape docs
You can think of reshaping as first raveling the array (using the given
index order), then inserting the elements from the raveled array into the
new array using the same kind of index ordering as was used for the
raveling.
The notes on order are fairly long, so it's not surprising that the topic is confusing.

Concatenation of every row combination of two numpy arrays

Given two arrays (A and B) of different shapes, I've like to produce an array containing the concatenation of every row from A with every row from B.
E.g. given:
A = np.array([[1, 2],
[3, 4],
[5, 6]])
B = np.array([[7, 8, 9],
[10, 11, 12]])
would like to produce the array:
[[1, 2, 7, 8, 9],
[1, 2, 10, 11, 12],
[3, 4, 7, 8, 9],
[3, 4, 10, 11, 12],
[5, 6, 7, 8, 9],
[5, 6, 10, 11, 12]]
I can do this with iteration, but it's very slow, so looking for some combination of numpy functions that can recreate the above as efficiently as possible (the input arrays A and B will be up to 10,000 rows in size, hence looking to avoid nested loops).
Perfect problem to learn about slicing and broadcasted-indexing.
Here's a vectorized solution using those tools -
def concatenate_per_row(A, B):
m1,n1 = A.shape
m2,n2 = B.shape
out = np.zeros((m1,m2,n1+n2),dtype=A.dtype)
out[:,:,:n1] = A[:,None,:]
out[:,:,n1:] = B
return out.reshape(m1*m2,-1)
Sample run -
In [441]: A
Out[441]:
array([[1, 2],
[3, 4],
[5, 6]])
In [442]: B
Out[442]:
array([[ 7, 8, 9],
[10, 11, 12]])
In [443]: concatenate_per_row(A, B)
Out[443]:
array([[ 1, 2, 7, 8, 9],
[ 1, 2, 10, 11, 12],
[ 3, 4, 7, 8, 9],
[ 3, 4, 10, 11, 12],
[ 5, 6, 7, 8, 9],
[ 5, 6, 10, 11, 12]])
Reference: numpy.concatenate on record arrays fails when array has different length strings
import numpy as np
from numpy.lib.recfunctions import stack_arrays
from pprint import pprint
A = np.array([[1, 2],
[3, 4],
[5, 6]])
B = np.array([[7, 8, 9],
[10, 11, 12]])
cartesian = [stack_arrays((a, b), usemask=False) for a in A
for b in B]
pprint(cartesian)
Output:
[array([1, 2, 7, 8, 9]),
array([ 1, 2, 10, 11, 12]),
array([3, 4, 7, 8, 9]),
array([ 3, 4, 10, 11, 12]),
array([5, 6, 7, 8, 9]),
array([ 5, 6, 10, 11, 12])]

Categories