Concatenation of every row combination of two numpy arrays - python

Given two arrays (A and B) of different shapes, I've like to produce an array containing the concatenation of every row from A with every row from B.
E.g. given:
A = np.array([[1, 2],
[3, 4],
[5, 6]])
B = np.array([[7, 8, 9],
[10, 11, 12]])
would like to produce the array:
[[1, 2, 7, 8, 9],
[1, 2, 10, 11, 12],
[3, 4, 7, 8, 9],
[3, 4, 10, 11, 12],
[5, 6, 7, 8, 9],
[5, 6, 10, 11, 12]]
I can do this with iteration, but it's very slow, so looking for some combination of numpy functions that can recreate the above as efficiently as possible (the input arrays A and B will be up to 10,000 rows in size, hence looking to avoid nested loops).

Perfect problem to learn about slicing and broadcasted-indexing.
Here's a vectorized solution using those tools -
def concatenate_per_row(A, B):
m1,n1 = A.shape
m2,n2 = B.shape
out = np.zeros((m1,m2,n1+n2),dtype=A.dtype)
out[:,:,:n1] = A[:,None,:]
out[:,:,n1:] = B
return out.reshape(m1*m2,-1)
Sample run -
In [441]: A
Out[441]:
array([[1, 2],
[3, 4],
[5, 6]])
In [442]: B
Out[442]:
array([[ 7, 8, 9],
[10, 11, 12]])
In [443]: concatenate_per_row(A, B)
Out[443]:
array([[ 1, 2, 7, 8, 9],
[ 1, 2, 10, 11, 12],
[ 3, 4, 7, 8, 9],
[ 3, 4, 10, 11, 12],
[ 5, 6, 7, 8, 9],
[ 5, 6, 10, 11, 12]])

Reference: numpy.concatenate on record arrays fails when array has different length strings
import numpy as np
from numpy.lib.recfunctions import stack_arrays
from pprint import pprint
A = np.array([[1, 2],
[3, 4],
[5, 6]])
B = np.array([[7, 8, 9],
[10, 11, 12]])
cartesian = [stack_arrays((a, b), usemask=False) for a in A
for b in B]
pprint(cartesian)
Output:
[array([1, 2, 7, 8, 9]),
array([ 1, 2, 10, 11, 12]),
array([3, 4, 7, 8, 9]),
array([ 3, 4, 10, 11, 12]),
array([5, 6, 7, 8, 9]),
array([ 5, 6, 10, 11, 12])]

Related

How to make a 2d numpy array from an empty numpy array by adding 1d numpy arrays?

So I'm trying to start an empty numpy array with a = np.array([]), but when i append other numpy arrays (like [1, 2, 3, 4, 5, 6, 7, 8] and [9, 10, 11, 12, 13, 14, 15, 16] to this array, then the result im basically getting is
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16].
But what i want as result is: [[1, 2, 3, 4, 5, 6, 7, 8], [9, 10, 11, 12, 13, 14, 15, 16]]
IIUC you want to keep adding lists to your np.array. In that case, you can use something like np.vstack to "append" the new lists to the array.
a = np.array([[1, 2, 3],[4, 5, 6]])
np.vstack([a, [7, 8, 9]])
>>> array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
You can also use np.c_[], especially if a and b are already 1D arrays (but it also works with lists):
a = [1, 2, 3, 4, 5, 6, 7, 8]
b = [9, 10, 11, 12, 13, 14, 15, 16]
>>> np.c_[a, b]
array([[ 1, 9],
[ 2, 10],
[ 3, 11],
[ 4, 12],
[ 5, 13],
[ 6, 14],
[ 7, 15],
[ 8, 16]])
It also works "multiple times":
>>> np.c_[np.c_[a, b], a, b]
array([[ 1, 9, 1, 9],
[ 2, 10, 2, 10],
[ 3, 11, 3, 11],
[ 4, 12, 4, 12],
[ 5, 13, 5, 13],
[ 6, 14, 6, 14],
[ 7, 15, 7, 15],
[ 8, 16, 8, 16]])

Slice array, but overlapped interval in Python

There are 3D-array in my data. I just want to slice 3D-array 2 by 2 by 2 with overlapped interval in Python.
Here is an example for 2D.
a = [1, 2, 3, 4;
5, 6, 7, 8]
Also, this is what I expected after slicing the array in 2 by 2.
[1, 2; [2, 3; [3, 4;
5, 6] 6, 7] 7, 8]
In 3D,
[[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]]
Like this,(maybe not exactly..)
[1, 2 [2, 3
4, 5] 5, 6] ...
[1, 2 [2, 3
4, 5] 5, 6]
I think, by using np.split, I could slice the array, but without overlapped. Please give me some helpful tips.
You should have a look at numpy.ndarray.strides and numpy.lib.stride_tricks
Tuple of bytes to step in each dimension when traversing an array.
The byte offset of element (i[0], i[1], ..., i[n]) in an array a is:
offset = sum(np.array(i) * a.strides)
See also the numpy documentation
Following a 2D example using strides:
x = np.arange(20).reshape([4, 5])
>>> x
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
>>> from numpy.lib import stride_tricks
>>> stride_tricks.as_strided(x, shape=(3, 2, 5),
strides=(20, 20, 4))
...
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9]],
[[ 5, 6, 7, 8, 9],
[ 10, 11, 12, 13, 14]],
[[ 10, 11, 12, 13, 14],
[ 15, 16, 17, 18, 19]]])
Also see this question on Stackoverflow, where this example is from, to increase your understanding.

Indexing a 2d array with a 3d array in numpy

I have two arrays.
"a", a 2d numpy array.
import numpy.random as npr
a = array([[5,6,7,8,9],[10,11,12,14,15]])
array([[ 5, 6, 7, 8, 9],
[10, 11, 12, 14, 15]])
"idx", a 3d numpy array constituting three index variants I want to use to index "a".
idx = npr.randint(5, size=(nsamp,shape(a)[0], shape(a)[1]))
array([[[1, 2, 1, 3, 4],
[2, 0, 2, 0, 1]],
[[0, 0, 3, 2, 0],
[1, 3, 2, 0, 3]],
[[2, 1, 0, 1, 4],
[1, 1, 0, 1, 0]]])
Now I want to index "a" three times with the indices in "idx" to obtain an object as follows:
array([[[6, 7, 6, 8, 9],
[12, 10, 12, 10, 11]],
[[5, 5, 8, 7, 5],
[11, 14, 12, 10, 14]],
[[7, 6, 5, 6, 9],
[11, 11, 10, 11, 10]]])
The naive "a[idx]" does not work. Any ideas as to how to do this? (I use Python 3.4 and numpy 1.9)
You can use choose to make the selection from a:
>>> np.choose(idx, a.T[:,:,np.newaxis])
array([[[ 6, 7, 6, 8, 9],
[12, 10, 12, 10, 11]],
[[ 5, 5, 8, 7, 5],
[11, 14, 12, 10, 14]],
[[ 7, 6, 5, 6, 9],
[11, 11, 10, 11, 10]]])
As you can see, a has to be reshaped from an array with shape (2, 5) to an array with shape (5, 2, 1) first. This is essentially so that it is broadcastable with idx, which has shape (3, 2, 5).
(I learned this method from #immerrr's answer here: https://stackoverflow.com/a/26225395/3923281)
You can use take array method:
import numpy
a = numpy.array([[5,6,7,8,9],[10,11,12,14,15]])
idx = numpy.random.randint(5, size=(3, a.shape[0], a.shape[1]))
print a.take(idx)

How to use numpy to add any two elements in an array and produce a matrix?

The native python codes are like this:
>>> a=[1,2,3,4,5,6]
>>> [[i+j for i in a] for j in a]
[[2, 3, 4, 5, 6, 7],
[3, 4, 5, 6, 7, 8],
[4, 5, 6, 7, 8, 9],
[5, 6, 7, 8, 9, 10],
[6, 7, 8, 9, 10, 11],
[7, 8, 9, 10, 11, 12]]
However, I have to use numpy to do this job as the array is very large.
Does anyone have ideas about how to do the same work in numpy?
Many NumPy binary operators have an outer method which can be used to form the equivalent of a multiplication (or in this case, addition) table:
In [260]: import numpy as np
In [255]: a = np.arange(1,7)
In [256]: a
Out[256]: array([1, 2, 3, 4, 5, 6])
In [259]: np.add.outer(a,a)
Out[259]:
array([[ 2, 3, 4, 5, 6, 7],
[ 3, 4, 5, 6, 7, 8],
[ 4, 5, 6, 7, 8, 9],
[ 5, 6, 7, 8, 9, 10],
[ 6, 7, 8, 9, 10, 11],
[ 7, 8, 9, 10, 11, 12]])

Resizing and stretching a NumPy array

I am working in Python and I have a NumPy array like this:
[1,5,9]
[2,7,3]
[8,4,6]
How do I stretch it to something like the following?
[1,1,5,5,9,9]
[1,1,5,5,9,9]
[2,2,7,7,3,3]
[2,2,7,7,3,3]
[8,8,4,4,6,6]
[8,8,4,4,6,6]
These are just some example arrays, I will actually be resizing several sizes of arrays, not just these.
I'm new at this, and I just can't seem to wrap my head around what I need to do.
#KennyTM's answer is very slick, and really works for your case but as an alternative that might offer a bit more flexibility for expanding arrays try np.repeat:
>>> a = np.array([[1, 5, 9],
[2, 7, 3],
[8, 4, 6]])
>>> np.repeat(a,2, axis=1)
array([[1, 1, 5, 5, 9, 9],
[2, 2, 7, 7, 3, 3],
[8, 8, 4, 4, 6, 6]])
So, this accomplishes repeating along one axis, to get it along multiple axes (as you might want), simply nest the np.repeat calls:
>>> np.repeat(np.repeat(a,2, axis=0), 2, axis=1)
array([[1, 1, 5, 5, 9, 9],
[1, 1, 5, 5, 9, 9],
[2, 2, 7, 7, 3, 3],
[2, 2, 7, 7, 3, 3],
[8, 8, 4, 4, 6, 6],
[8, 8, 4, 4, 6, 6]])
You can also vary the number of repeats for any initial row or column. For example, if you wanted two repeats of each row aside from the last row:
>>> np.repeat(a, [2,2,1], axis=0)
array([[1, 5, 9],
[1, 5, 9],
[2, 7, 3],
[2, 7, 3],
[8, 4, 6]])
Here when the second argument is a list it specifies a row-wise (rows in this case because axis=0) repeats for each row.
>>> a = numpy.array([[1,5,9],[2,7,3],[8,4,6]])
>>> numpy.kron(a, [[1,1],[1,1]])
array([[1, 1, 5, 5, 9, 9],
[1, 1, 5, 5, 9, 9],
[2, 2, 7, 7, 3, 3],
[2, 2, 7, 7, 3, 3],
[8, 8, 4, 4, 6, 6],
[8, 8, 4, 4, 6, 6]])
Unfortunately numpy does not allow fractional steps (as far as I am aware). Here is a workaround. It's not as clever as Kenny's solution, but it makes use of traditional indexing:
>>> a = numpy.array([[1,5,9],[2,7,3],[8,4,6]])
>>> step = .5
>>> xstop, ystop = a.shape
>>> x = numpy.arange(0,xstop,step).astype(int)
>>> y = numpy.arange(0,ystop,step).astype(int)
>>> mg = numpy.meshgrid(x,y)
>>> b = a[mg].T
>>> b
array([[1, 1, 5, 5, 9, 9],
[1, 1, 5, 5, 9, 9],
[2, 2, 7, 7, 3, 3],
[2, 2, 7, 7, 3, 3],
[8, 8, 4, 4, 6, 6],
[8, 8, 4, 4, 6, 6]])
(dtlussier's solution is better)

Categories