Using NumPy to build an array of all combinations of two arrays - python

I'm trying to run over the parameters space of a six-parameter function to study its numerical behavior before trying to do anything complex with it, so I'm searching for an efficient way to do this.
My function takes float values given in a 6-dim NumPy array as input. What I tried to do initially was this:
First, I created a function that takes two arrays and generate an array with all combinations of values from the two arrays:
from numpy import *
def comb(a, b):
c = []
for i in a:
for j in b:
c.append(r_[i,j])
return c
Then, I used reduce() to apply that to m copies of the same array:
def combs(a, m):
return reduce(comb, [a]*m)
Finally, I evaluate my function like this:
values = combs(np.arange(0, 1, 0.1), 6)
for val in values:
print F(val)
This works, but it's way too slow. I know the space of parameters is huge, but this shouldn't be so slow. I have only sampled 106 (a million) points in this example and it took more than 15 seconds just to create the array values.
Is there a more efficient way of doing this with NumPy?
I can modify the way the function F takes its arguments if it's necessary.

In newer versions of NumPy (>1.8.x), numpy.meshgrid() provides a much faster implementation:
For pv's solution:
In [113]:
%timeit cartesian(([1, 2, 3], [4, 5], [6, 7]))
10000 loops, best of 3: 135 µs per loop
In [114]:
cartesian(([1, 2, 3], [4, 5], [6, 7]))
Out[114]:
array([[1, 4, 6],
[1, 4, 7],
[1, 5, 6],
[1, 5, 7],
[2, 4, 6],
[2, 4, 7],
[2, 5, 6],
[2, 5, 7],
[3, 4, 6],
[3, 4, 7],
[3, 5, 6],
[3, 5, 7]])
numpy.meshgrid() used to be two-dimensional only, but now it is capable of multidimensional. In this case, three-dimensional:
In [115]:
%timeit np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)
10000 loops, best of 3: 74.1 µs per loop
In [116]:
np.array(np.meshgrid([1, 2, 3], [4, 5], [6, 7])).T.reshape(-1,3)
Out[116]:
array([[1, 4, 6],
[1, 5, 6],
[2, 4, 6],
[2, 5, 6],
[3, 4, 6],
[3, 5, 6],
[1, 4, 7],
[1, 5, 7],
[2, 4, 7],
[2, 5, 7],
[3, 4, 7],
[3, 5, 7]])
Note that the order of the final resultant is slightly different.

Here's a pure-NumPy implementation. It's about 5 times faster than using itertools.
Python 3:
import numpy as np
def cartesian(arrays, out=None):
"""
Generate a Cartesian product of input arrays.
Parameters
----------
arrays : list of array-like
1-D arrays to form the Cartesian product of.
out : ndarray
Array to place the Cartesian product in.
Returns
-------
out : ndarray
2-D array of shape (M, len(arrays)) containing Cartesian products
formed of input arrays.
Examples
--------
>>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
array([[1, 4, 6],
[1, 4, 7],
[1, 5, 6],
[1, 5, 7],
[2, 4, 6],
[2, 4, 7],
[2, 5, 6],
[2, 5, 7],
[3, 4, 6],
[3, 4, 7],
[3, 5, 6],
[3, 5, 7]])
"""
arrays = [np.asarray(x) for x in arrays]
dtype = arrays[0].dtype
n = np.prod([x.size for x in arrays])
if out is None:
out = np.zeros([n, len(arrays)], dtype=dtype)
#m = n / arrays[0].size
m = int(n / arrays[0].size)
out[:,0] = np.repeat(arrays[0], m)
if arrays[1:]:
cartesian(arrays[1:], out=out[0:m, 1:])
for j in range(1, arrays[0].size):
#for j in xrange(1, arrays[0].size):
out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
return out
Python 2:
import numpy as np
def cartesian(arrays, out=None):
arrays = [np.asarray(x) for x in arrays]
dtype = arrays[0].dtype
n = np.prod([x.size for x in arrays])
if out is None:
out = np.zeros([n, len(arrays)], dtype=dtype)
m = n / arrays[0].size
out[:,0] = np.repeat(arrays[0], m)
if arrays[1:]:
cartesian(arrays[1:], out=out[0:m, 1:])
for j in xrange(1, arrays[0].size):
out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
return out

itertools.combinations is in general the fastest way to get combinations from a Python container (if you do in fact want combinations, i.e., arrangements without repetitions and independent of order; that's not what your code appears to be doing, but I can't tell whether that's because your code is buggy or because you're using the wrong terminology).
If you want something different than combinations perhaps other iterators in itertools, product or permutations, might serve you better. For example, it looks like your code is roughly the same as:
for val in itertools.product(np.arange(0, 1, 0.1), repeat=6):
print F(val)
All of these iterators yield tuples, not lists or NumPy arrays, so if your F is picky about getting specifically a NumPy array, you'll have to accept the extra overhead of constructing or clearing and refilling one at each step.

You can use np.array(itertools.product(a, b)).

You can do something like this
import numpy as np
def cartesian_coord(*arrays):
grid = np.meshgrid(*arrays)
coord_list = [entry.ravel() for entry in grid]
points = np.vstack(coord_list).T
return points
a = np.arange(4) # Fake data
print(cartesian_coord(*6*[a])
which gives
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 2],
...,
[3, 3, 3, 3, 3, 1],
[3, 3, 3, 3, 3, 2],
[3, 3, 3, 3, 3, 3]])

The following NumPy implementation should be approximately two times the speed of the given previous answers:
def cartesian2(arrays):
arrays = [np.asarray(a) for a in arrays]
shape = (len(x) for x in arrays)
ix = np.indices(shape, dtype=int)
ix = ix.reshape(len(arrays), -1).T
for n, arr in enumerate(arrays):
ix[:, n] = arrays[n][ix[:, n]]
return ix

It looks like you want a grid to evaluate your function, in which case you can use numpy.ogrid (open) or numpy.mgrid (fleshed out):
import numpy
my_grid = numpy.mgrid[[slice(0, 1, 0.1)]*6]

Here's yet another way, using pure NumPy, no recursion, no list comprehension, and no explicit for loops. It's about 20% slower than the original answer, and it's based on np.meshgrid.
def cartesian(*arrays):
mesh = np.meshgrid(*arrays) # Standard NumPy meshgrid
dim = len(mesh) # Number of dimensions
elements = mesh[0].size # Number of elements, any index will do
flat = np.concatenate(mesh).ravel() # Flatten the whole meshgrid
reshape = np.reshape(flat, (dim, elements)).T # Reshape and transpose
return reshape
For example,
x = np.arange(3)
a = cartesian(x, x, x, x, x)
print(a)
gives
[[0 0 0 0 0]
[0 0 0 0 1]
[0 0 0 0 2]
...,
[2 2 2 2 0]
[2 2 2 2 1]
[2 2 2 2 2]]

For a pure NumPy implementation of the Cartesian product of one-dimensional arrays (or flat Python lists), just use meshgrid(), roll the axes with transpose(), and reshape to the desired output:
def cartprod(*arrays):
N = len(arrays)
return transpose(meshgrid(*arrays, indexing='ij'),
roll(arange(N + 1), -1)).reshape(-1, N)
Note this has the convention of the last axis changing the fastest ("C style" or "row-major").
In [88]: cartprod([1,2,3], [4,8], [100, 200, 300, 400], [-5, -4])
Out[88]:
array([[ 1, 4, 100, -5],
[ 1, 4, 100, -4],
[ 1, 4, 200, -5],
[ 1, 4, 200, -4],
[ 1, 4, 300, -5],
[ 1, 4, 300, -4],
[ 1, 4, 400, -5],
[ 1, 4, 400, -4],
[ 1, 8, 100, -5],
[ 1, 8, 100, -4],
[ 1, 8, 200, -5],
[ 1, 8, 200, -4],
[ 1, 8, 300, -5],
[ 1, 8, 300, -4],
[ 1, 8, 400, -5],
[ 1, 8, 400, -4],
[ 2, 4, 100, -5],
[ 2, 4, 100, -4],
[ 2, 4, 200, -5],
[ 2, 4, 200, -4],
[ 2, 4, 300, -5],
[ 2, 4, 300, -4],
[ 2, 4, 400, -5],
[ 2, 4, 400, -4],
[ 2, 8, 100, -5],
[ 2, 8, 100, -4],
[ 2, 8, 200, -5],
[ 2, 8, 200, -4],
[ 2, 8, 300, -5],
[ 2, 8, 300, -4],
[ 2, 8, 400, -5],
[ 2, 8, 400, -4],
[ 3, 4, 100, -5],
[ 3, 4, 100, -4],
[ 3, 4, 200, -5],
[ 3, 4, 200, -4],
[ 3, 4, 300, -5],
[ 3, 4, 300, -4],
[ 3, 4, 400, -5],
[ 3, 4, 400, -4],
[ 3, 8, 100, -5],
[ 3, 8, 100, -4],
[ 3, 8, 200, -5],
[ 3, 8, 200, -4],
[ 3, 8, 300, -5],
[ 3, 8, 300, -4],
[ 3, 8, 400, -5],
[ 3, 8, 400, -4]])
If you want to change the first axis fastest ("Fortran style" or "column-major"), just change the order parameter of reshape() like this: reshape((-1, N), order='F')

Pandas' merge() offers a naive, fast solution to the problem:
# Given the lists
x, y, z = [1, 2, 3], [4, 5], [6, 7]
# Get dataframes with the same, constant index
x = pd.DataFrame({'x': x}, index=np.repeat(0, len(x)))
y = pd.DataFrame({'y': y}, index=np.repeat(0, len(y)))
z = pd.DataFrame({'z': z}, index=np.repeat(0, len(z)))
# Get all permutations stored in a new dataframe
df = pd.merge(x, pd.merge(y, z, left_index=True, right_index=True),
left_index=True, right_index=True)

Related

How to combine 2 matrix given a rule using numpy?

I have two matrix.
a = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
b = [
[0, 0, 100],
[100, 0, 0],
[0, 0, 100]
]
I would like to create third matrix which contains elements from matrix a and non-zero elements from matrix b
c = [
[1, 2, 100],
[100, 5, 6],
[7, 8, 100]
]
How can i do this using numpy ?
Thx!
You could index both arrays where b==0:
# this assumes a and b are NumPy arrays
m = b==0
b[m] = a[m]
print(b)
array([[ 1, 2, 100],
[100, 5, 6],
[ 7, 8, 100]])

Columns of each row in 2D Numpy array do not shuffle [duplicate]

Suppose I have a matrix A with some arbitrary values:
array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
And a matrix B which contains indices of elements in A:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
How do I select values from A pointed by B, i.e.:
A[B] = [[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]]
EDIT: np.take_along_axis is a builtin function for this use case implemented since numpy 1.15. See #hpaulj 's answer below for how to use it.
You can use NumPy's advanced indexing -
A[np.arange(A.shape[0])[:,None],B]
One can also use linear indexing -
m,n = A.shape
out = np.take(A,B + n*np.arange(m)[:,None])
Sample run -
In [40]: A
Out[40]:
array([[2, 4, 5, 3],
[1, 6, 8, 9],
[8, 7, 0, 2]])
In [41]: B
Out[41]:
array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
In [42]: A[np.arange(A.shape[0])[:,None],B]
Out[42]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
In [43]: m,n = A.shape
In [44]: np.take(A,B + n*np.arange(m)[:,None])
Out[44]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
More recent versions have added a take_along_axis function that does the job:
A = np.array([[ 2, 4, 5, 3],
[ 1, 6, 8, 9],
[ 8, 7, 0, 2]])
B = np.array([[0, 0, 1, 2],
[0, 3, 2, 1],
[3, 2, 1, 0]])
np.take_along_axis(A, B, 1)
Out[]:
array([[2, 2, 4, 5],
[1, 9, 8, 6],
[2, 0, 7, 8]])
There's also a put_along_axis.
I know this is an old question, but another way of doing it using indices is:
A[np.indices(B.shape)[0], B]
output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]
Following is the solution using for loop:
outlist = []
for i in range(len(B)):
lst = []
for j in range(len(B[i])):
lst.append(A[i][B[i][j]])
outlist.append(lst)
outarray = np.asarray(outlist)
print(outarray)
Above can also be written in more succinct list comprehension form:
outlist = [ [A[i][B[i][j]] for j in range(len(B[i]))]
for i in range(len(B)) ]
outarray = np.asarray(outlist)
print(outarray)
Output:
[[2 2 4 5]
[1 9 8 6]
[2 0 7 8]]

Element-wise multiplication of 'slices' of 2D matrix to form 3D matrix

A matrix multiplication like this
Is easy to implement in Python using numpy
import numpy as np
np.array([[1, 2, 3]]) * np.array([[1], [2], [3]])
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
But in my situation, I have 2 2D matrices that I want to multiply to form a 3D matrix. Effectively, the first 'slice' of the 2D matrix is an array that I want to multiply by the first 'slice' of the second matrix to form a 2D matrix. This is continued for all the 'slices' of the 2D matrices. Think of the first as being dimensions [x,z] and the second being dimensions [y,z]. I want to multiply them to get [x,y,z]. Is there an elegant way to do this in numpy?
Because you can already describe your multiplication as
[x, z] * [y, z] -> [x, y, z]
the most straightforward solution will most likely be using Einsum:
import numpy as np
A = np.arange(12).reshape(4, 3)
# array([[ 0, 1, 2],
# [ 3, 4, 5],
# [ 6, 7, 8],
# [ 9, 10, 11]])
B = np.arange(9).reshape(3, 3)
# array([[0, 1, 2],
# [3, 4, 5],
# [6, 7, 8]])
C = np.einsum('xz,yz->xyz', A, B)
# array([[[ 0, 1, 4],
# [ 0, 4, 10],
# [ 0, 7, 16]],
#
# [[ 0, 4, 10],
# [ 9, 16, 25],
# [18, 28, 40]],
#
# [[ 0, 7, 16],
# [18, 28, 40],
# [36, 49, 64]],
#
# [[ 0, 10, 22],
# [27, 40, 55],
# [54, 70, 88]]])
An alternative is to simply use broadcasting
D = A[:, None, :] * B[None, :, :]
np.allclose(D, C)
# True
I managed to figure it out with the help of the response to this StackOverflow question.
arr = np.array([[1, 2, 3]])
arr * arr.T
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
mat = np.repeat(arr, 3, axis=0)
mat
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
mat[:,:,None] * np.transpose(mat[:,None,:], axes=(1, 0, 2))
array([[[1, 2, 3],
[2, 4, 6],
[3, 6, 9]],
[[1, 2, 3],
[2, 4, 6],
[3, 6, 9]],
[[1, 2, 3],
[2, 4, 6],
[3, 6, 9]]])

Efficient way of making a list of pairs from an array in Numpy

I have a numpy array x (with (n,4) shape) of integers like:
[[0 1 2 3],
[1 2 7 9],
[2 1 5 2],
...]
I want to transform the array into an array of pairs:
[0,1]
[0,2]
[0,3]
[1,2]
...
so first element makes a pair with other elements in the same sub-array. I have already a for-loop solution:
y=np.array([[x[j,0],x[j,i]] for i in range(1,4) for j in range(0,n)],dtype=int)
but since looping over numpy array is not efficient, I tried slicing as the solution. I can do the slicing for every column as:
y[1]=np.array([x[:,0],x[:,1]]).T
# [[0,1],[1,2],[2,1],...]
I can repeat this for all columns. My questions are:
How can I append y[2] to y[1],... such that the shape is (N,2)?
If number of columns is not small (in this example 4), how can I find y[i] elegantly?
What are the alternative ways to achieve the final array?
The cleanest way of doing this I can think of would be:
>>> x = np.arange(12).reshape(3, 4)
>>> x
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> n = x.shape[1] - 1
>>> y = np.repeat(x, (n,)+(1,)*n, axis=1)
>>> y
array([[ 0, 0, 0, 1, 2, 3],
[ 4, 4, 4, 5, 6, 7],
[ 8, 8, 8, 9, 10, 11]])
>>> y.reshape(-1, 2, n).transpose(0, 2, 1).reshape(-1, 2)
array([[ 0, 1],
[ 0, 2],
[ 0, 3],
[ 4, 5],
[ 4, 6],
[ 4, 7],
[ 8, 9],
[ 8, 10],
[ 8, 11]])
This will make two copies of the data, so it will not be the most efficient method. That would probably be something like:
>>> y = np.empty((x.shape[0], n, 2), dtype=x.dtype)
>>> y[..., 0] = x[:, 0, None]
>>> y[..., 1] = x[:, 1:]
>>> y.shape = (-1, 2)
>>> y
array([[ 0, 1],
[ 0, 2],
[ 0, 3],
[ 4, 5],
[ 4, 6],
[ 4, 7],
[ 8, 9],
[ 8, 10],
[ 8, 11]])
Like Jaimie, I first tried a repeat of the 1st column followed by reshaping, but then decided it was simpler to make 2 intermediary arrays, and hstack them:
x=np.array([[0,1,2,3],[1,2,7,9],[2,1,5,2]])
m,n=x.shape
x1=x[:,0].repeat(n-1)[:,None]
x2=x[:,1:].reshape(-1,1)
np.hstack([x1,x2])
producing
array([[0, 1],
[0, 2],
[0, 3],
[1, 2],
[1, 7],
[1, 9],
[2, 1],
[2, 5],
[2, 2]])
There probably are other ways of doing this sort of rearrangement. The result will copy the original data in one way or other. My guess is that as long as you are using compiled functions like reshape and repeat, the time differences won't be significant.
Suppose the numpy array is
arr = np.array([[0, 1, 2, 3],
[1, 2, 7, 9],
[2, 1, 5, 2]])
You can get the array of pairs as
import itertools
m, n = arr.shape
new_arr = np.array([x for i in range(m)
for x in itertools.product(a[i, 0 : 1], a[i, 1 : n])])
The output would be
array([[0, 1],
[0, 2],
[0, 3],
[1, 2],
[1, 7],
[1, 9],
[2, 1],
[2, 5],
[2, 2]])

Sub matrix of a list of lists (without numpy)

Suppose I have a matrix composed of a list of lists like so:
>>> LoL=[list(range(10)) for i in range(10)]
>>> LoL
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]
Assume, also, that I have a numpy matrix of the same structure called LoLa:
>>> LoLa=np.array(LoL)
Using numpy, I could get a submatrix of this matrix like this:
>>> LoLa[1:4,2:5]
array([[2, 3, 4],
[2, 3, 4],
[2, 3, 4]])
I can replicate the numpy matrix slice in pure Python like so:
>>> r=(1,4)
>>> s=(2,5)
>>> [LoL[i][s[0]:s[1]] for i in range(len(LoL))][r[0]:r[1]]
[[2, 3, 4], [2, 3, 4], [2, 3, 4]]
Which is not the easiest thing in the world to read nor the most efficient :-)
Question: Is there an easier way (in pure Python) to slice an arbitrary matrix as a sub matrix?
In [74]: [row[2:5] for row in LoL[1:4]]
Out[74]: [[2, 3, 4], [2, 3, 4], [2, 3, 4]]
You could also mimic NumPy's syntax by defining a subclass of list:
class LoL(list):
def __init__(self, *args):
list.__init__(self, *args)
def __getitem__(self, item):
try:
return list.__getitem__(self, item)
except TypeError:
rows, cols = item
return [row[cols] for row in self[rows]]
lol = LoL([list(range(10)) for i in range(10)])
print(lol[1:4, 2:5])
also yields
[[2, 3, 4], [2, 3, 4], [2, 3, 4]]
Using the LoL subclass won't win any speed tests:
In [85]: %timeit [row[2:5] for row in x[1:4]]
1000000 loops, best of 3: 538 ns per loop
In [82]: %timeit lol[1:4, 2:5]
100000 loops, best of 3: 3.07 us per loop
but speed isn't everything -- sometimes readability is more important.
For one, you can use slice objects directly, which helps a bit with both the readability and performance:
r = slice(1,4)
s = slice(2,5)
[LoL[i][s] for i in range(len(LoL))[r]]
And if you just iterate over the list-of-lists directly, you can write that as:
[row[s] for row in LoL[r]]
Do this,
submat = [ [ mat[ i ][ j ] for j in range( index1, index2 ) ] for i in range( index3, index4 ) ]
the submat will be the rectangular (square if index3 == index1 and index2 == index4) chunk of your original big matrix.
I dont know if its easier, but let me throw an idea to the table:
from itertools import product
r = (1+1, 4+1)
s = (2+1, 5+1)
array = [LoL[i][j] for i,j in product(range(*r), range(*s))]
This is a flattened version of the submatrix you want.

Categories