Vectorized operations on 2 columns of different dataframes [duplicate] - python
I have two numpy arrays that define the x and y axes of a grid. For example:
x = numpy.array([1,2,3])
y = numpy.array([4,5])
I'd like to generate the Cartesian product of these arrays to generate:
array([[1,4],[2,4],[3,4],[1,5],[2,5],[3,5]])
In a way that's not terribly inefficient since I need to do this many times in a loop. I'm assuming that converting them to a Python list and using itertools.product and back to a numpy array is not the most efficient form.
A canonical cartesian_product (almost)
There are many approaches to this problem with different properties. Some are faster than others, and some are more general-purpose. After a lot of testing and tweaking, I've found that the following function, which calculates an n-dimensional cartesian_product, is faster than most others for many inputs. For a pair of approaches that are slightly more complex, but are even a bit faster in many cases, see the answer by Paul Panzer.
Given that answer, this is no longer the fastest implementation of the cartesian product in numpy that I'm aware of. However, I think its simplicity will continue to make it a useful benchmark for future improvement:
def cartesian_product(*arrays):
la = len(arrays)
dtype = numpy.result_type(*arrays)
arr = numpy.empty([len(a) for a in arrays] + [la], dtype=dtype)
for i, a in enumerate(numpy.ix_(*arrays)):
arr[...,i] = a
return arr.reshape(-1, la)
It's worth mentioning that this function uses ix_ in an unusual way; whereas the documented use of ix_ is to generate indices into an array, it just so happens that arrays with the same shape can be used for broadcasted assignment. Many thanks to mgilson, who inspired me to try using ix_ this way, and to unutbu, who provided some extremely helpful feedback on this answer, including the suggestion to use numpy.result_type.
Notable alternatives
It's sometimes faster to write contiguous blocks of memory in Fortran order. That's the basis of this alternative, cartesian_product_transpose, which has proven faster on some hardware than cartesian_product (see below). However, Paul Panzer's answer, which uses the same principle, is even faster. Still, I include this here for interested readers:
def cartesian_product_transpose(*arrays):
broadcastable = numpy.ix_(*arrays)
broadcasted = numpy.broadcast_arrays(*broadcastable)
rows, cols = numpy.prod(broadcasted[0].shape), len(broadcasted)
dtype = numpy.result_type(*arrays)
out = numpy.empty(rows * cols, dtype=dtype)
start, end = 0, rows
for a in broadcasted:
out[start:end] = a.reshape(-1)
start, end = end, end + rows
return out.reshape(cols, rows).T
After coming to understand Panzer's approach, I wrote a new version that's almost as fast as his, and is almost as simple as cartesian_product:
def cartesian_product_simple_transpose(arrays):
la = len(arrays)
dtype = numpy.result_type(*arrays)
arr = numpy.empty([la] + [len(a) for a in arrays], dtype=dtype)
for i, a in enumerate(numpy.ix_(*arrays)):
arr[i, ...] = a
return arr.reshape(la, -1).T
This appears to have some constant-time overhead that makes it run slower than Panzer's for small inputs. But for larger inputs, in all the tests I ran, it performs just as well as his fastest implementation (cartesian_product_transpose_pp).
In following sections, I include some tests of other alternatives. These are now somewhat out of date, but rather than duplicate effort, I've decided to leave them here out of historical interest. For up-to-date tests, see Panzer's answer, as well as Nico Schlömer's.
Tests against alternatives
Here is a battery of tests that show the performance boost that some of these functions provide relative to a number of alternatives. All the tests shown here were performed on a quad-core machine, running Mac OS 10.12.5, Python 3.6.1, and numpy 1.12.1. Variations on hardware and software are known to produce different results, so YMMV. Run these tests for yourself to be sure!
Definitions:
import numpy
import itertools
from functools import reduce
### Two-dimensional products ###
def repeat_product(x, y):
return numpy.transpose([numpy.tile(x, len(y)),
numpy.repeat(y, len(x))])
def dstack_product(x, y):
return numpy.dstack(numpy.meshgrid(x, y)).reshape(-1, 2)
### Generalized N-dimensional products ###
def cartesian_product(*arrays):
la = len(arrays)
dtype = numpy.result_type(*arrays)
arr = numpy.empty([len(a) for a in arrays] + [la], dtype=dtype)
for i, a in enumerate(numpy.ix_(*arrays)):
arr[...,i] = a
return arr.reshape(-1, la)
def cartesian_product_transpose(*arrays):
broadcastable = numpy.ix_(*arrays)
broadcasted = numpy.broadcast_arrays(*broadcastable)
rows, cols = numpy.prod(broadcasted[0].shape), len(broadcasted)
dtype = numpy.result_type(*arrays)
out = numpy.empty(rows * cols, dtype=dtype)
start, end = 0, rows
for a in broadcasted:
out[start:end] = a.reshape(-1)
start, end = end, end + rows
return out.reshape(cols, rows).T
# from https://stackoverflow.com/a/1235363/577088
def cartesian_product_recursive(*arrays, out=None):
arrays = [numpy.asarray(x) for x in arrays]
dtype = arrays[0].dtype
n = numpy.prod([x.size for x in arrays])
if out is None:
out = numpy.zeros([n, len(arrays)], dtype=dtype)
m = n // arrays[0].size
out[:,0] = numpy.repeat(arrays[0], m)
if arrays[1:]:
cartesian_product_recursive(arrays[1:], out=out[0:m,1:])
for j in range(1, arrays[0].size):
out[j*m:(j+1)*m,1:] = out[0:m,1:]
return out
def cartesian_product_itertools(*arrays):
return numpy.array(list(itertools.product(*arrays)))
### Test code ###
name_func = [('repeat_product',
repeat_product),
('dstack_product',
dstack_product),
('cartesian_product',
cartesian_product),
('cartesian_product_transpose',
cartesian_product_transpose),
('cartesian_product_recursive',
cartesian_product_recursive),
('cartesian_product_itertools',
cartesian_product_itertools)]
def test(in_arrays, test_funcs):
global func
global arrays
arrays = in_arrays
for name, func in test_funcs:
print('{}:'.format(name))
%timeit func(*arrays)
def test_all(*in_arrays):
test(in_arrays, name_func)
# `cartesian_product_recursive` throws an
# unexpected error when used on more than
# two input arrays, so for now I've removed
# it from these tests.
def test_cartesian(*in_arrays):
test(in_arrays, name_func[2:4] + name_func[-1:])
x10 = [numpy.arange(10)]
x50 = [numpy.arange(50)]
x100 = [numpy.arange(100)]
x500 = [numpy.arange(500)]
x1000 = [numpy.arange(1000)]
Test results:
In [2]: test_all(*(x100 * 2))
repeat_product:
67.5 µs ± 633 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
dstack_product:
67.7 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
cartesian_product:
33.4 µs ± 558 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
cartesian_product_transpose:
67.7 µs ± 932 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
cartesian_product_recursive:
215 µs ± 6.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
cartesian_product_itertools:
3.65 ms ± 38.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [3]: test_all(*(x500 * 2))
repeat_product:
1.31 ms ± 9.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
dstack_product:
1.27 ms ± 7.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
cartesian_product:
375 µs ± 4.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
cartesian_product_transpose:
488 µs ± 8.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
cartesian_product_recursive:
2.21 ms ± 38.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
cartesian_product_itertools:
105 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [4]: test_all(*(x1000 * 2))
repeat_product:
10.2 ms ± 132 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
dstack_product:
12 ms ± 120 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
cartesian_product:
4.75 ms ± 57.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
cartesian_product_transpose:
7.76 ms ± 52.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
cartesian_product_recursive:
13 ms ± 209 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
cartesian_product_itertools:
422 ms ± 7.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In all cases, cartesian_product as defined at the beginning of this answer is fastest.
For those functions that accept an arbitrary number of input arrays, it's worth checking performance when len(arrays) > 2 as well. (Until I can determine why cartesian_product_recursive throws an error in this case, I've removed it from these tests.)
In [5]: test_cartesian(*(x100 * 3))
cartesian_product:
8.8 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
cartesian_product_transpose:
7.87 ms ± 91.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
cartesian_product_itertools:
518 ms ± 5.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [6]: test_cartesian(*(x50 * 4))
cartesian_product:
169 ms ± 5.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
cartesian_product_transpose:
184 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
cartesian_product_itertools:
3.69 s ± 73.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: test_cartesian(*(x10 * 6))
cartesian_product:
26.5 ms ± 449 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
cartesian_product_transpose:
16 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
cartesian_product_itertools:
728 ms ± 16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [8]: test_cartesian(*(x10 * 7))
cartesian_product:
650 ms ± 8.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
cartesian_product_transpose:
518 ms ± 7.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
cartesian_product_itertools:
8.13 s ± 122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
As these tests show, cartesian_product remains competitive until the number of input arrays rises above (roughly) four. After that, cartesian_product_transpose does have a slight edge.
It's worth reiterating that users with other hardware and operating systems may see different results. For example, unutbu reports seeing the following results for these tests using Ubuntu 14.04, Python 3.4.3, and numpy 1.14.0.dev0+b7050a9:
>>> %timeit cartesian_product_transpose(x500, y500)
1000 loops, best of 3: 682 µs per loop
>>> %timeit cartesian_product(x500, y500)
1000 loops, best of 3: 1.55 ms per loop
Below, I go into a few details about earlier tests I've run along these lines. The relative performance of these approaches has changed over time, for different hardware and different versions of Python and numpy. While it's not immediately useful for people using up-to-date versions of numpy, it illustrates how things have changed since the first version of this answer.
A simple alternative: meshgrid + dstack
The currently accepted answer uses tile and repeat to broadcast two arrays together. But the meshgrid function does practically the same thing. Here's the output of tile and repeat before being passed to transpose:
In [1]: import numpy
In [2]: x = numpy.array([1,2,3])
...: y = numpy.array([4,5])
...:
In [3]: [numpy.tile(x, len(y)), numpy.repeat(y, len(x))]
Out[3]: [array([1, 2, 3, 1, 2, 3]), array([4, 4, 4, 5, 5, 5])]
And here's the output of meshgrid:
In [4]: numpy.meshgrid(x, y)
Out[4]:
[array([[1, 2, 3],
[1, 2, 3]]), array([[4, 4, 4],
[5, 5, 5]])]
As you can see, it's almost identical. We need only reshape the result to get exactly the same result.
In [5]: xt, xr = numpy.meshgrid(x, y)
...: [xt.ravel(), xr.ravel()]
Out[5]: [array([1, 2, 3, 1, 2, 3]), array([4, 4, 4, 5, 5, 5])]
Rather than reshaping at this point, though, we could pass the output of meshgrid to dstack and reshape afterwards, which saves some work:
In [6]: numpy.dstack(numpy.meshgrid(x, y)).reshape(-1, 2)
Out[6]:
array([[1, 4],
[2, 4],
[3, 4],
[1, 5],
[2, 5],
[3, 5]])
Contrary to the claim in this comment, I've seen no evidence that different inputs will produce differently shaped outputs, and as the above demonstrates, they do very similar things, so it would be quite strange if they did. Please let me know if you find a counterexample.
Testing meshgrid + dstack vs. repeat + transpose
The relative performance of these two approaches has changed over time. In an earlier version of Python (2.7), the result using meshgrid + dstack was noticeably faster for small inputs. (Note that these tests are from an old version of this answer.) Definitions:
>>> def repeat_product(x, y):
... return numpy.transpose([numpy.tile(x, len(y)),
numpy.repeat(y, len(x))])
...
>>> def dstack_product(x, y):
... return numpy.dstack(numpy.meshgrid(x, y)).reshape(-1, 2)
...
For moderately-sized input, I saw a significant speedup. But I retried these tests with more recent versions of Python (3.6.1) and numpy (1.12.1), on a newer machine. The two approaches are almost identical now.
Old Test
>>> x, y = numpy.arange(500), numpy.arange(500)
>>> %timeit repeat_product(x, y)
10 loops, best of 3: 62 ms per loop
>>> %timeit dstack_product(x, y)
100 loops, best of 3: 12.2 ms per loop
New Test
In [7]: x, y = numpy.arange(500), numpy.arange(500)
In [8]: %timeit repeat_product(x, y)
1.32 ms ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [9]: %timeit dstack_product(x, y)
1.26 ms ± 8.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
As always, YMMV, but this suggests that in recent versions of Python and numpy, these are interchangeable.
Generalized product functions
In general, we might expect that using built-in functions will be faster for small inputs, while for large inputs, a purpose-built function might be faster. Furthermore for a generalized n-dimensional product, tile and repeat won't help, because they don't have clear higher-dimensional analogues. So it's worth investigating the behavior of purpose-built functions as well.
Most of the relevant tests appear at the beginning of this answer, but here are a few of the tests performed on earlier versions of Python and numpy for comparison.
The cartesian function defined in another answer used to perform pretty well for larger inputs. (It's the same as the function called cartesian_product_recursive above.) In order to compare cartesian to dstack_prodct, we use just two dimensions.
Here again, the old test showed a significant difference, while the new test shows almost none.
Old Test
>>> x, y = numpy.arange(1000), numpy.arange(1000)
>>> %timeit cartesian([x, y])
10 loops, best of 3: 25.4 ms per loop
>>> %timeit dstack_product(x, y)
10 loops, best of 3: 66.6 ms per loop
New Test
In [10]: x, y = numpy.arange(1000), numpy.arange(1000)
In [11]: %timeit cartesian([x, y])
12.1 ms ± 199 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [12]: %timeit dstack_product(x, y)
12.7 ms ± 334 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
As before, dstack_product still beats cartesian at smaller scales.
New Test (redundant old test not shown)
In [13]: x, y = numpy.arange(100), numpy.arange(100)
In [14]: %timeit cartesian([x, y])
215 µs ± 4.75 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [15]: %timeit dstack_product(x, y)
65.7 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
These distinctions are, I think, interesting and worth recording; but they are academic in the end. As the tests at the beginning of this answer showed, all of these versions are almost always slower than cartesian_product, defined at the very beginning of this answer -- which is itself a bit slower than the fastest implementations among the answers to this question.
>>> numpy.transpose([numpy.tile(x, len(y)), numpy.repeat(y, len(x))])
array([[1, 4],
[2, 4],
[3, 4],
[1, 5],
[2, 5],
[3, 5]])
See Using numpy to build an array of all combinations of two arrays for a general solution for computing the Cartesian product of N arrays.
You can just do normal list comprehension in python
x = numpy.array([1,2,3])
y = numpy.array([4,5])
[[x0, y0] for x0 in x for y0 in y]
which should give you
[[1, 4], [1, 5], [2, 4], [2, 5], [3, 4], [3, 5]]
I was interested in this as well and did a little performance comparison, perhaps somewhat clearer than in #senderle's answer.
For two arrays (the classical case):
For four arrays:
(Note that the length the arrays is only a few dozen entries here.)
Code to reproduce the plots:
from functools import reduce
import itertools
import numpy
import perfplot
def dstack_product(arrays):
return numpy.dstack(numpy.meshgrid(*arrays, indexing="ij")).reshape(-1, len(arrays))
# Generalized N-dimensional products
def cartesian_product(arrays):
la = len(arrays)
dtype = numpy.find_common_type([a.dtype for a in arrays], [])
arr = numpy.empty([len(a) for a in arrays] + [la], dtype=dtype)
for i, a in enumerate(numpy.ix_(*arrays)):
arr[..., i] = a
return arr.reshape(-1, la)
def cartesian_product_transpose(arrays):
broadcastable = numpy.ix_(*arrays)
broadcasted = numpy.broadcast_arrays(*broadcastable)
rows, cols = reduce(numpy.multiply, broadcasted[0].shape), len(broadcasted)
dtype = numpy.find_common_type([a.dtype for a in arrays], [])
out = numpy.empty(rows * cols, dtype=dtype)
start, end = 0, rows
for a in broadcasted:
out[start:end] = a.reshape(-1)
start, end = end, end + rows
return out.reshape(cols, rows).T
# from https://stackoverflow.com/a/1235363/577088
def cartesian_product_recursive(arrays, out=None):
arrays = [numpy.asarray(x) for x in arrays]
dtype = arrays[0].dtype
n = numpy.prod([x.size for x in arrays])
if out is None:
out = numpy.zeros([n, len(arrays)], dtype=dtype)
m = n // arrays[0].size
out[:, 0] = numpy.repeat(arrays[0], m)
if arrays[1:]:
cartesian_product_recursive(arrays[1:], out=out[0:m, 1:])
for j in range(1, arrays[0].size):
out[j * m : (j + 1) * m, 1:] = out[0:m, 1:]
return out
def cartesian_product_itertools(arrays):
return numpy.array(list(itertools.product(*arrays)))
perfplot.show(
setup=lambda n: 2 * (numpy.arange(n, dtype=float),),
n_range=[2 ** k for k in range(13)],
# setup=lambda n: 4 * (numpy.arange(n, dtype=float),),
# n_range=[2 ** k for k in range(6)],
kernels=[
dstack_product,
cartesian_product,
cartesian_product_transpose,
cartesian_product_recursive,
cartesian_product_itertools,
],
logx=True,
logy=True,
xlabel="len(a), len(b)",
equality_check=None,
)
Building on #senderle's exemplary ground work I've come up with two versions - one for C and one for Fortran layouts - that are often a bit faster.
cartesian_product_transpose_pp is - unlike #senderle's cartesian_product_transpose which uses a different strategy altogether - a version of cartesion_product that uses the more favorable transpose memory layout + some very minor optimizations.
cartesian_product_pp sticks with the original memory layout. What makes it fast is its using contiguous copying. Contiguous copies turn out to be so much faster that copying a full block of memory even though only part of it contains valid data is preferable to only copying the valid bits.
Some perfplots. I made separate ones for C and Fortran layouts, because these are different tasks IMO.
Names ending in 'pp' are my approaches.
1) many tiny factors (2 elements each)
2) many small factors (4 elements each)
3) three factors of equal length
4) two factors of equal length
Code (need to do separate runs for each plot b/c I couldn't figure out how to reset; also need to edit / comment in / out appropriately):
import numpy
import numpy as np
from functools import reduce
import itertools
import timeit
import perfplot
def dstack_product(arrays):
return numpy.dstack(
numpy.meshgrid(*arrays, indexing='ij')
).reshape(-1, len(arrays))
def cartesian_product_transpose_pp(arrays):
la = len(arrays)
dtype = numpy.result_type(*arrays)
arr = numpy.empty((la, *map(len, arrays)), dtype=dtype)
idx = slice(None), *itertools.repeat(None, la)
for i, a in enumerate(arrays):
arr[i, ...] = a[idx[:la-i]]
return arr.reshape(la, -1).T
def cartesian_product(arrays):
la = len(arrays)
dtype = numpy.result_type(*arrays)
arr = numpy.empty([len(a) for a in arrays] + [la], dtype=dtype)
for i, a in enumerate(numpy.ix_(*arrays)):
arr[...,i] = a
return arr.reshape(-1, la)
def cartesian_product_transpose(arrays):
broadcastable = numpy.ix_(*arrays)
broadcasted = numpy.broadcast_arrays(*broadcastable)
rows, cols = numpy.prod(broadcasted[0].shape), len(broadcasted)
dtype = numpy.result_type(*arrays)
out = numpy.empty(rows * cols, dtype=dtype)
start, end = 0, rows
for a in broadcasted:
out[start:end] = a.reshape(-1)
start, end = end, end + rows
return out.reshape(cols, rows).T
from itertools import accumulate, repeat, chain
def cartesian_product_pp(arrays, out=None):
la = len(arrays)
L = *map(len, arrays), la
dtype = numpy.result_type(*arrays)
arr = numpy.empty(L, dtype=dtype)
arrs = *accumulate(chain((arr,), repeat(0, la-1)), np.ndarray.__getitem__),
idx = slice(None), *itertools.repeat(None, la-1)
for i in range(la-1, 0, -1):
arrs[i][..., i] = arrays[i][idx[:la-i]]
arrs[i-1][1:] = arrs[i]
arr[..., 0] = arrays[0][idx]
return arr.reshape(-1, la)
def cartesian_product_itertools(arrays):
return numpy.array(list(itertools.product(*arrays)))
# from https://stackoverflow.com/a/1235363/577088
def cartesian_product_recursive(arrays, out=None):
arrays = [numpy.asarray(x) for x in arrays]
dtype = arrays[0].dtype
n = numpy.prod([x.size for x in arrays])
if out is None:
out = numpy.zeros([n, len(arrays)], dtype=dtype)
m = n // arrays[0].size
out[:, 0] = numpy.repeat(arrays[0], m)
if arrays[1:]:
cartesian_product_recursive(arrays[1:], out=out[0:m, 1:])
for j in range(1, arrays[0].size):
out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
return out
### Test code ###
if False:
perfplot.save('cp_4el_high.png',
setup=lambda n: n*(numpy.arange(4, dtype=float),),
n_range=list(range(6, 11)),
kernels=[
dstack_product,
cartesian_product_recursive,
cartesian_product,
# cartesian_product_transpose,
cartesian_product_pp,
# cartesian_product_transpose_pp,
],
logx=False,
logy=True,
xlabel='#factors',
equality_check=None
)
else:
perfplot.save('cp_2f_T.png',
setup=lambda n: 2*(numpy.arange(n, dtype=float),),
n_range=[2**k for k in range(5, 11)],
kernels=[
# dstack_product,
# cartesian_product_recursive,
# cartesian_product,
cartesian_product_transpose,
# cartesian_product_pp,
cartesian_product_transpose_pp,
],
logx=True,
logy=True,
xlabel='length of each factor',
equality_check=None
)
As of Oct. 2017, numpy now has a generic np.stack function that takes an axis parameter. Using it, we can have a "generalized cartesian product" using the "dstack and meshgrid" technique:
import numpy as np
def cartesian_product(*arrays):
ndim = len(arrays)
return (np.stack(np.meshgrid(*arrays), axis=-1)
.reshape(-1, ndim))
a = np.array([1,2])
b = np.array([10,20])
cartesian_product(a,b)
# output:
# array([[ 1, 10],
# [ 2, 10],
# [ 1, 20],
# [ 2, 20]])
Note on the axis=-1 parameter. This is the last (inner-most) axis in the result. It is equivalent to using axis=ndim.
One other comment, since Cartesian products blow up very quickly, unless we need to realize the array in memory for some reason, if the product is very large, we may want to make use of itertools and use the values on-the-fly.
The Scikit-learn package has a fast implementation of exactly this:
from sklearn.utils.extmath import cartesian
product = cartesian((x,y))
Note that the convention of this implementation is different from what you want, if you care about the order of the output. For your exact ordering, you can do
product = cartesian((y,x))[:, ::-1]
I used #kennytm answer for a while, but when trying to do the same in TensorFlow, but I found that TensorFlow has no equivalent of numpy.repeat(). After a little experimentation, I think I found a more general solution for arbitrary vectors of points.
For numpy:
import numpy as np
def cartesian_product(*args: np.ndarray) -> np.ndarray:
"""
Produce the cartesian product of arbitrary length vectors.
Parameters
----------
np.ndarray args
vector of points of interest in each dimension
Returns
-------
np.ndarray
the cartesian product of size [m x n] wherein:
m = prod([len(a) for a in args])
n = len(args)
"""
for i, a in enumerate(args):
assert a.ndim == 1, "arg {:d} is not rank 1".format(i)
return np.concatenate([np.reshape(xi, [-1, 1]) for xi in np.meshgrid(*args)], axis=1)
and for TensorFlow:
import tensorflow as tf
def cartesian_product(*args: tf.Tensor) -> tf.Tensor:
"""
Produce the cartesian product of arbitrary length vectors.
Parameters
----------
tf.Tensor args
vector of points of interest in each dimension
Returns
-------
tf.Tensor
the cartesian product of size [m x n] wherein:
m = prod([len(a) for a in args])
n = len(args)
"""
for i, a in enumerate(args):
tf.assert_rank(a, 1, message="arg {:d} is not rank 1".format(i))
return tf.concat([tf.reshape(xi, [-1, 1]) for xi in tf.meshgrid(*args)], axis=1)
More generally, if you have two 2d numpy arrays a and b, and you want to concatenate every row of a to every row of b (A cartesian product of rows, kind of like a join in a database), you can use this method:
import numpy
def join_2d(a, b):
assert a.dtype == b.dtype
a_part = numpy.tile(a, (len(b), 1))
b_part = numpy.repeat(b, len(a), axis=0)
return numpy.hstack((a_part, b_part))
The fastest you can get is either by combining a generator expression with the map function:
import numpy
import datetime
a = np.arange(1000)
b = np.arange(200)
start = datetime.datetime.now()
foo = (item for sublist in [list(map(lambda x: (x,i),a)) for i in b] for item in sublist)
print (list(foo))
print ('execution time: {} s'.format((datetime.datetime.now() - start).total_seconds()))
Outputs (actually the whole resulting list is printed):
[(0, 0), (1, 0), ...,(998, 199), (999, 199)]
execution time: 1.253567 s
or by using a double generator expression:
a = np.arange(1000)
b = np.arange(200)
start = datetime.datetime.now()
foo = ((x,y) for x in a for y in b)
print (list(foo))
print ('execution time: {} s'.format((datetime.datetime.now() - start).total_seconds()))
Outputs (whole list printed):
[(0, 0), (1, 0), ...,(998, 199), (999, 199)]
execution time: 1.187415 s
Take into account that most of the computation time goes into the printing command. The generator calculations are otherwise decently efficient. Without printing the calculation times are:
execution time: 0.079208 s
for generator expression + map function and:
execution time: 0.007093 s
for the double generator expression.
If what you actually want is to calculate the actual product of each of the coordinate pairs, the fastest is to solve it as a numpy matrix product:
a = np.arange(1000)
b = np.arange(200)
start = datetime.datetime.now()
foo = np.dot(np.asmatrix([[i,0] for i in a]), np.asmatrix([[i,0] for i in b]).T)
print (foo)
print ('execution time: {} s'.format((datetime.datetime.now() - start).total_seconds()))
Outputs:
[[ 0 0 0 ..., 0 0 0]
[ 0 1 2 ..., 197 198 199]
[ 0 2 4 ..., 394 396 398]
...,
[ 0 997 1994 ..., 196409 197406 198403]
[ 0 998 1996 ..., 196606 197604 198602]
[ 0 999 1998 ..., 196803 197802 198801]]
execution time: 0.003869 s
and without printing (in this case it doesn't save much since only a tiny piece of the matrix is actually printed out):
execution time: 0.003083 s
This can also be easily done by using itertools.product method
from itertools import product
import numpy as np
x = np.array([1, 2, 3])
y = np.array([4, 5])
cart_prod = np.array(list(product(*[x, y])),dtype='int32')
Result:
array([[1, 4],
[1, 5],
[2, 4],
[2, 5],
[3, 4],
[3, 5]], dtype=int32)
Execution time: 0.000155 s
In the specific case that you need to perform simple operations such as addition on each pair, you can introduce an extra dimension and let broadcasting do the job:
>>> a, b = np.array([1,2,3]), np.array([10,20,30])
>>> a[None,:] + b[:,None]
array([[11, 12, 13],
[21, 22, 23],
[31, 32, 33]])
I'm not sure if there is any similar way to actually get the pairs themselves.
I'm a bit late to the party, but I encoutered a tricky variant of that problem.
Let's say I want the cartesian product of several arrays, but that cartesian product ends up being much larger than the computers' memory (however, the computation done with that product are fast, or at least parallelizable).
The obvious solution is to divide this cartesian product in chunks, and treat these chunks one after the other (in sort of a "streaming" manner). You can do that easily with itertools.product, but it's horrendously slow. Also, none of the proposed solutions here (as fast as they are) give us this possibility. The solution I propose uses Numba, and is slightly faster than the "canonical" cartesian_product mentioned here. It's pretty long because I tried to optimize it everywhere I could.
import numba as nb
import numpy as np
from typing import List
#nb.njit(nb.types.Tuple((nb.int32[:, :],
nb.int32[:]))(nb.int32[:],
nb.int32[:],
nb.int64, nb.int64))
def cproduct(sizes: np.ndarray, current_tuple: np.ndarray, start_idx: int, end_idx: int):
"""Generates ids tuples from start_id to end_id"""
assert len(sizes) >= 2
assert start_idx < end_idx
tuples = np.zeros((end_idx - start_idx, len(sizes)), dtype=np.int32)
tuple_idx = 0
# stores the current combination
current_tuple = current_tuple.copy()
while tuple_idx < end_idx - start_idx:
tuples[tuple_idx] = current_tuple
current_tuple[0] += 1
# using a condition here instead of including this in the inner loop
# to gain a bit of speed: this is going to be tested each iteration,
# and starting a loop to have it end right away is a bit silly
if current_tuple[0] == sizes[0]:
# the reset to 0 and subsequent increment amount to carrying
# the number to the higher "power"
current_tuple[0] = 0
current_tuple[1] += 1
for i in range(1, len(sizes) - 1):
if current_tuple[i] == sizes[i]:
# same as before, but in a loop, since this is going
# to get called less often
current_tuple[i + 1] += 1
current_tuple[i] = 0
else:
break
tuple_idx += 1
return tuples, current_tuple
def chunked_cartesian_product_ids(sizes: List[int], chunk_size: int):
"""Just generates chunks of the cartesian product of the ids of each
input arrays (thus, we just need their sizes here, not the actual arrays)"""
prod = np.prod(sizes)
# putting the largest number at the front to more efficiently make use
# of the cproduct numba function
sizes = np.array(sizes, dtype=np.int32)
sorted_idx = np.argsort(sizes)[::-1]
sizes = sizes[sorted_idx]
if chunk_size > prod:
chunk_bounds = (np.array([0, prod])).astype(np.int64)
else:
num_chunks = np.maximum(np.ceil(prod / chunk_size), 2).astype(np.int32)
chunk_bounds = (np.arange(num_chunks + 1) * chunk_size).astype(np.int64)
chunk_bounds[-1] = prod
current_tuple = np.zeros(len(sizes), dtype=np.int32)
for start_idx, end_idx in zip(chunk_bounds[:-1], chunk_bounds[1:]):
tuples, current_tuple = cproduct(sizes, current_tuple, start_idx, end_idx)
# re-arrange columns to match the original order of the sizes list
# before yielding
yield tuples[:, np.argsort(sorted_idx)]
def chunked_cartesian_product(*arrays, chunk_size=2 ** 25):
"""Returns chunks of the full cartesian product, with arrays of shape
(chunk_size, n_arrays). The last chunk will obviously have the size of the
remainder"""
array_lengths = [len(array) for array in arrays]
for array_ids_chunk in chunked_cartesian_product_ids(array_lengths, chunk_size):
slices_lists = [arrays[i][array_ids_chunk[:, i]] for i in range(len(arrays))]
yield np.vstack(slices_lists).swapaxes(0,1)
def cartesian_product(*arrays):
"""Actual cartesian product, not chunked, still fast"""
total_prod = np.prod([len(array) for array in arrays])
return next(chunked_cartesian_product(*arrays, total_prod))
a = np.arange(0, 3)
b = np.arange(8, 10)
c = np.arange(13, 16)
for cartesian_tuples in chunked_cartesian_product(*[a, b, c], chunk_size=5):
print(cartesian_tuples)
This would output our cartesian product in chunks of 5 3-uples:
[[ 0 8 13]
[ 0 8 14]
[ 0 8 15]
[ 1 8 13]
[ 1 8 14]]
[[ 1 8 15]
[ 2 8 13]
[ 2 8 14]
[ 2 8 15]
[ 0 9 13]]
[[ 0 9 14]
[ 0 9 15]
[ 1 9 13]
[ 1 9 14]
[ 1 9 15]]
[[ 2 9 13]
[ 2 9 14]
[ 2 9 15]]
If you're willing to understand what is being done here, the intuition behind the njitted function is to enumerate each "number" in a weird numerical base whose elements would be composed of the sizes of the input arrays (instead of the same number in regular binary, decimal or hexadecimal bases).
Obviously, this solution is interesting for large products. For small ones, the overhead might be a bit costly.
NOTE: since numba is still under heavy development, i'm using numba 0.50 to run this, with python 3.6.
Yet another one:
>>>x1, y1 = np.meshgrid(x, y)
>>>np.c_[x1.ravel(), y1.ravel()]
array([[1, 4],
[2, 4],
[3, 4],
[1, 5],
[2, 5],
[3, 5]])
Inspired by Ashkan's answer, you can also try the following.
>>> x, y = np.meshgrid(x, y)
>>> np.concatenate([x.flatten().reshape(-1,1), y.flatten().reshape(-1,1)], axis=1)
This will give you the required cartesian product!
This is a generalized version of the accepted answer (Cartesian product of multiple arrays using numpy.tile and numpy.repeat functions).
from functors import reduce
from operator import mul
def cartesian_product(arrays):
return np.vstack(
np.tile(
np.repeat(arrays[j], reduce(mul, map(len, arrays[j+1:]), 1)),
reduce(mul, map(len, arrays[:j]), 1),
)
for j in range(len(arrays))
).T
If you are willing to use PyTorch, I should think it is highly efficient:
>>> import torch
>>> torch.cartesian_prod(torch.as_tensor(x), torch.as_tensor(y))
tensor([[1, 4],
[1, 5],
[2, 4],
[2, 5],
[3, 4],
[3, 5]])
and you can easily get a numpy array:
>>> torch.cartesian_prod(torch.as_tensor(x), torch.as_tensor(y)).numpy()
array([[1, 4],
[1, 5],
[2, 4],
[2, 5],
[3, 4],
[3, 5]])
Related
Efficient numpy row-wise matrix multiplication using 3d arrays
I have two 3d arrays of shape (N, M, D) and I want to perform an efficient row wise (over N) matrix multiplication such that the resulting array is of shape (N, D, D). An inefficient code sample showing what I try to achieve is given by: N = 100 M = 10 D = 50 arr1 = np.random.normal(size=(N, M, D)) arr2 = np.random.normal(size=(N, M, D)) result = [] for i in range(N): result.append(arr1[i].T # arr2[i]) result = np.array(result) However, this application is quite slow for large N due to the loop. Is there a more efficient way to achieve this computation without using loops? I already tried to find a solution via tensordot and einsum to no avail.
The vectorization solution is to swap the last two axes of arr1: >>> N, M, D = 2, 3, 4 >>> np.random.seed(0) >>> arr1 = np.random.normal(size=(N, M, D)) >>> arr2 = np.random.normal(size=(N, M, D)) >>> arr1.transpose(0, 2, 1) # arr2 array([[[ 6.95815626, 0.38299107, 0.40600482, 0.35990016], [-0.95421604, -2.83125879, -0.2759683 , -0.38027618], [ 3.54989101, -0.31274318, 0.14188485, 0.19860495], [ 3.56319723, -6.36209602, -0.42687188, -0.24932248]], [[ 0.67081341, -0.08816343, 0.35430089, 0.69962394], [ 0.0316968 , 0.15129449, -0.51592291, 0.07118177], [-0.22274906, -0.28955683, -1.78905988, 1.1486345 ], [ 1.68432706, 1.93915798, 2.25785798, -2.34404577]]]) A simple benchmark for the super N: In [225]: arr1.shape Out[225]: (100000, 10, 50) In [226]: %%timeit ...: result = [] ...: for i in range(N): ...: result.append(arr1[i].T # arr2[i]) ...: result = np.array(result) ...: ...: 12.4 s ± 224 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [227]: %timeit arr1.transpose(0, 2, 1) # arr2 843 ms ± 7.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Use pre allocated lists and do not perform data conversion after the loop ends. The performance here is not much worse than vectorization, which means that the most overhead comes from the final data conversion: In [375]: %%timeit ...: result = [None] * N ...: for i in range(N): ...: result[i] = arr1[i].T # arr2[i] ...: # result = np.array(result) ...: ...: 1.22 s ± 16.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Performance of loop solution with data conversion: In [376]: %%timeit ...: result = [None] * N ...: for i in range(N): ...: result[i] = arr1[i].T # arr2[i] ...: result = np.array(result) ...: ...: 11.3 s ± 179 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) Another refers to the answer of #9769953 and makes additional optimization test. To my surprise, its performance is almost the same as the vectorization solution: In [378]: %%timeit ...: result = np.empty_like(arr1, shape=(N, D, D)) ...: for res, ar1, ar2 in zip(result, arr1.transpose(0, 2, 1), arr2): ...: np.matmul(ar1, ar2, out=res) ...: 843 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
For interest, I wondered about the loop overhead, which I guess is minimal compared to the matrix multiplication; but in particular, the loop overhead is minimal to the potential reallocation of the list memory, which with N = 10000 could be significant. Using a pre-allocated array instead of a list, I compared the loop result and the solution provided by Mechanic Pig, and achieved the following results on my machine: In [10]: %timeit result1 = arr1.transpose(0, 2, 1) # arr2 33.7 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) versus In [14]: %%timeit ...: result = np.empty_like(arr1, shape=(N, D, D)) ...: for i in range(N): ...: result[i, ...] = arr1[i].T # arr2[i] ...: ...: 48.5 ms ± 184 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) The pure NumPy solution is still faster, so that's good, but only by a factor of about 1.5. Not too bad. Depending on the needs, the loop may be clearer as to what it intents (and easier to modify, in case there's a need for an if-statement or other shenigans). And naturally, a simple comment above the faster solution can easily point out what it actually replaces. Following the comments to this answer by Mechanic Pig, I've added below the timing results of a loop without preallocating an array (but with a preallocated list) and without conversion to a NumPy array. Mainly so the results are compared for the same machine: In [11]: %%timeit ...: result = [None] * N ...: for i in range(N): ...: result.append(arr1[i].T # arr2[i]) ...: 49.5 ms ± 672 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) So interestingly, this results, without conversion, is (a tiny bit) slower than the one with a pre-allocated array and directly assigning into the array.
Tensor dot product with rank one tensor
I'm trying to compute an inner product between tensors in numpy. I have a vector x of shape (n,) and a tensor y of shape d*(n,) with d > 1 and would like to compute $\langle y, x^{\otimes d} \rangle$. That is, I want to compute the sum $$\langle y, x^{\otimes d} \rangle= \sum_{i_1,\dots,i_d\in{1,\dots,n}}y[i_1, \dots, i_d]x[i_1]\dots x[i_d].$$ A working implementation I have uses a function to first compute $x^{\otimes d}$ and then uses np.tensordot: def d_fold_tensor_product(x, d) -> np.ndarray: """ Compute d-fold tensor product of a vector. """ assert d > 1, "Tensor order must be bigger than 1." xd = np.tensordot(x, x, axes=0) while d > 2: xd = np.tensordot(xd, x, axes=0) d -= 1 return xd n = 10 d = 4 x = np.random.random(n) y = np.random.random(d * (n,)) result = np.tensordot(y, d_fold_tensor_product(x, d), axes=d) Is there a more efficient and pythonic way? Perhaps without having to compute $x^{\otimes d}$.
The math is hard to read, so I'm going to skip that. Instead let's look at the sample calculation In [168]: n = 10 ...: d = 4 ...: x = np.random.random(n) ...: y = np.random.random(d * (n,)) In [169]: x.shape Out[169]: (10,) In [171]: d_fold_tensor_product(x,d).shape Out[171]: (10, 10, 10, 10) In [172]: result = np.tensordot(y, d_fold_tensor_product(x, d), axes=d) In [174]: result Out[174]: array(384.20478955) In [175]: y.shape Out[175]: (10, 10, 10, 10) tensordot can a be a complex call, though it all reduces to a call to dot. I once dug through the action with a single axis value. But without revisiting it that, or even looking at the docs (shame on me, I know :), this flattened dot does the same thing: In [176]: np.dot(y.ravel(), d_fold_tensor_product(x, d).ravel()) Out[176]: 384.20478955316673 So the d_fold... has somehow expanded or replicated x to a 4d array. Guess I'll have to digest that action :( That function is doing repeated outer products: In [177]: np.tensordot(x,x,axes=0).shape Out[177]: (10, 10) In [178]: np.allclose(np.tensordot(x,x,axes=0), x[:,None]*x) Out[178]: True In [181]: temp = d_fold_tensor_product(x,d) In [182]: np.allclose(temp, x[:,None,None,None]*x[:,None,None]*x[:,None]*x) Out[182]: True or put all together: In [184]: np.dot((x[:,None,None,None]*x[:,None,None]*x[:,None]*x).ravel(),y.ravel()) Out[184]: 384.20478955316673 So that eliminates the repeated tensordot, but isn't easily generalizable to other d. Another way - still not generalizable, but may help visualize the task: In [186]: np.einsum('ijkl,i,j,k,l',y,x,x,x,x) Out[186]: 384.2047895531675 Some timings - your use of tensordot is slower than the most direct outer product: In [193]: timeit temp = d_fold_tensor_product(x,d) 151 µs ± 127 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) In [194]: timeit x[:,None,None,None]*x[:,None,None]*x[:,None]*x 61.3 µs ± 1.36 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each) A generalization of the outer product is in between: In [195]: timeit np.multiply.reduce(np.array(np.ix_(x,x,x,x),object)) 85.1 µs ± 57.2 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) A more general way of doing the repeated outer product: def foo(x,d): x1 = np.expand_dims(x, tuple(range(1,d))) # make (10,1,1,1) res = x1 for _ in range(1,d): x1 = x1[...,0] res = res*x1 return res In [219]: foo(x,d).shape Out[219]: (10, 10, 10, 10) times almost as good as the explicit version: In [220]: timeit foo(x,d) 72.7 µs ± 47.3 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) In [221]: np.dot(foo(x,d).ravel(),y.ravel()) Out[221]: 384.20478955316673
Python Numpy get difference between 2 two-dimensional array
Well, I have a simple problem that is giving me a headache, basically I have two two-dimensional arrays, full of [x,y] coordinates, and I want to compare the first with the second and generate a third array that contains all the elements of the first array that doesn't appear in the second. It's simple but I couldn't make it work at all. The size varies a lot, the first array can have between a thousand and 2 million coordinates, while the first array has between 1 and a thousand. This operation will occur many times and the larger the first array, the more times it will occur sample: arr1 = np.array([[0, 3], [0, 4], [1, 3], [1, 7], ]) arr2 = np.array([[0, 3], [1, 7]]) result = np.array([[0, 4], [1, 3]]) In Depth: Basically I have a binary image with variable resolution, it is composed of 0 and 1 (255) and I analyze each pixel individually (with an algorithm that is already optimized), but (on purpose) every time this function is executed it analyzes only a fraction of the pixels, and when it is finished it gives me back all the coordinates of these pixels. The problem is that when it executes it runs the following code: ones = np.argwhere(img == 255) # ones = pixels array It takes about 0.02 seconds and is by far the slowest part of the code. My idea is to create this variable once and, each time the function ends, it removes the parsed pixels and passes the new array as parameter to continue until the array is empty
Not sure what you intend to do with the extra dimensions, as the set difference, like any filtering, is inherently losing the shape information. Anyway, NumPy does provide np.setdiff1d() to solve this problem elegantly. EDIT With the clarifications provided, you seems to be looking for a way compute the set difference on a given axis, i.e. the elements of the sets are actually arrays. There is no built-in specifically for this in NumPy, but it is not too difficult to craft one. For simplicity, we assume that the operating axis is the first one (so that the element of the set are arr[i]), that only unique elements appear in the first array, and that the arrays are 2D. They are all based on the idea that the asymptotically best approach is to build a set() of the second array and then using that to filter out the entries from the first array. The idiomatic way to build such set in Python / NumPy is to use: set(map(tuple, arr)) where the mapping to tuple freezes arr[i], allowing them to be hashable and hence making them available to use with set(). Unfortunately, since the filtering would produce results of unpredictable size, NumPy arrays are not the ideal container for the result. To solve this issue, one can use: an intermediate list import numpy as np def setdiff2d_list(arr1, arr2): delta = set(map(tuple, arr2)) return np.array([x for x in arr1 if tuple(x) not in delta]) np.fromiter() followed by np.reshape() import numpy as np def setdiff2d_iter(arr1, arr2): delta = set(map(tuple, arr2)) return np.fromiter((x for xs in arr1 if tuple(xs) not in delta for x in xs), dtype=arr1.dtype).reshape(-1, arr1.shape[-1]) NumPy's advanced indexing def setdiff2d_idx(arr1, arr2): delta = set(map(tuple, arr2)) idx = [tuple(x) not in delta for x in arr1] return arr1[idx] Convert both inputs to set() (will force uniqueness of the output elements and will lose ordering): import numpy as np def setdiff2d_set(arr1, arr2): set1 = set(map(tuple, arr1)) set2 = set(map(tuple, arr2)) return np.array(list(set1 - set2)) Alternatively, the advanced indexing can be built using broadcasting, np.any() and np.all(): def setdiff2d_bc(arr1, arr2): idx = (arr1[:, None] != arr2).any(-1).all(1) return arr1[idx] Some form of the above methods were originally suggested in #QuangHoang's answer. A similar approach could also be implemented in Numba, following the same idea as above but using a hash instead of the actual array view arr[i] (because of the limitations in what is supported inside a set() by Numba) and pre-computing the output size (for speed): import numpy as np import numba as nb #nb.njit def mul_xor_hash(arr, init=65537, k=37): result = init for x in arr.view(np.uint64): result = (result * k) ^ x return result #nb.njit def setdiff2d_nb(arr1, arr2): # : build `delta` set using hashes delta = {mul_xor_hash(arr2[0])} for i in range(1, arr2.shape[0]): delta.add(mul_xor_hash(arr2[i])) # : compute the size of the result n = 0 for i in range(arr1.shape[0]): if mul_xor_hash(arr1[i]) not in delta: n += 1 # : build the result result = np.empty((n, arr1.shape[-1]), dtype=arr1.dtype) j = 0 for i in range(arr1.shape[0]): if mul_xor_hash(arr1[i]) not in delta: result[j] = arr1[i] j += 1 return result While they all give the same result: funcs = setdiff2d_iter, setdiff2d_list, setdiff2d_idx, setdiff2d_set, setdiff2d_bc, setdiff2d_nb arr1 = np.array([[0, 3], [0, 4], [1, 3], [1, 7]]) print(arr1) # [[0 3] # [0 4] # [1 3] # [1 7]] arr2 = np.array([[0, 3], [1, 7], [4, 0]]) print(arr2) # [[0 3] # [1 7] # [4 0]] result = funcs[0](arr1, arr2) print(result) # [[0 4] # [1 3]] for func in funcs: print(f'{func.__name__:>24s}', np.all(result == func(arr1, arr2))) # setdiff2d_iter True # setdiff2d_list True # setdiff2d_idx True # setdiff2d_set False # because of ordering # setdiff2d_bc True # setdiff2d_nb True their performance seems to be varying: for func in funcs: print(f'{func.__name__:>24s}', end=' ') %timeit func(arr1, arr2) # setdiff2d_iter 16.3 µs ± 719 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) # setdiff2d_list 14.9 µs ± 528 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) # setdiff2d_idx 17.8 µs ± 1.75 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) # setdiff2d_set 17.5 µs ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) # setdiff2d_bc 9.45 µs ± 405 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) # setdiff2d_nb 1.58 µs ± 51.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) The Numba-based approach proposed seems to outperform the other approaches by a fair margin (some 10x using the given input). Similar timings are observed with larger inputs: np.random.seed(42) arr1 = np.random.randint(0, 100, (1000, 2)) arr2 = np.random.randint(0, 100, (1000, 2)) print(setdiff2d_nb(arr1, arr2).shape) # (736, 2) for func in funcs: print(f'{func.__name__:>24s}', end=' ') %timeit func(arr1, arr2) # setdiff2d_iter 3.51 ms ± 75.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # setdiff2d_list 2.92 ms ± 32.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # setdiff2d_idx 2.61 ms ± 38.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # setdiff2d_set 3.52 ms ± 67.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) # setdiff2d_bc 25.6 ms ± 198 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) # setdiff2d_nb 192 µs ± 1.66 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) (As a side note, setdiff2d_bc() is the most negatively affected by the size of the second input).
Depending on how large your arrays are. If they are not too large (few thousands), you can use broadcasting to compare each point in x to each point in y use any to check for inequality at the last dimension use all to check for matching Code: idx = (arr1[:,None]!=arr2).any(-1).all(1) arr1[idx] Output: array([[0, 4], [1, 3]]) update: for longer data, you can try set and a for loop: set_y = set(map(tuple, y)) idx = [tuple(point) not in set_y for point in x] x[idx]
Check if a large matrix is diagonal matrix in python
I have compute a very large matrix M with lots of degenerate eigenvectors(different eigenvectors with same eigenvalues). I use QR decomposition to make sure these eigenvectors are orthonormal, so the Q is the orthonormal eigenvectors of M, and Q^{-1}MQ = D, where D is diagonal matrix. Now I want to check if D is truly diagonal matrix, but when I print D, the matrix is too large to show all of them, so how can I know if it is truly diagonal matrix?
Remove the diagonal and count the non zero elements: np.count_nonzero(x - np.diag(np.diagonal(x)))
Not sure how fast this is compared to the others, but: def isDiag(M): i, j = np.nonzero(M) return np.all(i == j) EDIT Let's time things: M = np.random.randint(0, 10, 1000) * np.eye(1000) def a(M): #donkopotamus solution return np.count_nonzero(M - np.diag(np.diagonal(M))) %timeit a(M) 100 loops, best of 3: 11.5 ms per loop %timeit is_diagonal(M) 100 loops, best of 3: 10.4 ms per loop %timeit isDiag(M) 100 loops, best of 3: 12.5 ms per loop Hmm, that's slower, probably from constructing i and j Let's try to improve the #donkopotamus solution by removing the subtraction step: def b(M): return np.all(M == np.diag(np.diagonal(M))) %timeit b(M) 100 loops, best of 3: 4.48 ms per loop That's a bit better. EDIT2 I came up with an even faster method: def isDiag2(M): i, j = M.shape assert i == j test = M.reshape(-1)[:-1].reshape(i-1, j+1) return ~np.any(test[:, 1:]) This isn't doing any calculations, just reshaping. Turns out reshaping to +1 rows on a diagonal matrix puts all the data in the first column. You can then check a contiguous block for any nonzeros which is much fatser for numpy Let's check times: def Make42(m): b = np.zeros(m.shape) np.fill_diagonal(b, m.diagonal()) return np.all(m == b) %timeit b(M) %timeit Make42(M) %timeit isDiag2(M) 100 loops, best of 3: 4.88 ms per loop 100 loops, best of 3: 5.73 ms per loop 1000 loops, best of 3: 1.84 ms per loop Seems my original is faster than #Make42 for smaller sets M = np.diag(np.random.randint(0,10,10000)) %timeit b(M) %timeit Make42(M) %timeit isDiag2(M) The slowest run took 35.58 times longer than the fastest. This could mean that an intermediate result is being cached. 1 loop, best of 3: 335 ms per loop <MemoryError trace removed> 10 loops, best of 3: 76.5 ms per loop And #Make42 gives memory error on the larger set. But then I don't seem to have as much RAM as they do.
Appproach #1 : Using NumPy strides/np.lib.stride_tricks.as_strided We can leverage NumPy strides to give us the off-diag elements as a view. So, no memory overhead there and virtually free runtime! This idea has been explored before in this post. Thus, we have - # https://stackoverflow.com/a/43761941/ #Divakar def nodiag_view(a): m = a.shape[0] p,q = a.strides return np.lib.stride_tricks.as_strided(a[:,1:], (m-1,m), (p+q,q)) Sample run to showcase its usage - In [175]: a Out[175]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) In [176]: nodiag_view(a) Out[176]: array([[ 1, 2, 3, 4], [ 6, 7, 8, 9], [11, 12, 13, 14]]) Let's verify the free runtime and no memory overhead claims, by using it on a large array - In [182]: a = np.zeros((10000,10000), dtype=int) ...: np.fill_diagonal(a,np.arange(len(a))) In [183]: %timeit nodiag_view(a) 6.42 µs ± 48.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [184]: np.shares_memory(a, nodiag_view(a)) Out[184]: True Now, how do we use it here? Simply check if all nodiag_view elements are 0s, signalling a diagonal matrix! Hence, to solve our case here, for an input array a, it would be - isdiag = (nodiag_view(a)==0).all() Appproach #2 : Hacky way For completeness, one hacky way would be to temporarily save diag elements, assign 0s there, check for all elements to be 0s. If so, signalling a diagonal matrix. Finally assign back the diag elements. The implementation would be - def hacky_way(a): diag_elem = np.diag(a).copy() np.fill_diagonal(a,0) out = (a==0).all() np.fill_diagonal(a,diag_elem) return out Benchmarking Let's time on a large array and see how these compare on performance - In [3]: a = np.zeros((10000,10000), dtype=int) ...: np.fill_diagonal(a,np.arange(len(a))) In [4]: %timeit (nodiag_view(a)==0).all() 52.3 ms ± 393 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) In [5]: %timeit hacky_way(a) 51.8 ms ± 250 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) Other approaches from #Daniel F's post that captured other approaches - # #donkopotamus solution improved by #Daniel F def b(M): return np.all(M == np.diag(np.diagonal(M))) # #Daniel F's soln without assert check def isDiag2(M): i, j = M.shape test = M.reshape(-1)[:-1].reshape(i-1, j+1) return ~np.any(test[:, 1:]) # #Make42's soln def Make42(m): b = np.zeros(m.shape) np.fill_diagonal(b, m.diagonal()) return np.all(m == b) Timings with same setup as earlier - In [6]: %timeit b(a) ...: %timeit Make42(a) ...: %timeit isDiag2(a) 218 ms ± 1.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 302 ms ± 1.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 67.1 ms ± 1.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
We can actually do quite a bit better than what Daniel F suggested: import numpy as np import time a = np.diag(np.random.random(19999)) t1 = time.time() np.all(a == np.diag(np.diagonal(a))) print(time.time()-t1) t1 = time.time() b = np.zeros(a.shape) np.fill_diagonal(b, a.diagonal()) np.all(a == b) print(time.time()-t1) results in 2.5737204551696777 0.6501829624176025 One tricks is that np.diagonal(a) actually uses a.diagonal(), so we use that one directly. But what takes the cake the the fast build of b, combined with the in-place operation on b.
I afraid if this is the most efficient way of doing this, But the idea is to mask the diagonal elements and check if all the other elements are zero. I guess this is sufficient check to label a matrix as diagonal matrix. So we create a dummy array with same size as input matrix, initialized with ones. and then replace the diagonal elements with zeros. Now we perform element wise multiplication of input matrix and dummy matrix. So here we replace the diagonal elements of input matrix with zero and leave the other elements as it is. Now finally we check if there are any non zero elements. def is_diagonal(matrix): #create a dummy matrix dummy_matrix = np.ones(matrix.shape, dtype=np.uint8) # Fill the diagonal of dummy matrix with 0. np.fill_diagonal(dummy_matrix, 0) return np.count_nonzero(np.multiply(dummy_matrix, matrix)) == 0 diagonal_matrix = np.array([[3, 0, 0], [0, 7, 0], [0, 0, 4]]) print is_diagonal(diagonal_matrix) >>> True random_matrix = np.array([[3, 8, 0], [1, 7, 8], [5, 0, 4]]) print is_diagonal(random_matrix) >>> False
I believe this is the most succinct way: np.allclose(np.diag(np.diag(a)), a)
quick and dirty way to get the truth. works in a reasonable amount of time for i in range(0, len(matrix[0])): for j in range(0, len(matrix[0])): if ((i != j) and (matrix[i][j] != 0)) : return False return True
The solutions nodiag_view and hacky_way by #Divakar and the solution isDiag2 by #Daniel F are efficient. The other solutions are pretty slow. The accepted solution that is by #donkopotamus, which is easy to implement, is the second slowest and 7.55 times slower than the fastest answer. from timeit import timeit setup = ''' import numpy as np d = np.zeros((10000,10000), dtype=int) np.fill_diagonal(d,np.arange(len(d))) def nodiag_view(a): m = a.shape[0] p,q = a.strides return (np.lib.stride_tricks.as_strided(a[:,1:], (m-1,m), (p+q,q)) == 0).all() def hacky_way(a): diag_elem = np.diag(a).copy() np.fill_diagonal(a,0) out = (a==0).all() np.fill_diagonal(a,diag_elem) return out def isDiag(M): i, j = np.nonzero(M) return np.all(i == j) def isDiag2(M): i, j = M.shape assert i == j test = M.reshape(-1)[:-1].reshape(i-1, j+1) return ~np.any(test[:, 1:]) def Make42(m): b = np.zeros(m.shape) np.fill_diagonal(b, m.diagonal()) return np.all(m == b) def by_donkopotamus(a): return np.count_nonzero(a - np.diag(np.diag(a))) == 0 def by_liwt31(a): np.allclose(np.diag(np.diag(a)), a) ''' test0 = '''nodiag_view(d)''' test1 = '''hacky_way(d)''' test2 = '''isDiag(d)''' test3 = '''isDiag2(d)''' test4 = '''Make42(d)''' test5 = '''by_donkopotamus(d)''' test6 = '''by_liwt31(d)''' print('test0:', timeit(test0, setup, number=100)) print('test1:', timeit(test1, setup, number=100)) print('test2:', timeit(test2, setup, number=100)) print('test3:', timeit(test3, setup, number=100)) print('test4:', timeit(test4, setup, number=100)) print('test5:', timeit(test5, setup, number=100)) print('test6:', timeit(test6, setup, number=100)) Results: test0: 4.194842008990236 test1: 4.11843847198179 test2: 28.11888137299684 test3: 5.095675196003867 test4: 22.56097131301067 test5: 31.05823188900831 test6: 106.19386338599725
import numpy as np is_diagonal = (np.trace(mat) == np.sum(mat))
Cartesian product of x and y array points into single array of 2D points
I have two numpy arrays that define the x and y axes of a grid. For example: x = numpy.array([1,2,3]) y = numpy.array([4,5]) I'd like to generate the Cartesian product of these arrays to generate: array([[1,4],[2,4],[3,4],[1,5],[2,5],[3,5]]) In a way that's not terribly inefficient since I need to do this many times in a loop. I'm assuming that converting them to a Python list and using itertools.product and back to a numpy array is not the most efficient form.
A canonical cartesian_product (almost) There are many approaches to this problem with different properties. Some are faster than others, and some are more general-purpose. After a lot of testing and tweaking, I've found that the following function, which calculates an n-dimensional cartesian_product, is faster than most others for many inputs. For a pair of approaches that are slightly more complex, but are even a bit faster in many cases, see the answer by Paul Panzer. Given that answer, this is no longer the fastest implementation of the cartesian product in numpy that I'm aware of. However, I think its simplicity will continue to make it a useful benchmark for future improvement: def cartesian_product(*arrays): la = len(arrays) dtype = numpy.result_type(*arrays) arr = numpy.empty([len(a) for a in arrays] + [la], dtype=dtype) for i, a in enumerate(numpy.ix_(*arrays)): arr[...,i] = a return arr.reshape(-1, la) It's worth mentioning that this function uses ix_ in an unusual way; whereas the documented use of ix_ is to generate indices into an array, it just so happens that arrays with the same shape can be used for broadcasted assignment. Many thanks to mgilson, who inspired me to try using ix_ this way, and to unutbu, who provided some extremely helpful feedback on this answer, including the suggestion to use numpy.result_type. Notable alternatives It's sometimes faster to write contiguous blocks of memory in Fortran order. That's the basis of this alternative, cartesian_product_transpose, which has proven faster on some hardware than cartesian_product (see below). However, Paul Panzer's answer, which uses the same principle, is even faster. Still, I include this here for interested readers: def cartesian_product_transpose(*arrays): broadcastable = numpy.ix_(*arrays) broadcasted = numpy.broadcast_arrays(*broadcastable) rows, cols = numpy.prod(broadcasted[0].shape), len(broadcasted) dtype = numpy.result_type(*arrays) out = numpy.empty(rows * cols, dtype=dtype) start, end = 0, rows for a in broadcasted: out[start:end] = a.reshape(-1) start, end = end, end + rows return out.reshape(cols, rows).T After coming to understand Panzer's approach, I wrote a new version that's almost as fast as his, and is almost as simple as cartesian_product: def cartesian_product_simple_transpose(arrays): la = len(arrays) dtype = numpy.result_type(*arrays) arr = numpy.empty([la] + [len(a) for a in arrays], dtype=dtype) for i, a in enumerate(numpy.ix_(*arrays)): arr[i, ...] = a return arr.reshape(la, -1).T This appears to have some constant-time overhead that makes it run slower than Panzer's for small inputs. But for larger inputs, in all the tests I ran, it performs just as well as his fastest implementation (cartesian_product_transpose_pp). In following sections, I include some tests of other alternatives. These are now somewhat out of date, but rather than duplicate effort, I've decided to leave them here out of historical interest. For up-to-date tests, see Panzer's answer, as well as Nico Schlömer's. Tests against alternatives Here is a battery of tests that show the performance boost that some of these functions provide relative to a number of alternatives. All the tests shown here were performed on a quad-core machine, running Mac OS 10.12.5, Python 3.6.1, and numpy 1.12.1. Variations on hardware and software are known to produce different results, so YMMV. Run these tests for yourself to be sure! Definitions: import numpy import itertools from functools import reduce ### Two-dimensional products ### def repeat_product(x, y): return numpy.transpose([numpy.tile(x, len(y)), numpy.repeat(y, len(x))]) def dstack_product(x, y): return numpy.dstack(numpy.meshgrid(x, y)).reshape(-1, 2) ### Generalized N-dimensional products ### def cartesian_product(*arrays): la = len(arrays) dtype = numpy.result_type(*arrays) arr = numpy.empty([len(a) for a in arrays] + [la], dtype=dtype) for i, a in enumerate(numpy.ix_(*arrays)): arr[...,i] = a return arr.reshape(-1, la) def cartesian_product_transpose(*arrays): broadcastable = numpy.ix_(*arrays) broadcasted = numpy.broadcast_arrays(*broadcastable) rows, cols = numpy.prod(broadcasted[0].shape), len(broadcasted) dtype = numpy.result_type(*arrays) out = numpy.empty(rows * cols, dtype=dtype) start, end = 0, rows for a in broadcasted: out[start:end] = a.reshape(-1) start, end = end, end + rows return out.reshape(cols, rows).T # from https://stackoverflow.com/a/1235363/577088 def cartesian_product_recursive(*arrays, out=None): arrays = [numpy.asarray(x) for x in arrays] dtype = arrays[0].dtype n = numpy.prod([x.size for x in arrays]) if out is None: out = numpy.zeros([n, len(arrays)], dtype=dtype) m = n // arrays[0].size out[:,0] = numpy.repeat(arrays[0], m) if arrays[1:]: cartesian_product_recursive(arrays[1:], out=out[0:m,1:]) for j in range(1, arrays[0].size): out[j*m:(j+1)*m,1:] = out[0:m,1:] return out def cartesian_product_itertools(*arrays): return numpy.array(list(itertools.product(*arrays))) ### Test code ### name_func = [('repeat_product', repeat_product), ('dstack_product', dstack_product), ('cartesian_product', cartesian_product), ('cartesian_product_transpose', cartesian_product_transpose), ('cartesian_product_recursive', cartesian_product_recursive), ('cartesian_product_itertools', cartesian_product_itertools)] def test(in_arrays, test_funcs): global func global arrays arrays = in_arrays for name, func in test_funcs: print('{}:'.format(name)) %timeit func(*arrays) def test_all(*in_arrays): test(in_arrays, name_func) # `cartesian_product_recursive` throws an # unexpected error when used on more than # two input arrays, so for now I've removed # it from these tests. def test_cartesian(*in_arrays): test(in_arrays, name_func[2:4] + name_func[-1:]) x10 = [numpy.arange(10)] x50 = [numpy.arange(50)] x100 = [numpy.arange(100)] x500 = [numpy.arange(500)] x1000 = [numpy.arange(1000)] Test results: In [2]: test_all(*(x100 * 2)) repeat_product: 67.5 µs ± 633 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) dstack_product: 67.7 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) cartesian_product: 33.4 µs ± 558 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cartesian_product_transpose: 67.7 µs ± 932 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) cartesian_product_recursive: 215 µs ± 6.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cartesian_product_itertools: 3.65 ms ± 38.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [3]: test_all(*(x500 * 2)) repeat_product: 1.31 ms ± 9.28 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) dstack_product: 1.27 ms ± 7.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cartesian_product: 375 µs ± 4.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cartesian_product_transpose: 488 µs ± 8.88 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) cartesian_product_recursive: 2.21 ms ± 38.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product_itertools: 105 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) In [4]: test_all(*(x1000 * 2)) repeat_product: 10.2 ms ± 132 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) dstack_product: 12 ms ± 120 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product: 4.75 ms ± 57.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product_transpose: 7.76 ms ± 52.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product_recursive: 13 ms ± 209 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product_itertools: 422 ms ± 7.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In all cases, cartesian_product as defined at the beginning of this answer is fastest. For those functions that accept an arbitrary number of input arrays, it's worth checking performance when len(arrays) > 2 as well. (Until I can determine why cartesian_product_recursive throws an error in this case, I've removed it from these tests.) In [5]: test_cartesian(*(x100 * 3)) cartesian_product: 8.8 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product_transpose: 7.87 ms ± 91.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product_itertools: 518 ms ± 5.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [6]: test_cartesian(*(x50 * 4)) cartesian_product: 169 ms ± 5.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) cartesian_product_transpose: 184 ms ± 4.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) cartesian_product_itertools: 3.69 s ± 73.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [7]: test_cartesian(*(x10 * 6)) cartesian_product: 26.5 ms ± 449 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) cartesian_product_transpose: 16 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) cartesian_product_itertools: 728 ms ± 16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [8]: test_cartesian(*(x10 * 7)) cartesian_product: 650 ms ± 8.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) cartesian_product_transpose: 518 ms ± 7.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) cartesian_product_itertools: 8.13 s ± 122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) As these tests show, cartesian_product remains competitive until the number of input arrays rises above (roughly) four. After that, cartesian_product_transpose does have a slight edge. It's worth reiterating that users with other hardware and operating systems may see different results. For example, unutbu reports seeing the following results for these tests using Ubuntu 14.04, Python 3.4.3, and numpy 1.14.0.dev0+b7050a9: >>> %timeit cartesian_product_transpose(x500, y500) 1000 loops, best of 3: 682 µs per loop >>> %timeit cartesian_product(x500, y500) 1000 loops, best of 3: 1.55 ms per loop Below, I go into a few details about earlier tests I've run along these lines. The relative performance of these approaches has changed over time, for different hardware and different versions of Python and numpy. While it's not immediately useful for people using up-to-date versions of numpy, it illustrates how things have changed since the first version of this answer. A simple alternative: meshgrid + dstack The currently accepted answer uses tile and repeat to broadcast two arrays together. But the meshgrid function does practically the same thing. Here's the output of tile and repeat before being passed to transpose: In [1]: import numpy In [2]: x = numpy.array([1,2,3]) ...: y = numpy.array([4,5]) ...: In [3]: [numpy.tile(x, len(y)), numpy.repeat(y, len(x))] Out[3]: [array([1, 2, 3, 1, 2, 3]), array([4, 4, 4, 5, 5, 5])] And here's the output of meshgrid: In [4]: numpy.meshgrid(x, y) Out[4]: [array([[1, 2, 3], [1, 2, 3]]), array([[4, 4, 4], [5, 5, 5]])] As you can see, it's almost identical. We need only reshape the result to get exactly the same result. In [5]: xt, xr = numpy.meshgrid(x, y) ...: [xt.ravel(), xr.ravel()] Out[5]: [array([1, 2, 3, 1, 2, 3]), array([4, 4, 4, 5, 5, 5])] Rather than reshaping at this point, though, we could pass the output of meshgrid to dstack and reshape afterwards, which saves some work: In [6]: numpy.dstack(numpy.meshgrid(x, y)).reshape(-1, 2) Out[6]: array([[1, 4], [2, 4], [3, 4], [1, 5], [2, 5], [3, 5]]) Contrary to the claim in this comment, I've seen no evidence that different inputs will produce differently shaped outputs, and as the above demonstrates, they do very similar things, so it would be quite strange if they did. Please let me know if you find a counterexample. Testing meshgrid + dstack vs. repeat + transpose The relative performance of these two approaches has changed over time. In an earlier version of Python (2.7), the result using meshgrid + dstack was noticeably faster for small inputs. (Note that these tests are from an old version of this answer.) Definitions: >>> def repeat_product(x, y): ... return numpy.transpose([numpy.tile(x, len(y)), numpy.repeat(y, len(x))]) ... >>> def dstack_product(x, y): ... return numpy.dstack(numpy.meshgrid(x, y)).reshape(-1, 2) ... For moderately-sized input, I saw a significant speedup. But I retried these tests with more recent versions of Python (3.6.1) and numpy (1.12.1), on a newer machine. The two approaches are almost identical now. Old Test >>> x, y = numpy.arange(500), numpy.arange(500) >>> %timeit repeat_product(x, y) 10 loops, best of 3: 62 ms per loop >>> %timeit dstack_product(x, y) 100 loops, best of 3: 12.2 ms per loop New Test In [7]: x, y = numpy.arange(500), numpy.arange(500) In [8]: %timeit repeat_product(x, y) 1.32 ms ± 24.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [9]: %timeit dstack_product(x, y) 1.26 ms ± 8.47 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) As always, YMMV, but this suggests that in recent versions of Python and numpy, these are interchangeable. Generalized product functions In general, we might expect that using built-in functions will be faster for small inputs, while for large inputs, a purpose-built function might be faster. Furthermore for a generalized n-dimensional product, tile and repeat won't help, because they don't have clear higher-dimensional analogues. So it's worth investigating the behavior of purpose-built functions as well. Most of the relevant tests appear at the beginning of this answer, but here are a few of the tests performed on earlier versions of Python and numpy for comparison. The cartesian function defined in another answer used to perform pretty well for larger inputs. (It's the same as the function called cartesian_product_recursive above.) In order to compare cartesian to dstack_prodct, we use just two dimensions. Here again, the old test showed a significant difference, while the new test shows almost none. Old Test >>> x, y = numpy.arange(1000), numpy.arange(1000) >>> %timeit cartesian([x, y]) 10 loops, best of 3: 25.4 ms per loop >>> %timeit dstack_product(x, y) 10 loops, best of 3: 66.6 ms per loop New Test In [10]: x, y = numpy.arange(1000), numpy.arange(1000) In [11]: %timeit cartesian([x, y]) 12.1 ms ± 199 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [12]: %timeit dstack_product(x, y) 12.7 ms ± 334 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) As before, dstack_product still beats cartesian at smaller scales. New Test (redundant old test not shown) In [13]: x, y = numpy.arange(100), numpy.arange(100) In [14]: %timeit cartesian([x, y]) 215 µs ± 4.75 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [15]: %timeit dstack_product(x, y) 65.7 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) These distinctions are, I think, interesting and worth recording; but they are academic in the end. As the tests at the beginning of this answer showed, all of these versions are almost always slower than cartesian_product, defined at the very beginning of this answer -- which is itself a bit slower than the fastest implementations among the answers to this question.
>>> numpy.transpose([numpy.tile(x, len(y)), numpy.repeat(y, len(x))]) array([[1, 4], [2, 4], [3, 4], [1, 5], [2, 5], [3, 5]]) See Using numpy to build an array of all combinations of two arrays for a general solution for computing the Cartesian product of N arrays.
You can just do normal list comprehension in python x = numpy.array([1,2,3]) y = numpy.array([4,5]) [[x0, y0] for x0 in x for y0 in y] which should give you [[1, 4], [1, 5], [2, 4], [2, 5], [3, 4], [3, 5]]
I was interested in this as well and did a little performance comparison, perhaps somewhat clearer than in #senderle's answer. For two arrays (the classical case): For four arrays: (Note that the length the arrays is only a few dozen entries here.) Code to reproduce the plots: from functools import reduce import itertools import numpy import perfplot def dstack_product(arrays): return numpy.dstack(numpy.meshgrid(*arrays, indexing="ij")).reshape(-1, len(arrays)) # Generalized N-dimensional products def cartesian_product(arrays): la = len(arrays) dtype = numpy.find_common_type([a.dtype for a in arrays], []) arr = numpy.empty([len(a) for a in arrays] + [la], dtype=dtype) for i, a in enumerate(numpy.ix_(*arrays)): arr[..., i] = a return arr.reshape(-1, la) def cartesian_product_transpose(arrays): broadcastable = numpy.ix_(*arrays) broadcasted = numpy.broadcast_arrays(*broadcastable) rows, cols = reduce(numpy.multiply, broadcasted[0].shape), len(broadcasted) dtype = numpy.find_common_type([a.dtype for a in arrays], []) out = numpy.empty(rows * cols, dtype=dtype) start, end = 0, rows for a in broadcasted: out[start:end] = a.reshape(-1) start, end = end, end + rows return out.reshape(cols, rows).T # from https://stackoverflow.com/a/1235363/577088 def cartesian_product_recursive(arrays, out=None): arrays = [numpy.asarray(x) for x in arrays] dtype = arrays[0].dtype n = numpy.prod([x.size for x in arrays]) if out is None: out = numpy.zeros([n, len(arrays)], dtype=dtype) m = n // arrays[0].size out[:, 0] = numpy.repeat(arrays[0], m) if arrays[1:]: cartesian_product_recursive(arrays[1:], out=out[0:m, 1:]) for j in range(1, arrays[0].size): out[j * m : (j + 1) * m, 1:] = out[0:m, 1:] return out def cartesian_product_itertools(arrays): return numpy.array(list(itertools.product(*arrays))) perfplot.show( setup=lambda n: 2 * (numpy.arange(n, dtype=float),), n_range=[2 ** k for k in range(13)], # setup=lambda n: 4 * (numpy.arange(n, dtype=float),), # n_range=[2 ** k for k in range(6)], kernels=[ dstack_product, cartesian_product, cartesian_product_transpose, cartesian_product_recursive, cartesian_product_itertools, ], logx=True, logy=True, xlabel="len(a), len(b)", equality_check=None, )
Building on #senderle's exemplary ground work I've come up with two versions - one for C and one for Fortran layouts - that are often a bit faster. cartesian_product_transpose_pp is - unlike #senderle's cartesian_product_transpose which uses a different strategy altogether - a version of cartesion_product that uses the more favorable transpose memory layout + some very minor optimizations. cartesian_product_pp sticks with the original memory layout. What makes it fast is its using contiguous copying. Contiguous copies turn out to be so much faster that copying a full block of memory even though only part of it contains valid data is preferable to only copying the valid bits. Some perfplots. I made separate ones for C and Fortran layouts, because these are different tasks IMO. Names ending in 'pp' are my approaches. 1) many tiny factors (2 elements each) 2) many small factors (4 elements each) 3) three factors of equal length 4) two factors of equal length Code (need to do separate runs for each plot b/c I couldn't figure out how to reset; also need to edit / comment in / out appropriately): import numpy import numpy as np from functools import reduce import itertools import timeit import perfplot def dstack_product(arrays): return numpy.dstack( numpy.meshgrid(*arrays, indexing='ij') ).reshape(-1, len(arrays)) def cartesian_product_transpose_pp(arrays): la = len(arrays) dtype = numpy.result_type(*arrays) arr = numpy.empty((la, *map(len, arrays)), dtype=dtype) idx = slice(None), *itertools.repeat(None, la) for i, a in enumerate(arrays): arr[i, ...] = a[idx[:la-i]] return arr.reshape(la, -1).T def cartesian_product(arrays): la = len(arrays) dtype = numpy.result_type(*arrays) arr = numpy.empty([len(a) for a in arrays] + [la], dtype=dtype) for i, a in enumerate(numpy.ix_(*arrays)): arr[...,i] = a return arr.reshape(-1, la) def cartesian_product_transpose(arrays): broadcastable = numpy.ix_(*arrays) broadcasted = numpy.broadcast_arrays(*broadcastable) rows, cols = numpy.prod(broadcasted[0].shape), len(broadcasted) dtype = numpy.result_type(*arrays) out = numpy.empty(rows * cols, dtype=dtype) start, end = 0, rows for a in broadcasted: out[start:end] = a.reshape(-1) start, end = end, end + rows return out.reshape(cols, rows).T from itertools import accumulate, repeat, chain def cartesian_product_pp(arrays, out=None): la = len(arrays) L = *map(len, arrays), la dtype = numpy.result_type(*arrays) arr = numpy.empty(L, dtype=dtype) arrs = *accumulate(chain((arr,), repeat(0, la-1)), np.ndarray.__getitem__), idx = slice(None), *itertools.repeat(None, la-1) for i in range(la-1, 0, -1): arrs[i][..., i] = arrays[i][idx[:la-i]] arrs[i-1][1:] = arrs[i] arr[..., 0] = arrays[0][idx] return arr.reshape(-1, la) def cartesian_product_itertools(arrays): return numpy.array(list(itertools.product(*arrays))) # from https://stackoverflow.com/a/1235363/577088 def cartesian_product_recursive(arrays, out=None): arrays = [numpy.asarray(x) for x in arrays] dtype = arrays[0].dtype n = numpy.prod([x.size for x in arrays]) if out is None: out = numpy.zeros([n, len(arrays)], dtype=dtype) m = n // arrays[0].size out[:, 0] = numpy.repeat(arrays[0], m) if arrays[1:]: cartesian_product_recursive(arrays[1:], out=out[0:m, 1:]) for j in range(1, arrays[0].size): out[j*m:(j+1)*m, 1:] = out[0:m, 1:] return out ### Test code ### if False: perfplot.save('cp_4el_high.png', setup=lambda n: n*(numpy.arange(4, dtype=float),), n_range=list(range(6, 11)), kernels=[ dstack_product, cartesian_product_recursive, cartesian_product, # cartesian_product_transpose, cartesian_product_pp, # cartesian_product_transpose_pp, ], logx=False, logy=True, xlabel='#factors', equality_check=None ) else: perfplot.save('cp_2f_T.png', setup=lambda n: 2*(numpy.arange(n, dtype=float),), n_range=[2**k for k in range(5, 11)], kernels=[ # dstack_product, # cartesian_product_recursive, # cartesian_product, cartesian_product_transpose, # cartesian_product_pp, cartesian_product_transpose_pp, ], logx=True, logy=True, xlabel='length of each factor', equality_check=None )
As of Oct. 2017, numpy now has a generic np.stack function that takes an axis parameter. Using it, we can have a "generalized cartesian product" using the "dstack and meshgrid" technique: import numpy as np def cartesian_product(*arrays): ndim = len(arrays) return (np.stack(np.meshgrid(*arrays), axis=-1) .reshape(-1, ndim)) a = np.array([1,2]) b = np.array([10,20]) cartesian_product(a,b) # output: # array([[ 1, 10], # [ 2, 10], # [ 1, 20], # [ 2, 20]]) Note on the axis=-1 parameter. This is the last (inner-most) axis in the result. It is equivalent to using axis=ndim. One other comment, since Cartesian products blow up very quickly, unless we need to realize the array in memory for some reason, if the product is very large, we may want to make use of itertools and use the values on-the-fly.
The Scikit-learn package has a fast implementation of exactly this: from sklearn.utils.extmath import cartesian product = cartesian((x,y)) Note that the convention of this implementation is different from what you want, if you care about the order of the output. For your exact ordering, you can do product = cartesian((y,x))[:, ::-1]
I used #kennytm answer for a while, but when trying to do the same in TensorFlow, but I found that TensorFlow has no equivalent of numpy.repeat(). After a little experimentation, I think I found a more general solution for arbitrary vectors of points. For numpy: import numpy as np def cartesian_product(*args: np.ndarray) -> np.ndarray: """ Produce the cartesian product of arbitrary length vectors. Parameters ---------- np.ndarray args vector of points of interest in each dimension Returns ------- np.ndarray the cartesian product of size [m x n] wherein: m = prod([len(a) for a in args]) n = len(args) """ for i, a in enumerate(args): assert a.ndim == 1, "arg {:d} is not rank 1".format(i) return np.concatenate([np.reshape(xi, [-1, 1]) for xi in np.meshgrid(*args)], axis=1) and for TensorFlow: import tensorflow as tf def cartesian_product(*args: tf.Tensor) -> tf.Tensor: """ Produce the cartesian product of arbitrary length vectors. Parameters ---------- tf.Tensor args vector of points of interest in each dimension Returns ------- tf.Tensor the cartesian product of size [m x n] wherein: m = prod([len(a) for a in args]) n = len(args) """ for i, a in enumerate(args): tf.assert_rank(a, 1, message="arg {:d} is not rank 1".format(i)) return tf.concat([tf.reshape(xi, [-1, 1]) for xi in tf.meshgrid(*args)], axis=1)
More generally, if you have two 2d numpy arrays a and b, and you want to concatenate every row of a to every row of b (A cartesian product of rows, kind of like a join in a database), you can use this method: import numpy def join_2d(a, b): assert a.dtype == b.dtype a_part = numpy.tile(a, (len(b), 1)) b_part = numpy.repeat(b, len(a), axis=0) return numpy.hstack((a_part, b_part))
The fastest you can get is either by combining a generator expression with the map function: import numpy import datetime a = np.arange(1000) b = np.arange(200) start = datetime.datetime.now() foo = (item for sublist in [list(map(lambda x: (x,i),a)) for i in b] for item in sublist) print (list(foo)) print ('execution time: {} s'.format((datetime.datetime.now() - start).total_seconds())) Outputs (actually the whole resulting list is printed): [(0, 0), (1, 0), ...,(998, 199), (999, 199)] execution time: 1.253567 s or by using a double generator expression: a = np.arange(1000) b = np.arange(200) start = datetime.datetime.now() foo = ((x,y) for x in a for y in b) print (list(foo)) print ('execution time: {} s'.format((datetime.datetime.now() - start).total_seconds())) Outputs (whole list printed): [(0, 0), (1, 0), ...,(998, 199), (999, 199)] execution time: 1.187415 s Take into account that most of the computation time goes into the printing command. The generator calculations are otherwise decently efficient. Without printing the calculation times are: execution time: 0.079208 s for generator expression + map function and: execution time: 0.007093 s for the double generator expression. If what you actually want is to calculate the actual product of each of the coordinate pairs, the fastest is to solve it as a numpy matrix product: a = np.arange(1000) b = np.arange(200) start = datetime.datetime.now() foo = np.dot(np.asmatrix([[i,0] for i in a]), np.asmatrix([[i,0] for i in b]).T) print (foo) print ('execution time: {} s'.format((datetime.datetime.now() - start).total_seconds())) Outputs: [[ 0 0 0 ..., 0 0 0] [ 0 1 2 ..., 197 198 199] [ 0 2 4 ..., 394 396 398] ..., [ 0 997 1994 ..., 196409 197406 198403] [ 0 998 1996 ..., 196606 197604 198602] [ 0 999 1998 ..., 196803 197802 198801]] execution time: 0.003869 s and without printing (in this case it doesn't save much since only a tiny piece of the matrix is actually printed out): execution time: 0.003083 s
This can also be easily done by using itertools.product method from itertools import product import numpy as np x = np.array([1, 2, 3]) y = np.array([4, 5]) cart_prod = np.array(list(product(*[x, y])),dtype='int32') Result: array([[1, 4], [1, 5], [2, 4], [2, 5], [3, 4], [3, 5]], dtype=int32) Execution time: 0.000155 s
In the specific case that you need to perform simple operations such as addition on each pair, you can introduce an extra dimension and let broadcasting do the job: >>> a, b = np.array([1,2,3]), np.array([10,20,30]) >>> a[None,:] + b[:,None] array([[11, 12, 13], [21, 22, 23], [31, 32, 33]]) I'm not sure if there is any similar way to actually get the pairs themselves.
I'm a bit late to the party, but I encoutered a tricky variant of that problem. Let's say I want the cartesian product of several arrays, but that cartesian product ends up being much larger than the computers' memory (however, the computation done with that product are fast, or at least parallelizable). The obvious solution is to divide this cartesian product in chunks, and treat these chunks one after the other (in sort of a "streaming" manner). You can do that easily with itertools.product, but it's horrendously slow. Also, none of the proposed solutions here (as fast as they are) give us this possibility. The solution I propose uses Numba, and is slightly faster than the "canonical" cartesian_product mentioned here. It's pretty long because I tried to optimize it everywhere I could. import numba as nb import numpy as np from typing import List #nb.njit(nb.types.Tuple((nb.int32[:, :], nb.int32[:]))(nb.int32[:], nb.int32[:], nb.int64, nb.int64)) def cproduct(sizes: np.ndarray, current_tuple: np.ndarray, start_idx: int, end_idx: int): """Generates ids tuples from start_id to end_id""" assert len(sizes) >= 2 assert start_idx < end_idx tuples = np.zeros((end_idx - start_idx, len(sizes)), dtype=np.int32) tuple_idx = 0 # stores the current combination current_tuple = current_tuple.copy() while tuple_idx < end_idx - start_idx: tuples[tuple_idx] = current_tuple current_tuple[0] += 1 # using a condition here instead of including this in the inner loop # to gain a bit of speed: this is going to be tested each iteration, # and starting a loop to have it end right away is a bit silly if current_tuple[0] == sizes[0]: # the reset to 0 and subsequent increment amount to carrying # the number to the higher "power" current_tuple[0] = 0 current_tuple[1] += 1 for i in range(1, len(sizes) - 1): if current_tuple[i] == sizes[i]: # same as before, but in a loop, since this is going # to get called less often current_tuple[i + 1] += 1 current_tuple[i] = 0 else: break tuple_idx += 1 return tuples, current_tuple def chunked_cartesian_product_ids(sizes: List[int], chunk_size: int): """Just generates chunks of the cartesian product of the ids of each input arrays (thus, we just need their sizes here, not the actual arrays)""" prod = np.prod(sizes) # putting the largest number at the front to more efficiently make use # of the cproduct numba function sizes = np.array(sizes, dtype=np.int32) sorted_idx = np.argsort(sizes)[::-1] sizes = sizes[sorted_idx] if chunk_size > prod: chunk_bounds = (np.array([0, prod])).astype(np.int64) else: num_chunks = np.maximum(np.ceil(prod / chunk_size), 2).astype(np.int32) chunk_bounds = (np.arange(num_chunks + 1) * chunk_size).astype(np.int64) chunk_bounds[-1] = prod current_tuple = np.zeros(len(sizes), dtype=np.int32) for start_idx, end_idx in zip(chunk_bounds[:-1], chunk_bounds[1:]): tuples, current_tuple = cproduct(sizes, current_tuple, start_idx, end_idx) # re-arrange columns to match the original order of the sizes list # before yielding yield tuples[:, np.argsort(sorted_idx)] def chunked_cartesian_product(*arrays, chunk_size=2 ** 25): """Returns chunks of the full cartesian product, with arrays of shape (chunk_size, n_arrays). The last chunk will obviously have the size of the remainder""" array_lengths = [len(array) for array in arrays] for array_ids_chunk in chunked_cartesian_product_ids(array_lengths, chunk_size): slices_lists = [arrays[i][array_ids_chunk[:, i]] for i in range(len(arrays))] yield np.vstack(slices_lists).swapaxes(0,1) def cartesian_product(*arrays): """Actual cartesian product, not chunked, still fast""" total_prod = np.prod([len(array) for array in arrays]) return next(chunked_cartesian_product(*arrays, total_prod)) a = np.arange(0, 3) b = np.arange(8, 10) c = np.arange(13, 16) for cartesian_tuples in chunked_cartesian_product(*[a, b, c], chunk_size=5): print(cartesian_tuples) This would output our cartesian product in chunks of 5 3-uples: [[ 0 8 13] [ 0 8 14] [ 0 8 15] [ 1 8 13] [ 1 8 14]] [[ 1 8 15] [ 2 8 13] [ 2 8 14] [ 2 8 15] [ 0 9 13]] [[ 0 9 14] [ 0 9 15] [ 1 9 13] [ 1 9 14] [ 1 9 15]] [[ 2 9 13] [ 2 9 14] [ 2 9 15]] If you're willing to understand what is being done here, the intuition behind the njitted function is to enumerate each "number" in a weird numerical base whose elements would be composed of the sizes of the input arrays (instead of the same number in regular binary, decimal or hexadecimal bases). Obviously, this solution is interesting for large products. For small ones, the overhead might be a bit costly. NOTE: since numba is still under heavy development, i'm using numba 0.50 to run this, with python 3.6.
Yet another one: >>>x1, y1 = np.meshgrid(x, y) >>>np.c_[x1.ravel(), y1.ravel()] array([[1, 4], [2, 4], [3, 4], [1, 5], [2, 5], [3, 5]])
Inspired by Ashkan's answer, you can also try the following. >>> x, y = np.meshgrid(x, y) >>> np.concatenate([x.flatten().reshape(-1,1), y.flatten().reshape(-1,1)], axis=1) This will give you the required cartesian product!
This is a generalized version of the accepted answer (Cartesian product of multiple arrays using numpy.tile and numpy.repeat functions). from functors import reduce from operator import mul def cartesian_product(arrays): return np.vstack( np.tile( np.repeat(arrays[j], reduce(mul, map(len, arrays[j+1:]), 1)), reduce(mul, map(len, arrays[:j]), 1), ) for j in range(len(arrays)) ).T
If you are willing to use PyTorch, I should think it is highly efficient: >>> import torch >>> torch.cartesian_prod(torch.as_tensor(x), torch.as_tensor(y)) tensor([[1, 4], [1, 5], [2, 4], [2, 5], [3, 4], [3, 5]]) and you can easily get a numpy array: >>> torch.cartesian_prod(torch.as_tensor(x), torch.as_tensor(y)).numpy() array([[1, 4], [1, 5], [2, 4], [2, 5], [3, 4], [3, 5]])