Dimensionality agnostic (generic) cartesian product [duplicate] - python

This question already has answers here:
How to get the cartesian product of multiple lists
(17 answers)
Closed 8 months ago.
I'm looking to generate the cartesian product of a relatively large number of arrays to span a high-dimensional grid. Because of the high dimensionality, it won't be possible to store the result of the cartesian product computation in memory; rather it will be written to hard disk. Because of this constraint, I need access to the intermediate results as they are generated. What I've been doing so far is this:
for x in xrange(0, 10):
for y in xrange(0, 10):
for z in xrange(0, 10):
writeToHdd(x,y,z)
which, apart from being very nasty, is not scalable (i.e. it would require me writing as many loops as dimensions). I have tried to use the solution proposed here, but that is a recursive solution, which therefore makes it quite hard to obtain the results on the fly as they are being generated. Is there any 'neat' way to do this other than having a hardcoded loop per dimension?

In plain Python, you can generate the Cartesian product of a collection of iterables using itertools.product.
>>> arrays = range(0, 2), range(4, 6), range(8, 10)
>>> list(itertools.product(*arrays))
[(0, 4, 8), (0, 4, 9), (0, 5, 8), (0, 5, 9), (1, 4, 8), (1, 4, 9), (1, 5, 8), (1, 5, 9)]
In Numpy, you can combine numpy.meshgrid (passing sparse=True to avoid expanding the product in memory) with numpy.ndindex:
>>> arrays = np.arange(0, 2), np.arange(4, 6), np.arange(8, 10)
>>> grid = np.meshgrid(*arrays, sparse=True)
>>> [tuple(g[i] for g in grid) for i in np.ndindex(grid[0].shape)]
[(0, 4, 8), (0, 4, 9), (1, 4, 8), (1, 4, 9), (0, 5, 8), (0, 5, 9), (1, 5, 8), (1, 5, 9)]

I think I figured out a nice way using a memory mapped file:
def carthesian_product_mmap(vectors, filename, mode='w+'):
'''
Vectors should be a tuple of `numpy.ndarray` vectors. You could
also make it more flexible, and include some error checking
'''
# Make a meshgrid with `copy=False` to create views
grids = np.meshgrid(*vectors, copy=False, indexing='ij')
# The shape for concatenating the grids from meshgrid
shape = grid[0].shape + (len(vectors),)
# Find the "highest" dtype neccesary
dtype = np.result_type(*vectors)
# Instantiate the memory mapped file
M = np.memmap(filename, dtype, mode, shape=shape)
# Fill the memmap with the grids
for i, grid in enumerate(grids):
M[...,i] = grid
# Make sure the data is written to disk (optional?)
M.flush()
# Reshape to put it in the right format for Carthesian product
return M.reshape((-1, len(vectors)))
But I wonder if you really need to store the whole Carthesian product (there's a lot of data duplication). Is it not an option to generate the rows in the product at the moment they're needed?

It seems you just want to loop over an arbitrary number of dimensions. My generic solution for this is using an index field and increment indices plus handling overflows.
Example:
n = 3 # number of dimensions
N = 1 # highest index value per dimension
idx = [0]*n
while True:
print(idx)
# increase first dimension
idx[0] += 1
# handle overflows
for i in range(0, n-1):
if idx[i] > N:
# reset this dimension and increase next higher dimension
idx[i] = 0
idx[i+1] += 1
if idx[-1] > N:
# overflow in the last dimension, we are finished
break
Gives:
[0, 0, 0]
[1, 0, 0]
[0, 1, 0]
[1, 1, 0]
[0, 0, 1]
[1, 0, 1]
[0, 1, 1]
[1, 1, 1]
Numpy has something similar inbuilt: ndenumerate.

Related

What does array[...,list([something]) mean?

I am going through the following lines of code but I didn't understand image[...,list()]. What do the three dots mean?
self.probability = 0.5
self.indices = list(permutations(range(3), 3))
if random.random() < self.probability:
image = np.asarray(image)
image = Image.fromarray(image[...,list(self.indices[random.randint(0, len(self.indices) - 1)])])
What exactly is happening in the above lines?
I have understood that the list() part is taking random channels from image? Am I correct?
It is an object in Python called Ellipsis (for example, as a placeholder for something missing).
x = np.random.rand(3,3,3,3,3)
elem = x[:, :, :, :, 0]
elem = x[..., 0] # same as above
This should be helpful if you want to access a specific element in a multi-dimensional array in NumPy.
list(permutations(range(3), 3)) generates all permutations of the intergers 0,1,2.
from itertools import permutations
list(permutations(range(3), 3))
# [(0, 1, 2), (0, 2, 1), (1, 0, 2), (1, 2, 0), (2, 0, 1), (2, 1, 0)]
So the following chooses among these tuples of permutations:
list(self.indices[random.randint(0, len(self.indices) - 1)])]
In any case you'll have a permutation over the last axis of image which is usually the image channels RGB (note that with the ellipsis (...) here image[...,ixs] we are taking full slices over all axes except for the last. So this is performing a shuffling of the image channels.
An example run -
indices = list(permutations(range(3), 3))
indices[np.random.randint(0, len(indices) - 1)]
# (2, 0, 1)
Here's an example, note that this does not change the shape, we are using integer array indexing to index on the last axis only:
a = np.random.randint(0,5,(5,5,3))
a[...,(0,2,1)].shape
# (5, 5, 3)

How to find the index of a tuple in a 2D array in python?

I have an array with the form as follows (with much more elements):
coords = np.array(
[[(2, 1), 1613, 655],
[(2, 5), 906, 245],
[(5, 2), 0, 0]])
And I would like to find the index of a specific tuple. For example, I might be looking for the position of the tuple (2, 5), which should be in position 1 in this case.
I have tried with np.where and np.argwhere, with no luck:
pos = np.argwhere(coords == (2,5))
print(pos)
>> DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
pos = np.where(coords == (2,5))
print(pos)
>> DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
How can I get the index of a tuple?
If you intend to use a numpy array containing objects, all comparison will be done using python itself. At that point, you have given up almost all the advantages of numpy and may as well use a list:
coords = coords.tolist()
index = next((i for i, n in enumerate(coords) if n[0] == (2, 5)), -1)
If you really want to use numpy, I suggest you transform your data appropriately. Two simple options come to mind. You can either expand your tuple and create an array of shape (N, 4), or you can create a structured array that preserves the arrangement of the data as a unit, and has shape (N,). The former is much simpler, while the later is, in my opinion, more elegant.
If you flatten the coordinates:
coords = np.array([[x[0][0], x[0][1], x[1], x[2]] for x in coords])
index = np.flatnonzero(np.all(coords[:, :2] == [2, 5], axis=1))
The structured solution:
coordt = np.dtype([('x', np.int_), ('y', np.int_)])
dt = np.dtype([('coord', coordt), ('a', np.int_), ('b', np.int_)])
coords = np.array([((2, 1), 1613, 655), ((2, 5), 906, 245), ((5, 2), 0, 0)], dtype=dt)
index = np.flatnonzero(coords['coord'] == np.array((2, 5), dtype=coordt))
You can also just transform the first part of your data to a real numpy array, and operate on that:
coords = np.array(coords[:, 0].tolist())
index = np.flatnonzero((coords == [2, 5]).all(axis=1))
You should not compare (2, 5) and coords, but compare (2, 5) and coords[:, 0].
Try this code.
np.where([np.array_equal(coords[:, 0][i], (2, 5)) for i in range(len(coords))])[0]
Try this one
import numpy as np
coords = np.array([[(2, 1), 1613, 655], [(2, 5), 906, 245], [(5, 2), 0, 0]])
tpl=(2,5)
i=0 # index of the column in which the tuple you are looking for is listed
pos=([t[i] for t in coords].index(tpl))
print(pos)
Assuming your target tuple (e.g. (2,5) ) is always in the first column of the numpy array coords i.e. coords[:,0] you can simply do the following without any loops!
[*coords[:,0]].index((2,5))
If the tuples aren't necessarily in the first column always, then you can use,
[*coords.flatten()].index((2,5))//3
Hope that helps.
First of all, the tuple (2, 5) is in position 0 as it is the first element of the list [(2, 5), 906, 245].
And second of all, you can use basic python functions to check the index of a tuple in that array. Here's how you do it:
>>> coords = np.array([[(2, 1), 1613, 655], [(2, 5), 906, 245], [(5, 2), 0, 0]])
>>>
>>> coords_list = cl = list(coords)
>>> cl
[[(2, 1), 1613, 655], [(2, 5), 906, 245], [(5, 2), 0, 0]]
>>>
>>> tuple_to_be_checked = tuple_ = (2, 5)
>>> tuple_
(2, 5)
>>>
>>> for i in range(0, len(cl), 1): # Dynamically works for any array `cl`
for j in range(0, len(cl[i]), 1): # Dynamic; works for any list `cl[i]`
if cl[i][j] == tuple_: # Found the tuple
# Print tuple index and containing list index
print(f'Tuple at index {j} of list at index {i}')
break # Break to avoid unwanted loops
Tuple at index 0 of list at index 1
>>>

How to make generated matrix non-repeating in Python?

We have created a matrix that is a random size, and has random digits, but we aren't sure how to make sure that none of the generated strings are the same. For reference, we need the matrix to be non-repeating because we're trying to calculate the recursive teaching dimension -- basically a complexity measurement of data sets -- of the set of all the strings and it can't be computed if any of the strings are the same.
An example of a matrix generated is:
[[0 0 0]
[0 1 0]
[0 1 0]
[0 0 1]
[1 0 0]
[1 0 1]
[0 0 0]]
As you can see, the second and third strings and the first and last strings are identical.
This is our current code. How should we go about ensuring that nothing repeats?
def matrix():
import numpy as np
import random
a = random.randrange (2, 10)
b = random.randrange (2, a)
A = np.random.randint(2, size=(a,b))
print (A)
matrix()
If you are not fixated on numpy, here is an option using itertools' product and random's sample:
import itertools
import random
b = random.randrange (2, 10)
a = random.randrange (2, 2**b)
words = list(itertools.product([0, 1], repeat=b))
matrix = random.sample(words, a)
print(matrix)
Running with fixed values of b=3 and a=7 gives:
[(1, 0, 1),
(0, 1, 0),
(0, 1, 1),
(1, 0, 0),
(1, 1, 1),
(1, 1, 0),
(0, 0, 1)]
Note that product returns tuples, so if that matters, a simple conversion is needed:
words = [list(tup) for tup in itertools.product([0, 1], repeat=b)]
One possibility would be to choose a different numbers between zero and 2^b-1
and use their binary representations as the different strings in A.
import numpy as np
a = random.randrange (2, 10)
b = random.randrange (2, a)
# Choos numbers between 0 and the largest representable number with b digits
nums_base_ten = np.random.choice(np.arange(2**b-1), a, replace=False)
print(nums)
A = np.zeros((a, b))
# Loop over digits and generate the binary representation for the chosen numbers
# Fill in ones if necessary to represent the numbers, reduce, repeat
for i in range(b):
temp_value_base2 = 2**(b-1-i)
bool_digit = nums >= temp_value_base2
A[:, i] = bool_digit
nums[bool_digit] -= temp_value_base2
print(A)

How to write a loop-free code to find the max in each region?

I have a vector which specifies a number of regions over 1 to N. For example, if
A = [1,2,3,6,7,9,10]
Then the regions are [1,3], [6,7], [9,10] defined over interval [1,10] with N=10. I have another vector with length N that contains a set of positive and negative numbers:
x = [0.8,0.1,1,-1,-2,-0.76,0.1,0.2,0.9,0.6]
I want to find the maximum value of x in each region. In this example, the result is:
y = [1,0.1,0.9]
y_locs = [3,7,9]
It is possible to compute the max in each region by first obtaining regions from A and then using a for loop to find the max in each region. Is there a loop-free way to do that?
You could slice your array and use the built in max() function. Something like:
x = [0.8, 0.1, 1, -1, -2, -0.76, 0.1, 0.2, 0.9, 0.6]
# each tuple contains (start_index, length, maximum_value)
max_list = [(0, 3, max(x[0:3])), (5, 2, max(x[5:7])), (8, 2, max(x[8:]))]
locations_list = [max_list[i][0] + x[max_list[i][0]:max_list[i][0] + max_list[i][1]].index(max_list[i][2]) + 1 for i in range(len(max_list))]
print(max_list)
print(locations_list)
Yields:
[(0, 3, 1), (5, 2, 0.1), (8, 2, 0.9)]
[3, 7, 9]
Notes:
I did use a for loop to iterate each section, but you could expand this by hand into three separate lines that do not have a for loop (this would become very tedious for large data though)
I do not know the internals of max() and it may use a for loop that is hidden.

Sorting 3d arrays in python as per grid

I have grid points in 3d and I would like to sort them based (x,y,z) using python (if possible avoiding loops)..
For example if the input is,
(1,2,1), (0,8,1), (1,0,0) ..
then output should be
(0,8,1), (1,0,0), (1,2,1)..
Sorry for this side track but I am actually doing is reading from a file which has data in following way:
x y z f(x) f(y) f(z)..
what I was doing was following:
def fill_array(output_array,source_array,nx,ny,nz,position):
for i in range(nx):
for j in range(ny):
for k in range(nz):
output_array[i][j][k] = source_array[i][j][k][position]
nx = 8
ny = 8
nz = 8
ndim = 6
x = np.zeros((nx,ny,nz))
y = np.zeros((nx,ny,nz))
z = np.zeros((nx,ny,nz))
bx = np.zeros((nx,ny,nz))
by = np.zeros((nx,ny,nz))
bz = np.zeros((nx,ny,nz))
data_file = np.loadtxt('datafile')
f = np.reshape(data_file, (nx,ny,nz,ndim))
fill_array(x,f,nx,ny,nz,0))
fill_array(y,f,nx,ny,nz,1)
fill_array(z,f,nx,ny,nz,2)
fill_array(fx,f,nx,ny,nz,3)
fill_array(fy,f,nx,ny,nz,4)
fill_array(fz,f,nx,ny,nz,5)
This was working fine when data was arranged (as explained previously) but with file written not in order it is creating problems with plot later on. Is there are better way to do this ? Of course I only want to arrange x,y,z and then associate functional value f(x),f(y),f(z) to its right position (x,y,z)
two updates
1) i am getting following error when I use sorted with either x,y,z,fx,fy,fz or f.
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
2) i need it in that specific way because I am using mayavi then for contour3d
The built-in function sorted does what you want:
>>> a = [(1, 2, 1), (0, 8, 1), (1, 0, 0)]
>>> sorted(a)
[(0, 8, 1), (1, 0, 0), (1, 2, 1)]
Use [sorted][1].
In [71]: sorted(a)
Out[71]: [(0, 8, 1), (1, 0, 0), (1, 2, 1)]
more precisely
In [70]: sorted(a, key=lambda x: (x[0], x[1], x[2]))
Out[70]: [(0, 8, 1), (1, 0, 0), (1, 2, 1)]
key=lambda x: (x[0], x[1], x[2])
at this step we are sorting list at 0th 1st and 2nd element of tuple

Categories