Related
I have a tensor in Tensor Flow that is of the size (2, 16384, 11). I am trying to use the tf.slice function to pull 1D tensors out of that array. I can name two point in the array as the start and stop.
The first column is [0, 0, 1] ---> [0, 16383, 1].
The second column is [1, 0, 1] ---> [1, 16383, 1].
But the problem is this returns a tensor with the dimensions of (0, 16383) and (1, 16383). Accessing an array with a length 0 is a problem; I understand that you can get there by only using the [:] accessor as opposed to the [0] accessor, otherwise you get the error
'0 is out of bounds for axis 0 of length 0'.
How else can I get TF to output a single column of numbers? Here is the code.
Xdata = tf.slice(x, [0,0,1], [0,16383,1])
Ydata = tf.slice(x, [1,0,1], [1,16383,1])
Xarry = Xdata.numpy()
Yarry = Ydata.numpy()
# Outputs
print(Xarry.shape) # (0, 16383, 1)
print(Yarry.shape) # (1, 16383, 1)
print(Xarry[:,:,0]) # []
print(Yarry[0,:,0]) # [22.05 20.92 22.11 ... 22.53 22.03 22.47]
plt.plot(Xarry[:,:,0],Yarry[0,:,0]) # <--- Error is here
Which produces:
(0, 16383, 1)
(1, 16383, 1)
[]
[22.05 20.92 22.11 ... 22.53 22.03 22.47]
ValueError: x and y must have same first dimension, but have shapes
(0, 16383) and (16383,)
I have tried using .flatten() but this does not get around the problem. I have also looked at using tf.gather().
Xarry shape is (0, 16383, 1), which means an empty list, since the first dimension is zero. So, if you print it you will get []. As it has nothing to present, then slicing like Xarry[:,:,0] will also give you another [].
I think you have a misunderstanding of tf.slice arguments. The second argument is the size of tensor, beginning from the first argument index. The second argument in tf.slice() is not the stop index, but the size.
So, consider to change the size to something like this:
Xdata = tf.slice(x, [0,0,1], [1,16383,1]) # start from [0,0,1] index and slice an array with size (1,16383,1)
Ydata = tf.slice(x, [1,0,1], [1,16383,1])
Xarry = Xdata.numpy()
Yarry = Ydata.numpy()
# Outputs
print(Xarry.shape) # (0, 16383, 1)
print(Yarry.shape) # (1, 16383, 1)
print(Xarry[0,:,0]) # [22.05 20.92 22.11 ... 22.53 22.03 22.47]
print(Yarry[0,:,0]) # [22.05 20.92 22.11 ... 22.53 22.03 22.47]
import matplotlib.pyplot as plt
plt.plot(Xarry[0,:,0],Yarry[0,:,0])
plt.show()
This is part of the code I'm working on: (Using Python)
import random
pairs = [
(0, 1),
(1, 2),
(2, 3),
(3, 0), # I want to treat 0,1,2,3 as some 'coordinate' (or positional infomation)
]
alphas = [(random.choice([1, -1]) * random.uniform(5, 15), pairs[n]) for n in range(4)]
alphas.sort(reverse=True, key=lambda n: abs(n[0]))
A sample output looks like this:
[(13.747649802587832, (2, 3)),
(13.668274782626717, (1, 2)),
(-9.105374057105703, (0, 1)),
(-8.267840318934667, (3, 0))]
Now I'm wondering is there a way I can give each element in 0,1,2,3 a random binary number, so if [0,1,2,3] = [0,1,1,0], (By that I mean if the 'coordinates' on the left list have the corresponding random binary information on the right list. In this case, coordinate 0 has the random binary number '0' and etc.) then the desired output using the information above looks like:
[(13.747649802587832, (1, 0)),
(13.668274782626717, (1, 1)),
(-9.105374057105703, (0, 1)),
(-8.267840318934667, (0, 0))]
Thanks!!
One way using dict:
d = dict(zip([0,1,2,3], [0,1,1,0]))
[(i, tuple(d[j] for j in c)) for i, c in alphas]
Output:
[(13.747649802587832, (1, 0)),
(13.668274782626717, (1, 1)),
(-9.105374057105703, (0, 1)),
(-8.267840318934667, (0, 0))]
You can create a function to convert your number to the random binary assigned. Using a dictionary within this function would make sense. Something like this should work where output1 is that first sample output you provide and binary_code would be [0, 1, 1, 0] in your example:
def convert2bin(original, binary_code):
binary_dict = {n: x for n, x in enumerate(binary_code)}
return tuple([binary_code[x] for x in original])
binary_code = np.random.randint(2, size=4)
[convert2bin(x[1], binary_code) for x in output1]
Suppose I have a matrix like this:
m = [0, 1, 1, 0,
1, 1, 0, 0,
0, 0, 0, 1]
And I need to get the coordinates of the same neighbouring values (but not diagonally):
So the result would be a list of lists of coordinates in the "matrix" list, starting with [0,0], like this:
r = [[[0,0]],
[[0,1], [0,2], [1,0], [1,1]],
[[0,3], [1,2], [1,3], [2,0], [2,1], [2,2]]
[[2,3]]]
There must be a way to do that, but I'm really stuck.
tl;dr: We take an array of zeros and ones and use scipy.ndimage.label to convert it to an array of zeros and [1,2,3,...]. We then use np.where to find the coordinates of each element with value > 0. Elements that have the same value end up in the same list.
scipy.ndimage.label interprets non-zero elements of a matrix as features and labels them. Each unique feature in the input gets assigned a unique label. Features are e.g. groups of adjacent elements (or pixels) with the same value.
import numpy as np
from scipy.ndimage import label
# make dummy data
arr = np.array([[0,1,1,0], [1,1,0,0], [0,0,0,1]])
#initialise list of features
r = []
Since OP wanted all features, that is groups of zero and non-zero pixels, we use label twice: First on the original array, and second on 1 - original array. (For an array of zeros and ones, 1 - array just flips the values).
Now, label returns a tuple, containing the labelled array (which we are interested in) and the number of features that it found in that array (which we could use, but when I coded this, I chose to ignore it. So, we are interested in the first element of the tuple returned by label, which we access with [0]:
a = label(arr)[0]
b = label(1-arr)[0]
Now we check which unique pixel values label has assigned. So we want the set of a and b, repectively. In order for set() to work, we need to linearise both arrays, which we do with .ravel(). We have to subtract {0} in both cases, because for both a and b we are interested in only the non-zero values.
So, having found the unique labels, we loop through these values, and use np.where to find where on the array a given value is located. np.where returns a tuple of arrays. The first element of this tuple are all the row-coordinates for which the condition was met, and the second element are the column-coordinates.
So, we can use zip(* to unpack the two containers of length n to n containers of length 2. This means that we go from list of all row-coords + list of all column-coords to list of all row-column-coordinate pairs for which the condition is met. Finally in python 3, zip is a generator, which we can evaluate by calling list() on it. The resulting list is then appended to our list of coordinates, r.
for x in set(a.ravel())-{0}:
r.append(list(zip(*np.where(a==x))))
for x in set(b.ravel())-{0}:
r.append(list(zip(*np.where(b==x))))
print(r)
[[(0, 1), (0, 2), (1, 0), (1, 1)],
[(2, 3)],
[(0, 0)],
[(0, 3), (1, 2), (1, 3), (2, 0), (2, 1), (2, 2)]]
That said, we can speed up this code slightly by making use of the fact that label returns the number of features it assigned. This allows us to avoid the set command, which can take time on large arrays:
a, num_a = label(arr)
for x in range(1, num_a+1): # range from 1 to the highest label
r.append(list(zip(*np.where(a==x))))
A solution with only standard libraries:
from pprint import pprint
m = [0, 1, 1, 0,
1, 1, 0, 0,
0, 0, 0, 1]
def is_neighbour(x1, y1, x2, y2):
return (x1 in (x2-1, x2+1) and y1 == y2) or \
(x1 == x2 and y1 in (y2+1, y2-1))
def is_value_touching_group(val, groups, x, y):
for d in groups:
if d['color'] == val and any(is_neighbour(x, y, *cell) for cell in d['cells']):
return d
def check(m, w, h):
groups = []
for i in range(h):
for j in range(w):
val = m[i*w + j]
touching_group = is_value_touching_group(val, groups, i, j)
if touching_group:
touching_group['cells'].append( (i, j) )
else:
groups.append({'color':val, 'cells':[(i, j)]})
final_groups = []
while groups:
current_group = groups.pop()
for c in current_group['cells']:
touching_group = is_value_touching_group(current_group['color'], groups, *c)
if touching_group:
touching_group['cells'].extend(current_group['cells'])
break
else:
final_groups.append(current_group['cells'])
return final_groups
pprint( check(m, 4, 3) )
Prints:
[[(2, 3)],
[(0, 3), (1, 3), (1, 2), (2, 2), (2, 0), (2, 1)],
[(0, 1), (0, 2), (1, 1), (1, 0)],
[(0, 0)]]
Returns as a list of groups under value key.
import numpy as np
import math
def get_keys(old_dict):
new_dict = {}
for key, value in old_dict.items():
if value not in new_dict.keys():
new_dict[value] = []
new_dict[value].append(key)
else:
new_dict[value].append(key)
return new_dict
def is_neighbor(a,b):
if a==b:
return True
else:
distance = abs(a[0]-b[0]), abs(a[1]-b[1])
return distance == (0,1) or distance == (1,0)
def collate(arr):
arr2 = arr.copy()
ret = []
for a in arr:
for i, b in enumerate(arr2):
if set(a).intersection(set(b)):
a = list(set(a+b))
ret.append(a)
for clist in ret:
clist.sort()
return [list(y) for y in set([tuple(x) for x in ret])]
def get_groups(d):
for k,v in d.items():
ret = []
for point in v:
matches = [a for a in v if is_neighbor(point, a)]
ret.append(matches)
d[k] = collate(ret)
return d
a = np.array([[0,1,1,0],
[1,1,0,0],
[0,0,1,1]])
d = dict(np.ndenumerate(a))
d = get_keys(d)
d = get_groups(d)
print(d)
Result:
{
0: [[(0, 3), (1, 2), (1, 3)], [(0, 0)], [(2, 0), (2, 1)]],
1: [[(2, 2), (2, 3)], [(0, 1), (0, 2), (1, 0), (1, 1)]]
}
This question already has answers here:
How to get the cartesian product of multiple lists
(17 answers)
Closed 8 months ago.
I'm looking to generate the cartesian product of a relatively large number of arrays to span a high-dimensional grid. Because of the high dimensionality, it won't be possible to store the result of the cartesian product computation in memory; rather it will be written to hard disk. Because of this constraint, I need access to the intermediate results as they are generated. What I've been doing so far is this:
for x in xrange(0, 10):
for y in xrange(0, 10):
for z in xrange(0, 10):
writeToHdd(x,y,z)
which, apart from being very nasty, is not scalable (i.e. it would require me writing as many loops as dimensions). I have tried to use the solution proposed here, but that is a recursive solution, which therefore makes it quite hard to obtain the results on the fly as they are being generated. Is there any 'neat' way to do this other than having a hardcoded loop per dimension?
In plain Python, you can generate the Cartesian product of a collection of iterables using itertools.product.
>>> arrays = range(0, 2), range(4, 6), range(8, 10)
>>> list(itertools.product(*arrays))
[(0, 4, 8), (0, 4, 9), (0, 5, 8), (0, 5, 9), (1, 4, 8), (1, 4, 9), (1, 5, 8), (1, 5, 9)]
In Numpy, you can combine numpy.meshgrid (passing sparse=True to avoid expanding the product in memory) with numpy.ndindex:
>>> arrays = np.arange(0, 2), np.arange(4, 6), np.arange(8, 10)
>>> grid = np.meshgrid(*arrays, sparse=True)
>>> [tuple(g[i] for g in grid) for i in np.ndindex(grid[0].shape)]
[(0, 4, 8), (0, 4, 9), (1, 4, 8), (1, 4, 9), (0, 5, 8), (0, 5, 9), (1, 5, 8), (1, 5, 9)]
I think I figured out a nice way using a memory mapped file:
def carthesian_product_mmap(vectors, filename, mode='w+'):
'''
Vectors should be a tuple of `numpy.ndarray` vectors. You could
also make it more flexible, and include some error checking
'''
# Make a meshgrid with `copy=False` to create views
grids = np.meshgrid(*vectors, copy=False, indexing='ij')
# The shape for concatenating the grids from meshgrid
shape = grid[0].shape + (len(vectors),)
# Find the "highest" dtype neccesary
dtype = np.result_type(*vectors)
# Instantiate the memory mapped file
M = np.memmap(filename, dtype, mode, shape=shape)
# Fill the memmap with the grids
for i, grid in enumerate(grids):
M[...,i] = grid
# Make sure the data is written to disk (optional?)
M.flush()
# Reshape to put it in the right format for Carthesian product
return M.reshape((-1, len(vectors)))
But I wonder if you really need to store the whole Carthesian product (there's a lot of data duplication). Is it not an option to generate the rows in the product at the moment they're needed?
It seems you just want to loop over an arbitrary number of dimensions. My generic solution for this is using an index field and increment indices plus handling overflows.
Example:
n = 3 # number of dimensions
N = 1 # highest index value per dimension
idx = [0]*n
while True:
print(idx)
# increase first dimension
idx[0] += 1
# handle overflows
for i in range(0, n-1):
if idx[i] > N:
# reset this dimension and increase next higher dimension
idx[i] = 0
idx[i+1] += 1
if idx[-1] > N:
# overflow in the last dimension, we are finished
break
Gives:
[0, 0, 0]
[1, 0, 0]
[0, 1, 0]
[1, 1, 0]
[0, 0, 1]
[1, 0, 1]
[0, 1, 1]
[1, 1, 1]
Numpy has something similar inbuilt: ndenumerate.
Say I have a list of tuples containing the RGB information of each pixels in an image from left to right, top to bottom.
[(1,2,3),(2,3,4),(1,2,4),(9,2,1),(1,1,1),(3,4,5)]
Assuming the width and height of the image is already know, is there a way I can represent the image using list of list?
For example, let's say the above list of tuples represent a 2x3 image, image[1][2] should give me the RGB tuple (3,4,5).
Use the step argument in range (or xrange):
>>> width = 2
>>> pixels = [(1,2,3),(2,3,4),(1,2,4),(9,2,1),(1,1,1),(3,4,5)]
>>> image = [pixels[x:x+width] for x in range(0,len(pixels),width)]
>>> image
[[(1, 2, 3), (2, 3, 4)], [(1, 2, 4), (9, 2, 1)], [(1, 1, 1), (3, 4, 5)]]
It will make x increment by the value of the step, instead of the default, which is 1. If you are familiar with Java, it's similar to:
for (int x=0; x<length; x = x+step)