Tensorflow: tensor binarization - python

I want to transform this dataset in such a way that each tensor has a given size n and that a feature at index i of this new tensor is set to 1 if and only if there is a i in the original feature (modulo n).
I hope the following example will make things clearer
Let's suppose I have a dataset like:
t = tf.constant([
[0, 3, 4],
[12, 2 ,4]])
ds = tf.data.Dataset.from_tensors(t)
I want to get (if n = 9)
t = tf.constant([
[1, 0, 0, 1, 1, 0, 0, 0, 0], # index set to 1 are 0, 3 and 4
[0, 0, 1, 1, 1, 0, 0, 0, 0]]) # index set to 1 are 2, 4, and 12%9 = 3
I know how to apply the modulo to a tensor, but I don't find how to do the rest of the transformation
thanks

That is similar to tf.one_hot, only for multiple values at the same time. Here is a way to do that:
import tensorflow as tf
def binarization(t, n):
# One-hot encoding of each value
t_1h = tf.one_hot(t % n, n, dtype=tf.bool, on_value=True, off_value=False)
# Reduce across last dimension of the original tensor
return tf.cast(tf.reduce_any(t_1h, axis=-2), t.dtype)
# Test
with tf.Graph().as_default(), tf.Session() as sess:
t = tf.constant([
[ 0, 3, 4],
[12, 2, 4]
])
t_m1h = binarization(t, 9)
print(sess.run(t_m1h))
Output:
[[1 0 0 1 1 0 0 0 0]
[0 0 1 1 1 0 0 0 0]]

Related

Scanning for groups of the same value in numpy array

I have a numpy array where 0 denotes empty space and 1 denotes that a location is filled. I am trying to find a quick method of scanning the numpy array for where there are multiple values of zero adjacent to each other and return the location of the central zero.
For Example if I had the following array
[0 1 0 1]
[0 0 0 1]
[0 1 0 1]
[1 1 1 1]
I want to return the locations for which there is an adjacent zero on either side of a central zero
e.g
[1,1]
as this is the central of 3 zeros, i.e there is a zero either side of the zero at this location
Im aware that this can be calculated using if statements, but wondered if there was a more pythonic way of doing this.
Any help is greatly appreciated
The desired output here for arbitrary inputs is not exhaustively specified in the question, but here is a possible approach that might be useful for this kind of problem, and adapted to the details of the desired output. It uses np.cumsum, np.bincount, np.where, and np.median to find the middle index for groups of consecutive zeros along rows of a 2D array:
import numpy as np
def find_groups(x, min_size=3, value=0):
# Compute a sequential label for groups in each row.
xc = (x != value).cumsum(1)
# Count the number of occurances per group in each row.
counts = np.apply_along_axis(
lambda x: np.bincount(x, minlength=1 + xc.max()),
axis=1, arr=xc)
# Filter by minimum number of occurances.
i, j = np.where(counts >= min_size)
# Compute the median index of each group.
return [
(ii, int(np.ceil(np.median(np.where(xc[ii] == jj)[0]))))
for ii, jj in zip(i, j)
]
x = np.array([[0, 1, 0, 1],
[0, 0, 0, 1],
[0, 1, 0, 1],
[1, 1, 1, 1]])
print(find_groups(x))
# [(1, 1)]
It should work properly even for multiple rows with groups of varying sizes, and even multiple groups per row:
x2 = np.array([[0, 1, 0, 1, 1, 1, 1],
[0, 0, 0, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0, 0]])
print(find_groups(x2))
# [(1, 1), (1, 5), (2, 3), (3, 3)]

Find Distance to Nearest Zero in NumPy Array

Let's say I have a NumPy array:
x = np.array([0, 1, 2, 0, 4, 5, 6, 7, 0, 0])
At each index, I want to find the distance to nearest zero value. If the position is a zero itself then return zero as a distance. Afterward, we are only interested in distances to the nearest zero that is to the right of the current position. The super naive approach would be something like:
out = np.full(x.shape[0], x.shape[0]-1)
for i in range(x.shape[0]):
j = 0
while i + j < x.shape[0]:
if x[i+j] == 0:
break
j += 1
out[i] = j
And the output would be:
array([0, 2, 1, 0, 4, 3, 2, 1, 0, 0])
I'm noticing a countdown/decrement pattern in the output in between the zeros. So, I might be able to do use the locations of the zeros (i.e., zero_indices = np.argwhere(x == 0).flatten())
What is the fastest way to get the desired output in linear time?
Approach #1 : Searchsorted to the rescue for linear-time in a vectorized manner (before numba guys come in)!
mask_z = x==0
idx_z = np.flatnonzero(mask_z)
idx_nz = np.flatnonzero(~mask_z)
# Cover for the case when there's no 0 left to the right
# (for same results as with posted loop-based solution)
if x[-1]!=0:
idx_z = np.r_[idx_z,len(x)]
out = np.zeros(len(x), dtype=int)
idx = np.searchsorted(idx_z, idx_nz)
out[~mask_z] = idx_z[idx] - idx_nz
Approach #2 : Another with some cumsum -
mask_z = x==0
idx_z = np.flatnonzero(mask_z)
# Cover for the case when there's no 0 left to the right
if x[-1]!=0:
idx_z = np.r_[idx_z,len(x)]
out = idx_z[np.r_[False,mask_z[:-1]].cumsum()] - np.arange(len(x))
Alternatively, last step of cumsum could be replaced by repeat functionality -
r = np.r_[idx_z[0]+1,np.diff(idx_z)]
out = np.repeat(idx_z,r)[:len(x)] - np.arange(len(x))
Approach #3 : Another with mostly just cumsum -
mask_z = x==0
idx_z = np.flatnonzero(mask_z)
pp = np.full(len(x), -1)
pp[idx_z[:-1]] = np.diff(idx_z) - 1
if idx_z[0]==0:
pp[0] = idx_z[1]
else:
pp[0] = idx_z[0]
out = pp.cumsum()
# Handle boundary case and assigns 0s at original 0s places
out[idx_z[-1]:] = np.arange(len(x)-idx_z[-1],0,-1)
out[mask_z] = 0
You could work from the other side. Keep a counter on how many non zero digits have passed and assign it to the element in the array. If you see 0, reset the counter to 0
Edit: if there is no zero on the right, then you need another check
x = np.array([0, 1, 2, 0, 4, 5, 6, 7, 0, 0])
out = x
count = 0
hasZero = False
for i in range(x.shape[0]-1,-1,-1):
if out[i] != 0:
if not hasZero:
out[i] = x.shape[0]-1
else:
count += 1
out[i] = count
else:
hasZero = True
count = 0
print(out)
You can use the difference between the indices of each position and the cumulative max of zero positions to determine the distance to the preceding zero. This can be done forward and backward. The minimum between forward and backward distance to the preceding (or next) zero will be the nearest:
import numpy as np
indices = np.arange(x.size)
zeroes = x==0
forward = indices - np.maximum.accumulate(indices*zeroes) # forward distance
forward[np.cumsum(zeroes)==0] = x.size-1 # handle absence of zero from edge
forward = forward * (x!=0) # set zero positions to zero
zeroes = zeroes[::-1]
backward = indices - np.maximum.accumulate(indices*zeroes) # backward distance
backward[np.cumsum(zeroes)==0] = x.size-1 # handle absence of zero from edge
backward = backward[::-1] * (x!=0) # set zero positions to zero
distZero = np.minimum(forward,backward) # closest distance (minimum)
results:
distZero
# [0, 1, 1, 0, 1, 2, 2, 1, 0, 0]
forward
# [0, 1, 2, 0, 1, 2, 3, 4, 0, 0]
backward
# [0, 2, 1, 0, 4, 3, 2, 1, 0, 0]
Special case where no zeroes are present on outer edges:
x = np.array([3, 1, 2, 0, 4, 5, 6, 0,8,8])
forward: [9 9 9 0 1 2 3 0 1 2]
backward: [3 2 1 0 3 2 1 0 9 9]
distZero: [3 2 1 0 1 2 1 0 1 2]
also works with no zeroes at all
[EDIT] non-numpy solutions ...
if you're looking for an O(N) solution that doesn't require numpy, you can apply this strategy using the accumulate function from itertools:
x = [0, 1, 2, 0, 4, 5, 6, 7, 0, 0]
from itertools import accumulate
maxDist = len(x) - 1
zeroes = [maxDist*(v!=0) for v in x]
forward = [*accumulate(zeroes,lambda d,v:min(maxDist,(d+1)*(v!=0)))]
backward = accumulate(zeroes[::-1],lambda d,v:min(maxDist,(d+1)*(v!=0)))
backward = [*backward][::-1]
distZero = [min(f,b) for f,b in zip(forward,backward)]
print("x",x)
print("f",forward)
print("b",backward)
print("d",distZero)
output:
x [0, 1, 2, 0, 4, 5, 6, 7, 0, 0]
f [0, 1, 2, 0, 1, 2, 3, 4, 0, 0]
b [0, 2, 1, 0, 4, 3, 2, 1, 0, 0]
d [0, 1, 1, 0, 1, 2, 2, 1, 0, 0]
If you don't want to use any library, you can accumulate the distances manually in a loop:
x = [0, 1, 2, 0, 4, 5, 6, 7, 0, 0]
forward,backward = [],[]
fDist = bDist = maxDist = len(x)-1
for f,b in zip(x,reversed(x)):
fDist = min(maxDist,(fDist+1)*(f!=0))
forward.append(fDist)
bDist = min(maxDist,(bDist+1)*(b!=0))
backward.append(bDist)
backward = backward[::-1]
distZero = [min(f,b) for f,b in zip(forward,backward)]
print("x",x)
print("f",forward)
print("b",backward)
print("d",distZero)
output:
x [0, 1, 2, 0, 4, 5, 6, 7, 0, 0]
f [0, 1, 2, 0, 1, 2, 3, 4, 0, 0]
b [0, 2, 1, 0, 4, 3, 2, 1, 0, 0]
d [0, 1, 1, 0, 1, 2, 2, 1, 0, 0]
My first intuition would be to use slicing. If x can be a normal list instead of a numpy array, then you could use
out = [x[i:].index(0) for i,_ in enumerate(x)]
if numpy is necessary then you can use
out = [np.where(x[i:]==0)[0][0] for i,_ in enumerate(x)]
but this is less efficient because you are finding all zero locations to the right of the value and then pulling out just the first. Almost definitely a better way to do this in numpy.
Edit: I am sorry, I misunderstood. This will give you the distance to the nearest zeros - may it be at left or right. But you can use d_right as intermediate result. This does not cover the edge case of not having any zero to the right though.
import numpy as np
x = np.array([0, 1, 2, 0, 4, 5, 6, 7, 0, 0])
# Get the distance to the closest zero from the left:
zeros = x == 0
zero_locations = np.argwhere(x == 0).flatten()
zero_distances = np.diff(np.insert(zero_locations, 0, 0))
temp = x.copy()
temp[~zeros] = 1
temp[zeros] = -(zero_distances-1)
d_left = np.cumsum(temp) - 1
# Get the distance to the closest zero from the right:
zeros = x[::-1] == 0
zero_locations = np.argwhere(x[::-1] == 0).flatten()
zero_distances = np.diff(np.insert(zero_locations, 0, 0))
temp = x.copy()
temp[~zeros] = 1
temp[zeros] = -(zero_distances-1)
d_right = np.cumsum(temp) - 1
d_right = d_right[::-1]
# Get the smallest distance from both sides:
smallest_distances = np.min(np.stack([d_left, d_right]), axis=0)
# np.array([0, 1, 1, 0, 1, 2, 2, 1, 0, 0])

python - applying a mask to an array in a for loop

I have this code:
import numpy as np
result = {}
result['depth'] = [1,1,1,2,2,2]
result['generation'] = [1,1,1,2,2,2]
result['dimension'] = [1,2,3,1,2,3]
result['data'] = [np.array([0,0,0]), np.array([0,0,0]), np.array([0,0,0]), np.array([0,0,0]), np.array([0,0,0]), np.array([0,0,0])]
for v in np.unique(result['depth']):
temp_v = (result['depth'] == v)
values_v = [result[string][temp_v] for string in result.keys()]
this_v = dict(zip(result.keys(), values_v))
in which I want to create a new dictcalled 'this_v', with the same keys as the original dict result, but fewer values.
The line:
values_v = [result[string][temp_v] for string in result.keys()]
gives an error
TypeError: only integer scalar arrays can be converted to a scalar index
which I don't understand, since I can create ex = result[result.keys()[0]][temp_v] just fine. It just does not let me do this with a for loop so that I can fill the list.
Any idea as to why it does not work?
In order to solve your problem (finding and dropping duplicates) I encourage you to use pandas. It is a Python module that makes your life absurdly simple:
import numpy as np
result = {}
result['depth'] = [1,1,1,2,2,2]
result['generation'] = [1,1,1,2,2,2]
result['dimension'] = [1,2,3,1,2,3]
result['data'] = [np.array([0,0,0]), np.array([0,0,0]), np.array([0,0,0]),\
np.array([0,0,0]), np.array([0,0,0]), np.array([0,0,0])]
# Here comes pandas!
import pandas as pd
# Converting your dictionary of lists into a beautiful dataframe
df = pd.DataFrame(result)
#> data depth dimension generation
# 0 [0, 0, 0] 1 1 1
# 1 [0, 0, 0] 1 2 1
# 2 [0, 0, 0] 1 3 1
# 3 [0, 0, 0] 2 1 2
# 4 [0, 0, 0] 2 2 2
# 5 [0, 0, 0] 2 3 2
# Dropping duplicates... in one single command!
df = df.drop_duplicates('depth')
#> data depth dimension generation
# 0 [0, 0, 0] 1 1 1
# 3 [0, 0, 0] 2 1 2
If you want oyur data back in the original format... you need yet again just one line of code!
df.to_dict('list')
#> {'data': [array([0, 0, 0]), array([0, 0, 0])],
# 'depth': [1, 2],
# 'dimension': [1, 1],
# 'generation': [1, 2]}

Editting python 2-dimensional array without for-loop?

So, I have a given 2 dimensional matrix which is randomly generated:
a = np.random.randn(4,4)
which gives output:
array([[-0.11449491, -2.7777728 , -0.19784241, 1.8277976 ],
[-0.68511473, 0.40855461, 0.06003551, -0.8779363 ],
[-0.55650378, -0.16377137, 0.10348714, -0.53449633],
[ 0.48248298, -1.12199767, 0.3541335 , 0.48729845]])
I want to change all the negative values to 0 and all the positive values to 1.
How can I do this without a for loop?
You can use np.where()
import numpy as np
a = np.random.randn(4,4)
a = np.where(a<0, 0, 1)
print(a)
[[1 1 0 1]
[1 0 1 0]
[1 1 0 0]
[0 1 1 0]]
(a<0).astype(int)
This is one possibly solution - converting the array to boolean array according to your condition and then converting it from boolean to integer.
array([[ 0.63694991, -0.02785534, 0.07505496, 1.04719295],
[-0.63054947, -0.26718763, 0.34228736, 0.16134474],
[ 1.02107383, -0.49594998, -0.11044738, 0.64459594],
[ 0.41280766, 0.668819 , -1.0636972 , -0.14684328]])
And the result -
(a<0).astype(int)
>>> array([[0, 1, 0, 0],
[1, 1, 0, 0],
[0, 1, 1, 0],
[0, 0, 1, 1]])

In order to generate all combinations of 1's and 0's we use a simple binary table. How can I easily create this binary table in an array?

For example the binary table for 3 bit:
0 0 0
0 0 1
0 1 0
1 1 1
1 0 0
1 0 1
And I want to store this into an n*n*2 array so it would be:
0 0 0
0 0 1
0 1 0
1 1 1
1 0 0
1 0 1
For generating the combinations automatically, you can use itertools.product standard library, which generates all possible combinations of the different sequences which are supplied, i. e. the cartesian product across the input iterables. The repeat argument comes in handy as all of our sequences here are identical ranges.
from itertools import product
x = [i for i in product(range(2), repeat=3)]
Now if we want an array instead a list of tuples from that, we can just pass this to numpy.array.
import numpy as np
x = np.array(x)
# [[0 0 0]
# [0 0 1]
# [0 1 0]
# [0 1 1]
# [1 0 0]
# [1 0 1]
# [1 1 0]
# [1 1 1]]
If you want all elements in a single list, so you could index them with a single index, you could chain the iterable:
from itertools import chain, product
x = list(chain.from_iterable(product(range(2), repeat=3)))
result: [0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1]
Most people would expect 2^n x n as in
np.c_[tuple(i.ravel() for i in np.mgrid[:2,:2,:2])]
# array([[0, 0, 0],
# [0, 0, 1],
# [0, 1, 0],
# [0, 1, 1],
# [1, 0, 0],
# [1, 0, 1],
# [1, 1, 0],
# [1, 1, 1]])
Explanation: np.mgrid as used here creates the coordinates of the corners of a unit cube which happen to be all combinations of 0 and 1. The individual coordinates are then ravelled and joined as columns by np.c_
Here's a recursive, native python (no libraries) version of it:
def allBinaryPossiblities(maxLength, s=""):
if len(s) == maxLength:
return s
else:
temp = allBinaryPossiblities(maxLength, s + "0") + "\n"
temp += allBinaryPossiblities(maxLength, s + "1")
return temp
print (allBinaryPossiblities(3))
It prints all possible:
000
001
010
011
100
101
110
111

Categories