Bucketize real valued Tensor - python

I have a matrix like this:
tf_ent = tf.Variable([ [9.96, 8.65, 0.8, 0.1 ],
[0.7, 8.33, 0 , 0 ],
[0.9, 0, 6, 7.33],
[6.60, 0, 3, 5.5 ],
[9.49, 0., 0, 0 ],
[0.4, 8.45, 0, 0.2 ],
[0.3, 0, 5.82, 8.28]])
I want to get an output in which the weight in the same range get the same number.
To put it another way, This number is a good representative of the weight range in the matrix.
So my ideal output could be something like this:
[[5, 5, 1, 1],
[1, 5, 0, 0],
[1, 0, 6, 5],
[4, 0, 2, 3],
[5, 0., 0, 0],
[1, 5, 0, 1],
[1, 0, 3, 5]]
In this case I mapped with this condition(consider that i have my mapping):
range(0.1, 1) -> 1
range( 1, 3) -> 2
range( 3, 5) -> 3
range(5, 7) -> 4
range (7, 10) -> 5
firstly I tried with tf.histogram_fixed_width_bins but it did not give me the expected result:
this is my code:
value_ranges = [0, 10.0]
nbins = 5
new_f_w_t = tf.histogram_fixed_width_bins(tf_ent, value_ranges, nbins)
print(new_f_w_t)
and this is the output of this code:
output= [[4 4 0 0]
[0 4 0 0]
[0 0 3 3]
[3 0 1 2]
[4 0 0 0]
[0 4 0 0]
[0 0 2 4]]
output more like getting an equivalent integer of those float numbers rather than considering any range.
How can I get a dynamic mapping of the matrix in an efficient way?
Thanks in advance:)

This is equivalent to a np.digitize operation. You can use math_ops._bucketize:
from tensorflow.python.ops import math_ops
bins = [0.1, 1, 3, 5, 7, 10]
new_f_w_t = math_ops._bucketize(tf_ent, boundaries=bins)
new_f_w_t.numpy()
array([[5, 5, 1, 1],
[1, 5, 0, 0],
[1, 0, 4, 5],
[4, 0, 3, 4],
[5, 0, 0, 0],
[1, 5, 0, 1],
[1, 0, 4, 5]], dtype=int32)

The tf.searchsorted() call can also be used for this purpose, which avoids calling the hidden _bucketize function. You have to tile the bins to match the dimensions of the input values, though.
For example:
bins = tf.tile([[0.1, 1, 3, 5, 7, 10]], (tf_ent.shape[0], 1))
new_f_w_t = tf.searchsorted(bins, tf_ent, side="right")

Related

Numpy: How do I reorder the rows of an array to match the rows of another array?

I have two 2d arrays that contain XYZ points, A and B.
Array A has the shape (796704, 3) and is my original pointcloud. Each point is unique except for (0, 0, 0) but those don't matter:
A = [[x_1, y_1, z_1],
[x_2, y_2, z_2],
[x_3, y_3, z_3],
[x_4, y_4, z_4],
[x_5, y_5, z_5],
...]
Array B has the shape (N, 4) and is a cropped version of A (N<796704).
The remaining points did not change and are still equal to their counterpart in A.
The fourth column contains the segmentation value of each point.
The row order of B is completely random and doesn't match A anymore.
B = [[x_4, y_4, z_4, 5],
[x_2, y_2, z_2, 12],
[x_6, y_6, z_6, 5],
[x_7, y_7, z_7, 3],
[x_9, y_9, z_9, 3]]
I need to reorder the rows of B so that they match the rows of A with the same point and fill in the gaps with a zero row:
B = [[0.0, 0.0, 0.0, 0],
[x_2, y_2, z_2, 12],
[0.0, 0.0, 0.0, 0],
[x_4, y_4, z_4, 5],
[0.0, 0.0, 0.0, 0],
[x_6, y_6, z_6, 5],
[x_7, y_7, z_7, 3],
[0.0, 0.0, 0.0, 0],
[x_9, y_9, z_9, 3],
[0.0, 0.0, 0.0, 0],
[0.0, 0.0, 0.0, 0],
[0.0, 0.0, 0.0, 0]
...]
In the end B should have the shape (796704, 4).
I tried using the numpy_indexed package like it was proposed in this very similar question but the issue here is that B doesn't contain all the points of A:
import numpy_indexed as npi
B[npi.indices(B[:, :-1], A)]
I'm not familiar with numpy and my only solution would be a for-loop but that would be far to slow for my application. Is there some sort of fast method of solving this problem?
Pandas => reindex:
import pandas as pd
import numpy as np
A = np.array([[8, 7, 4],
[0, 7, 7],
[4, 7, 0],
[5, 5, 8],
[8, 7, 5]])
B = np.array([[8, 7, 4, 2],
[4, 7, 0, 5],
[8, 7, 5, 6]])
df_B = (pd.DataFrame(B, columns=["x", "y", "z", "seg"])
.set_index(["x", "y", "z"])
.reindex(list(map(tuple, A)))
.reset_index())
df_B.loc[df_B.seg.isna()] = 0
B = df_B.values
print(B)
Result:
array([[8., 7., 4., 2.],
[0., 0., 0., 0.],
[4., 7., 0., 5.],
[0., 0., 0., 0.],
[8., 7., 5., 6.]])
Solving your problem just with numpy:
Case 1
You're working just with numbers:
import numpy as np
A = np.array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4],
[5, 5, 5],
[6, 6, 6],
[7, 7, 7],
[8, 8, 8],
[9, 9, 9],
[10,10, 10]
])
B = np.array([[4, 4, 4, 5],
[2, 2, 2, 12],
[6, 6, 6, 5],
[7, 7, 7, 3],
[9, 9, 9, 3]])
c = np.insert(A, 3, 0, axis = 1)
d = np.vstack((B,c[np.in1d(c[:,0],B[:,0], invert=True)]*0))
print(d)
Out:
[[ 4 4 4 5]
[ 2 2 2 12]
[ 6 6 6 5]
[ 7 7 7 3]
[ 9 9 9 3]
[ 0 0 0 0] # previously 1, 1, 1, 0
[ 0 0 0 0] # previously 3, 3, 3, 0
[ 0 0 0 0] # previously 5, 5, 5, 0
[ 0 0 0 0] # previously 8, 8, 8, 0
[ 0 0 0 0]] # previously 10, 10, 10, 0
Explanation:
1º c will be a copy of A with a new field with a 0:
c = np.insert(A, 3, 0, axis = 1)
If I print c right now I will get this:
[[ 1 1 1 0]
[ 2 2 2 0]
[ 3 3 3 0]
[ 4 4 4 0]
[ 5 5 5 0]
[ 6 6 6 0]
[ 7 7 7 0]
[ 8 8 8 0]
[ 9 9 9 0]
[10 10 10 0]]
2º You create a new array with B, and the parts of c that are not in B multiplied by 0.
d = np.vstack((B,c[np.in1d(c[:,0],B[:,0], invert=True)]*0))
2.1 np.vstack((B,_)) Here I removed the c just to be more easy to you to see the args that vstack receive. You have a tuple with the two arrays that you want to concatenate.
2.2 c[np.in1d(c[:,0],B[:,0], invert=True)]*0 Instead of passing all the c a pass c selecting np.in1d(c[:,0],B[:,0], invert=True) of c and multiplying it by 0.
2.3 np.in1d(c[:,0],B[:,0], invert=True) If I do np.in1d(c[:,0],B[:,0]) I get a boolean array telling me which x_n of c also exists in B, if I set invert=True i'll get which x_n of c does NOT exists in B. (Another way to to that invertion is by using the tilde operator ~, so ~np.in1d(c[:,0],B[:,0]) == np.in1d(c[:,0],B[:,0], invert=True))
Since each point is unique with the exception of the 0,0,0,0 ones, when I do c[np.in1d(c[:,0],B[:,0], invert=True)] I get:
array([[ 1, 1, 1, 0],
[ 3, 3, 3, 0],
[ 5, 5, 5, 0],
[ 8, 8, 8, 0],
[10, 10, 10, 0]])
if I multiply by 0 I get:
array([[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
So in np.vstack((B,c[np.in1d(c[:,0],B[:,0], invert=True)]*0)) I concatenate the B and the c.
Being the B this:
array([[ 4, 4, 4, 5],
[ 2, 2, 2, 12],
[ 6, 6, 6, 5],
[ 7, 7, 7, 3],
[ 9, 9, 9, 3]])
and c the array of 0's above. The result at the end is:
array([[ 4, 4, 4, 5],
[ 2, 2, 2, 12],
[ 6, 6, 6, 5],
[ 7, 7, 7, 3],
[ 9, 9, 9, 3],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0],
[ 0, 0, 0, 0]])
Case 2
If you are working with strings and numbers you can do that way:
import numpy as np
A = np.array([['x_1', 'y_1', 'z_1'],
['x_2', 'y_2', 'z_2'],
['x_3', 'y_3', 'z_3'],
['x_4', 'y_4', 'z_4'],
['x_5', 'y_5', 'z_5'],
['x_6', 'y_6', 'z_6'],
['x_7', 'y_7', 'z_7'],
['x_8', 'y_8', 'z_8'],
['x_9', 'y_9', 'z_9'],
['x_10', 'y_10', 'z_10']
])
B = np.array([['x_4', 'y_4', 'z_4', 5],
['x_2', 'y_2', 'z_2', 12],
['x_6', 'y_6', 'z_6', 5],
['x_7', 'y_7', 'z_7', 3],
['x_9', 'y_9', 'z_9', 3]])
c = np.insert(A, 3, 0, axis = 1)
c[np.in1d(c[:,0],B[:,0], invert=True)] = 0
d = np.vstack((B,c[np.in1d(c[:,0],B[:,0], invert=True)]))
print(d)
Out:
[['x_4' 'y_4' 'z_4' '5']
['x_2' 'y_2' 'z_2' '12']
['x_6' 'y_6' 'z_6' '5']
['x_7' 'y_7' 'z_7' '3']
['x_9' 'y_9' 'z_9' '3']
['0' '0' '0' '0']
['0' '0' '0' '0']
['0' '0' '0' '0']
['0' '0' '0' '0']
['0' '0' '0' '0']]
Explanation:
1º c will be a copy of A with a new field with a 0:
c = np.insert(A, 3, 0, axis = 1)
If I print c right now I will get this:
[['x_1' 'y_1' 'z_1' '0']
['x_2' 'y_2' 'z_2' '0']
['x_3' 'y_3' 'z_3' '0']
['x_4' 'y_4' 'z_4' '0']
['x_5' 'y_5' 'z_5' '0']
['x_6' 'y_6' 'z_6' '0']
['x_7' 'y_7' 'z_7' '0']
['x_8' 'y_8' 'z_8' '0']
['x_9' 'y_9' 'z_9' '0']
['x_10' 'y_10' 'z_10' '0']]
2º At the fields of c that don't exist in B, i'll set as 0
c[np.in1d(c[:,0],B[:,0], invert=True)] = 0
3º d will be B + the c part that was set as 0
d = np.vstack((B,c[np.in1d(c[:,0],B[:,0], invert=True)]))
Since in this case you're working with strings and numbers in the same array you can't just multiply all by 0 at the d. So you need to set the fields of c as 0 and then select the 0's fields.
Useful links:
Code that I based my answer in.
Tilde Operator.
I managed to solve this problem by using the numpy_indexed package, which I mentioned in my question.
The solution:
A = np.array([[8, 7, 4],
[0, 7, 7],
[4, 3, 0],
[5, 5, 8],
[3, 9, 5]])
B = np.array([[3, 9, 5, 6],
[8, 7, 4, 2],
[4, 3, 0, 5]])
# Create a new, zero-filled, array C with length of A
C = np.zeros((A.shape[0], 4))
# Insert B at the beginning of C
C[:B.shape[0], :B.shape[1]] = B
print(C)
Out:
[[3, 9, 5, 6],
[8, 7, 4, 2],
[4, 3, 0, 5],
[0, 0, 0, 0],
[0, 0, 0, 0]]
# Using the numpy_indexed package reorder the rows.
# The last index of C is used as a fill value in case
# a row wasn't found in A thus filling the gaps with [0,0,0,0]
import numpy_indexed as npi
D = C[npi.indices(C[:, :-1], A, missing=-1)]
print(D)
Out:
[[8, 7, 4, 2],
[0, 0, 0, 0],
[4, 3, 0, 5],
[0, 0, 0, 0],
[3, 9, 5, 6]]

Swapping columns in a numpy array by a given indexs

I am trying to change column position of a matrix by a given indexs of array
import numpy as np
t = np.array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
indexs = np.array([3, 4, 2, 1, 0])
check = [False for i in range(len(indexs))]
for i in range(len(indexs)):
check[i] = True
if (i != indexs[i] and check[indexs[i]] == False):
check[indexs[i]] = True
t[:, [i, indexs[i]]] = t[:, [indexs[i], i]]
print(t)
The result I want:
[[3 4 2 1 0]
[3 4 2 1 0]
[3 4 2 1 0]
[3 4 2 1 0]]
I want to return an array whose column positions is the same as indexs but I can't.
How can I achieve that?
Just index the array along the dimension you want:
t[:, indexs]
if you transpose the matrix it's easy
transposed = t.T
result = np.array([transposed[i] for i in indexs])
result = result.T
array([[3, 4, 2, 1, 0],
[3, 4, 2, 1, 0],
[3, 4, 2, 1, 0],
[3, 4, 2, 1, 0]])

How to apply mask from array to another matrix in numpy

How you I apply a mask in numpy to get this output?
ar2 = np.arange(1,26)[::-1].reshape([5,5]).T
ar3 = np.array([1,1,-1,-1,1])
print ar2, '\n\n', ar3
[[25 20 15 10 5]
[24 19 14 9 4]
[23 18 13 8 3]
[22 17 12 7 2]
[21 16 11 6 1]]
[ 1 1 -1 -1 1]
--apply where ar3 = 1: ar2/ar2[:,0][:, np.newaxis]
--apply where ar3 = -1: ar2/ar2[:,4][:, np.newaxis]
The result I am after is:
[[1 0 0 0 0]
[1 0 0 0 0]
[ 7 6 4 2 1]
[11 8 6 3 1]
[1 0 0 0 0]]
I have tried np.where()
I don't see why np.where shouldn't work here:
>>> np.where((ar3==1)[:, None],
... ar2 // ar2[:, [0]], # where condition is True, divide by first column
... ar2 // ar2[:, [4]]) # where condition is False, divide by last column
array([[ 1, 0, 0, 0, 0],
[ 1, 0, 0, 0, 0],
[ 7, 6, 4, 2, 1],
[11, 8, 6, 3, 1],
[ 1, 0, 0, 0, 0]])
I'm using Python 3 that's why I used // (floor division) instead of regular division (/) otherwise the result would contain floats.
This computes the arrays eagerly, so it evaluates ar2 // ar2[:, [0]] and ar2 // ar2[:, [4]] for all values. Effectively holding 3 arrays of the size of ar2 in memory (the result and the two temporaries). If you want it more memory-efficient you need to do apply the mask before doing the operation:
>>> res = np.empty_like(ar2)
>>> mask = ar3 == 1
>>> res[mask] = ar2[mask] // ar2[mask][:, [0]]
>>> res[~mask] = ar2[~mask] // ar2[~mask][:, [4]]
>>> res
array([[ 1, 0, 0, 0, 0],
[ 1, 0, 0, 0, 0],
[ 7, 6, 4, 2, 1],
[11, 8, 6, 3, 1],
[ 1, 0, 0, 0, 0]])
This computes only the necessary values which uses less memory (and is probably faster too).
Not the most elegant, but here's what I could think of.
m = ar3 == -1
a = (ar2 // ar2[:, [0]])
a[m] = (ar2 // ar2[:, [4]])[m]
print(a)
array([[ 1, 0, 0, 0, 0],
[ 1, 0, 0, 0, 0],
[ 7, 6, 4, 2, 1],
[11, 8, 6, 3, 1],
[ 1, 0, 0, 0, 0]], dtype=int32)

numpy matrix. move all 0's to the end of each row

Given a matrix in python numpy which has for some of its rows, leading zeros. I need to shift all zeros to the end of the line.
E.g.
0 2 3 4
0 0 1 5
2 3 1 1
should be transformed to
2 3 4 0
1 5 0 0
2 3 1 1
Is there any nice way to do this in python numpy?
To fix for leading zeros rows -
def fix_leading_zeros(a):
mask = a!=0
flipped_mask = mask[:,::-1]
a[flipped_mask] = a[mask]
a[~flipped_mask] = 0
return a
To push all zeros rows to the back -
def push_all_zeros_back(a):
# Based on http://stackoverflow.com/a/42859463/3293881
valid_mask = a!=0
flipped_mask = valid_mask.sum(1,keepdims=1) > np.arange(a.shape[1]-1,-1,-1)
flipped_mask = flipped_mask[:,::-1]
a[flipped_mask] = a[valid_mask]
a[~flipped_mask] = 0
return a
Sample runs -
In [220]: a
Out[220]:
array([[0, 2, 3, 4],
[0, 0, 1, 5],
[2, 3, 1, 1]])
In [221]: fix_leading_zero_rows(a)
Out[221]:
array([[2, 3, 4, 0],
[1, 5, 0, 0],
[2, 3, 1, 1]])
In [266]: a
Out[266]:
array([[0, 2, 3, 4, 0],
[0, 0, 1, 5, 6],
[2, 3, 0, 1, 0]])
In [267]: push_all_zeros_back(a)
Out[267]:
array([[2, 3, 4, 0, 0],
[1, 5, 6, 0, 0],
[2, 3, 1, 0, 0]])
leading zeros, simple loop
ar = np.array([[0, 2, 3, 4],
[0, 0, 1, 5],
[2, 3, 1, 1]])
for i in range(ar.shape[0]):
for j in range(ar.shape[1]): # prevent infinite loop if row all zero
if ar[i,0] == 0:
ar[i]=np.roll(ar[i], -1)
ar
Out[31]:
array([[2, 3, 4, 0],
[1, 5, 0, 0],
[2, 3, 1, 1]])

calculations for different columns in a numpy array

I have a 2D array with filled with some values (column 0) and zeros (rest of the columns). I would like to do pretty much the same as I do with MS excel but using numpy, meaning to put into the rest of the columns values from calculations based on the first column. Here it is a MWE:
import numpy as np
a = np.zeros(20, dtype=np.int8).reshape(4,5)
b = [1, 2, 3, 4]
b = np.array(b)
a[:, 0] = b
# don't change the first column
for column in a[:, 1:]:
a[:, column] = column[0]+1
The expected output:
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]], dtype=int8)
The resulting output:
array([[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0],
[1, 0, 0, 0, 0]], dtype=int8)
Any help would be appreciated.
Looping is slow and there is no need to loop to produce the array that you want:
>>> a = np.ones(20, dtype=np.int8).reshape(4,5)
>>> a[:, 0] = b
>>> a
array([[1, 1, 1, 1, 1],
[2, 1, 1, 1, 1],
[3, 1, 1, 1, 1],
[4, 1, 1, 1, 1]], dtype=int8)
>>> np.cumsum(a, axis=1)
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
What went wrong
Let's start, as in the question, with this array:
>>> a
array([[1, 0, 0, 0, 0],
[2, 0, 0, 0, 0],
[3, 0, 0, 0, 0],
[4, 0, 0, 0, 0]], dtype=int8)
Now, using the code from the question, let's do the loop and see what column actually is:
>>> for column in a[:, 1:]:
... print(column)
...
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
As you can see, column is not the index of the column but the actual values in the column. Consequently, the following does not do what you would hope:
a[:, column] = column[0]+1
Another method
If we want to loop (so that we can do something more complex), here is another approach to generating the desired array:
>>> b = np.array([1, 2, 3, 4])
>>> np.column_stack([b+i for i in range(5)])
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8]])
Your usage of column is a little ambiguous: in for column in a[:, 1:], it is treated as a column and in the body, however, it is treated as index to the column. You can try this instead:
for column in range(1, a.shape[1]):
a[:, column] = a[:, column-1]+1
a
#array([[1, 2, 3, 4, 5],
# [2, 3, 4, 5, 6],
# [3, 4, 5, 6, 7],
# [4, 5, 6, 7, 8]], dtype=int8)

Categories