I have a tensor as follows:
arr = [[1.5,0.2],[2.3,0.1],[1.3,0.21],[2.2,0.09],[4.4,0.8]]
I would like to collect small arrays whose difference of first elements are within 0.3 and second elements are within 0.03.
For example [1.5,0.2] and [1.3,0.21] should belong to a same category. The difference of their first elements is 0.2<0.3 and second 0.01<0.03.
I want a tensor looks like this
arr = {[[1.5,0.2],[1.3,0.21]],[[2.3,0.1],[2.2,0.09]]}
How to do this in tensorflow? Eager mode is ok.
I found a way which is a bit ugly and slow:
samples = np.array([[1.5,0.2],[2.3,0.1],[1.3,0.2],[2.2,0.09],[4.4,0.8],[2.3,0.11]],dtype=np.float32)
ini_samples = samples
samples = tf.split(samples,2,1)
a = samples[0]
b = samples[1]
find_match1 = tf.reduce_sum(tf.abs(tf.expand_dims(a,0) - tf.expand_dims(a,1)),2)
a = tf.logical_and(tf.greater(find_match1, tf.zeros_like(find_match1)),tf.less(find_match1, 0.3*tf.ones_like(find_match1)))
find_match2 = tf.reduce_sum(tf.abs(tf.expand_dims(b,0) - tf.expand_dims(b,1)),2)
b = tf.logical_and(tf.greater(find_match2, tf.zeros_like(find_match2)),tf.less(find_match2, 0.03*tf.ones_like(find_match2)))
x,y = tf.unique(tf.reshape(tf.where(tf.logical_or(a,b)),[1,-1])[0])
r = tf.gather(ini_samples, x)
Does tensorflow have more elegant functions?
You cannot get a result composed of "groups" of vectors with different sizes. Instead, you can make a "group id" tensor that classifies each vector into a group according to your criteria. The part that makes this a bit more complicated is that you have to "fuse" groups with common elements, which I think can only be done with a loop. This code does something like that:
import tensorflow as tf
def make_groups(correspondences):
# Multiply each row by its index
m = tf.to_int32(correspondences) * tf.range(tf.shape(correspondences)[0])
# Pick the largest index for each row
r = tf.reduce_max(m, axis=1)
# While loop accounts for transitive correspondences
# (e.g. if A and B go toghether and B and C go together, then A, B and C go together)
# The loop makes sure every element gets the largest common group id
r_prev = -tf.ones_like(r)
r, _ = tf.while_loop(lambda r, r_prev: tf.reduce_any(tf.not_equal(r, r_prev)),
lambda r, r_prev: (tf.gather(r, r), tf.identity(r)),
[r, r_prev])
# Use unique indices to make sequential group ids starting from 0
return tf.unique(r)[1]
# Test
with tf.Graph().as_default(), tf.Session() as sess:
arr = tf.constant([[1.5 , 0.2 ],
[2.3 , 0.1 ],
[1.3 , 0.21],
[2.2 , 0.09],
[4.4 , 0.8 ],
[1.1 , 0.23]])
a = arr[:, 0]
b = arr[:, 0]
cond = (tf.abs(a - a[:, tf.newaxis]) < 0.3) | (tf.abs(b - b[:, tf.newaxis]) < 0.03)
groups = make_groups(cond)
print(sess.run(groups))
# [0 1 0 1 2 0]
So in this case, the groups would be:
[1.5, 0.2], [1.3, 0.21] and [1.1, 0.23]
[2.3, 0.1] and [2.2, 0.09]
[4.4, 0.8]
Related
I have the following numpy array :
A = np.array([[1,2,3,4,5],
[15,25,35,45,55]])
I would like to create a new array with the same shape by dividing each dimension by the last element of the dimension
The output desired would be :
B = np.array([[0.2,0.4,0.6,0.8,1],
[0.27272727,0.45454545,0.63636364,0.81818182,1]])
Any idea ?
Slice the last element while keeping the dimensions and divide:
B = A/A[:,[-1]] # slice with [] to keep the dimensions
or, better, to avoid an unnecessary copy:
B = A/A[:,-1,None]
output:
array([[0.2 , 0.4 , 0.6 , 0.8 , 1. ],
[0.27272727, 0.45454545, 0.63636364, 0.81818182, 1. ]])
You mean this?
B = np.array([[A[i][j]/A[i][len(A[i])-1] for j in range(0,len(A[i]))] for i in range(0,len(A))])
You can achieve this using:
[list(map(lambda i: i / a[-1], a)) for a in A]
Result:
[[0.2, 0.4, 0.6, 0.8, 1.0], [0.2727272727272727, 0.45454545454545453, 0.6363636363636364, 0.8181818181818182, 1.0]]
Adding on #mozway answer, it seems to be faster to take the last column and then add an axis with:
B = A/A[:,-1][:,None]
for instance.
See the benchmark:
I have done the code below:
size = int(len(np.where(features_temp != 0)[0]) * k_value)
idx = np.random.choice(np.where(np.in1d(features_temp, np.array(list_amino_numbers)))[0], size=size)
for i in idx:
features_temp[i] = np.random.choice(list_amino_numbers, p=probabilities[:, list_amino_numbers.index(features_temp[i])].tolist())
This code works well, but I think that it can run faster, mainly in the for iteration. There is some operation that I can change the for iteration?
Code Explanation: I am trying to change the values of features_temp in the indexes that are values different from 0. Each index can be changed many times, and the number of possible changes depends on the number of values different from 0 and a constant (it is saved at idx). In the end, each index depends on a matrix (probabilities), and in the matrix, each line i and column j defines the probability of each j be changed to i (so, I need to use the column values).
Input Example:
features_temp = np.array([3, 2, 0, 2, 1])
k_value = 1.5
list_amino_numbers = [1, 3, 2]
probabilities = np.array([[0.9, 0.2, 0.3], [0.07, 0.7, 0.5], [0.03, 0.1, 0.2]])
In this case, size = 6.
I'm currently writing a code, and I have to extract from a numpy array.
For example: [[1,1] , [0.6,0.6], [0,0]]), given the condition for the extracted points [x,y] must satisfy x>=0.5 and y >= 0.5
I've tried to use numpy extract, with the condition arr[0]>=0.5 & arr[1]>=0.5 however that does not seem to work
It applied the condition on all the elements, and I just want it to apply to the points inside my array.
Thanks in advance!
You can use multiple conditions to slice an array as follows:
import numpy as np
a = np.array([[1, 1] , [0.6, 0.6], [0, 0]])
new = a[(a[:, 0] >= 0.5) & (a[:, 1] >= 0.5)]
Results:
array([[1. , 1. ],
[0.6, 0.6]])
The first condition filters on column 0 and the second condition filters on column 1. Only rows where both conditions are met will be in the results.
I would do it following way: firstly look for rows full-filling condition:
import numpy as np
a = np.array([[1,1] , [0.6,0.6], [0,0]])
rows = np.apply_along_axis(lambda x:x[0]>=0.5 and x[1]>=0.5,1,a)
then use it for indexing:
out = a[rows]
print(out)
output:
[[1. 1. ]
[0.6 0.6]]
It can be solved using python generators.
import numpy as np
p = [[1,1] , [0.6,0.6], [0,0]]
result = np.array([x for x in p if x[0]>0.5 and x[1]>0.5 ])
You can read more about generators from here.
Also you can try this:-
p = np.array(p)
result= p[np.all(p>0.5, axis=1)]
For machine learning, I'm appliying Parzen Window algorithm.
I have an array (m,n). I would like to check on each row if any of the values is > 0.5 and if each of them is, then I would return 0, otherwise 1.
I would like to know if there is a way to do this without a loop thanks to numpy.
You can use np.all with axis=1 on a boolean array.
import numpy as np
arr = np.array([[0.8, 0.9], [0.1, 0.6], [0.2, 0.3]])
print(np.all(arr>0.5, axis=1))
>> [True False False]
import numpy as np
# Value Initialization
a = np.array([0.75, 0.25, 0.50])
y_predict = np.zeros((1, a.shape[0]))
#If the value is greater than 0.5, the value is 1; otherwise 0
y_predict = (a > 0.5).astype(float)
I have an array (m,n). I would like to check on each row if any of the values is > 0.5
That will be stored in b:
import numpy as np
a = # some np.array of shape (m,n)
b = np.any(a > 0.5, axis=1)
and if each of them is, then I would return 0, otherwise 1.
I'm assuming you mean 'and if this is the case for all rows'. In this case:
c = 1 - 1 * np.all(b)
c contains your return value, either 0 or 1.
I have two 2d numpy arrays which is used to plot simulation results.
The first column of both arrays a and b contains the time intervals and the second column contains the data to be plotted. The two arrays have different shapes a(500,2) b(600,2). I want to compare these two numpy arrays by first column and create a third array with matches found on the first column of a. If no match is found add 0 to third column.
Is there any numpy trick to do this?
For instance:
a=[[0.002,0.998],
[0.004,0.997],
[0.006,0.996],
[0.008,0.995],
[0.010,0.993]]
b= [[0.002,0.666],
[0.004,0.665],
[0.0041,0.664],
[0.0042,0.664],
[0.0043,0.664],
[0.0044,0.663],
[0.0045,0.663],
[0.0005,0.663],
[0.006,0.663],
[0.0061,0.662],
[0.008,0.661]]
expected output
c= [[0.002,0.998,0.666],
[0.004,0.997,0.665],
[0.006,0.996,0.663],
[0.008,0.995,0.661],
[0.010,0.993, 0 ]]
I can quickly think of the solution as
import numpy as np
a = np.array([[0.002, 0.998],
[0.004, 0.997],
[0.006, 0.996],
[0.008, 0.995],
[0.010, 0.993]])
b = np.array([[0.002, 0.666],
[0.004, 0.665],
[0.0041, 0.664],
[0.0042, 0.664],
[0.0043, 0.664],
[0.0044, 0.663],
[0.0045, 0.663],
[0.0005, 0.663],
[0.0006, 0.663],
[0.00061, 0.662],
[0.0008, 0.661]])
c = []
for row in a:
index = np.where(b[:,0] == row[0])[0]
if np.size(index) != 0:
c.append([row[0], row[1], b[index[0], 1]])
else:
c.append([row[0], row[1], 0])
print c
As pointed out in the comments above, there seems to be a data entry error
import numpy as np
i = np.intersect1d(a[:,0], b[:,0])
overlap = np.vstack([i, a[np.in1d(a[:,0], i), 1], b[np.in1d(b[:,0], i), 1]]).T
underlap = np.setdiff1d(a[:,0], b[:,0])
underlap = np.vstack([underlap, a[np.in1d(a[:,0], underlap), 1], underlap*0]).T
fast_c = np.vstack([overlap, underlap])
This works by taking the intersection of the first column of a and b using intersect1d, and then using in1d to cross-reference that intersection with the second columns.
vstack stacks the elements of the input vertically, and the transpose is needed to get the right dimensions (very fast operation).
Then find times in a that are not in b using setdiff1d, and complete the result by putting 0s in the third column.
This prints out
array([[ 0.002, 0.998, 0.666],
[ 0.004, 0.997, 0.665],
[ 0.006, 0.996, 0. ],
[ 0.008, 0.995, 0. ],
[ 0.01 , 0.993, 0. ]])
The following works both for numpy arrays and simple python lists.
c = [[*x, y[1]] for x in a for y in b if x[0] == y[0]]
d = [[*x, 0] for x in a if x[0] not in [y[0] for y in b]]
c.extend(d)
Someone braver than I am could try to make this one line.