Finding the top k matches in Pytorch

Finding the top k matches in Pytorch - python

I'm using the following code to find the topk matches using pytorch:
def find_top(self, x, y, n_neighbors, unit_vectors=False, cuda=False):
if not unit_vectors:
x = __to_unit_torch__(x, cuda=cuda)
y = __to_unit_torch__(y, cuda=cuda)
with torch.no_grad():
d = 1. - torch.matmul(x, y.transpose(0, 1))
values, indices = torch.topk(d, n_neighbors, dim=1, largest=False, sorted=True)
return indices.cpu().numpy()
Unfortunately, it is throwing the following error:
values, indices = torch.topk(d, n_neighbors, dim=1, largest=False, sorted=True)
RuntimeError: invalid argument 5: k not in range for dimension at /pytorch/aten/src/THC/generic/THCTensorTopK.cu:23
The size of d is (1793,1) . What am I missing?

This error occurs when you call torch.topk with a k larger than the total number of classes. Reduce your argument and it should run fine.

Related

How to construct a Tensor with values at given indices inside graph mode?

I have a 1D tensor of length 128 with logits in it.
For a custom loss, I'm trying to replace the 3 highest values with 1.0 and replace the rest with 0.0. This is inside a #tf.function, so I can't convert it to numpy and do the manipulation there.
I've come up with:
top_3 = tf.math.top_k(code, k=3)
indices = top_3.indices
updates = tf.ones_like(indices)
new_code = tf.scatter_nd(indices, updates, tf.constant([128]))
But it gives me the error:
ValueError: Dimensions [3,1) of input[shape=[?]] = [] must match dimensions [0,1) of updates[shape=[3]] = [3]: Shapes must be equal rank, but are 0 and 1 for '{{node ScatterNd}} = ScatterNd[T=DT_INT32, Tindices=DT_INT32](TopKV2:1, ones_like_1, Const_3)' with input shapes: [3], [3], [1].
which I don't understand, because indices should have length 3, and so does updates. Whats the problem?

Try:
import tensorflow as tf
code = tf.random.normal((128,))
top_3 = tf.math.top_k(code, k=3)
indices = top_3.indices
updates = tf.ones_like(indices, dtype=tf.float32)
new_code = tf.zeros_like(code)
new_code = tf.tensor_scatter_nd_update(new_code, indices[..., None], updates)
print(new_code)

IndexError: tuple index out of range. issue occurred when restarted notebook

I keep getting the error IndexError: tuple index out of range and i am not sure what is happening. My code was working just fine however when i restarted the jupyter notebook i started receiving this error.
this is my code:
X = df.Tweet
y = df.target
from sklearn import linear_model
import pyswarms as ps
# Create an instance of the classifier
classifier = linear_model.LogisticRegression()
# Define objective function
def f_per_particle(m, alpha):
total_features = X.shape[1]
# Get the subset of the features from the binary mask
if np.count_nonzero(m) == 0:
X_subset = X
else:
X_subset = X[:,m==1]
# Perform classification and store performance in P
classifier.fit(X_subset, y)
P = (classifier.predict(X_subset) == y).mean()
# Compute for the objective function
j = (alpha * (1.0 - P)
+ (1.0 - alpha) * (1 - (X_subset.shape[1] / total_features)))
return j
[some more code]
options = {'c1': 0.5, 'c2': 0.5, 'w':0.9, 'k': 30, 'p':2}
# Call instance of PSO
dimensions = X.shape[1] # dimensions should be the number of features
optimizer = ps.discrete.BinaryPSO(n_particles=30, dimensions=dimensions, options=options)
# Perform optimization
cost, pos = optimizer.optimize(f, iters=1000)
i received the following traceback:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-76-bea8cf064cd2> in <module>
2
3 # Call instance of PSO
----> 4 dimensions = X.shape[1] # dimensions should be the number of features
5 optimizer = ps.discrete.BinaryPSO(n_particles=30, dimensions=dimensions, options=options)
6
IndexError: tuple index out of range

It is not absolutely clear, but it seems to me that your df variable might be a Pandas dataframe, and your df.Tweet may be a Pandas series.
In that case, being a series, your X will have only one dimension (so, only the first element of the tuple X.shape, X.shape[0]), instead of two dimensions - reason for the index out of range exception in your code. The two dimensions case occurs only when the variable is a dataframe.
More information: https://www.google.com/amp/s/www.geeksforgeeks.org/python-pandas-series-shape/amp/

How to fix 'need at least one array to concatenate' error?

I have read through the various posts on ValueError but I'm not getting much satisfactory solution. Please, can anyone help me what I am doing wrong??
Code:
assert(type(images) == list)
# assert(type(images[0]) == np.ndarray)
# assert(len(images[0].shape) == 3)
# assert(np.max(images[0]) > 10)
# assert(np.min(images[0]) >= 0.0)
inps = []
for img in images:
img = img.astype(np.float32)
inps.append(np.expand_dims(img, 0))
bs = 100
with tf.Session() as sess:
preds = []
n_batches = int(math.ceil(float(len(inps)) / float(bs)))
for i in range(n_batches):
sys.stdout.write(".")
sys.stdout.flush()
inp = inps[(i * bs):min((i + 1) * bs, len(inps))]
inp = np.concatenate(inp, 0)
pred = sess.run(softmax, {'ExpandDims:0': inp})
preds.append(pred)
preds = np.concatenate(preds, 0)
scores = []
for i in range(splits):
part = preds[(i * preds.shape[0] // splits):((i + 1) * preds.shape[0] // splits), :]
kl = part * (np.log(part) - np.log(np.expand_dims(np.mean(part, 0), 0)))
kl = np.mean(np.sum(kl, 1))
scores.append(np.exp(kl))
return np.mean(scores), np.std(scores)
Error :
>File "/content/Inception-Score/inception_score.py", line 45, in >get_inception_score
> preds = np.concatenate(preds, 0)
>ValueError: need at least one array to concatenate

It appears that you are missing the argument for the array you would like to concatenate. You specified the initial array and the axis to concatenate on, but not the second array -- hence "need at least one array to concatenate".
np.concatenate() has a minimum of two arrays in the first argument, as detailed in the documentation here. Looks like "preds" is only one array. I am not sure what you are trying to do, but maybe concatenate is not what you want?

The problem seems to be in np.concatenate where it expects an array of arrays and you are not providing that
#syntax
numpy.concatenate((a1, a2, ...), axis=0, out=None)
Parameters:
a1, a2, … : sequence of array_like The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).
axis : int, optional The axis along which the arrays will be joined. If axis is None, arrays are flattened before use. Default is 0.
out : ndarray, optional If provided, the destination to place the result. The shape must be correct, matching that of what concatenate would have returned if no out argument were specified.
Returns: ndarray The concatenated array.
check preds what it returns

How to generate a dynamic number of samples from tensorflow dataset

My goal is to allow my Tensorflow Dataset pipeline to allow near arbitrary sized inputs, which will be converted to uniform (known at 'compile' time) sized samples, which number more than the original. Thus I have a py_func (similar to 1 in idea of mapping one to many) which aims to return a dataset for use in flat_map
def split_fn(x, y):
""" Splits X into a number of subsamples, each labeled y"""
full_width = x.shape[1]
full_height = x.shape[0]
print(full_width)
print(full_height)
slice_width = SLICE_WIDTH
slice_height = SLICE_HEIGHT
# The splits created by these offset cover the complete input image
offsets1 = [[x,0] for x in range(0, full_width-slice_width, slice_width)]
if full_width % slice_width != 0:
offsets1.append([full_width-slice_width, 0])
# The splits from these offsets are random, intended for data augmentation
offsets2 = [[x,0] for x in random.sample(range(0,full_width-slice_width), 5)]
#Combine the two lists of offsets
offsets = offsets1 + offsets2
image = x.reshape(1, full_height, full_width, 1)
#This creates a list of the slices corresponding to the offsets
ts = list(map(lambda offset: tf.image.crop_to_bounding_box(image,
offset[1],
offset[0],
slice_height,
slice_width),
offsets))
#Create and concatenate a dataset for each of the samples
datasets = map(lambda d: tf.data.Dataset.from_tensors((d, y)), ts)
ds = reduce((lambda x, y: x.concatenate(y)), datasets)
return ds
However, where I define offsets1,
TypeError: __index__ returned non-int (type NoneType)
. I've tried to fix this by wrapping it in a py_func which returns a dataset
dataset = dataset.flat_map(
lambda image, label: tuple(tf.py_func(
split_fn, [image, label], [tf.data.Dataset])))
however I can't seem to get this to correctly work:
TypeError: Expected DataType for argument 'Tout' not < class
'tensorflow.python.data.ops.dataset_ops.Dataset' > .
What can I do to get this to work?
Thank you

Pyplot truth value of an array with more than one element is ambiguous

I am trying to implement a knn 1D estimate:
# nearest neighbors estimate
def nearest_n(x, k, data):
# Order dataset
#data = np.sort(data, kind='mergesort')
nnb = []
# iterate over all data and get k nearest neighbours around x
for n in data:
if nnb.__len__()<k:
nnb.append(n)
else:
for nb in np.arange(0,k):
if np.abs(x-n) < np.abs(x-nnb[nb]):
nnb[nb] = n
break
nnb = np.array(nnb)
# get volume(distance) v of k nearest neighbours around x
v = nnb.max() - nnb.min()
v = k/(data.__len__()*v)
return v
interval = np.arange(-4.0, 8.0, 0.1)
plt.figure()
for k in (2,8,35):
plt.plot(interval, nearest_n(interval, k,train_data), label=str(o))
plt.legend()
plt.show()
Which throws:
File "x", line 55, in nearest_n
if np.abs(x-n) < np.abs(x-nnb[nb]):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I know the error comes from the array input in plot(), but I am not sure how to avoid this in a function with operators >/==/<
'data' comes from a 1D txt file containing floats.
I tried using vectorize:
nearest_n = np.vectorize(nearest_n)
which results in:
line 50, in nearest_n
for n in data:
TypeError: 'numpy.float64' object is not iterable
Here is an example, let's say:
data = [0.5,1.7,2.3,1.2,0.2,2.2]
k = 2
nearest_n(1.5) should then lead to
nbb=[1.2,1.7]
v = 0.5
and return 2/(6*0.5) = 2/3
The function runs for example neares_n(2.0,4,data) and gives 0.0741586011463

You're passing in np.arange(-4, 8, .01) as your x, which is an array of values. So x - n is an array of the same length as x, in this case 120 elements, since subtraction of an array and a scalar does element-wise subtraction. Same with nnb[nb]. So the result of your comparison there is a 120-length array with boolean values depending on whether each element of np.abs(x-n) is less than the corresponding element of np.abs(x-nnb[nb]). This can't be directly used as a conditional, you would need to coalesce these values to a single boolean (using all(), any(), or simply rethinking your code).

plt.figure()
X = np.arange(-4.0,8.0,0.1)
for k in [2,8,35]:
Y = []
for n in X:
Y.append(nearest_n(n,k,train_data))
plt.plot(X,Y,label=str(k))
plt.show()
is working fine. I thought pyplot.plot would do this exact thing for me already, but I guess it does not...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding the top k matches in Pytorch - python

This error occurs when you call torch.topk with a k larger than the total number of classes. Reduce your argument and it should run fine.

Related

How to construct a Tensor with values at given indices inside graph mode?

IndexError: tuple index out of range. issue occurred when restarted notebook

How to fix 'need at least one array to concatenate' error?

How to generate a dynamic number of samples from tensorflow dataset

Pyplot truth value of an array with more than one element is ambiguous

Categories

Resources