How to calculate mean average precision (mAP) using TensorFlow?

How to calculate mean average precision (mAP) using TensorFlow? - python

I want to use TensorFlow to calculate hashcode‘s mAP （mean average precision）， but I don‘t know how to use tensor calculations directly.
The code which using NumPy is the following:
import numpy as np
import time
import os
# read train and test binarayCode
CURRENT_DIR = os.getcwd()
def getCode(train_codes,train_groudTruth,test_codes,test_groudTruth):
line_number = 0
with open(CURRENT_DIR+'/result.txt','r') as f:
for line in f:
temp = line.strip().split('\t')
if line_number < 10000:
test_codes.append([i if i==1 else -1 for i in map(int, list(temp[0]))])
list2 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
list2[int(temp[1])] = 1
test_groudTruth.append(list2) # get test ground truth(0-9)
else:
train_codes.append([i if i==1 else -1 for i in map(int, list(temp[0]))]) # change to -1, 1
list2 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
list2[int(temp[1])] = 1
train_groudTruth.append(list2) # get test ground truth(0-9)
line_number += 1
print 'read data finish'
def getHammingDist(code_a,code_b):
dist = 0
for i in range(len(code_a)):
if code_a[i]!=code_b[i]:
dist += 1
return dist
if __name__ =='__main__':
print getNowTime(),'start!'
train_codes = []
train_groudTruth =[]
test_codes = []
test_groudTruth = []
# get g.t. and binary code
getCode(train_codes,train_groudTruth,test_codes,test_groudTruth)
train_codes = np.array(train_codes)
train_groudTruth = np.array(train_groudTruth)
test_codes = np.array(test_codes)
test_groudTruth = np.array(test_groudTruth)
numOfTest = 10000
# generate hanmming martix, g.t. martix 10000*50000
gt_martix = np.dot(test_groudTruth, np.transpose(train_groudTruth))
print getNowTime(),'gt_martix finish!'
ham_martix = np.dot(test_codes, np.transpose(train_codes)) # hanmming distance map to dot value
print 'ham_martix finish!'
# sort hanmming martix,Returns the indices that would sort an array.
sorted_ham_martix_index = np.argsort(ham_martix,axis=1)
# calculate mAP
print 'sort ham_matrix finished,start calculate mAP'
apall = np.zeros((numOfTest,1),np.float64)
for i in range(numOfTest):
x = 0.0
p = 0
test_oneLine = sorted_ham_martix_index[i,:]
length = test_oneLine.shape[0]
num_return_NN = 5000 # top 1000
for j in range(num_return_NN):
if gt_martix[i][test_oneLine[length-j-1]] == 1: # reverse
x += 1
p += x/(j+1)
if p == 0:
apall[i]=0
else:
apall[i]=p/x
mAP = np.mean(apall)
print 'mAP:',mAP
I want to re-write the code above using tensor operations (like tf.equal()、tf.reduce_sum() so on).
for example
I want to calculate valid accuracy of images
logits = self._model(x_valid)
valid_preds = tf.argmax(logits, axis=1)
valid_preds = tf.to_int32(valid_preds)
self.valid_acc = tf.equal(valid_preds, y_valid)
self.valid_acc = tf.to_int32(self.valid_acc)
self.valid_acc = tf.to_float(tf.reduce_sum(self.valid_acc))/tf.to_float(self.batch_size)
I want to use TensorFlow to calculate hashcode‘s mAP （mean average precision） this way（like tf.XX opreation）
How could I do？ Thanks！

You can just calculate the y_score (or predictions) and then use sklearn.metrics to calculate the average precision:
from sklearn.metrics import average_precision_score
predictions = model.predict(x_test)
average_precision_score(y_test, predictions)

If you just want to calculate average precision based on the validation set predictions, you can use the vector of predicted probabilities and the vector of true labels in this scikit-learn function.
If you really want to use a tensorflow function, there's a tensorflow function average_precision_at_k.
For more info about average precision you can see this article.

Related

Random weight initialisation influence on a simple neural network

I am following a book which has the following code:
import numpy as np
np.random.seed(1)
streetlights = np.array([[1, 0, 1], [0, 1, 1], [0, 0, 1], [1, 1, 1]])
walk_vs_stop = np.array([[1, 1, 0, 0]]).T
def relu(x):
return (x > 0) * x
def relu2deriv(output):
return output > 0
alpha = 0.2
hidden_layer_size = 4
# random weights from the first layer to the second
weights_0_1 = 2*np.random.random((3, hidden_layer_size)) -1
# random weights from the second layer to the output
weights_1_2 = 2*np.random.random((hidden_layer_size, 1)) -1
for iteration in range(60):
layer_2_error = 0
for i in range(len(streetlights)):
layer_0 = streetlights[i : i + 1]
layer_1 = relu(np.dot(layer_0, weights_0_1))
layer_2 = relu(np.dot(layer_1, weights_1_2))
layer_2_error += np.sum((layer_2 - walk_vs_stop[i : i + 1])) ** 2
layer_2_delta = layer_2 - walk_vs_stop[i : i + 1]
layer_1_delta = layer_2_delta.dot(weights_1_2.T) * relu2deriv(layer_1)
weights_1_2 -= alpha * layer_1.T.dot(layer_2_delta)
weights_0_1 -= alpha * layer_0.T.dot(layer_1_delta)
if iteration % 10 == 9:
print(f"Error: {layer_2_error}")
Which outputs:
# Error: 0.6342311598444467
# Error: 0.35838407676317513
# Error: 0.0830183113303298
# Error: 0.006467054957103705
# Error: 0.0003292669000750734
# Error: 1.5055622665134859e-05
I understand everything but this part is not explained and I am not sure why it is the way it is:
weights_0_1 = 2*np.random.random((3, hidden_layer_size)) -1
weights_1_2 = 2*np.random.random((hidden_layer_size, 1)) -1
I don't understand:
Why there is 2* the whole matrix and why is there a -1
If I change 2 to 3 my error becomes greatly lower # Error: 5.616513576418916e-13
I tried changing the 2 to many other numbers along with the change of -1 to many other numbers I get # Error: 2.0 most of the time or the Error is much worst than combination of 3 and -1.
I can't seem to grasp the relationship and the purpose of multiplying the random weights by a number and subracting a number afterwards.
P.S. The idea of the network is to understand a streetlight pattern when people should go and when they should stop depending what combination of the lights in streetlight is on / off.

There is a lot of ways to initialize neural network, and it's a current research subject as it can have a great impact on performance and training time. Some rules of thumb :
avoid having only one value for all weights, as they would all update the same
avoid having too large weights that could make your gradient too high
avoid having too small weights that could make your gradient vanish
In your case, the goal is just to have something between [-1;1] :
np.random.random gives you a float in [0;1]
multiply by 2 gives you something in [0;2]
substract 1 gives you a number in [-1;1]

2*np.random.random((3, 4)) -1 is a way to generated 3*4=12 random number from uniform distribution of half-open interval [-1, +1) i.e including -1 but excluding +1.
This is equivalent to more readable code
np.random.uniform(-1, 1, (3, 4))

Keras CNN training: Cannot use nested list as Input

(Edited to include dataset and model code)
I'm training a Keras CNN 2d matrix. I'm creating my own training dataset, in which each matrix cell has the shape of [[list], int]. The cell's first list item is the product of a string class that I converts to list (using tf.keras.utils.to_categorical):
cell[0] = to_categorical(
rnd_type-1, num_classes=num_types)
the second is a simple int:
cell[1] = random.randint(0, max_val)
The dataset creation function:
def make_data(num_of_samples, num_types, max_height, grid_x, grid_y):
grids_list = []
target_list = []
target = 0
for _ in range(num_of_samples):
# create empty grid
grid = [[[[],0] for i in range(grid_y)] for j in range(grid_x)]
for i in range(grid_x):
for j in range(grid_y):
rnd_type = random.randint(
0, num_types)
# get random class
# and convert to cat list
cat = to_categorical(
rnd_type-1, num_classes=num_types)
# get random type
rnd_height = random.randint(0, max_height)
# inject the two values into the cell
grid[i][j] = [cat, rnd_height]
# get some target value
target += rnd_type * 5 + random.random()*5
target_list.append(target)
grids_list.append(grid)
# make np arrs out of the lists
t = np.array(target_list)
g = np.array(grids_list)
return t, g
my model is created using model = models.create_cnn(grid_size, grid_size, 2, regress=True) in which (I assumed) the Input depth is 2.
The model creation code:
num_types = 20
max_height = 50
num_of_samples = 10
grid_size = 10
epochs = 5000
# get n results of X x Y grid with target
targets_list, grids_list = datasets.make_data(
num_of_samples, num_types, max_height, grid_size, grid_size)
split = train_test_split(targets_list, grids_list,
test_size=0.25, random_state=42)
(train_attr_X, test_attr_X, train_grids_X, test_grids_X) = split
# find the largest value in the training set and use it to
# scale values to the range [0, 1]
max_target = train_attr_X.max()
train_attr_Y = train_attr_X / max_target
test_attr_Y = test_attr_X / max_target
model = models.create_cnn(grid_size, grid_size, 2, regress=True)
I however cannot train it given this error: ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list).

Answer my own question:
model can only accept int as depth. Therefore, the depth of my matrix must by a list of int len, not a 2D matrix. For that reason, the way to merge class data with continuous field rnd_height is:
class => cat = to_categorical
cell = np.append(cat, [rnd_height])
This way, cat list is added with the rnd_height value.
The whole dataset function now look like this:
def make_data(num_of_samples, num_types, max_height, grid_x, grid_y):
grids_list = []
target_list = []
target = 0
for _ in range(num_of_samples):
grid = [[[False, False] for i in range(grid_y)] for j in range(grid_x)]
for i in range(grid_x):
for j in range(grid_y):
rnd_type = random.randint(
0, num_types)
cat = to_categorical(
rnd_type-1, num_classes=num_types)
rnd_height = random.randint(0, max_height)
cell = np.append(cat, [rnd_height])
grid[i][j] = cell
# simulate simple objective function
if rnd_type < num_types/5:
target += rnd_height * 5
target_list.append(target)
grids_list.append(grid)
t = np.array(target_list)
g = np.array(grids_list)
# return grids and targets
return g, t

Simple neural network gives wrong output after training

I've been working on a simple neural network.
It takes in a data set with 3 columns, if the first column's value is a 1, then the output should be a 1.
I've provided comments so it is easier to follow.
Code is as follows:
import numpy as np
import random
def sigmoid_derivative(x):
return x * (1 - x)
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def think(weights, inputs):
sum = (weights[0] * inputs[0]) + (weights[1] * inputs[1]) + (weights[2] * inputs[2])
return sigmoid(sum)
if __name__ == "__main__":
# Assign random weights
weights = [-0.165, 0.440, -0.867]
# Training data for the network.
training_data = [
[0, 0, 1],
[1, 1, 1],
[1, 0, 1],
[0, 1, 1]
]
# The answers correspond to the training_data by place,
# so first element of training_answers is the answer to the first element of training_data
# NOTE: The pattern is if there's a 1 in the first place, the result should be a one
training_answers = [0, 1, 1, 0]
# Train the neural network
for iteration in range(50000):
# Pick a random piece of training_data
selected = random.randint(0, 3)
training_output = think(weights, training_data[selected])
# Calculate the error
error = training_output - training_answers[selected]
# Calculate the adjustments that need to be applied to the weights
adjustments = np.dot(training_data[selected], error * sigmoid_derivative(training_output))
# Apply adjustments, maybe something wrong is going here?
weights += adjustments
print("The Neural Network has been trained!")
# Result of print below should be close to 1
print(think(weights, [1, 0, 0]))
The result of the last print should be close to 1, however it is not?
I have a feeling that I'm not adjusting the weights correctly.

How to create a 2D array with N lots of random numbers?

I am trying to obtain a variance for a value I obtained by processing a 2x150 array into a discrete correlation function. In order to do this I need to randomly sample 80% of the original data N times, which will allow me to calculate a variance over these values.
have so far been able to create one randomly sampled set of data using this:
rand_indices = []
running_var = (len(find_length)*0.8)
x=0
while x<running_var:
rand_inx = randint(0, (len(find_length)-1))
rand_indices.append(rand_inx)
x=x+1
which creates an array 80% of the length of my original with randomly selected indices to be picked out and processed.
My problem is that I am not sure how to iterate this in order to get N sets of these random numbers, I think ideally in a Nx120 sized array. My whole code so far is:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from random import randint
useless, just_to, find_length = np.loadtxt("w2_mjy_final.dat").T
w2_dat = np.loadtxt("w2_mjy_final.dat")
w2_rel = np.delete(w2_dat, 2, axis = 1)
w2_array = np.asarray(w2_rel)
w1_dat = np.loadtxt("w1_mjy_final.dat")
w1_rel = np.delete(w1_dat, 2, axis=1)
w1_array = np.asarray(w1_rel)
peaks = []
y=1
N = 0
x = 0
z = 0
rand_indices = []
rand_indices2d = []
running_var = (len(find_length)*0.8)
while z<N:
while x<running_var:
rand_inx = randint(0, (len(find_length)-1))
rand_indices.append(rand_inx)
x=x+1
rand_indices2d.append(rand_indices)
z=z+1
while y<N:
w1_sampled = w1_array[rand_indices, :]
w2_sampled = w2_array[rand_indices, :]
w1s_t, w1s_dat = zip(*w1_sampled)
w2s_t, w2s_dat = zip(*w2_sampled)
w2s_mean = np.mean(w2s_dat)
w2s_stdev = np.std(w2s_dat)
w1s_mean = np.mean(w1s_dat)
w1s_stdev = np.std(w1s_dat)
taus = []
dcfs = []
bins = 40
for i in w2s_t:
for j in w1s_t:
tau_datpoint = i-j
taus.append(tau_datpoint)
for k in w2s_dat:
for l in w1s_dat:
dcf_datpoint = ((k - w2s_mean)*(l - w1s_mean))/((w2s_stdev*w1s_stdev))
dcfs.append(dcf_datpoint)
plotdat = np.vstack((taus, dcfs)).T
sort_plotdat = sorted(plotdat, key=lambda x:x[0])
np.savetxt("w1sw2sarray.txt", sort_plotdat)
taus_sort, dcfs_sort = np.loadtxt("w1w2array.txt").T
dcfs_means, taubins_edges, taubins_number = stats.binned_statistic(taus_sort, dcfs_sort, statistic='mean', bins=bins)
taubin_edge = np.delete(taubins_edges, 0)
import operator
indexs, values = max(enumerate(dcfs_means), key=operator.itemgetter(1))
percents = values*0.8
dcf_lists = dcfs_means.tolist()
centarr_negs, centarr_poss = np.split(dcfs_means, [indexs])
centind_negs = np.argmin(np.abs(centarr_negs - percents))
centind_poss = np.argmin(np.abs(centarr_poss - percents))
lagcent_negs = taubins_edges[centind_negs]
lagcent_poss = taubins_edges[int((bins/2)+centind_poss)]
sampled_peak = (np.abs(lagcent_poss - lagcent_negs)/2)+lagcent_negs
peaks.append(sampled_peak)
y=y+1
print peaks

Seeing as you're using numpy already, why not use np.random.randint
In your case:
np.random.randint(len(find_length)-1, size=(N, running_var))
Would give you an N*running_var sized matrix, with random integer entries from 0 to len(find_length)-2 inclusive.
Example Usage:
>>> N=4
>>> running_var=6
>>> find_length = [1,2,3]
>>> np.random.randint(len(find_length)-1, size=(N, running_var))
array([[1, 0, 1, 0, 0, 1],
[1, 0, 1, 1, 0, 0],
[1, 1, 0, 0, 1, 0],
[1, 1, 0, 1, 0, 1]])

Rand Index function (clustering performance evaluation)

As far as I know, there is no package available for Rand Index in python while for Adjusted Rand Index you have the option of using sklearn.metrics.adjusted_rand_score(labels_true, labels_pred).
I wrote the code for Rand Score and I am going to share it with others as the answer to the post.

from scipy.misc import comb
from itertools import combinations
import numpy as np
def check_clusterings(labels_true, labels_pred):
"""Check that the two clusterings matching 1D integer arrays."""
labels_true = np.asarray(labels_true)
labels_pred = np.asarray(labels_pred)
# input checks
if labels_true.ndim != 1:
raise ValueError(
"labels_true must be 1D: shape is %r" % (labels_true.shape,))
if labels_pred.ndim != 1:
raise ValueError(
"labels_pred must be 1D: shape is %r" % (labels_pred.shape,))
if labels_true.shape != labels_pred.shape:
raise ValueError(
"labels_true and labels_pred must have same size, got %d and %d"
% (labels_true.shape[0], labels_pred.shape[0]))
return labels_true, labels_pred
def rand_score (labels_true, labels_pred):
"""given the true and predicted labels, it will return the Rand Index."""
check_clusterings(labels_true, labels_pred)
my_pair = list(combinations(range(len(labels_true)), 2)) #create list of all combinations with the length of labels.
def is_equal(x):
return (x[0]==x[1])
my_a = 0
my_b = 0
for i in range(len(my_pair)):
if(is_equal((labels_true[my_pair[i][0]],labels_true[my_pair[i][1]])) == is_equal((labels_pred[my_pair[i][0]],labels_pred[my_pair[i][1]]))
and is_equal((labels_pred[my_pair[i][0]],labels_pred[my_pair[i][1]])) == True):
my_a += 1
if(is_equal((labels_true[my_pair[i][0]],labels_true[my_pair[i][1]])) == is_equal((labels_pred[my_pair[i][0]],labels_pred[my_pair[i][1]]))
and is_equal((labels_pred[my_pair[i][0]],labels_pred[my_pair[i][1]])) == False):
my_b += 1
my_denom = comb(len(labels_true),2)
ri = (my_a + my_b) / my_denom
return ri
As a simple example:
labels_true = [1, 1, 0, 0, 0, 0]
labels_pred = [0, 0, 0, 1, 0, 1]
rand_score (labels_true, labels_pred)
#0.46666666666666667
There are probably some ways to improve it and make it more pythonic. If you have any suggestion, you may improve it.
I found this implementation which seems faster.
import numpy as np
from scipy.misc import comb
def rand_index_score(clusters, classes):
tp_plus_fp = comb(np.bincount(clusters), 2).sum()
tp_plus_fn = comb(np.bincount(classes), 2).sum()
A = np.c_[(clusters, classes)]
tp = sum(comb(np.bincount(A[A[:, 0] == i, 1]), 2).sum()
for i in set(clusters))
fp = tp_plus_fp - tp
fn = tp_plus_fn - tp
tn = comb(len(A), 2) - tp - fp - fn
return (tp + tn) / (tp + fp + fn + tn)
As a simple example:
labels_true = [1, 1, 0, 0, 0, 0]
labels_pred = [0, 0, 0, 1, 0, 1]
rand_index_score (labels_true, labels_pred)
#0.46666666666666667

Starting from scikit-learn 0.24.0, the sklearn.metrics.rand_score function has been added, implementing the (unadjusted) Rand index. Please check the changelog.
All you have to do is:
from sklearn.metrics import rand_score
rand_score(labels_true, labels_pred)
labels_true and labels_pred can have values in different domains. For example:
>>> rand_score(['a', 'b', 'c'], [5, 6, 7])
1.0

Here is my code:
def rand_index_score(y_gold, y_predict):
index1_index2_pairs = list(it.combinations(range(len(y_gold)), 2)) #create list of all combinations with the length of labels.
numberOfPairs = len(index1_index2_pairs)
fractalUpperPart = 0
for index1_index2 in index1_index2_pairs:
theyRealyAreInSameGroup = y_gold[index1_index2[0]] == y_gold[index1_index2[1]]
itIsPredictedThatTheyAreInSameGroup = y_predict[index1_index2[0]] == y_predict[index1_index2[1]]
if theyRealyAreInSameGroup == itIsPredictedThatTheyAreInSameGroup:
fractalUpperPart += 1
return fractalUpperPart/numberOfPairs

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to calculate mean average precision (mAP) using TensorFlow? - python

You can just calculate the y_score (or predictions) and then use sklearn.metrics to calculate the average precision: from sklearn.metrics import average_precision_score predictions = model.predict(x_test) average_precision_score(y_test, predictions)

Related

Random weight initialisation influence on a simple neural network

Keras CNN training: Cannot use nested list as Input

Simple neural network gives wrong output after training

How to create a 2D array with N lots of random numbers?

Rand Index function (clustering performance evaluation)

Categories

Resources