How to create a 2D array with N lots of random numbers? - python

I am trying to obtain a variance for a value I obtained by processing a 2x150 array into a discrete correlation function. In order to do this I need to randomly sample 80% of the original data N times, which will allow me to calculate a variance over these values.
have so far been able to create one randomly sampled set of data using this:
rand_indices = []
running_var = (len(find_length)*0.8)
x=0
while x<running_var:
rand_inx = randint(0, (len(find_length)-1))
rand_indices.append(rand_inx)
x=x+1
which creates an array 80% of the length of my original with randomly selected indices to be picked out and processed.
My problem is that I am not sure how to iterate this in order to get N sets of these random numbers, I think ideally in a Nx120 sized array. My whole code so far is:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from random import randint
useless, just_to, find_length = np.loadtxt("w2_mjy_final.dat").T
w2_dat = np.loadtxt("w2_mjy_final.dat")
w2_rel = np.delete(w2_dat, 2, axis = 1)
w2_array = np.asarray(w2_rel)
w1_dat = np.loadtxt("w1_mjy_final.dat")
w1_rel = np.delete(w1_dat, 2, axis=1)
w1_array = np.asarray(w1_rel)
peaks = []
y=1
N = 0
x = 0
z = 0
rand_indices = []
rand_indices2d = []
running_var = (len(find_length)*0.8)
while z<N:
while x<running_var:
rand_inx = randint(0, (len(find_length)-1))
rand_indices.append(rand_inx)
x=x+1
rand_indices2d.append(rand_indices)
z=z+1
while y<N:
w1_sampled = w1_array[rand_indices, :]
w2_sampled = w2_array[rand_indices, :]
w1s_t, w1s_dat = zip(*w1_sampled)
w2s_t, w2s_dat = zip(*w2_sampled)
w2s_mean = np.mean(w2s_dat)
w2s_stdev = np.std(w2s_dat)
w1s_mean = np.mean(w1s_dat)
w1s_stdev = np.std(w1s_dat)
taus = []
dcfs = []
bins = 40
for i in w2s_t:
for j in w1s_t:
tau_datpoint = i-j
taus.append(tau_datpoint)
for k in w2s_dat:
for l in w1s_dat:
dcf_datpoint = ((k - w2s_mean)*(l - w1s_mean))/((w2s_stdev*w1s_stdev))
dcfs.append(dcf_datpoint)
plotdat = np.vstack((taus, dcfs)).T
sort_plotdat = sorted(plotdat, key=lambda x:x[0])
np.savetxt("w1sw2sarray.txt", sort_plotdat)
taus_sort, dcfs_sort = np.loadtxt("w1w2array.txt").T
dcfs_means, taubins_edges, taubins_number = stats.binned_statistic(taus_sort, dcfs_sort, statistic='mean', bins=bins)
taubin_edge = np.delete(taubins_edges, 0)
import operator
indexs, values = max(enumerate(dcfs_means), key=operator.itemgetter(1))
percents = values*0.8
dcf_lists = dcfs_means.tolist()
centarr_negs, centarr_poss = np.split(dcfs_means, [indexs])
centind_negs = np.argmin(np.abs(centarr_negs - percents))
centind_poss = np.argmin(np.abs(centarr_poss - percents))
lagcent_negs = taubins_edges[centind_negs]
lagcent_poss = taubins_edges[int((bins/2)+centind_poss)]
sampled_peak = (np.abs(lagcent_poss - lagcent_negs)/2)+lagcent_negs
peaks.append(sampled_peak)
y=y+1
print peaks

Seeing as you're using numpy already, why not use np.random.randint
In your case:
np.random.randint(len(find_length)-1, size=(N, running_var))
Would give you an N*running_var sized matrix, with random integer entries from 0 to len(find_length)-2 inclusive.
Example Usage:
>>> N=4
>>> running_var=6
>>> find_length = [1,2,3]
>>> np.random.randint(len(find_length)-1, size=(N, running_var))
array([[1, 0, 1, 0, 0, 1],
[1, 0, 1, 1, 0, 0],
[1, 1, 0, 0, 1, 0],
[1, 1, 0, 1, 0, 1]])

Related

Python numpy matrix of matrix

I have this code, and it works. It just seems like there may be a better way to do this. Does anyone know a cleaner solution?
def Matrix2toMatrix(Matrix2):
scaleSize = len(Matrix2[0, 0])
FinalMatrix = np.empty([len(Matrix2)*scaleSize, len(Matrix2[0])*scaleSize])
for x in range(0, len(Matrix2)):
for y in range(0, len(Matrix2[0])):
for xFinal in range(0, scaleSize):
for yFinal in range(0, scaleSize):
FinalMatrix[(x*scaleSize)+xFinal, (y*scaleSize)+yFinal] = Matrix2[x, y][xFinal, yFinal]
return FinalMatrix
This is where Matrix2 is a 4x4 matrix, with each cell containing a 2x2 matrix
Full code in case anyone was wondering:
import matplotlib.pyplot as plt
import numpy as np
def Matrix2toMatrix(Matrix2):
scaleSize = len(Matrix2[0, 0])
FinalMatrix = np.empty([len(Matrix2)*scaleSize, len(Matrix2[0])*scaleSize])
for x in range(0, len(Matrix2)):
for y in range(0, len(Matrix2[0])):
for xFinal in range(0, scaleSize):
for yFinal in range(0, scaleSize):
FinalMatrix[(x*scaleSize)+xFinal, (y*scaleSize)+yFinal] = Matrix2[x, y][xFinal, yFinal]
return FinalMatrix
XSize = 4
Xtest = np.array([[255, 255, 255, 255]
,[255, 255, 255, 255]
,[127, 127, 127, 127]
,[0, 0, 0, 0]
])
scaleFactor = 2
XMarixOfMatrix = np.empty([XSize, XSize], dtype=object)
Xexpanded = np.empty([XSize*scaleFactor, XSize*scaleFactor], dtype=int) # careful, will contain garbage data
for xOrg in range(0, XSize):
for yOrg in range(0, XSize):
newMatrix = np.empty([scaleFactor, scaleFactor], dtype=int) # careful, will contain garbage data
# grab org point equivalent
pointValue = Xtest[xOrg, yOrg]
newMatrix.fill(pointValue)
# now write the data
XMarixOfMatrix[xOrg, yOrg] = newMatrix
# need to concat all matrix together to form a larger singular matrix
Xexpanded = Matrix2toMatrix(XMarixOfMatrix)
img = plt.imshow(Xexpanded)
img.set_cmap('gray')
plt.axis('off')
plt.show()
Permute axes and reshape -
m,n = Matrix2.shape[0], Matrix2.shape[2]
out = Matrix2.swapaxes(1,2).reshape(m*n,-1)
For permuting axes, we could also use np.transpose or np.rollaxis, as functionally all are the same.
Verify with sample run -
In [17]: Matrix2 = np.random.rand(3,3,3,3)
# With given solution
In [18]: out1 = Matrix2toMatrix(Matrix2)
In [19]: m,n = Matrix2.shape[0], Matrix2.shape[2]
...: out2 = Matrix2.swapaxes(1,2).reshape(m*n,-1)
In [20]: np.allclose(out1, out2)
Out[20]: True

How to calculate mean average precision (mAP) using TensorFlow?

I want to use TensorFlow to calculate hashcode‘s mAP (mean average precision), but I don‘t know how to use tensor calculations directly.
The code which using NumPy is the following:
import numpy as np
import time
import os
# read train and test binarayCode
CURRENT_DIR = os.getcwd()
def getCode(train_codes,train_groudTruth,test_codes,test_groudTruth):
line_number = 0
with open(CURRENT_DIR+'/result.txt','r') as f:
for line in f:
temp = line.strip().split('\t')
if line_number < 10000:
test_codes.append([i if i==1 else -1 for i in map(int, list(temp[0]))])
list2 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
list2[int(temp[1])] = 1
test_groudTruth.append(list2) # get test ground truth(0-9)
else:
train_codes.append([i if i==1 else -1 for i in map(int, list(temp[0]))]) # change to -1, 1
list2 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
list2[int(temp[1])] = 1
train_groudTruth.append(list2) # get test ground truth(0-9)
line_number += 1
print 'read data finish'
def getHammingDist(code_a,code_b):
dist = 0
for i in range(len(code_a)):
if code_a[i]!=code_b[i]:
dist += 1
return dist
if __name__ =='__main__':
print getNowTime(),'start!'
train_codes = []
train_groudTruth =[]
test_codes = []
test_groudTruth = []
# get g.t. and binary code
getCode(train_codes,train_groudTruth,test_codes,test_groudTruth)
train_codes = np.array(train_codes)
train_groudTruth = np.array(train_groudTruth)
test_codes = np.array(test_codes)
test_groudTruth = np.array(test_groudTruth)
numOfTest = 10000
# generate hanmming martix, g.t. martix 10000*50000
gt_martix = np.dot(test_groudTruth, np.transpose(train_groudTruth))
print getNowTime(),'gt_martix finish!'
ham_martix = np.dot(test_codes, np.transpose(train_codes)) # hanmming distance map to dot value
print 'ham_martix finish!'
# sort hanmming martix,Returns the indices that would sort an array.
sorted_ham_martix_index = np.argsort(ham_martix,axis=1)
# calculate mAP
print 'sort ham_matrix finished,start calculate mAP'
apall = np.zeros((numOfTest,1),np.float64)
for i in range(numOfTest):
x = 0.0
p = 0
test_oneLine = sorted_ham_martix_index[i,:]
length = test_oneLine.shape[0]
num_return_NN = 5000 # top 1000
for j in range(num_return_NN):
if gt_martix[i][test_oneLine[length-j-1]] == 1: # reverse
x += 1
p += x/(j+1)
if p == 0:
apall[i]=0
else:
apall[i]=p/x
mAP = np.mean(apall)
print 'mAP:',mAP
I want to re-write the code above using tensor operations (like tf.equal()、tf.reduce_sum() so on).
for example
I want to calculate valid accuracy of images
logits = self._model(x_valid)
valid_preds = tf.argmax(logits, axis=1)
valid_preds = tf.to_int32(valid_preds)
self.valid_acc = tf.equal(valid_preds, y_valid)
self.valid_acc = tf.to_int32(self.valid_acc)
self.valid_acc = tf.to_float(tf.reduce_sum(self.valid_acc))/tf.to_float(self.batch_size)
I want to use TensorFlow to calculate hashcode‘s mAP (mean average precision) this way(like tf.XX opreation)
How could I do? Thanks!
You can just calculate the y_score (or predictions) and then use sklearn.metrics to calculate the average precision:
from sklearn.metrics import average_precision_score
predictions = model.predict(x_test)
average_precision_score(y_test, predictions)
If you just want to calculate average precision based on the validation set predictions, you can use the vector of predicted probabilities and the vector of true labels in this scikit-learn function.
If you really want to use a tensorflow function, there's a tensorflow function average_precision_at_k.
For more info about average precision you can see this article.

Rand Index function (clustering performance evaluation)

As far as I know, there is no package available for Rand Index in python while for Adjusted Rand Index you have the option of using sklearn.metrics.adjusted_rand_score(labels_true, labels_pred).
I wrote the code for Rand Score and I am going to share it with others as the answer to the post.
from scipy.misc import comb
from itertools import combinations
import numpy as np
def check_clusterings(labels_true, labels_pred):
"""Check that the two clusterings matching 1D integer arrays."""
labels_true = np.asarray(labels_true)
labels_pred = np.asarray(labels_pred)
# input checks
if labels_true.ndim != 1:
raise ValueError(
"labels_true must be 1D: shape is %r" % (labels_true.shape,))
if labels_pred.ndim != 1:
raise ValueError(
"labels_pred must be 1D: shape is %r" % (labels_pred.shape,))
if labels_true.shape != labels_pred.shape:
raise ValueError(
"labels_true and labels_pred must have same size, got %d and %d"
% (labels_true.shape[0], labels_pred.shape[0]))
return labels_true, labels_pred
def rand_score (labels_true, labels_pred):
"""given the true and predicted labels, it will return the Rand Index."""
check_clusterings(labels_true, labels_pred)
my_pair = list(combinations(range(len(labels_true)), 2)) #create list of all combinations with the length of labels.
def is_equal(x):
return (x[0]==x[1])
my_a = 0
my_b = 0
for i in range(len(my_pair)):
if(is_equal((labels_true[my_pair[i][0]],labels_true[my_pair[i][1]])) == is_equal((labels_pred[my_pair[i][0]],labels_pred[my_pair[i][1]]))
and is_equal((labels_pred[my_pair[i][0]],labels_pred[my_pair[i][1]])) == True):
my_a += 1
if(is_equal((labels_true[my_pair[i][0]],labels_true[my_pair[i][1]])) == is_equal((labels_pred[my_pair[i][0]],labels_pred[my_pair[i][1]]))
and is_equal((labels_pred[my_pair[i][0]],labels_pred[my_pair[i][1]])) == False):
my_b += 1
my_denom = comb(len(labels_true),2)
ri = (my_a + my_b) / my_denom
return ri
As a simple example:
labels_true = [1, 1, 0, 0, 0, 0]
labels_pred = [0, 0, 0, 1, 0, 1]
rand_score (labels_true, labels_pred)
#0.46666666666666667
There are probably some ways to improve it and make it more pythonic. If you have any suggestion, you may improve it.
I found this implementation which seems faster.
import numpy as np
from scipy.misc import comb
def rand_index_score(clusters, classes):
tp_plus_fp = comb(np.bincount(clusters), 2).sum()
tp_plus_fn = comb(np.bincount(classes), 2).sum()
A = np.c_[(clusters, classes)]
tp = sum(comb(np.bincount(A[A[:, 0] == i, 1]), 2).sum()
for i in set(clusters))
fp = tp_plus_fp - tp
fn = tp_plus_fn - tp
tn = comb(len(A), 2) - tp - fp - fn
return (tp + tn) / (tp + fp + fn + tn)
As a simple example:
labels_true = [1, 1, 0, 0, 0, 0]
labels_pred = [0, 0, 0, 1, 0, 1]
rand_index_score (labels_true, labels_pred)
#0.46666666666666667
Starting from scikit-learn 0.24.0, the sklearn.metrics.rand_score function has been added, implementing the (unadjusted) Rand index. Please check the changelog.
All you have to do is:
from sklearn.metrics import rand_score
rand_score(labels_true, labels_pred)
labels_true and labels_pred can have values in different domains. For example:
>>> rand_score(['a', 'b', 'c'], [5, 6, 7])
1.0
Here is my code:
def rand_index_score(y_gold, y_predict):
index1_index2_pairs = list(it.combinations(range(len(y_gold)), 2)) #create list of all combinations with the length of labels.
numberOfPairs = len(index1_index2_pairs)
fractalUpperPart = 0
for index1_index2 in index1_index2_pairs:
theyRealyAreInSameGroup = y_gold[index1_index2[0]] == y_gold[index1_index2[1]]
itIsPredictedThatTheyAreInSameGroup = y_predict[index1_index2[0]] == y_predict[index1_index2[1]]
if theyRealyAreInSameGroup == itIsPredictedThatTheyAreInSameGroup:
fractalUpperPart += 1
return fractalUpperPart/numberOfPairs

how can I speed up my python code?

It's a program of import multiple images and extract feature using dct and histogram.
1) Import multiple images from folder
2) Make image size 256*256
3) Use image of 64*64 block unit with stride = 32
4) Do dct(8*8 size)
5) make histogram of dct
6) Extract features from dct coefficient histogram
The problem is that it's too slow.
I think it's because there's so many "for loop".
This is my full-code in python.
How can I change my code to speed up?
I am not familiar with python.
Please help me
import numpy as np
from scipy.fftpack import dct
from PIL import Image
import glob
import matplotlib.pyplot as plt
def find_index(x,key):
for i in range(0,len(x)):
if x[i] == key :
return i
else:
i = i+1
def image_open(path):
image_list = []
#for filename in glob.glob('path/*.jpg'):
for filename in glob.glob(path+'/*.jpg'):
im=Image.open(filename)
image_list.append(im)
return image_list
def dct_2(img):
#Get 2D Cosine Transform of Image
return dct(dct(np.asarray(img).T, norm='ortho').T, norm='ortho')
def return_array(array):
zero = [0.0, 0.0, 0.0, 0.0, 0.0]
range = int((max(array)) - min(array))
x, bins, patch = plt.hist(array, bins=range)
x = list(zero) + list(x) + list(zero)
return x
path = 'C:\\Users\\LG\\PycharmProjects\\photo' #folder that contains many images
images = image_open(path)
row = 0
array_matrix = []
label_matrix = []
for i in range(0, len(images)): #access image
box3 = (0,0,256,256)
a = images[i].crop(box3)
(y,cb,cr) = a.split() #ycbcr
width , height = y.size
y.show()
for q in range(0, height-32 , 32): #use image 64*64 block unit
for w in range(0 , width-32 ,32):
box1 =(q,w,q+64,w+64)
block = y.crop(box1)
array1 , array2 , array3 , array4 , array5 , array6 , array7 , array8 ,array9 = [],[],[],[],[],[],[],[],[]
for j in range(0,64,8): #dct
for n in range(0,64,8):
box2 = (j,n,j+8,n+8)
temp = block.crop(box2)
dct_temp = dct_2(temp)
array1.append(dct_temp[0,1])
array2.append(dct_temp[1,0])
array3.append(dct_temp[0,2])
array4.append(dct_temp[1,1])
array5.append(dct_temp[2,0])
array6.append(dct_temp[0,3])
array7.append(dct_temp[1,2])
array8.append(dct_temp[2,1])
array9.append(dct_temp[3,0])
x1 = return_array(array1) #extract feature from dct histogram
index = find_index(x1, max(x1))
u = [index - 5, index + 5, 1]
array_matrix.append(x1[u[0]:u[1] + 1:u[2]])
x2 = return_array(array2)
index = find_index(x2, max(x2))
u = [index - 5, index + 5, 1]
array_matrix[row].extend(x2[u[0]:u[1] + 1:u[2]])
x3 = return_array(array3)
index = find_index(x3, max(x3))
u = [index - 5, index + 5, 1]
array_matrix[row].extend(x3[u[0]:u[1] + 1:u[2]])
x4 = return_array(array4)
index = find_index(x4, max(x4))
u = [index - 5, index + 5, 1]
array_matrix[row].extend(x4[u[0]:u[1] + 1:u[2]])
x5 = return_array(array5)
index = find_index(x5, max(x5))
u = [index - 5, index + 5, 1]
array_matrix[row].extend(x5[u[0]:u[1] + 1:u[2]])
x6 = return_array(array6)
index = find_index(x6, max(x6))
u = [index - 5, index + 5, 1]
array_matrix[row].extend(x6[u[0]:u[1] + 1:u[2]])
x7 = return_array(array7)
index = find_index(x7, max(x7))
u = [index - 5, index + 5, 1]
array_matrix[row].extend(x7[u[0]:u[1] + 1:u[2]])
x8 = return_array(array8)
index = find_index(x8, max(x8))
u = [index - 5, index + 5, 1]
array_matrix[row].extend(x8[u[0]:u[1] + 1:u[2]])
x9 = return_array(array9)
index = find_index(x9, max(x9))
u = [index - 5, index + 5, 1]
array_matrix[row].extend(x9[u[0]:u[1] + 1:u[2]])
print(w/32)
row = row+1
print(array_matrix)
Rather than assuming that a specific section is taking longer than others, I'd recommend profiling your script. A profiler will collect metrics on how long certain parts of your program takes, and also allow you to better see how much any changes affect the code (makes it better, worse, etc).
Once you know where your problem lies, then you can take a more targeted approach at making it faster.
Have a look at the profiling module: https://docs.python.org/2/library/profile.html
Also have a look at some tutorials:
https://julien.danjou.info/blog/2015/guide-to-python-profiling-cprofile-concrete-case-carbonara
https://zapier.com/engineering/profiling-python-boss/
https://marcobonzanini.com/2015/01/05/my-python-code-is-slow-tips-for-profiling/

Python - Remove a row from numpy array?

Hi all what I wan't should be really simple for somebody here..I want to remove a row from a numpy array in a loop like:
for i in range(len(self.Finalweight)):
if self.Finalweight[i] >= self.cutoffOutliers:
"remove line[i from self.wData"
I'm trying to remove outliers from a dataset. My full code os the method is like:
def calculate_Outliers(self):
def calcWeight(Value):
pFinal = abs(Value - self.pMed)/ self.pDev_abs_Med
gradFinal = abs(gradient(Value) - self.gradMed) / self.gradDev_abs_Med
return pFinal * gradFinal
self.pMed = median(self.wData[:,self.yColum-1])
self.pDev_abs_Med = median(abs(self.wData[:,self.yColum-1] - self.pMed))
self.gradMed = median(gradient(self.wData[:,self.yColum-1]))
self.gradDev_abs_Med = median(abs(gradient(self.wData[:,self.yColum-1]) - self.gradMed))
self.workingData= self.wData[calcWeight(self.wData)<self.cutoffOutliers]
self.xData = self.workingData[:,self.xColum-1]
self.yData = self.workingData[:,self.yColum-1]
I'm getting the following error:
ile "bin/dmtools", line 201, in plot_gride
self.calculate_Outliers()
File "bin/dmtools", line 188, in calculate_Outliers
self.workingData= self.wData[calcWeight(self.wData)>self.cutoffOutliers]
ValueError: too many indices for array
There is actually a tool in NumPy specifically made to mask out outliers and invalid data points: masked arrays. Example from the linked page:
x = numpy.array([1, 2, 3, -1, 5])
mx = numpy.ma.masked_array(x, mask=[0, 0, 0, 1, 0])
print mx.mean()
prints
2.75

Categories