I'm trying to add the values generated by the function calc_class, but it is not working and I don't know the reason. I've tried to use the numpy.append, numpy.insert and the built-in Python function append unsuccessfully.
This is my piece of code:
def calc_class(test):
expec = []
for new in test:
prob_vector = np.zeros((len(voc)), dtype=bool) #define a 'True' array to store class probabilities
words_in_new = new[0].split() #split the new email into words
words_in_new = list(set(words_in_new)) #remove duplicated words
i = 0
for voc_word in voc: #for each element in voc
if voc_word in words_in_new:
prob_vector[i] = True #set the ith element of prob_vector to True, if voc element is in word
else:
prob_vector[i] = False #set the ith element of prob_vector to False, otherwise
i += 1
prob_ham = 1
for i in range(len(prob_vector)):
if prob_vector[i] == True:
prob_ham *= ham_class_prob[i]
else:
prob_ham *= (1 - ham_class_prob[i])
# alternative: np.prod(ham_class_prob[np.where(prob_vector==True)]) * np.prod(1- ham_class_prob[np.where(prob_vector==False)])
prob_spam = 1
for i in range(len(prob_vector)):
if prob_vector[i] == True:
prob_spam *= spam_class_prob[i]
else:
prob_spam *= (1 - spam_class_prob[i])
p_spam = 0.3
p_ham = 1 - p_spam
p_spam_given_new = (prob_spam * p_spam) / (prob_spam * p_spam + prob_ham * p_ham) # Bayes theorem
print('p(spam|new_email)=', p_spam_given_new[0])
expec.append(p_spam_given_new[0])
print(expec)
The problem is that print(expect) is printing an empty array.
You can use pdb do debug (or ipdb for ipython).
from pdb import set_trace
use "set_trace()" instead of "print('p(spam|new_email)=', p_spam_given_new[0])"(third line counted from end-line), than run your code. It will pause at this line, and you can run any python code there, such as "print(p_spam_given_new)" or just "p_spam_given_new", you can also check "prob_spam", "p_spam" or any other variable you want to check.
Related
I have created a function in python which randomly generates nucleotide sequence:
import random
selection60 = {"A":20, "T":20, "G":30, "C":30}
sseq60=[]
for k in selection60:
sseq60 = sseq60 + [k] * int(selection60[k])
random.shuffle(sseq60)
for i in range(100):
random.shuffle(sseq60)
def generateSequence(self, length):
length = int(length)
sequence = ""
while len(sequence) < length:
sequence="".join(random.sample(self, length))
return sequence[:length]
Now, I would like to check that while I apply this function, if a newly created sequence has a similarity of > 10% to the previous sequences, the sequence is eliminated and a new one is created:
I wrote something like this:
lst60=[]
newSeq=[]
for i in range(5):
while max_identity < 10:
newSeq=generateSequence(sseq60,100)
identity[i] = [newSeq[i] == newSeq[i] for i in range(len(newSeq[i]))]
max_identity[I]=100*sum(identity[i]/length(identity[i])
lst60.append(newSeq)
print(len(lst60))
However, it seems I get an empty list
You have to use a nested for loop if you want to compare ith sequence with jth sequence for all 1 <= j < i.
Further, I created a separate getSimilarity function for easier code readability. Pass it an old and new sequence to get the similarity.
def getSimilarity(old_seq, new_seq):
similarity = [old_seq[i] == new_seq[i] for i in range(len(new_seq))]
return 100*sum(similarity)/len(similarity)
lst60=[generateSequence(sseq60,100)]
for i in range(1,5):
newSeq = ""
max_identity = 0
while True:
newSeq = generateSequence(sseq60,100)
for j in range(0,i):
max_identity = max(max_identity, getSimilarity(lst60[j], newSeq))
if max_identity < 10:
break
lst60.append(newSeq)
print(len(lst60))
I'm trying to implement a filter with Python to sort out the points on a point cloud generated by Agisoft PhotoScan. PhotoScan is a photogrammetry software developed to be user friendly but also allows to use Python commands through an API.
Bellow is my code so far and I'm pretty sure there is better way to write it as I'm missing something. The code runs inside PhotoScan.
Objective:
Selecting and removing 10% of points at a time with error within defined range of 50 to 10. Also removing any points within error range less than 10% of the total, when the initial steps of selecting and removing 10% at a time are done. Immediately after every point removal an optimization procedure should be done. It should stop when no points are selectable or when selectable points counts as less than 1% of the present total points and it is not worth removing them.
Draw it for better understanding:
Actual Code Under Construction (3 updates - see bellow for details):
import PhotoScan as PS
import math
doc = PS.app.document
chunk = doc.chunk
# using float with range and that by setting i = 1 it steps 0.1 at a time
def precrange(a, b, i):
if a < b:
p = 10**i
sr = a*p
er = (b*p) + 1
p = float(p)
for n in range(sr, er):
x = n/p
yield x
else:
p = 10**i
sr = b*p
er = (a*p) + 1
p = float(p)
for n in range(sr, er):
x = n/p
yield x
"""
Determine if x is close to y:
x relates to nselected variable
y to p10 variable
math.isclose() Return True if the values a and b are close to each other and
False otherwise
var is the tolerance here setted as a relative tolerance:
rel_tol is the relative tolerance – it is the maximum allowed difference
between a and b, relative to the larger absolute value of a or b. For example,
to set a tolerance of 5%, pass rel_tol=0.05. The default tolerance is 1e-09,
which assures that the two values are the same within about 9 decimal digits.
rel_tol must be greater than zero.
"""
def test_isclose(x, y, var):
if math.isclose(x, y, rel_tol=var): # if variables are close return True
return True
else:
False
# 1. define filter limits
f_ReconstUncert = precrange(50, 10, 1)
# 2. count initial point number
tiePoints_0 = len(chunk.point_cloud.points) # storing info for later
# 3. call Filter() and init it
f = PS.PointCloud.Filter()
f.init(chunk, criterion=PS.PointCloud.Filter.ReconstructionUncertainty)
a = 0
"""
Way to restart for loop!
should_restart = True
while should_restart:
should_restart = False
for i in xrange(10):
print i
if i == 5:
should_restart = True
break
"""
restartLoop = True
while restartLoop:
restartLoop = False
for count, i in enumerate(f_ReconstUncert): # for each threshold value
# count points for every i
tiePoints = len(chunk.point_cloud.points)
p10 = int(round((10 / 100) * tiePoints, 0)) # 10% of the total
f.selectPoints(i) # selects points
nselected = len([p for p in chunk.point_cloud.points if p.selected])
percent = round(nselected * 100 / tiePoints, 2)
if nselected == 0:
print("For threshold {} there´s no selectable points".format(i))
break
elif test_isclose(nselected, p10, 0.1):
a += 1
print("Threshold found in iteration: ", count)
print("----------------------------------------------")
print("# {} Removing points from cloud ".format(a))
print("----------------------------------------------")
print("# {}. Reconstruction Uncerntainty:"
" {:.2f}".format(a, i))
print("{} - {}"
" ({:.1f} %)\n".format(tiePoints,
nselected, percent))
f.removePoints(i) # removes points
# optimization procedure needed to refine cameras positions
print("--------------Optimizing cameras-------------\n")
chunk.optimizeCameras(fit_f=True, fit_cx=True,
fit_cy=True, fit_b1=False,
fit_b2=False, fit_k1=True,
fit_k2=True, fit_k3=True,
fit_k4=False, fit_p1=True,
fit_p2=True, fit_p3=False,
fit_p4=False, adaptive_fitting=False)
# count again number of points in point cloud
tiePoints = len(chunk.point_cloud.points)
print("= {} remaining points after"
" {} removal".format(tiePoints, a))
# reassigning variable to get new 10% of remaining points
p10 = int(round((10 / 100) * tiePoints, 0))
percent = round(nselected * 100 / tiePoints, 2)
print("----------------------------------------------\n\n")
# restart loop to investigate from range start
restartLoop = True
break
else:
f.resetSelection()
continue # continue to next i
else:
f.resetSelection()
print("for loop didnt work out")
print("{} iterations done!".format(count))
tiePoints = len(chunk.point_cloud.points)
print("Tiepoints 0: ", tiePoints_0)
print("Tiepoints 1: ", tiePoints)
Problems:
A. Currently I'm stuck on an endless processing because of a loop. I know it's about my bad coding. But how do I implement my objective and get away with the infinite loops? ANSWER: Got the code less confusing and updated above.
B. How do I start over (or restart) my search for valid threshold values in the range(50, 20) after finding one of them? ANSWER: Stack Exchange: how to restart a for loop
C. How do I turn the code more pythonic?
IMPORTANT UPDATE 1: altered above
Using a better range with float solution adapted from stackoverflow: how-to-use-a-decimal-range-step-value
# using float with range and that by setting i = 1 it steps 0.1 at a time
def precrange(a, b, i):
if a < b:
p = 10**i
sr = a*p
er = (b*p) + 1
p = float(p)
return map(lambda x: x/p, range(sr, er))
else:
p = 10**i
sr = b*p
er = (a*p) + 1
p = float(p)
return map(lambda x: x/p, range(sr, er))
# some code
f_ReconstUncert = precrange(50, 20, 1)
And also using math.isclose() to determine if selected points are close to the 10% selected points instead of using a manual solution through assigning new variables. This was implemented as follows:
"""
Determine if x is close to y:
x relates to nselected variable
y to p10 variable
math.isclose() Return True if the values a and b are close to each other and
False otherwise
var is the tolerance here setted as a relative tolerance:
rel_tol is the relative tolerance – it is the maximum allowed difference
between a and b, relative to the larger absolute value of a or b. For example,
to set a tolerance of 5%, pass rel_tol=0.05. The default tolerance is 1e-09,
which assures that the two values are the same within about 9 decimal digits.
rel_tol must be greater than zero.
"""
def test_threshold(x, y, var):
if math.isclose(x, y, rel_tol=var): # if variables are close return True
return True
else:
False
# some code
if test_threshold(nselected, p10, 0.1):
# if true then a valid threshold is found
# some code
UPDATE 2: altered on code under construction
Minor fixes and got to restart de for loop from beginning by following guidance from another Stack Exchange post on the subject. Have to improve the range now or alter the isclose() to get more values.
restartLoop = True
while restartLoop:
restartLoop = False
for i in range(0, 10):
if condition:
restartLoop = True
break
UPDATE 3: Code structure to achieve listed objectives:
threshold = range(0, 11, 1)
listx = []
for i in threshold:
listx.append(i)
restart = 0
restartLoop = True
while restartLoop:
restartLoop = False
for idx, i in enumerate(listx):
print("do something as printing i:", i)
if i > 5: # if this condition restart loop
print("found value for condition: ", i)
del listx[idx]
restartLoop = True
print("RESTARTING LOOP\n")
restart += 1
break # break inner while and restart for loop
else:
# continue if the inner loop wasn't broken
continue
else:
continue
print("restart - outer while", restart)
So I have a problem that might be super duper simple.
I have these numpy ndarrays that I allocated and want to assign values to them via indices returned as lists. It might be easier if I showed you some example code. The questionable code I have is at the bottom, and in my testing (before actually taking this to scale) I keep getting syntax errors :'(
EDIT: edited to make it easier to troubleshoot and put some example code at the bottoms
import numpy as np
def do_stuff(index, mask):
# this is where the calculations are made
magic = sum(mask)
return index, magic
def foo(full_index, comparison_dims, *xargs):
# I have this function executed in Parallel since I'm using a machine with 36 nodes per core, and can access upto 16 cores for each script #blessed
# figure out how many dimensions there are, and how big they are
parent_dims = []
parent_diffs = []
for j in xargs:
parent_dims += [len(j)]
parent_diffs += [j[1] - j[0]] # this is used to find a mask
index = [] # this is where the individual dimension indices will be stored
dim_n = 0
# loop through the dimensions
while dim_n < len(parent_dims):
dim_index = full_index % parent_dims[dim_n]
index += [dim_index]
if dim_n == 0:
mask = (comparison_dims[dim_n] > xargs[dim_n][dim_index] - parent_diffs[dim_n]/2) * \
(comparison_dims[dim_n] <= xargs[dim_n][dim_index] +parent_diffs[dim_n] / 2)
else:
mask *= (comparison_dims[dim_n] > xargs[dim_n][dim_index] - parent_diffs[dim_n]/2) * \
(comparison_dims[dim_n] <=xargs[dim_n][dim_index] + parent_diffs[dim_n] / 2)
full_index //= parent_dims[dim_n]
dim_n += 1
return do_stuff(index, mask)
def bar(comparison_dims, *xargs):
if len(xargs) == comparison_dims.shape[0]:
pass
elif len(comparison_dims.shape) == 2:
pass
else:
raise ValueError("silly person, you failed")
from joblib import Parallel, delayed
dims = []
for j in xargs:
dims += [len(j)]
myArray = np.empty(tuple(dims))
results = Parallel(n_jobs=1)(
delayed(foo)(
index, comparison_dims, *xargs)
for index in range(np.prod(dims))
)
# LOOK HERE, HELP HERE!
for index_list, result in results:
# I thought this would work, but oh golly was I was wrong, index_list here is a list of ints, and result is a value
# for example index, result = [0,3,7], 45.4
# so in execution, that would yield: myArray[0,3,7] = 45.4
# instead it yields SyntaxError because I don't know what I'm doing XD
myArray[*index_list] = result
return myArray
Any ideas how I can make that work. What do I need to do?
I'm not the sharpest tool in the shed, but I think with your help we might be able to figure this out!
A quick example to troubleshoot this problem would be:
compareDims = np.array([np.random.rand(1000), np.random.rand(1000)])
dim0 = np.arange(0,1,1./20)
dim1 = np.arange(0,1,1./30)
myArray = bar(compareDims, dim0, dim1)
To index a numpy array with an arbitrary list of multidimensional indices. you actually need to use a tuple:
for index_list, result in results:
myArray[tuple(index_list)] = result
I am trying to implement dijkstra's algorithm (on an undirected graph) to find the shortest path and my code is this.
Note: I am not using heap/priority queue or anything but an adjacency list, a dictionary to store weights and a bool list to avoid cycling in the loops/recursion forever. Also, the algorithm works for most test cases but fails for this particular one here: https://ideone.com/iBAT0q
Important : Graph can have multiple edges from v1 to v2 (or vice versa), you have to use the minimum weight.
import sys
sys.setrecursionlimit(10000)
def findMin(n):
for i in x[n]:
cost[n] = min(cost[n],cost[i]+w[(n,i)])
def dik(s):
for i in x[s]:
if done[i]:
findMin(i)
done[i] = False
dik(i)
return
q = int(input())
for _ in range(q):
n,e = map(int,input().split())
x = [[] for _ in range(n)]
done = [True]*n
w = {}
cost = [1000000000000000000]*n
for k in range(e):
i,j,c = map(int,input().split())
x[i-1].append(j-1)
x[j-1].append(i-1)
try: #Avoiding multiple edges
w[(i-1,j-1)] = min(c,w[(i-1,j-1)])
w[(j-1,i-1)] = w[(i-1,j-1)]
except:
try:
w[(i-1,j-1)] = min(c,w[(j-1,i-1)])
w[(j-1,i-1)] = w[(i-1,j-1)]
except:
w[(j-1,i-1)] = c
w[(i-1,j-1)] = c
src = int(input())-1
#for i in sorted(w.keys()):
# print(i,w[i])
done[src] = False
cost[src] = 0
dik(src) #First iteration assigns possible minimum to all nodes
done = [True]*n
dik(src) #Second iteration to ensure they are minimum
for val in cost:
if val == 1000000000000000000:
print(-1,end=' ')
continue
if val!=0:
print(val,end=' ')
print()
The optimum isn't always found in the second pass. If you add a third pass to your example, you get closer to the expected result and after the fourth iteration, you're there.
You could iterate until no more changes are made to the cost array:
done[src] = False
cost[src] = 0
dik(src)
while True:
ocost = list(cost) # copy for comparison
done = [True]*n
dik(src)
if cost == ocost:
break
First time publishing in here, here it goes:
I have two sets of data(v and t), each one has 46 values. The data is imported with "pandas" module and coverted to a numpy array in order to do the calculation.
I need to set ml_min1[45], ml_min2[45], and so on to the value "0". The problem is that each time I ran the script, the values corresponding to the position 45 of ml_min1 and ml_min2 are different. This is the piece of code that I have:
t1 = fil_copy.t1.as_matrix()
t2 = fil_copy.t2.as_matrix()
v1 = fil_copy.v1.as_matrix()
v2 = fil_copy.v2.as_matrix()
ml_min1 = np.empty(len(t1))
l_h1 = np.empty(len(t1))
ml_min2 = np.empty(len(t2))
l_h2 = np.empty(len(t2))
for i in range(0, (len(v1) - 1)):
if (i != (len(v1) - 1)) and (v1[i+1] > v1[i]):
ml_min1[i] = v1[i+1] - v1[i]
l_h1[i] = ml_min1[i] * (60/1000)
elif i == (len(v1)-1):
ml_min1[i] = 0
l_h1[i] = 0
print(i, ml_min1[i])
else:
ml_min1[i] = 0
l_h1[i] = 0
print(i, ml_min1[i])
for i in range(0, (len(v2) - 1)):
if (i != (len(v2) - 1)) and (v2[i+1] > v2[i]):
ml_min2[i] = v2[i+1] - v2[i]
l_h2[i] = ml_min2[i] * (60/1000)
elif i == (len(v2)-1):
ml_min2[i] = 0
l_h2[i] = 0
print(i, ml_min2[i])
else:
ml_min2[i] = 0
l_h2[i] = 0
print(i, ml_min2[i])
Your code as it is currently written doesn't work because the elif blocks are never hit, since range(0, x) does not include x (it stops just before getting there). The easiest way to solve this is probably just to initialize your output arrays with numpy.zeros rather than numpy.empty, since then you don't need to do anything in the elif and else blocks (you can just delete them).
That said, it's generally a design error to use loops like yours in numpy code. Instead, you should use numpy's broadcasting features to perform your mathematical operations to a whole array (or a slice of one) at once.
If I understand correctly, the following should be equivalent to what you wanted your code to do (just for one of the arrays, the other should work the same):
ml_min1 = np.zeros(len(t1)) # use zeros rather than empty, so we don't need to assign any 0s
diff = v1[1:] - v1[:-1] # find the differences between all adjacent values (using slices)
mask = diff > 0 # check which ones are positive (creates a Boolean array)
ml_min1[:-1][mask] = diff[mask] # assign with mask to a slice of the ml_min1 array
l_h1 = ml_min1 * (60/1000) # create l_h1 array with a broadcast scalar multiplication