local variable might be referenced before assignment in Python - python

I'm going through some code that implements a decision tree learner.
Here is the code:
def calculate_entropy(self, tags):
tags_counter = Counter()
if len(tags) > 0:
for tag in tags:
tags_counter[tag] += 1
classes_probs = [float(tags_counter[tag]) / len(tags) for tag in tags_counter]
entropy = 0
for prob in classes_probs:
if prob == 0:
return 0
entropy -= prob * math.log(prob, 2)
return entropy
else:
return 0
My questions are:
for classes_probs I get a local variable might be referenced before assignment message, and I can't figure why.
what does the code on the right side of the placement into classes probs do? I haven't seen anything like it.

(1) The warning is because classes_probs may be undefined at that point. If tags is empty, the first loop doesn't execute. You can "fix" this by assigning an empty list before the first loop.
(2) This is called a list comprehension. Use that search term and find a tutorial at your preferred level of writing and examples.

Related

Why my solution is unable to solve 8puzzle problem for boards that require more than 1 move?

I am trying to solve 8 puzzle problem in python given here in this assignment -https://www.cs.princeton.edu/courses/archive/fall12/cos226/assignments/8puzzle.html
My goal state is a little different from what is mentioned in the assignment -
#GOAL STATE
goal_state = [[0,1,2],[3,4,5],[6,7,8]]
The buggy part, it seems, is the isSolvable function. It is implemented correctly but while testing the board, it considers the goal state to be the one in which relative order is maintained and blank can be anywhere. So it might be the case that a board is solvable but it might not lead to the current defined goal state. So I am unable to think of a method in which I can test for all the possible goal states while running the solver function *
Also, my solver function was wrongly implemented. I was only considering the neighbor which had the minimum manhattan value and when I was hitting a dead end, I was not considering other states. This can be done by using a priority queue. I am not exactly sure as to how to proceed to implement it. I have written a part of it(see below) which is also kind of wrong as I not pushing the parent into the heap. Kindly provide me guidance for that.
Here is my complete code -
https://pastebin.com/q7sAKS6a
Updated code with incomplete solver function -
https://pastebin.com/n4CcQaks
I have used manhattan values to calculate heuristic values and hamming value to break the tie.
my isSolvable function, manhattan function and solver function:
isSolvable function -
#Conditions for unsolvability -->
#https://www.geeksforgeeks.org/check-instance-8-puzzle-solvable/
def isSolvable(self):
self.one_d_array = []
for i in range(0,len(self.board)):
for j in range(0,len(self.board)):
self.one_d_array.append(self.board[i][j])
inv_count = 0
for i in range(0,len(self.one_d_array)-1):
for j in range(i+1, len(self.one_d_array)):
if (self.one_d_array[i] != 0 and self.one_d_array[j] != 0 and self.one_d_array[i] > self.one_d_array[j]):
inv_count = inv_count + 1
if(inv_count % 2 == 0):
print("board is solvable")
return True
else:
print("board is not solvable")
return False
Manhattan function
def manhattan_value(self,data=None):
manhattan_distance = 0
for i in range(0,len(data)):
for j in range(0,len(data)):
if(data[i][j] != self.goal_state[i][j] and data[i][j] != 0):
#correct position of the element
x_goal , y_goal = divmod(data[i][j],3)
manhattan_distance = manhattan_distance + abs(i-x_goal) + abs(j-y_goal)
return manhattan_distance
Updated Solver function
#implement A* algorithm
def solver(self):
moves = 0
heuristic_value = []
prev_state = []
curr_state = self.board
output = []
heap_array = []
store_manhattan_values = []
if(curr_state == self.goal_state):
print("goal state reached!")
print(curr_state)
print("number of moves required to reach goal state --> {}".format(moves))
else:
while(True):
min_heuristic_value = 99999999999
min_pos = None
moves = moves + 1
output = self.get_neighbours(curr_state)
for i in range(len(output)):
store_manhattan_values.append([self.manhattan_value(output[i]),i])
#print(store_manhattan_values)
for i in range(len(store_manhattan_values)):
heapq.heappush(heap_array,store_manhattan_values[i])
#print(heap_array)
#print(heapq.heappop(heap_array)[1])
#if(moves > 1):
# return
return
Please refer to the PASTEBIN link for complete code and all the references (https://pastebin.com/r7TngdFc).
Updated code with incomplete solver function -
https://pastebin.com/n4CcQaks
In the given link for my code (based on my tests and debugging so far) -
These functions are working correctly - manhatten_value, hamming_value, append_in_list, get_neighbours
What does these functions do -
isSolvable - tells if the board can be solved or not
manhattan_value - calculates the manhattan value of the board passed to it.
hamming_value - calculates the hamming value of the board passed to it.
append_in_list - helper function for getting neighbours. It swaps values then save the resultant state in an array and then reswaps them to return to original position for further swapping and getting other possible states.
get_neighbours - gets all the possible neighbors which can be formed by swapping places with blank element(0 element).
solver - implements the A* algorithm
I am unable to find my mistake. Kindly guide me in this problem. Thank you in advance for your help!
I am apologizing in advance as I am unable to produce a minimal version of my code for this problem. I can not think of any way to use all the functions and produce a minimal version of the code.
(Note, this answer is different than the earlier revision about which many of the comments below were relating to.)
I don't see how the current code implements a queue. It seems like the while loop in the solver picks one new board state each time from a list of possible moves, then considers the next list generated by this new board state.
On the other hand, a priority queue, from what I understand, would have all the (valid) neighbours from the current board state inserted into it and prioritised such that the next chosen board state to be removed from the queue and examined will be the one with highest priority.
(To be completely sure in debugging, I might add a memoisation to detect if the code ends up also revisiting board states -- ah, on second thought, I believe the stipulation in the assignment description that the number of current moves be added to the priority assignment would rule out the same board state being revisited if the priority queue is correctly observed, so memoisation may not be needed.)

About coding this binary tree in Python

I have a question about coding up a binary tree, and I'm currently stuck on it.
Basically, the algorithm is as follows where at the initial node, we have the following objects: 1,2,3 placed in order as (None,2,3) with object 3 > object 2 > object 1. In each stage of the tree, one object can move to their immediate right or left and be placed with another object if that object that is moving is smaller than the object that is placed at the current position. If the position to the immediate right or left has a None position, then only the smallest number may move over from the right or left position. For example, it is possible to have the events of (None,0,2+3) and (None,1+2,3) after the initial node.
Hence, the tree is as follows:
I am wondering how to code this up in Python. I dont really care about the replicating events so at each point, I am only interested in the unique tuples/events that occurs.
Im not sure but a rough idea I have is to do this:
def recursion():
pos = list(range(1,4))
index = 0
values = list(range(1,4))
tree = []
if pos[index+1] > pos[index]:
pos[index + 1] -= pos[index+1]
pos[index + 2] += pos[index+1]
tree.append(pos)
if pos[index+1] < pos[index]:
pos[index+1] += pos[index]
pos[index] -= pos[index]
tree.append(pos)
else:
recursion()
return tree
Any help would be greatly appreciated.

Parameter hint assignment through a function in LMFIT

I want to set the parameter hints for models held in a dictionary. I have created a function which is called for setting the hints. First, a primary model is created and then I want to create different models, identical to the primary, but with different prefixes. The set_hints function accepts a parameter comp which defined what hints will be set. This is a simplified part of my code:
import lmfit
def foo (x, a):
return x + a
def set_hints(mod, comp="2"):
mod.set_param_hint("a", value=1, vary=True)
if comp == "2":
mod.set_param_hint("a", value=0, vary=False)
return mod.param_hints
m = lmfit.Model(foo)
models = {}
for i in range(2):
hints = set_hints(m, comp="2")
models["m%i" % i] = lmfit.Model(m.func, m.independent_vars,
prefix="m%i" %i,
param_names=m.param_names)
for par in m.param_names:
models["m%i" % i].param_hints[par] = hints[par]
# models["m%i" % i].param_hints = hints
for key in models.keys():
print key
print "value:"
print models[key].param_hints["a"]["value"]
print "vary:"
print models[key].param_hints["a"]["vary"]
which outputs:
m0
value:
1
vary:
True
m1
value:
0
vary:
False
Which doesn't make any sense to me! The value and vary hints should be 0 and False respectively in both cases. It is like at the second iteration of the loop, the condition comp == "2" of the set_hints function is not satisfied for the 1st iteration of the loop and the hints are changed retroactively! If I uncomment the commented line and not set the hints iteratively, the result is good. But what is happening now I find it completely absurd. Please help me understand what is happening!
The code seems very weird, but I assume it comes from a larger design. I think this must be a bug, though I'm not certain what that is. I will create an Issue on the lmfit github site.

Checking that array doesn't contain negative numbers, and running function again if it does

My task today is to create a way for checking if a function's output contains negative numbers, and if it does, then I must run the function until it contains no negative numbers.
I'll post the full code later in the post, but this is my attempt at a solution:
def evecs(matrixTranspose):
evectors = numpy.linalg.eig(matrixTranspose)[1][:,0]
return evectors
if any(x<0 for x in evectors) == False:
print(evectors)
evecs() is my function, and evectors is the output array, but I only want to print evectors if there are no negative entries in it. I also want to later add that if there are negative entries in it, the code should run the evecs function again until it finds an evectors that has no negative entries.
However, whenever I run it I get the error:
global name evectors is not defined
Here's a link to my code, and the full output from the iPython console. http://pastebin.com/3Bk9h1gq
Thanks!
You have not declared the variable evectors other than within the scope of your function evecs.
evectors = evecs(matrixTranspose)
if any(x<0 for x in evectors) == False:
print(evectors)
EDIT
There are several issues:
Indentation is VERY important in Python. MarkovChain and evecs are two seperate functions. You had your evacs function indented an extra level in, embeddeding it within MarkovChain.
MarkovChain should return matrixTransponse if you plan to use it in another function call.
As a result of the above issue, your function call to MarkovChain needs to be assigned to a variable, matrixTranponse, otherwise you will get an error stating that matrixTranspose is not defined when you make your function call to evecs with it.
Since the initialization of the variable matrixTranspose isn't set until the function call to MarkovChain is completed, the remainder of your logic will need to be re-ordered.
I have applied all the above changes below and added comments to the changed areas:
def MarkovChain(n,s) :
"""
"""
matrix = []
for l in range(n) :
lineLst = []
sum = 0
crtPrec = precision
for i in range(n-1) :
val = random.randrange(crtPrec)
sum += val
lineLst.append(float(val)/precision)
crtPrec -= val
lineLst.append(float(precision - sum)/precision)
matrix2 = matrix.append(lineLst)
print("The intial probability matrix.")
print(tabulate(matrix))
matrix_n = numpy.linalg.matrix_power(matrix, s)
print("The final probability matrix.")
print(tabulate(matrix_n))
matrixTranspose = zip(*matrix_n)
return matrixTransponse # issue 2
# issue 1
def evecs(matrixTranspose):
evectors = numpy.linalg.eig(matrixTranspose)[1][:,0]
return evectors
matrixTranponse = MarkovChain(4, 10000000000) # issue 3
# issue 4
evectors = evecs(matrixTranspose)
if any(x<0 for x in evectors) == False:
print(evectors)

How to speed up Python string matching code

I have this code which computes the Longest Common Subsequence between random strings to see how accurately one can reconstruct an unknown region of the input. To get good statistics I need to iterate it many times but my current python implementation is far too slow. Even using pypy it currently takes 21 seconds to run once and I would ideally like to run it 100s of times.
#!/usr/bin/python
import random
import itertools
#test to see how many different unknowns are compatible with a set of LCS answers.
def lcs(x, y):
n = len(x)
m = len(y)
# table is the dynamic programming table
table = [list(itertools.repeat(0, n+1)) for _ in xrange(m+1)]
for i in range(n+1): # i=0,1,...,n
for j in range(m+1): # j=0,1,...,m
if i == 0 or j == 0:
table[i][j] = 0
elif x[i-1] == y[j-1]:
table[i][j] = table[i-1][j-1] + 1
else:
table[i][j] = max(table[i-1][j], table[i][j-1])
# Now, table[n, m] is the length of LCS of x and y.
return table[n][m]
def lcses(pattern, text):
return [lcs(pattern, text[i:i+2*l]) for i in xrange(0,l)]
l = 15
#Create the pattern
pattern = [random.choice('01') for i in xrange(2*l)]
#create text start and end and unknown.
start = [random.choice('01') for i in xrange(l)]
end = [random.choice('01') for i in xrange(l)]
unknown = [random.choice('01') for i in xrange(l)]
lcslist= lcses(pattern, start+unknown+end)
count = 0
for test in itertools.product('01',repeat = l):
test=list(test)
testlist = lcses(pattern, start+test+end)
if (testlist == lcslist):
count += 1
print count
I tried converting it to numpy but I must have done it badly as it actually ran more slowly. Can this code be sped up a lot somehow?
Update. Following a comment below, it would be better if lcses used a recurrence directly which gave the LCS between pattern and all sublists of text of the same length. Is it possible to modify the classic dynamic programming LCS algorithm somehow to do this?
The recurrence table table is being recomputed 15 times on every call to lcses() when it is only dependent upon m and n where m has a maximum value of 2*l and n is at most 3*l.
If your program only computed table once, it would be dynamic programming which it is not currently. A Python idiom for this would be
table = None
def use_lcs_table(m, n, l):
global table
if table is None:
table = lcs(2*l, 3*l)
return table[m][n]
Except using an class instance would be cleaner and more extensible than a global table declaration. But this gives you an idea of why its taking so much time.
Added in reply to comment:
Dynamic Programming is an optimization that requires a trade-off of extra space for less time. In your example you appear to be doing a table pre-computation in lcs() but you build the whole list on every single call and then throw it away. I don't claim to understand the algorithm you are trying to implement, but the way you have it coded, it either:
Has no recurrence relation, thus no grounds for DP optimization, or
Has a recurrence relation, the implementation of which you bungled.

Categories