Getting a binary search to work in Python

Getting a binary search to work in Python - python

I am trying to get a binary search to work in Python. I have a massive, sorted list of passwords. The plan is to get a password input from the user and see if it is in the list. I've decided to implement a binary search because of the size of the list.
Here's my code:
Found = False
Password = user_input("Enter a password: ")
with io.open('final.txt', encoding='latin-1') as myfile:
data = myfile.readlines()
low = 0
high = (int(len(data))+1)
while (low < high) and not Found:
mid = int((low+high)/2)
if data[mid] == Password:
Found = True
break
elif Password < str(data[mid]):
high = mid - 1
elif Password > str(data[mid]):
low = mid + 1
I am guessing it is because of the string comparison? Any ideas? The binary search never returns true, even if I explicitly search something that I know is in the list.
I used this code to sort the password list.
import io
with io.open('result.txt', encoding='latin-1') as myfile:
data = myfile.readlines()
def partition(data, start, end):
pivot = data[end] # Partition around the last value
bottom = start-1 # Start outside the area to be partitioned
top = end # Ditto
done = 0
while not done: # Until all elements are partitioned...
while not done: # Until we find an out of place element...
bottom = bottom+1 # ... move the bottom up.
if bottom == top: # If we hit the top...
done = 1 # ... we are done.
break
if data[bottom] > pivot: # Is the bottom out of place?
data[top] = data[bottom] # Then put it at the top...
break # ... and start searching from the top.
while not done: # Until we find an out of place element...
top = top-1 # ... move the top down.
if top == bottom: # If we hit the bottom...
done = 1 # ... we are done.
break
if data[top] < pivot: # Is the top out of place?
data[bottom] = data[top] # Then put it at the bottom...
break # ...and start searching from the bottom.
data[top] = pivot # Put the pivot in its place.
return top # Return the split point
def quicksort(data, start, end):
if start < end: # If there are two or more elements...
split = partition(data, start, end) # ... partition the sublist...
quicksort(data, start, split-1)
quicksort(data, split+1, end)
quicksort(data, 0, (int(len(data))-1))
with io.open('final.txt', 'w', encoding='latin-1') as f:
for s in data:
f.write(s)
The sorted list looks something like this: whitespace, then symbols, then numbers, then capital letters (alphabetically sorted), then common letters (alphabetically sorted).

Do not write your own binary search, it's a bit tricky to get them right. Use bisect module instead.
from bisect import bisect_left
def binary_search(lst, el):
# returns lower bound of key `el` in list `lst`
index = bisect_left(lst, el)
# check that: (1) the lower bound is not at the end of the list and
# (2) the element at the index matches `el`
return index < len(lst) and lst[index] == el
Usage:
test = ["abc", "def", "ghi"]
print(binary_search(test, "def")) # True
print(binary_search(test, "xyz")) # False

You probably have a new line character at the end of each password after calling readlines, use rstrip() to remove it
Found = False
Password = user_input("Enter a password: ")
with io.open('final.txt', encoding='latin-1') as myfile:
data = myfile.readlines()
low = 0
high = len(data)-1 #no need to cast to int, should be len()-1
while (low <= high) and not Found: #less than or equal to
mid = int((low+high)/2)
if data[mid].rstrip() == Password: #Remove newline character before compare
Found = True
break
elif Password < str(data[mid]):
high = mid - 1
elif Password > str(data[mid]):
low = mid + 1

If you only want to search the password in your list then In your code
data = myfile.readlines()
you have already taken all the passwords into the memory.
so if you just want to check if a given password is present in your list or not, you can directly check by using
if Password in data:
print "yes it is present in the list"
else:
print "Not present in the list"
hope it may help.

This is example of binary search
def binarySearch(alist, item):
first = 0
last = len(alist)-1
found = False
while first<=last and not found:
midpoint = (first + last)//2
if alist[midpoint] == item:
found = True
else:
if item < alist[midpoint]:
last = midpoint-1
else:
first = midpoint+1
return found
mylist1 = [0, 1, 2, 8, 9, 17, 19, 32, 42,]
print(binarySearch(mylist1, 3))
print(binarySearch(mylist1, 13))
mylist2 = [0, 1, 2, 8, 9, 17, 19, 32, 42, 99]
print(binarySearch(mylist2, 2))
print(binarySearch(mylist2, 42))
I got then
False
False
True
True
Yes and I am sure that you need new line character at the end of each password after calling readlines,as Eamon pointed out.

There are two problems .
Your binary search algorithm is wrong .
The repeat condition should be
while (low <= high)
or your can't find the first and the last element .
readlines() will read \n but user_input() does not .
Which causes `Password` == `Password\n' be false forever.

You're skipping parts of your list because of the way you're setting low and high. Because of this, low == high occurs after updating and before checking, causing you to jump out of the loop prematurely.
There are two easy solutions:
Either..
set high = mid or low = mid instead of mid -/+ 1, triggering an extra iteration,
or..
Check if high == low and data[low] == Password after the loop
terminates, as you might still find Password there.

Related

python infinite loop and numpy delete do not work properly

I wrote a function and it does not end. Logically len(array) should be decreasing but it stuck in 227. I think numpy delete does not work properly or I made mistake somewhere??
def segmenting (file, threshold):
segments = []
check = True
count = 0
while check == True:
if len(file) <= 2:
check = False
sequence = []
ids = []
for i in range(1, len(file)):
vector = [file[i,1] - file[0,1] , file[i,2]- file[0,2] ]
magnitude = math.sqrt(vector[0]**2 + vector[1]**2)
print(i)
if magnitude <= threshold:
sequence.append(file[i])
ids.append(i)
if i == len(file) and len(sequence) == 0:
file = np.delete(file, 0 , axis = 0)
break
if len(ids) >0 and len(sequence)>0 :
segments.append(sequence)
file = np.delete(file, ids , axis = 0)
print('sequence after :',sequence)
sequence = []
ids = []
print(len(file))
return segments

The following (simplified) logic will never be executed
for i in range(1, len(file)):
if i == len(file):
file = np.delete(file, 0)
Without having a way to remove the first line of the file, you have no way to exhaust your array. This check is superfluous anyway since after each iteration you won't need the first line anymore.
As a first fix you can put the check outside the loop and only check whether you've found any matches
for i in range(1, len(file)):
...
if len(sequence) == 0:
file = np.delete(file, 0)
But that way you would have one iteration where you find (and remove) matches and then one more with no more matches where you then remove it. Therefore, as said above, you should always remove the first line after each iteration.
With more simplifications, your code can be reduced down to:
def segmenting(file, threshold):
segments = []
while len(file) > 2:
idx = np.sqrt(np.sum((file[1:,1:3] - file[0,1:3])**2, axis=1)) <= threshold
file = file[1:]
segments.append(list(file[idx]))
file = file[np.logical_not(idx)]
return segments

It's likely due to the fact you are removing element from file array within a for loop, and also trying to iterate over for loop using file array. Try iterate over a clean version of file array(no modification on it), and do the deletion on a copy of file array
For example, one possible solution is to fix this line
for i in range(1, len(file)):
Fix like below
N=len(file)
for i in range(1, N):
Also you could remove flag variable 'check' and replace with break statement

How can I assign positives and negatives to a string?

I'm trying to create this code so that when variable J is present, it is a positive number, but if H is present, it is a negative number. Here is my code.
record = ['1J2H']
def robot_location(record: str):
if J in record:
sum()
if H in record:
** I dont know how to subtract them**
print(robot_location(record)
So if record = [1J2H] then the output should be ((+1)+(-2)) = -1 should be the output... how can I do that?? Somebody pls help explain this.

You need to iterate over string inside list, check char by char and asume thats always length of string will be odd
record = ['1J2H']
def robot_location(record: str):
total = 0
aux_n = 0
for a in record:
if a.isnumeric():
aux_n = int(a)
else:
if a == 'H':
total = total + aux_n*-1
else:
total = total + aux_n
aux_n = 0
return total
print(robot_location(record[0]))

Here is a concise way to do this via a list comprehension:
record = '1J2H'
nums = re.findall(r'\d+[JH]', record) # ['1J', '2H']
output = sum([int(x[:-1]) if x[-1] == 'J' else -1*int(x[:-1]) for x in nums])
print(output) # -1

One way to do that is by using iter and converting the string into an iterable, that allows to use next which moves the iteration to the next item meaning that if one simply iterates over it it will get moved to the next item and then if one uses next it will return the current value where the "pointer"? is and move it to the next value so the next iteration with a for loop (list comprehension in this case) will get the next value, meaning that the loop will return only the numbers while next will return only the letters:
lst = ['1J2H']
def robot_location(record: str):
record = iter(record)
numbers = [int(i) if next(record) == 'J' else -int(i) for i in record]
return sum(numbers)
print(robot_location(lst[0]))

You could modify a string like s = '1J2H' to '1+2*-1+0' and let Python evaluate it:
result = eval(s.replace('J', '+').replace('H', '*-1+') + '0')

Recursive Decompression of Strings

I'm trying to decompress strings using recursion. For example, the input:
3[b3[a]]
should output:
baaabaaabaaa
but I get:
baaaabaaaabaaaabbaaaabaaaabaaaaa
I have the following code but it is clearly off. The first find_end function works as intended. I am absolutely new to using recursion and any help understanding / tracking where the extra letters come from or any general tips to help me understand this really cool methodology would be greatly appreciated.
def find_end(original, start, level):
if original[start] != "[":
message = "ERROR in find_error, must start with [:", original[start:]
raise ValueError(message)
indent = level * " "
index = start + 1
count = 1
while count != 0 and index < len(original):
if original[index] == "[":
count += 1
elif original[index] == "]":
count -= 1
index += 1
if count != 0:
message = "ERROR in find_error, mismatched brackets:", original[start:]
raise ValueError(message)
return index - 1
def decompress(original, level):
# set the result to an empty string
result = ""
# for any character in the string we have not looked at yet
for i in range(len(original)):
# if the character at the current index is a digit
if original[i].isnumeric():
# the character of the current index is the number of repetitions needed
repititions = int(original[i])
# start = the next index containing the '[' character
x = 0
while x < (len(original)):
if original[x].isnumeric():
start = x + 1
x = len(original)
else:
x += 1
# last = the index of the matching ']'
last = find_end(original, start, level)
# calculate a substring using `original[start + 1:last]
sub_original = original[start + 1 : last]
# RECURSIVELY call decompress with the substring
# sub = decompress(original, level + 1)
# concatenate the result of the recursive call times the number of repetitions needed to the result
result += decompress(sub_original, level + 1) * repititions
# set the current index to the index of the matching ']'
i = last
# else
else:
# concatenate the letter at the current index to the result
if original[i] != "[" and original[i] != "]":
result += original[i]
# return the result
return result
def main():
passed = True
ORIGINAL = 0
EXPECTED = 1
# The test cases
provided = [
("3[b]", "bbb"),
("3[b3[a]]", "baaabaaabaaa"),
("3[b2[ca]]", "bcacabcacabcaca"),
("5[a3[b]1[ab]]", "abbbababbbababbbababbbababbbab"),
]
# Run the provided tests cases
for t in provided:
actual = decompress(t[ORIGINAL], 0)
if actual != t[EXPECTED]:
print("Error decompressing:", t[ORIGINAL])
print(" Expected:", t[EXPECTED])
print(" Actual: ", actual)
print()
passed = False
# print that all the tests passed
if passed:
print("All tests passed")
if __name__ == '__main__':
main()

From what I gathered from your code, it probably gives the wrong result because of the approach you've taken to find the last matching closing brace at a given level (I'm not 100% sure, the code was a lot). However, I can suggest a cleaner approach using stacks (almost similar to DFS, without the complications):
def decomp(s):
stack = []
for i in s:
if i.isalnum():
stack.append(i)
elif i == "]":
temp = stack.pop()
count = stack.pop()
if count.isnumeric():
stack.append(int(count)*temp)
else:
stack.append(count+temp)
for i in range(len(stack)-2, -1, -1):
if stack[i].isnumeric():
stack[i] = int(stack[i])*stack[i+1]
else:
stack[i] += stack[i+1]
return stack[0]
print(decomp("3[b]")) # bbb
print(decomp("3[b3[a]]")) # baaabaaabaaa
print(decomp("3[b2[ca]]")) # bcacabcacabcaca
print(decomp("5[a3[b]1[ab]]")) # abbbababbbababbbababbbababbbab
This works on a simple observation: rather tha evaluating a substring after on reading a [, evaluate the substring after encountering a ]. That would allow you to build the result AFTER the pieces have been evaluated individually as well. (This is similar to the prefix/postfix evaluation using programming).
(You can add error checking to this as well, if you wish. It would be easier to check if the string is semantically correct in one pass and evaluate it in another pass, rather than doing both in one go)

Here is the solution with the similar idea from above:
we go through string putting everything on stack until we find ']', then we go back until '[' taking everything off, find the number, multiply and put it back on stack
It's much less consuming as we don't add strings, but work with lists
Note: multiply number can't be more than 9 as we parse it as one element string
def decompress(string):
stack = []
letters = []
for i in string:
if i != ']':
stack.append(i)
elif i == ']':
letter = stack.pop()
while letter != '[':
letters.append(letter)
letter = stack.pop()
word = ''.join(letters[::-1])
letters = []
stack.append(''.join([word for j in range(int(stack.pop()))]))
return ''.join(stack)

Longest chain of last word of line/first word of next

Okay, so I am trying to find from a text file the longest chain in which the last word of one line is the first word of the next (works well for poetry). The Python script I have to far works well but still takes an immensely long time. I am no coding expert and have really no idea of optimization. Am I running through more options than necessary?
How can I reduce the time it takes to run through a longer text?
#!/usr/bin/python
# -*- coding: utf-8 -*-
import re
import sys
# Opening the source text
with open("/text.txt") as g:
all_lines = g.readlines()
def last_word(particular_line):
if particular_line != "\n":
particular_line = re.sub(ur'^\W*|\W*$', "",particular_line)
if len(particular_line) > 1:
return particular_line.rsplit(None, 1)[-1].lower()
def first_word(particular_line):
if particular_line != "\n":
particular_line = re.sub(ur'^\W*|\W*$', "",particular_line)
if len(particular_line) > 1:
return particular_line.split(None, 1)[0].lower()
def chain(start, lines, depth):
remaining = list(lines)
del remaining[remaining.index(start)]
possibles = [x for x in remaining if (len(x.split()) > 2) and (first_word(x) == last_word(start))]
maxchain = []
for c in possibles:
l = chain(c, remaining, depth)
sys.stdout.flush()
sys.stdout.write(str(depth) + " of " + str(len(all_lines)) + " \r")
sys.stdout.flush()
if len(l) > len(maxchain):
maxchain = l
depth = str(depth) + "." + str(len(maxchain))
return [start] + maxchain
#Start
final_output = []
#Finding the longest chain
for i in range (0, len(all_lines)):
x = chain(all_lines[i], all_lines, i)
if len(x) > 2:
final_output.append(x)
final_output.sort(key = len)
#Output on screen
print "\n\n--------------------------------------------"
if len(final_output) > 1:
print final_output[-1]
else:
print "Nothing found"

import itertools
def matching_lines(line_pair):
return line_pair[0].split()[-1].lower() == line_pair[1].split()[0].lower()
line_pairs = ((line,next_line) for line,next_line in itertools.izip(all_lines,all_lines[1:]))
grouped_pairs = itertools.groupby(line_pairs,matching_lines)
print max([len(list(y))+1 for x,y in grouped_pairs if x])
although im not sure it will be faster (but i think it will be since it only iterates one time and uses mostly builtins)

Yes, this code has the complexity of $O(n^2)$. It means that if your file has n lines, then the amount of iterations your code will perform is 1 * (n-1) for the first line, then 1 * (n-2) for the second line etc, with n such elements. For a big n, this is relatively equal to $n^2$. Actually, there's a bug in the code in this line
del remaining[remaining.index(start)]
where you probably meant to run this:
del remaining[:remaining.index(start)]
(notice the ':' in the square brackets) which expands the runtime (now you have (n-1) + (n-1) + .. + (n-1) = n*(n-1), which is slightly bigger then (n-1) + (n-2) + (n-3) ..).
Your can optimize the code as so: begin with maxchainlen = 0, curchainlen = 0. Now, iterate through the lines, every time compare the first word of the current line to the last word of the previous line. If they match, increase curchainlen by 1. If they don't, check if maxchainlen < curchainlen, if so, assign maxchainlen = curchainlen, and init curchainlen to 0. After you finish iterating through the lines, do this checkup for maxchainlen again. Example:
lw = last_word(lines[0])
curchainlen = 0
maxchainlen = 0
for l in lines[2:]:
if lw = first_word(l):
curchainlen = curchainlen + 1
else:
maxchainlen = max(maxchainlen, curchainlen)
curchainlen = 0
maxchainlen = max(maxchainlen, curchainlen)
print(maxchainlen)

I'd try splitting this job into two phases: first finding the chains and then comparing them. That will simplify the code a lot. Since chains will be a small subset of all the lines in the file, finding them first and then sorting them will be quicker than trying to process the whole thing in one big go.
The first part of the problem is a lot easier if you use the python yield keyword, which is similar to return but doesn't end a function. This lets you loop over your content one line at a time and process it in small bites without needing to hold the whole thing in memory at all times.
Here's a basic way to grab a file one line at a time. It uses yield to pull out the chains as it finds them
def get_chains(*lines):
# these hold the last token and the
# members of this chain
previous = None
accum = []
# walk through the lines,
# seeing if they can be added to the existing chain in `accum`
for each_line in lines:
# split the line into words, ignoring case & whitespace at the ends
pieces = each_line.lower().strip().split(" ")
if pieces[0] == previous:
# match? add to accum
accum.append(each_line)
else:
# no match? yield our chain
# if it is not empty
if accum:
yield accum
accum = []
# update our idea of the last, and try the next line
previous = pieces[-1]
# at the end of the file we need to kick out anything
# still in the accumulator
if accum:
yield accum
When you feed this function a string of lines, it will yield out chains if it finds them and then continue. Whoever calls the function can capture the yielded chains and do things with them.
Once you've got the chains, it's easy to sort them by length and pick the longest. Since Python has built-in list sorting, just collect a list of line-length -> line pairs and sort it. The longest line will be the last item:
def longest_chain(filename):
with open (filename, 'rt') as file_handle:
# if you loop over an open file, you'll get
# back the lines in the file one at a time
incoming_chains = get_chains(*file_handle)
# collect the results into a list, keyed by lengths
all_chains = [(len(chain), chain ) for chain in incoming_chains]
if all_chains:
all_chains.sort()
length, lines = all_chains[-1]
# found the longest chain
return "\n".join(lines)
else:
# for some reason there are no chains of connected lines
return []

Loop not iterating fully

I'm creating a simple RPG as a learning experience. In my code I have an array of tiles that are displaying on a 25x25 grid just fine, and a separate array that contains the True/False values pertaining to whether the tile is solid. The latter is not working; in my code below I have put a print statement exactly where it is not reaching, and i'm not quite sure what the problem is.
Also, the data for the level is simply a text file with a grid of 25x25 characters representing blocks.
def loadLevel(self, level):
fyle = open("levels/" + level,'r')
count = 0
for lyne in fyle:
if lyne.startswith("|"):
dirs = lyne.split('|')
self.north = dirs[1]
self.south = dirs[2]
self.east = dirs[3]
self.west = dirs[4]
continue
for t in range(25):
tempTile = Tiles.Tile()
tempTile.value = lyne[t]
tempTile.x = t
tempTile.y = count
self.levelData.append(tempTile)
count += 1
rowcount = 0
colcount = 0
for rows in fyle:
print('Doesnt get here!')
for col in rows:
if col == 2:
self.collisionLayer[rowcount][colcount] = False
else:
self.collisionLayer[rowcount][colcount] = True
colcount += 1
print(self.collisionLayer[rowcount[colcount]])
if rows == 2:
self.collisionLayer[rowcount][colcount] = False
else:
self.collisionLayer[rowcount][colcount] = True
rowcount += 1
print(self.collisionLayer)
Where exactly is the problem? I feel as though it is a quick fix but I'm simply not seeing it. Thanks!

You read through the file once with your first for loop, so there isn't anything left to read for the second loop. Seek back to the beginning of the file before starting the second loop:
fyle.seek(0)
Although I'd just cache the lines as a list, if possible:
with open('filename.txt', 'r') as handle:
lines = list(handle)
Also, you can replace this:
if rows == 2:
self.collisionLayer[rowcount][colcount] = False
else:
self.collisionLayer[rowcount][colcount] = True
With:
self.collisionLayer[rowcount][colcount] = rows != 2

The loop:
for lyne in fyle:
... reads all of fyle and leaves nothing to be read by the loop:
for rows in fyle:

I think you just need to reopen the file. If I recall, python will just keep going from where you left off. If there is nothing left, it can't read anything.
You can either re-open it, or use the fyle.seek(0) to go to the first character in the first line.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting a binary search to work in Python - python

There are two problems . Your binary search algorithm is wrong . The repeat condition should be while (low <= high) or your can't find the first and the last element . readlines() will read \n but user_input() does not . Which causes `Password` == `Password\n' be false forever.

Related

python infinite loop and numpy delete do not work properly

How can I assign positives and negatives to a string?

Recursive Decompression of Strings

Longest chain of last word of line/first word of next

Loop not iterating fully

Categories

Resources