Which is the faster method of searching? - python

I'm trying to demonstrate different ways of searching, so I've attempted a brute force iterative way, and a second one where I split the list into 2 halves and check from the front and the back.
Which is quicker? Or is my code just terrible?
I'm very new to Python so just getting to grips.
import itertools
import math
a = ["Rhys", "Jayne", "Brett", "Tool","Dave", "Paul"]
#Counts the length of the list
Length = 0
for i in a:
Length = Length + 1
print(Length)
#Brute force, iterative
counter = 0
print("Brute Force Search")
for i in a:
if i != "Paul" :
counter = counter +1
print(counter)
print("No")
else:
print("Yes")
print (counter)
counter = 0 ## reset counter
#Binary Chop Attempt
print(" Binary Search")
i = 0
j = Length-1
while i <= math.ceil(Length/2):
i = i+1
while j > math.ceil(Length/2):
if a[i] != "Paul" or a[j]!= "Paul":
print(j)
print("No")
else:
print("Yes")
break
j = j-1
#Binary Chop Attempt2
print(" Binary Search 2")
i = 0
j = Length-1
found = False
while i <= math.ceil(Length/2) or j > math.ceil(Length/2):
if found == True:
break
if a[i] != "Paul" or a[j]!= "Paul":
print("Not in position " + str(i))
else:
print("Found in position" + str(i))
found = True
if a[j]!= "Paul":
print("Not in position " + str(j))
else:
print("Found In position " + str(j))
found = True
j = j-1
i = i+1
Thanks

a = ["Rhys", "Jayne", "Brett", "Tool","Dave", "Paul"]
print a.index('Paul')
This is going to be a boatload faster than any C-algorithm-transcribed-to-python you can come up with, up to considerable list sizes.
So the first question would be; isn't that good enough?
If it isn't, the next pythonic place to go looking would be the standard library (note that a binary search requires sorted input!):
a = sorted( ["Rhys", "Jayne", "Brett", "Tool","Dave", "Paul"])
from bisect import bisect_left as bisect
print bisect(a, 'Paul')
Or perhaps a set() or dict() might be more called for; but it all depends on what exactly you are trying to achieve.

Well, your code is not that bad. The general concept is OK. The thing you call "brute force" is actually called a "table scan", at least in the context of databases. Sometimes it is the only way you are left with.
Your second code is not that different from the first one. Since in Python "get" on lists is O(1) then no matter how you "jump" you will end up with pretty much the same result (assuming that you know nothing about the list, in particular its order). You could do tests and measure it though (I'm too lazy to do that).
There are however several improvements that can be done:
1) Keep the list sorted. That way you can apply the "division" algorithm, i.e. you start in the middle and if value is smaller then the given one you go into the middle of the first half. Otherwise you go into the middle of the second half. And so on... this will allow you to search in O(log(n))
2) Use some other structure then lists. Some kind of B-Tree. This will allow you to search in O(log(n)).
3) Finally use a dictionary. It's a really good structure which allows you to search for a key in O(1) (impossible to be faster, baby). If you really need to maintain the order of the array you can use dictionary like that: keys are elements and values are positions in order.
4) Use an index. That's pretty much the same as one of the points above except that you use different structure not instead of but in addition to. A bit more difficult to maintain but good when you have a list of complex objects and you want to be able to search efficiently based on more then one attribute.

Binary searching only makes sense if the list is ordered. If its unordered, checking the 1st and last and then 2nd and second to last is no different than checking the first, second, third and fourth. Ultimately, you have to check them all. Order doesn't matter.
You have to sort the list if you want binary search to be effective, and then your binary search has to search based on the fact that things are sorted. That's how binary works; it removes sections as it goes. Its the old "high or low" game. You guess 50, they say high. Now you know it can't be 50+. So now you only need to search 1-50. Now you guess 25. They say Low. So now you know it can't be 1-25. So now you pick the middle of 25 and 50.

Your "brute force" search is usually called a "linear" search. In Python, that would just be
# Linear search
"Paul" in a
Your "binary chop" is usually required a "binary" search, and it depends on the input list to be sorted. You can use the sorted function to sort the list or just use a set:
# Binary search
"Paul" in set(a)
Whether or not a binary search is faster than a linear search depends on a few things (e.g. how expensive is it to sort the list?), it's certainly not always faster. If in doubt, use the timeit module to benchmark your code on some representative data.

Related

Time complexity for two different solutions

I want to understand the difference in time complexity between these two solutions.
The task is not relevant but if you're curious here's the link with the explanation.
This is my first solution. Scores a 100% in correctness but 0% in performance:
def solution(s, p ,q):
dna = dict({'A': 1, 'C': 2, 'G': 3, 'T': 4})
result = []
for i in range(len(q)):
least = 4
for c in set(s[p[i] : q[i] + 1]):
if least > dna[c]: least = dna[c]
result.append(least)
return result
This is the second solution. Scores 100% in both correctness and performance:
def solution(s, p ,q):
result = []
for i in range(len(q)):
if 'A' in s[p[i]:q[i] + 1]: result.append(1)
elif 'C' in s[p[i]:q[i] + 1]: result.append(2)
elif 'G' in s[p[i]:q[i] + 1]: result.append(3)
else: result.append(4)
return list(result)
Now this is how I see it. In both solutions I'm iterating through a range of Q length and on each iteration I'm slicing different portions of a string, with a length between 1 and 100,000.
Here's where I get confused, in my first solution on each iteration, I'm slicing once a portion of the string and create a set to remove all the duplicates. The set can have a length between 1 and 4, so iterating through it must be very quick. What I notice is that I iterate through it only once, on each iteration.
In the second solution on each iteration, I'm slicing three times a portion of the string and iterate through it, in the worst case three times with a length of 100,000.
Then why is the second solution faster? How can the first have a time complexity of O(n*m) and the second O(n+m)?
I thought it's because of the in and the for operators, but I tried the same second solution in JavaScript with the indexOf method and it still gets a 100% in performance. But why? I can understand that if in Python the in and the for operators have different implementations and work differently behind the scene, but in JS the indexOf method is just going to apply a for loop. Then isn't it the same as just doing the for loop directly inside my function? Shouldn't that be a O(n*m) time complexity?
You haven't specified how the performance rating is obtained, but anyway, the second algorithm is clearly better, mainly because it uses the in operator, that under the hood calls a function implemented in C, which is far more efficient than python. More on this topic here.
Also, I'm not sure, but I don't think that the python interpreter isn't smart enough to slice the string only once and then reuse the same portion the other times in the second algorithm.
Creating the set in the first algorithm also seems like a very costly operation.
Lastly, maybe the performance ratings aren't based on the algorithm complexity, but rather on the execution time over different test strings?
I think the difference in complexity can easily be showcased on an example.
Consider the following input:
s = 'ACGT' * 1000000
# = 'ACGTACGTACGTACGTACGTACGTACGTACGTACGT...ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGT'
p = [0]
q = [3999999]
Algorithm 2 very quickly checks that 'A' is in s[0:4000000] (it's the first character - no need to iterate through the whole string to find it!).
Algorithm 1, on the other hand, must iterate through the whole string s[0:4000000] to build the set {'A','C','G','T'}, because iterating through the whole string is the only way to check that there isn't a fifth distinct character hidden somewhere in the string.
Important note
I said algorithm 2 should be fast on this example, because the test 'A' in ... doesn't need to iterate through the whole string to find 'A' if 'A' is at the beginning of the string. However, note a possible important difference in complexity between 'A' in s and 'A' in s[0:4000000]. The problem is that creating a slice of the string might cost time (and memory) if it's copying the string. Instead of slicing, you should use s.find('A', 0, 4000000), which is guaranteed not to build a copy. For more information on this:
Documentation on string.find
Stackoverflow: Time complexity of string slice

"in list" vs. "manual searching" in list

My code in Python reads a next element in list and checks whether it already appeared in the list before. If yes, then it moves a left bound of the list behind the previous appearance (the rest of the code does not matter):
while k<len(lst):
if lst[k] in lst[a:k]: #a is a left bound
i = lst.index(lst[k],a) #could be done more effeciently with exception handling
a = i+1
k += 1
I tried to rewrite this without using high-level tricks (in/index):
while k<len(lst):
for i in range(a,k+1):
if lst[i] == lst[k]:
break
if i != k: #different indices, same values
a = i+1
k += 1
This appears to be cca 3.5 times slower than the code #1. But I do not think the code #2 does something highly inefficient, if I understand the "in" command correctly.
go through all elements in list
compare to the searched element
if they are equal, stop and return True
if at end of the list, return False
(and the function index probably works in the same way, you only have to remember also the index).
My guess is that the Python interpreter interprets the "in" as a low-level version of the for-cycle in code #2. But in the code #2, it has to interpret my comparison every time I increase the value of i, which makes the code run slowly overall. Am I right about this?
By the way, the list is an ordered list of non-repeating numbers (does not have to be, so no suggestions of binary search), which results in a worst-case complexity for this algorithm, n^2/2.

Is it safe to append to a list during iteration if I want to iterate over the added value?

I often find myself writing code like:
mylist = [247]
while mylist:
nextlist = []
for element in mylist:
print element
if element%2==0:
nextlist.append(element/2)
elif element !=1:
nextlist.append(3*element+1)
mylist = nextlist
Okay - it's generally not this simple [and usually it really is with long lists, I just chose this (see xkcd) for fun], but I create a list, iterate over it doing things with those elements. While doing this, I will discover new things that I will need to iterate over, and I put them into a new list which I then iterate over.
It appears to be possible to write:
mylist=[247]
for element in mylist:
print element
if element%2 == 0:
mylist.append(element/2)
elif element !=1:
mylist.append(element*3+1)
I know that it's considered dangerous to modify a list while iterating over it, but in this case I want to iterate over the new elements.
Are there dangers from doing this? The only one I can think of is that the list may grow and take up a lot of memory (in many of my cases I actually want to have the whole list at the end). Are there others I'm ignoring?
Please note: Python: Adding element to list while iterating is related, but explains ways to create a copy of the list so that we can avoid iterating over the original. I'm asking about whether there is anything wrong in my specific case where I actually want my iteration to be extended.
edit: here is something closer to the real problem. Say we want to generate the "k-core" of a network. That is, delete all nodes with degree less than k. From remaining network delete all nodes with degree less than k. Repeat until none left to delete. The algorithm would find all less than k nodes to begin with, put them in a to_delete list. Then as nodes are deleted, if a neighbor's degree becomes k-1, add it to the list. This could be done by:
delete_list = [node for node in G.nodes() if G.degree(node)<k]
for node in delete_list:
nbrs = G.neighbors(node)
for nbr in nbrs:
if G.degree(nbr)==k:
delete_list.append(nbr)
G.remove_node(node)
Yes, it's fairly safe to append to a list you're iterating over, at least in the way that you're doing it. The only issue would be if the list grew so large that it caused memory issues, though that's only going to be an issue for you with very large numbers.
That said, I would probably use a while loop in this case, whether or not you want to have the entire list at the end.
current = 247
result_list = [current]
while current != 1:
if current % 2 == 0:
current /= 2
else:
current = current * 3 + 1
result_list.append(current)
Though really I would probably use a generator.
def collatz(start):
current = start
yield current
while current != 1:
if current % 2 == 0:
current /= 2
else:
current = current * 3 + 1
yield current
Shout-out to the Collatz conjecture! :D
As it's (currently) implemented yes, as it's specified no.
That means that it's a risky idea to modify the list while iterating through it and relying on that the behaviour will remain. One could of course argue that there is no reason why the behaviour would change in this case, but that is relying on an assumption that changes need a reason to happen.

Finding the most similar numbers across multiple lists in Python

In Python, I have 3 lists of floating-point numbers (angles), in the range 0-360, and the lists are not the same length. I need to find the triplet (with 1 number from each list) in which the numbers are the closest. (It's highly unlikely that any of the numbers will be identical, since this is real-world data.) I was thinking of using a simple lowest-standard-deviation method to measure agreement, but I'm not sure of a good way to implement this. I could loop through each list, comparing the standard deviation of every possible combination using nested for loops, and have a temporary variable save the indices of the triplet that agrees the best, but I was wondering if anyone had a better or more elegant way to do something like this. Thanks!
I wouldn't be surprised if there is an established algorithm for doing this, and if so, you should use it. But I don't know of one, so I'm going to speculate a little.
If I had to do it, the first thing I would try would be just to loop through all possible combinations of all the numbers and see how long it takes. If your data set is small enough, it's not worth the time to invent a clever algorithm. To demonstrate the setup, I'll include the sample code:
# setup
def distance(nplet):
'''Takes a pair or triplet (an "n-plet") as a list, and returns its distance.
A smaller return value means better agreement.'''
# your choice of implementation here. Example:
return variance(nplet)
# algorithm
def brute_force(*lists):
return min(itertools.product(*lists), key = distance)
For a large data set, I would try something like this: first create one triplet for each number in the first list, with its first entry set to that number. Then go through this list of partially-filled triplets and for each one, pick the number from the second list that is closest to the number from the first list and set that as the second member of the triplet. Then go through the list of triplets and for each one, pick the number from the third list that is closest to the first two numbers (as measured by your agreement metric). Finally, take the best of the bunch. This sample code demonstrates how you could try to keep the runtime linear in the length of the lists.
def item_selection(listA, listB, listC):
# make the list of partially-filled triplets
triplets = [[a] for a in listA]
iT = 0
iB = 0
while iT < len(triplets):
# make iB the index of a value in listB closes to triplets[iT][0]
while iB < len(listB) and listB[iB] < triplets[iT][0]:
iB += 1
if iB == 0:
triplets[iT].append(listB[0])
elif iB == len(listB)
triplets[iT].append(listB[-1])
else:
# look at the values in listB just below and just above triplets[iT][0]
# and add the closer one as the second member of the triplet
dist_lower = distance([triplets[iT][0], listB[iB]])
dist_upper = distance([triplets[iT][0], listB[iB + 1]])
if dist_lower < dist_upper:
triplets[iT].append(listB[iB])
elif dist_lower > dist_upper:
triplets[iT].append(listB[iB + 1])
else:
# if they are equidistant, add both
triplets[iT].append(listB[iB])
iT += 1
triplets[iT:iT] = [triplets[iT-1][0], listB[iB + 1]]
iT += 1
# then another loop while iT < len(triplets) to add in the numbers from listC
return min(triplets, key = distance)
The thing is, I can imagine situations where this wouldn't actually find the best triplet, for instance if a number from the first list is close to one from the second list but not at all close to anything in the third list. So something you could try is to run this algorithm for all 6 possible orderings of the lists. I can't think of a specific situation where that would fail to find the best triplet, but one might still exist. In any case the algorithm will still be O(N) if you use a clever implementation, assuming the lists are sorted.
def symmetrized_item_selection(listA, listB, listC):
best_results = []
for ordering in itertools.permutations([listA, listB, listC]):
best_results.extend(item_selection(*ordering))
return min(best_results, key = distance)
Another option might be to compute all possible pairs of numbers between list 1 and list 2, between list 1 and list 3, and between list 2 and list 3. Then sort all three lists of pairs together, from best to worst agreement between the two numbers. Starting with the closest pair, go through the list pair by pair and any time you encounter a pair which shares a number with one you've already seen, merge them into a triplet. For a suitable measure of agreement, once you find your first triplet, that will give you a maximum pair distance that you need to iterate up to, and once you get up to it, you just choose the closest triplet of the ones you've found. I think that should consistently find the best possible triplet, but it will be O(N^2 log N) because of the requirement for sorting the lists of pairs.
def pair_sorting(listA, listB, listC):
# make all possible pairs of values from two lists
# each pair has the structure ((number, origin_list),(number, origin_list))
# so we know which lists the numbers came from
all_pairs = []
all_pairs += [((nA,0), (nB,1)) for (nA,nB) in itertools.product(listA,listB)]
all_pairs += [((nA,0), (nC,2)) for (nA,nC) in itertools.product(listA,listC)]
all_pairs += [((nB,1), (nC,2)) for (nB,nC) in itertools.product(listB,listC)]
all_pairs.sort(key = lambda p: distance(p[0][0], p[1][0]))
# make a dict to track which (number, origin_list)s we've already seen
pairs_by_number_and_list = collections.defaultdict(list)
min_distance = INFINITY
min_triplet = None
# start with the closest pair
for pair in all_pairs:
# for the first value of the current pair, see if we've seen that particular
# (number, origin_list) combination before
for pair2 in pairs_by_number_and_list[pair[0]]:
# if so, that means the current pair shares its first value with
# another pair, so put the 3 unique values together to make a triplet
this_triplet = (pair[1][0], pair2[0][0], pair2[1][0])
# check if the triplet agrees more than the previous best triplet
this_distance = distance(this_triplet)
if this_distance < min_distance:
min_triplet = this_triplet
min_distance = this_distance
# do the same thing but checking the second element of the current pair
for pair2 in pairs_by_number_and_list[pair[1]]:
this_triplet = (pair[0][0], pair2[0][0], pair2[1][0])
this_distance = distance(this_triplet)
if this_distance < min_distance:
min_triplet = this_triplet
min_distance = this_distance
# finally, add the current pair to the list of pairs we've seen
pairs_by_number_and_list[pair[0]].append(pair)
pairs_by_number_and_list[pair[1]].append(pair)
return min_triplet
N.B. I've written all the code samples in this answer out a little more explicitly than you'd do it in practice to help you to understand how they work. But when doing it for real, you'd use more list comprehensions and such things.
N.B.2. No guarantees that the code works :-P but it should get the rough idea across.

Fastest way in Python to find a 'startswith' substring in a long sorted list of strings

I've done a lot of Googling, but haven't found anything, so I'm really sorry if I'm just searching for the wrong things.
I am writing an implementation of the Ghost for MIT Introduction to Programming, assignment 5.
As part of this, I need to determine whether a string of characters is the start of any valid word. I have a list of valid words ("wordlist").
Update: I could use something that iterated through the list each time, such as Peter's simple suggestion:
def word_exists(wordlist, word_fragment):
return any(w.startswith(word_fragment) for w in wordlist)
I previously had:
wordlist = [w for w in wordlist if w.startswith(word_fragment)]
(from here) to narrow the list down to the list of valid words that start with that fragment and consider it a loss if wordlist is empty. The reason that I took this approach was that I (incorrectly, see below) thought that this would save time, as subsequent lookups would only have to search a smaller list.
It occurred to me that this is going through each item in the original wordlist (38,000-odd words) checking the start of each. This seems silly when wordlist is ordered, and the comprehension could stop once it hits something that is after the word fragment. I tried this:
newlist = []
for w in wordlist:
if w[:len(word_fragment)] > word_fragment:
# Take advantage of the fact that the list is sorted
break
if w.startswith(word_fragment):
newlist.append(w)
return newlist
but that is about the same speed, which I thought may be because list comprehensions run as compiled code?
I then thought that more efficient again would be some form of binary search in the list to find the block of matching words. Is this the way to go, or am I missing something really obvious?
Clearly it isn't really a big deal in this case, but I'm just starting out with programming and want to do things properly.
UPDATE:
I have since tested the below suggestions with a simple test script. While Peter's binary search/bisect would clearly be better for a single run, I was interested in whether the narrowing list would win over a series of fragments. In fact, it did not:
The totals for all strings "p", "py", "pyt", "pyth", "pytho" are as follows:
In total, Peter's simple test took 0.175472736359
In total, Peter's bisect left test took 9.36985015869e-05
In total, the list comprehension took 0.0499348640442
In total, Neil G's bisect took 0.000373601913452
The overhead of creating a second list etc clearly took more time than searching the longer list. In hindsight, this was likely the best approach regardless, as the "reducing list" approach increased the time for the first run, which was the worst case scenario.
Thanks all for some excellent suggestions, and well done Peter for the best answer!!!
Generator expressions are evaluated lazily, so if you only need to determine whether or not your word is valid, I would expect the following to be more efficient since it doesn't necessarily force it to build the full list once it finds a match:
def word_exists(wordlist, word_fragment):
return any(w.startswith(word_fragment) for w in wordlist)
Note that the lack of square brackets is important for this to work.
However this is obviously still linear in the worst case. You're correct that binary search would be more efficient; you can use the built-in bisect module for that. It might look something like this:
from bisect import bisect_left
def word_exists(wordlist, word_fragment):
try:
return wordlist[bisect_left(wordlist, word_fragment)].startswith(word_fragment)
except IndexError:
return False # word_fragment is greater than all entries in wordlist
bisect_left runs in O(log(n)) so is going to be considerably faster for a large wordlist.
Edit: I would guess that the example you gave loses out if your word_fragment is something really common (like 't'), in which case it probably spends most of its time assembling a large list of valid words, and the gain from only having to do a partial scan of the list is negligible. Hard to say for sure, but it's a little academic since binary search is better anyway.
You're right that you can do this more efficiently given that the list is sorted.
I'm building off of #Peter's answer, which returns a single element. I see that you want all the words that start with a given prefix. Here's how you do that:
from bisect import bisect_left
wordlist[bisect_left(wordlist, word_fragment):
bisect_left(wordlist, word_fragment[:-1] + chr(ord(word_fragment[-1])+1))]
This returns the slice from your original sorted list.
As Peter suggested I would use the Bisect module. Especially if you're reading from a large file of words.
If you really need speed you could make a daemon ( How do you create a daemon in Python? ) that has a pre-processed data structure suited for the task
I suggest you could use "tries"
http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=usingTries
There are many algorithms and data structures to index and search
strings inside a text, some of them are included in the standard
libraries, but not all of them; the trie data structure is a good
example of one that isn't.
Let word be a single string and let dictionary be a large set of
words. If we have a dictionary, and we need to know if a single word
is inside of the dictionary the tries are a data structure that can
help us. But you may be asking yourself, "Why use tries if set
and hash tables can do the same?" There are two main reasons:
The tries can insert and find strings in O(L) time (where L represent
the length of a single word). This is much faster than set , but is it
a bit faster than a hash table.
The set and the hash tables
can only find in a dictionary words that match exactly with the single
word that we are finding; the trie allow us to find words that have a
single character different, a prefix in common, a character missing,
etc.
The tries can be useful in TopCoder problems, but also have a
great amount of applications in software engineering. For example,
consider a web browser. Do you know how the web browser can auto
complete your text or show you many possibilities of the text that you
could be writing? Yes, with the trie you can do it very fast. Do you
know how an orthographic corrector can check that every word that you
type is in a dictionary? Again a trie. You can also use a trie for
suggested corrections of the words that are present in the text but
not in the dictionary.
an example would be:
start={'a':nodea,'b':nodeb,'c':nodec...}
nodea={'a':nodeaa,'b':nodeab,'c':nodeac...}
nodeb={'a':nodeba,'b':nodebb,'c':nodebc...}
etc..
then if you want all the words starting with ab you would just traverse
start['a']['b'] and that would be all the words you want.
to build it you could iterate through your wordlist and for each word, iterate through the characters adding a new default dict where required.
In case of binary search (assuming wordlist is sorted), I'm thinking of something like this:
wordlist = "ab", "abc", "bc", "bcf", "bct", "cft", "k", "l", "m"
fragment = "bc"
a, m, b = 0, 0, len(wordlist)-1
iterations = 0
while True:
if (a + b) / 2 == m: break # endless loop = nothing found
m = (a + b) / 2
iterations += 1
if wordlist[m].startswith(fragment): break # found word
if wordlist[m] > fragment >= wordlist[a]: a, b = a, m
elif wordlist[b] >= fragment >= wordlist[m]: a, b = m, b
if wordlist[m].startswith(fragment):
print wordlist[m], iterations
else:
print "Not found", iterations
It will find one matched word, or none. You will then have to look to the left and right of it to find other matched words. My algorithm might be incorrect, its just a rough version of my thoughts.
Here's my fastest way to narrow the list wordlist down to a list of valid words starting with a given fragment :
sect() is a generator function that uses the excellent Peter's idea to employ bisect, and the islice() function :
from bisect import bisect_left
from itertools import islice
from time import clock
A,B = [],[]
iterations = 5
repetition = 10
with open('words.txt') as f:
wordlist = f.read().split()
wordlist.sort()
print 'wordlist[0:10]==',wordlist[0:10]
def sect(wordlist,word_fragment):
lgth = len(word_fragment)
for w in islice(wordlist,bisect_left(wordlist, word_fragment),None):
if w[0:lgth]==word_fragment:
yield w
else:
break
def hooloo(wordlist,word_fragment):
usque = len(word_fragment)
for w in wordlist:
if w[:usque] > word_fragment:
break
if w.startswith(word_fragment):
yield w
for rep in xrange(repetition):
te = clock()
for i in xrange(iterations):
newlistA = list(sect(wordlist,'VEST'))
A.append(clock()-te)
te = clock()
for i in xrange(iterations):
newlistB = list(hooloo(wordlist,'VEST'))
B.append(clock() - te)
print '\niterations =',iterations,' number of tries:',repetition,'\n'
print newlistA,'\n',min(A),'\n'
print newlistB,'\n',min(B),'\n'
result
wordlist[0:10]== ['AA', 'AAH', 'AAHED', 'AAHING', 'AAHS', 'AAL', 'AALII', 'AALIIS', 'AALS', 'AARDVARK']
iterations = 5 number of tries: 30
['VEST', 'VESTA', 'VESTAL', 'VESTALLY', 'VESTALS', 'VESTAS', 'VESTED', 'VESTEE', 'VESTEES', 'VESTIARY', 'VESTIGE', 'VESTIGES', 'VESTIGIA', 'VESTING', 'VESTINGS', 'VESTLESS', 'VESTLIKE', 'VESTMENT', 'VESTRAL', 'VESTRIES', 'VESTRY', 'VESTS', 'VESTURAL', 'VESTURE', 'VESTURED', 'VESTURES']
0.0286089433154
['VEST', 'VESTA', 'VESTAL', 'VESTALLY', 'VESTALS', 'VESTAS', 'VESTED', 'VESTEE', 'VESTEES', 'VESTIARY', 'VESTIGE', 'VESTIGES', 'VESTIGIA', 'VESTING', 'VESTINGS', 'VESTLESS', 'VESTLIKE', 'VESTMENT', 'VESTRAL', 'VESTRIES', 'VESTRY', 'VESTS', 'VESTURAL', 'VESTURE', 'VESTURED', 'VESTURES']
0.415578236899
sect() is 14.5 times faster than holloo()
PS:
I know the existence of timeit, but here, for such a result, clock() is fully sufficient
Doing binary search in the list is not going to guarantee you anything. I am not sure how that would work either.
You have a list which is ordered, it is a good news. The algorithmic performance complexity of both your cases is O(n) which is not bad, that you just have to iterate through the whole wordlist once.
But in the second case, the performance (engineering performance) should be better because you are breaking as soon as you find that rest cases will not apply. Try to have a list where 1st element is match and rest 38000 - 1 elements do not match, you will the second will beat the first.

Categories