"in list" vs. "manual searching" in list

"in list" vs. "manual searching" in list - python

My code in Python reads a next element in list and checks whether it already appeared in the list before. If yes, then it moves a left bound of the list behind the previous appearance (the rest of the code does not matter):
while k<len(lst):
if lst[k] in lst[a:k]: #a is a left bound
i = lst.index(lst[k],a) #could be done more effeciently with exception handling
a = i+1
k += 1
I tried to rewrite this without using high-level tricks (in/index):
while k<len(lst):
for i in range(a,k+1):
if lst[i] == lst[k]:
break
if i != k: #different indices, same values
a = i+1
k += 1
This appears to be cca 3.5 times slower than the code #1. But I do not think the code #2 does something highly inefficient, if I understand the "in" command correctly.
go through all elements in list
compare to the searched element
if they are equal, stop and return True
if at end of the list, return False
(and the function index probably works in the same way, you only have to remember also the index).
My guess is that the Python interpreter interprets the "in" as a low-level version of the for-cycle in code #2. But in the code #2, it has to interpret my comparison every time I increase the value of i, which makes the code run slowly overall. Am I right about this?
By the way, the list is an ordered list of non-repeating numbers (does not have to be, so no suggestions of binary search), which results in a worst-case complexity for this algorithm, n^2/2.

Related

Merge sorting algorithm in Python for two sorted lists - trouble constructing for-loop

I'm trying to create an algorithm to merge two ordered lists into a larger ordered list in Python. Essentially I began by trying to isolate the minimum elements in each list and then I compared them to see which was smallest, because that number would be smallest in the larger list as well. I then appended that element to the empty larger list, and then deleted it from the original list it came from. I then tried to loop through the original two lists doing the same thing. Inside the "if" statements, I've essentially tried to program the function to append the remainder of one list to the larger function if the other is/becomes empty, because there would be no point in asking which elements between the two lists are comparatively smaller then.
def merge_cabs(cab1, cab2):
for (i <= all(j) for j in cab1):
for (k <= all(l) for l in cab2):
if cab1 == []:
newcab.append(cab2)
if cab2 == []:
newcab.append(cab1)
else:
k = min(min(cab1), min(cab2))
newcab.append(k)
if min(cab1) < min(cab2):
cab1.remove(min(cab1))
if min(cab2) < min(cab1):
cab2.remove(min(cab2))
print(newcab)
cab1 = [1,2,5,6,8,9]
cab2 = [3,4,7,10,11]
newcab = []
merge_cabs(cab1, cab2)
I've had a bit of trouble constructing the for-loop unfortunately. One way I've tried to isolate the minimum values was as I wrote in the two "for" lines. Right now, Python is returning "SyntaxError: invalid syntax," pointing to the colon in the first "for" line. Another way I've tried to construct the for-loop was like this:
def merge_cabs(cabs1, cabs2):
for min(i) in cab1:
for min(j) in cab2:
I've also tried to write the expression all in one line like this:
def merge_cabs(cab1, cab2):
for min(i) in cabs1 and min(j) in cabs2:
and to loop through a copy of the original lists rather than looping through the lists themselves, because searching through the site, I've found that it can sometimes be difficult to remove elements from a list you're looping through. I've also tried to protect the expressions after the "for" statements inside various configurations of parentheses. If someone sees where the problem(s) lies, it would really be great if you could point it out, or if you have any other observations that could help me better construct this function, I would really appreciate those too.

Here's a very simple-minded solution to this that uses only very basic Python operations:
def merge_cabs(cab1, cab2):
len1 = len(cab1)
len2 = len(cab2)
i = 0
j = 0
newcab = []
while i < len1 and j < len2:
v1 = cab1[i]
v2 = cab2[j]
if v1 <= v2:
newcab.append(v1)
i += 1
else:
newcab.append(v2)
j += 1
while i < len1:
newcab.append(cab1[i])
i += 1
while j < len2:
newcab.append(cab2[j])
j += 1
return newcab
Things to keep in mind:
You should not have any nested loops. Merging two sorted lists is typically used to implement a merge sort, and the merge step should be linear. I.e., the algorithm should be O(n).
You need to walk both lists together, choosing the smallest value at east step, and advancing only the list that contains the smallest value. When one of the lists is consumed, the remaining elements from the unconsumed list are simply appended in order.
You should not be calling min or max etc. in your loop, since that will effectively introduce a nested loop, turning the merge into an O(n**2) algorithm, which ignores the fact that the lists are known to be sorted.
Similarly, you should not be calling any external sort function to do the merge, since that will result in an O(n*log(n)) merge (or worse, depending on the sort algorithm), and again ignores the fact that the lists are known to be sorted.

Firstly, there's a function in the (standard library) heapq module for doing exactly this, heapq.merge; if this is a real problem (rather than an exercise), you want to use that one instead.
If this is an exercise, there are a couple of points:
You'll need to use a while loop rather than a for loop:
while cab1 or cab2:
This will keep repeating the body while there are any items in either of your source lists.
You probably shouldn't delete items from the source lists; that's a relatively expensive operation. In addition, on the balance having a merge_lists function destroy its arguments would be unexpected.
Within the loop you'll refer to cab1[i1] and cab2[i2] (and, in the condition, to i1 < len(cab1)).
(By the time I typed out the explanation, Tom Karzes typed out the corresponding code in another answer...)

Trouble with top down recursive algorithm

I am trying to make word chains, but cant get around recursive searching.
I want to return a list of the words reuired to get to the target word
get_words_quicker returns a list of words that can be made by just changing one letter.
def dig(InWord, OutWord, Depth):
if Depth == 0:
return False
else:
d = Depth - 1;
wordC = 0;
wordS = [];
for q in get_words_quicker(InWord):
wordC+=1
if(OutWord == q):
return q
wordS.append(q)
for i in range(0,wordC):
return dig(wordS[i],OutWord,d)
Any help/questions would be much appreciated.

ANALYSIS
There is nowhere in your code that you form a list to return. The one place where you make a list is appending to wordS, but you never return this list, and your recursive call passes only one element (a single word) from that list.
As jasonharper already pointed out, your final loop can iterate once and return whatever the recursion gives it, or it can fall off the end and return None (rather than "nothing").
You have two other returns in the code: one returns False, and the other will return q, but only if q has the same value as OutWord.
Since there is no code where you use the result or alter the return value in any way, the only possibilities for your code's return value are None, False, and OutWord.
REPAIR
I'm afraid that I'm not sure how to fix this routine your way, since you haven't really described how you intended this code to carry out the high-level tasks you describe. The abbreviated variable names hide their purposes. wordC is a counter whose only function is to hold the length of the list returned from get_words_quicker -- which could be done much more easily.
If you can clean up the code, improve the data flow and/or documentation to something that shows only one or two disruptions in logic, perhaps we can fix the remaining problems. As it stands, I hesitate to try -- you'd have my solution, not yours.

quicksort algorithm in python initial condition

In the following implementation of the quicksort algorithm in Python:
def quicksort(listT):
greater=[]
lower=[]
pivot=[]
if len(listT)<=1:
return listT
else:
pivot=listT[0]
for i in listT:
if i<pivot:
lower.append(i)
elif i>pivot:
greater.append(i)
else:
pivot.append(i)
lower=quicksort(lower)
greater=quicksort(greater)
return lower+pivot+greater
I was wondering what exactly the first condition does in this implementation, for what I see when it divides each part of the list into a greater and lower part, according to the pivot, there would be a moment in which the list has a length lower than 1, but this returned list is not concatenated in any way. Could this condition be changed?

The len(listT)<=1 is needed to terminate the recursion. Quicksort works by dividing the problem into more easily solved subproblems. When the subproblem is an empty list or list of length one, it is already solved (no sorting needed) so the result can be returned directly.

If the initial condition is not stated, then the sort will fail at either
pivot=listT[0] # because the list may be empty and it will reference an invalid index, or
lower=quicksort(lower) # actually it will never end because the stack keeps building on this line.

Is it safe to append to a list during iteration if I want to iterate over the added value?

I often find myself writing code like:
mylist = [247]
while mylist:
nextlist = []
for element in mylist:
print element
if element%2==0:
nextlist.append(element/2)
elif element !=1:
nextlist.append(3*element+1)
mylist = nextlist
Okay - it's generally not this simple [and usually it really is with long lists, I just chose this (see xkcd) for fun], but I create a list, iterate over it doing things with those elements. While doing this, I will discover new things that I will need to iterate over, and I put them into a new list which I then iterate over.
It appears to be possible to write:
mylist=[247]
for element in mylist:
print element
if element%2 == 0:
mylist.append(element/2)
elif element !=1:
mylist.append(element*3+1)
I know that it's considered dangerous to modify a list while iterating over it, but in this case I want to iterate over the new elements.
Are there dangers from doing this? The only one I can think of is that the list may grow and take up a lot of memory (in many of my cases I actually want to have the whole list at the end). Are there others I'm ignoring?
Please note: Python: Adding element to list while iterating is related, but explains ways to create a copy of the list so that we can avoid iterating over the original. I'm asking about whether there is anything wrong in my specific case where I actually want my iteration to be extended.
edit: here is something closer to the real problem. Say we want to generate the "k-core" of a network. That is, delete all nodes with degree less than k. From remaining network delete all nodes with degree less than k. Repeat until none left to delete. The algorithm would find all less than k nodes to begin with, put them in a to_delete list. Then as nodes are deleted, if a neighbor's degree becomes k-1, add it to the list. This could be done by:
delete_list = [node for node in G.nodes() if G.degree(node)<k]
for node in delete_list:
nbrs = G.neighbors(node)
for nbr in nbrs:
if G.degree(nbr)==k:
delete_list.append(nbr)
G.remove_node(node)

Yes, it's fairly safe to append to a list you're iterating over, at least in the way that you're doing it. The only issue would be if the list grew so large that it caused memory issues, though that's only going to be an issue for you with very large numbers.
That said, I would probably use a while loop in this case, whether or not you want to have the entire list at the end.
current = 247
result_list = [current]
while current != 1:
if current % 2 == 0:
current /= 2
else:
current = current * 3 + 1
result_list.append(current)
Though really I would probably use a generator.
def collatz(start):
current = start
yield current
while current != 1:
if current % 2 == 0:
current /= 2
else:
current = current * 3 + 1
yield current
Shout-out to the Collatz conjecture! :D

As it's (currently) implemented yes, as it's specified no.
That means that it's a risky idea to modify the list while iterating through it and relying on that the behaviour will remain. One could of course argue that there is no reason why the behaviour would change in this case, but that is relying on an assumption that changes need a reason to happen.

Which is the faster method of searching?

I'm trying to demonstrate different ways of searching, so I've attempted a brute force iterative way, and a second one where I split the list into 2 halves and check from the front and the back.
Which is quicker? Or is my code just terrible?
I'm very new to Python so just getting to grips.
import itertools
import math
a = ["Rhys", "Jayne", "Brett", "Tool","Dave", "Paul"]
#Counts the length of the list
Length = 0
for i in a:
Length = Length + 1
print(Length)
#Brute force, iterative
counter = 0
print("Brute Force Search")
for i in a:
if i != "Paul" :
counter = counter +1
print(counter)
print("No")
else:
print("Yes")
print (counter)
counter = 0 ## reset counter
#Binary Chop Attempt
print(" Binary Search")
i = 0
j = Length-1
while i <= math.ceil(Length/2):
i = i+1
while j > math.ceil(Length/2):
if a[i] != "Paul" or a[j]!= "Paul":
print(j)
print("No")
else:
print("Yes")
break
j = j-1
#Binary Chop Attempt2
print(" Binary Search 2")
i = 0
j = Length-1
found = False
while i <= math.ceil(Length/2) or j > math.ceil(Length/2):
if found == True:
break
if a[i] != "Paul" or a[j]!= "Paul":
print("Not in position " + str(i))
else:
print("Found in position" + str(i))
found = True
if a[j]!= "Paul":
print("Not in position " + str(j))
else:
print("Found In position " + str(j))
found = True
j = j-1
i = i+1
Thanks

a = ["Rhys", "Jayne", "Brett", "Tool","Dave", "Paul"]
print a.index('Paul')
This is going to be a boatload faster than any C-algorithm-transcribed-to-python you can come up with, up to considerable list sizes.
So the first question would be; isn't that good enough?
If it isn't, the next pythonic place to go looking would be the standard library (note that a binary search requires sorted input!):
a = sorted( ["Rhys", "Jayne", "Brett", "Tool","Dave", "Paul"])
from bisect import bisect_left as bisect
print bisect(a, 'Paul')
Or perhaps a set() or dict() might be more called for; but it all depends on what exactly you are trying to achieve.

Well, your code is not that bad. The general concept is OK. The thing you call "brute force" is actually called a "table scan", at least in the context of databases. Sometimes it is the only way you are left with.
Your second code is not that different from the first one. Since in Python "get" on lists is O(1) then no matter how you "jump" you will end up with pretty much the same result (assuming that you know nothing about the list, in particular its order). You could do tests and measure it though (I'm too lazy to do that).
There are however several improvements that can be done:
1) Keep the list sorted. That way you can apply the "division" algorithm, i.e. you start in the middle and if value is smaller then the given one you go into the middle of the first half. Otherwise you go into the middle of the second half. And so on... this will allow you to search in O(log(n))
2) Use some other structure then lists. Some kind of B-Tree. This will allow you to search in O(log(n)).
3) Finally use a dictionary. It's a really good structure which allows you to search for a key in O(1) (impossible to be faster, baby). If you really need to maintain the order of the array you can use dictionary like that: keys are elements and values are positions in order.
4) Use an index. That's pretty much the same as one of the points above except that you use different structure not instead of but in addition to. A bit more difficult to maintain but good when you have a list of complex objects and you want to be able to search efficiently based on more then one attribute.

Binary searching only makes sense if the list is ordered. If its unordered, checking the 1st and last and then 2nd and second to last is no different than checking the first, second, third and fourth. Ultimately, you have to check them all. Order doesn't matter.
You have to sort the list if you want binary search to be effective, and then your binary search has to search based on the fact that things are sorted. That's how binary works; it removes sections as it goes. Its the old "high or low" game. You guess 50, they say high. Now you know it can't be 50+. So now you only need to search 1-50. Now you guess 25. They say Low. So now you know it can't be 1-25. So now you pick the middle of 25 and 50.

Your "brute force" search is usually called a "linear" search. In Python, that would just be
# Linear search
"Paul" in a
Your "binary chop" is usually required a "binary" search, and it depends on the input list to be sorted. You can use the sorted function to sort the list or just use a set:
# Binary search
"Paul" in set(a)
Whether or not a binary search is faster than a linear search depends on a few things (e.g. how expensive is it to sort the list?), it's certainly not always faster. If in doubt, use the timeit module to benchmark your code on some representative data.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.