I need some help with python, a new program language to me.
So, lets say that I have this list:
list= [3, 1, 4, 9, 8, 2]
And I would like to sort it, but without using the built-in function "sort", otherwise where's all the fun and the studying in here? I want to code as simple and as basic as I can, even if it means to work a bit harder. Therefore, if you want to help me and to offer me some of ideas and code, please, try to keep them very "basic".
Anyway, back to my problem: In order to sort this list, I've decided to compare every time a number from the list to the last number. First, I'll check 3 and 2. If 3 is smaller than 2 (and it's false, wrong), then do nothing.
Next - check if 1 is smaller than 2 (and it's true) - then change the index place of this number with the first element.
On the next run, it will check again if the number is smaller or not from the last number in the list. But this time, if the number is smaller, it will change the place with the second number (and on the third run with the third number, if it's smaller, of course).
and so on and so on.
In the end, the ()function will return the sorted list.
Hop you've understand it.
So I want to use a ()recursive function to make the task bit interesting, but still basic.
Therefore, I thought about this code:
def func(list):
if not list:
for i in range(len(list)):
if list[-1] > lst[i]:
#have no idea what to write here in order to change the locations
i = i + 1
#return func(lst[i+1:])?
return list
2 questions:
1. How can I change the locations? Using pop/remove and then insert?
2. I don't know where to put the recursive part and if I've wrote it good (I think I didn't). the recursive part is the second "#", the first "return".
What do you think? How can I improve this code? What's wrong?
Thanks a lot!
Oh man, sorting. That's one of the most popular problems in programming with many, many solutions that differ a little in every language. Anyway, the most straight-forward algorithm is I guess the bubble sort. However, it's not very effective, so it's mostly used for educational purposes. If you want to try something more efficient and common go for the quick sort. I believe it's the most popular sorting algorithm. In python however, the default algorithm is a bit different - read here. And like I've said, there are many, many more sorting algorithms around the web.
Now, to answer your specific questions: in python replacing an item in a list is as simple as
list[-1]=list[i]
or
tmp=list[-1]
list[-1]=list[i]
list[i]=tmp
As to recursion - I don't think it's a good idea to use it, a simple while/for loop is better here.
maybe you can try a quicksort this way :
def quicksort(array, up, down):
# start sorting in your array from down to up :
# is array[up] < array[down] ? if yes switch
# do it until up <= down
# call recursively quicksort
# with the array, middle, up
# with the array, down, middle
# where middle is the value found when the first sort ended
you can check this link : Quicksort on Wikipedia
It is nearly the same logic.
Hope it will help !
The easiest way to swap the two list elements is by using “parallel assignment”:
list[-1], list[i] = list[i], list[-1]
It doesn't really make sense to use recursion for this algorithm. If you call func(lst[i+1:]), that makes a copy of those elements of the list, and the recursive call operates on the copy, and then the copy is discarded. You could make func take two arguments: the list and i+1.
But your code is still broken. The not list test is incorrect, and the i = i + 1 is incorrect. What you are describing sounds a variation of selection sort where you're doing a bunch of extra swapping.
Here's how a selection sort normally works.
Find the smallest of all elements and swap it into index 0.
Find the smallest of all remaining elements (all indexes greater than 0) and swap it into index 1.
Find the smallest of all remaining elements (all indexes greater than 1) and swap it into index 2.
And so on.
To simplify, the algorithm is this: find the smallest of all remaining (unsorted) elements, and append it to the list of sorted elements. Repeat until there are no remaining unsorted elements.
We can write it in Python like this:
def func(elements):
for firstUnsortedIndex in range(len(elements)):
# elements[0:firstUnsortedIndex] are sorted
# elements[firstUnsortedIndex:] are not sorted
bestIndex = firstUnsortedIndex
for candidateIndex in range(bestIndex + 1, len(elements)):
if elements[candidateIndex] < elements[bestIndex]:
bestIndex = candidateIndex
# Now bestIndex is the index of the smallest unsorted element
elements[firstUnsortedIndex], elements[bestIndex] = elements[bestIndex], elements[firstUnsortedIndex]
# Now elements[0:firstUnsortedIndex+1] are sorted, so it's safe to increment firstUnsortedIndex
# Now all elements are sorted.
Test:
>>> testList = [3, 1, 4, 9, 8, 2]
>>> func(testList)
>>> testList
[1, 2, 3, 4, 8, 9]
If you really want to structure this so that recursion makes sense, here's how. Find the smallest element of the list. Then call func recursively, passing all the remaining elements. (Thus each recursive call passes one less element, eventually passing zero elements.) Then prepend that smallest element onto the list returned by the recursive call. Here's the code:
def func(elements):
if len(elements) == 0:
return elements
bestIndex = 0
for candidateIndex in range(1, len(elements)):
if elements[candidateIndex] < elements[bestIndex]:
bestIndex = candidateIndex
return [elements[bestIndex]] + func(elements[0:bestIndex] + elements[bestIndex + 1:])
Related
Given the following input
[1,2,3,4,5,6,7]
We are suppose to sort the numbers so that they are sorted as following
[7,1,6,2,5,3,4]
So i figured i can basically create a new list and use two pointers one starting at left most index and the other starting at right most index. Since the list is already sorted the right pointer gives me the max value and left gives me the min value. So i basically keep on adding values of max and min and update the left pointer to move right and the right pointer to move left. Following is my solution.
def maxMin(lst):
result = []
left = 0
right = len(lst) - 1
while left < right:
result.append(lst[right])
result.append(lst[left])
left += 1
right -= 1
if len(lst) % 2 != 0:
result.append(lst[left])
return result
While this solution works apparently this uses extra space and there is a possible way to come up with a solution that uses only constant space. I came across this video on youtube but i had no idea how the author came up with the solution. To me understanding the arrival at solution is more important than the actual solution. I would love if someone can shed some light.
The extra space complained of is because you create a second list to hold the result. With a small amount of data that is an entirely reasonable way to go about it. But if your list had thousands of entries instead of a handful, then making a copy might not be feasible. In that case you might need to move the elements of the list in place.
We start by observing that the list is sorted and the elements in the last half of the list get moved to earlier in the list, at even-numbered positions (0, 2, 4). So, first calculate how many elements have to be shifted to earlier in the list. Then, for each of the elements to be shifted, calculate its new position (0, 2, 4) and move it there.
The first element to be shifted is at the end of the list. After it has been moved to an earlier point, the next element to be shifted is now at the end of the list. So the item to be moved is always the last in the list, and that can be extracted using pop().
Like this:
a = [1,2,3,4,5,6,7]
shifts = len(a)//2
for i in range(shifts):
print(f"shifting {a[-1]} to position {i*2}")
a.insert(i*2,a.pop())
print (a)
Output from the last print() call is [7, 1, 6, 2, 5, 3, 4]. The print() call inside the loop is simply there to show you what is happening.
This alternative way of doing it trades execution time for space. Every time you insert an element in a list, other than at the end, all of the subsequent elements have to be moved to the right to make room. That does a lot of processing behind your back, even though it is a single line of code.
There is a sort in place strategy that minimizes memory movements (I believe inserting in a list reallocates it internally). It is based on a naive sort algorithm that swaps the value at each position with the subsequent position having the lowest value. With a small adjustment to the logic, we can make the sort decide that each position will alternatively swap the lowest and highest value depending on whether the position is odd or even.
This sort-in place will perform N swaps and N*(N-1)/2 comparisons so the memory overhead will be minimal and there will be no reallocation.
def updownSort(a):
for i in range(len(a)-1): # for each position except the last one
p=i # identify the swapping position (p)
for j in range(i+1,len(a)): # only look in remaining positions
if bool(i&1) == (a[p]>a[j]): # alternate seeking the largest and smallest value
p = j
a[p],a[i] = a[i],a[p] # make the swap with the index that was found
a = [1,2,3,4,5,6,7]
updownSort(a)
print(a)
# [7, 1, 6, 2, 5, 3, 4]
Note that any other "sort in place" algorithm (e.g. bubble sort) could be adapted in the same fashion as long as the comparisons can take into account the positions being compared in addition to the values.
I am trying to write my own quick sort algorithm in Python without looking up how it's done professionally (I will learn more this way). If my idea of how I intend to implement this quick sort seems silly to you, (I am aware that it probably will) please don't give me a completely different way of doing it, unless my method will never succeed or at least not without ridiculous measures, please help me reach a solution with my desired method :)
Currently I have a defined a function "pivot" which will take the input list and output three lists, a list of numbers smaller than the pivot (chosen in this case to be the first number in the list every time), a list of numbers equal to the pivot and a list of numbers greater than the pivot.
My next step was to define a function "q_sort". First this function creates a list called "finalList" and fills it with 0s such that it is the same length as the list being sorted. Next it pivots the list and adds the the numbers equal to the pivot to finalList in what is already their correct position (as there are 0s in place to represent the number of items smaller than it and 0s as place-holders again in place of the items bigger than pivot)
This all works fine.
What doesn't work fine is the next step. I have written what I want to happen next in some poorly thought out psuedo-code below:
numList = [3, 5, 3, 1, 12, 65, 2, 11, 32]
def pivot(aList):
biggerNum =[]
smallerNum = []
equalNum = [aList[0]]
for x in range(1, len(aList)):
if aList[0]<aList[x]:
biggerNum.append(aList[x])
elif aList[0]>aList[x]:
smallerNum.append(aList[x])
elif aList[0] == aList[x]:
equalNum.append(aList[x])
pivoted = [smallerNum, equalNum, biggerNum]
return pivoted
def q_sort(aList):
finalList = []
for x in range(len(aList)):
finalList.append(0)
pivot(aList)
for i in range(len(pivot(aList)[1])):
finalList[len(pivot(aList)[0])+i] = pivot(aList)[1][i]
Pseudo Code:
#if len(smallerNum) != 0:
#q_sort(smallerNum) <--- I want this to add it's pivot to finalList
#if len(biggerNum) != 0:
#q_sort(biggerNum) <--- Again I want this to add it's pivot to finalList
#return finalList <--- Now after all the recursion every number has been pivoted and added
What I intend to happen is that if the list of numbers smaller than the pivot actually has any items in it, it will then q_sort this list. This means it will find a new pivot and add it's value to the right position in finalList. The way I imagine it working is that the function only reaches "return finalList" once every number from "numList" has been put in it's correct position. As the recursive nature of including q_sort within q_sort means after pivoting "smallerNum" (and adding the pivot to finalList) it will have another list to pivot.
The problem is that you're starting over on each call: using the entire list, working from both ends. You need to recur on each partition of the list: the part below the pivot, then the part above the pivot. This is generally done by passing the endpoints of the sub-list, such as ...
def q_sort(aList, low, high):
if low >= high:
return
# find pivot position, "pivot"
...
# arrange list on either side of pivot
...
# recur on each part of list.
q_sort(alist, low_index, pivot-1)
q_sort(alist, pivot+1, high_index)
Is that enough of an outline to get you moving?
If not, try a browser search on "Python quicksort", and you'll find a lot of help, more thorough than we can cover here.
I was doing one of the course exercises on codeacademy for python and I had a few questions I couldn't seem to find an answer to:
For this block of code, how exactly does python check whether something is "in" or "not in" a list? Does it run through each item in the list to check or does it use a quicker process?
Also, how would this code be affected if it were running with a massive list of numbers (thousands or millions)? Would it slow down as the list size increases, and are there better alternatives?
numbers = [1, 1, 2, 3, 5, 8, 13]
def remove_duplicates(list):
new_list = []
for i in list:
if i not in new_list:
new_list.append(i)
return new_list
remove_duplicates(numbers)
Thanks!
P.S. Why does this code not function the same?
numbers = [1, 1, 2, 3, 5, 8, 13]
def remove_duplicates(list):
new_list = []
new_list.append(i for i in list if i not in new_list)
return new_list
In order to execute i not in new_list Python has to do a linear scan of the list. The scanning loop breaks as soon as the result of the test is known, but if i is actually not in the list the whole list must be scanned to determine that. It does that at C speed, so it's faster than doing a Python loop to explicitly check each item. Doing the occasional in some_list test is ok, but if you need to do a lot of such membership tests it's much better to use a set.
On average, with random data, testing membership has to scan through half the list items, and in general the time taken to perform the scan is proportional to the length of the list. In the usual notation the size of the list is denoted by n, and the time complexity of this task is written as O(n).
In contrast, determining membership of a set (or a dict) can be done (on average) in constant time, so its time complexity is O(1). Please see TimeComplexity in the Python Wiki for further details on this topic. Thanks, Serge, for that link.
Of course, if your using a set then you get de-duplication for free, since it's impossible to add duplicate items to a set.
One problem with sets is that they generally don't preserve order. But you can use a set as an auxilliary collection to speed up de-duping. Here is an illustration of one common technique to de-dupe a list, or other ordered collection, which does preserve order. I'll use a string as the data source because I'm too lazy to type out a list. ;)
new_list = []
seen = set()
for c in "this is a test":
if c not in seen:
new_list.append(c)
seen.add(c)
print(new_list)
output
['t', 'h', 'i', 's', ' ', 'a', 'e']
Please see How do you remove duplicates from a list whilst preserving order? for more examples. Thanks, Jean-François Fabre, for the link.
As for your PS, that code appends a single generator object to new_list, it doesn't append what the generate would produce.
I assume you alreay tried to do it with a list comprehension:
new_list = [i for i in list if i not in new_list]
That doesn't work, because the new_list doesn't exist until the list comp finishes running, so doing in new_list would raise a NameError. And even if you did new_list = [] before the list comp, it won't be modified by the list comp, and the result of the list comp would simply replace that empty list object with a new one.
BTW, please don't use list as a variable name (even in example code) since that shadows the built-in list type, which can lead to mysterious error messages.
You are asking multiple questions and one of them asking if you can do this more efficiently. I'll answer that.
Ok let's say you'd have thousands or millions of numbers. From where exactly? Let's say they were stored in some kind of txtfile, then you would probably want to use numpy (if you are sticking with Python that is). Example:
import numpy as np
numbers = np.array([1, 1, 2, 3, 5, 8, 13], dtype=np.int32)
numbers = np.unique(numbers).tolist()
This will be more effective (above all memory-efficient compared) than reading it with python and performing a list(set..)
numbers = [1, 1, 2, 3, 5, 8, 13]
numbers = list(set(numbers))
You are asking for the algorithmic complexity of this function. To find that you need to see what is happening at each step.
You are scanning the list one at a time, which takes 1 unit of work. This is because retrieving something from a list is O(1). If you know the index, it can be retrieved in 1 operation.
The list to which you are going to add it increases at worst case 1 at a time. So at any point in time, the unique items list is going to be of size n.
Now, to add the item you picked to the unique items list is going to take n work in the worst case. Because we have to scan each item to decide that.
So if you sum up the total work in each step, it would be 1 + 2 + 3 + 4 + 5 + ... n which is n (n + 1) / 2. So if you have a million items, you can just find that by applying n = million in the formula.
This is not entirely true because of how list works. But theoretically, it would help to visualize this way.
to answer the question in the title: python has more efficient data types but the list() object is just a plain array, if you want a more efficient way to search values you can use dict() which uses a hash of the object stored to insert it into a tree which i assume is what you were thinking of when you mentioned "a quicker process".
as to the second code snippet:
list().append() inserts whatever value you give it to the end of the list, i for i in list if i not in new_list is a generator object and it inserts that generator as an object into the array, list().extend() does what you want: it takes in an iterable and appends all of its elements to the list
here is a merge sort logic in python : (this is the first part, ignore the function merge()) The point in question is converting the recursive logic to a while loop.
Code courtesy: Rosettacode Merge Sort
def merge_sort(m):
if len(m) <= 1:
return m
middle = len(m) / 2
left = m[:middle]
right = m[middle:]
left = merge_sort(left)
right = merge_sort(right)
return list(merge(left, right))
Is it possible to make it a sort of dynamically in the while loop while each left and right array breaks into two, a sort of pointer keeps increasing based on the number of left and right arrays and breaking them until only single length sized list remains?
because every time the next split comes while going on both left- and right- side the array keeps breaking down till only single length list remains, so the number of left sided (left-left,left-right) and right sided (right-left,right-right) breaks will increase till it reaches a list of size 1 for all.
One possible implementation might be this:
def merge_sort(m):
l = [[x] for x in m] # split each element to its own list
while len(l) > 1: # while there's merging to be done
for x in range(len(l) >> 1): # take the first len/2 lists
l[x] = merge(l[x], l.pop()) # and merge with the last len/2 lists
return l[0] if len(l) else []
Stack frames in the recursive version are used to store progressively smaller lists that need to be merged. You correctly identified that at the bottom of the stack, there's a one-element list for each element in whatever you're sorting. So, by starting from a series of one-element lists, we can iteratively build up larger, merged lists until we have a single, sorted list.
Reposted from alternative to recursion based merge sort logic at the request of a reader:
One way to eliminate recursion is to use a queue to manage the outstanding work. For example, using the built-in collections.deque:
from collections import deque
from heapq import merge
def merge_sorted(iterable):
"""Return a list consisting of the sorted elements of 'iterable'."""
queue = deque([i] for i in iterable)
if not queue:
return []
while len(queue) > 1:
queue.append(list(merge(queue.popleft(), queue.popleft())))
return queue[0]
It's said, that every recursive function can be written in a non-recursive manner, so the short answer is: yes, it's possible. The only solution I can think of is to use the stack-based approach. When recursive function invokes itself, it puts some context (its arguments and return address) on the inner stack, which isn't available for you. Basically, what you need to do in order to eliminate recursion is to write your own stack and every time when you would make a recursive call, put the arguments onto this stack.
For more information you can read this article, or refer to the section named 'Eliminating Recursion' in Robert Lafore's "Data Structures and Algorithms in Java" (although all the examples in this book are given in Java, it's pretty easy to grasp the main idea).
Going with Dan's solution above and taking the advice on pop, still I tried eliminating while and other not so pythonic approach. Here is a solution that I have suggested:
PS: l = len
My doubt on Dans solution is what if L.pop() and L[x] are same and a conflict is created, as in the case of an odd range after iterating over half of the length of L?
def merge_sort(m):
L = [[x] for x in m] # split each element to its own list
for x in xrange(l(L)):
if x > 0:
L[x] = merge(L[x-1], L[x])
return L[-1]
This can go on for all academic discussions but I got my answer to an alternative to recursive method.
My problem is about managing insert/append methods within loops.
I have two lists of length N: the first one (let's call it s) indicates a subset to which, while the second one represents a quantity x that I want to evaluate. For sake of simplicity, let's say that every subset presents T elements.
cont = 0;
for i in range(NSUBSETS):
for j in range(T):
subcont = 0;
if (x[(i*T)+j] < 100):
s.insert(((i+1)*T)+cont, s[(i*T)+j+cont]);
x.insert(((i+1)*T)+cont, x[(i*T)+j+cont]);
subcont += 1;
cont += subcont;
While cycling over all the elements of the two lists, I'd like that, when a certain condition is fulfilled (e.g. x[i] < 100), a copy of that element is put at the end of the subset, and then going on with the loop till completing the analysis of all the original members of the subset. It would be important to maintain the "order", i.e. inserting the elements next to the last element of the subset it comes from.
I thought a way could have been to store within 2 counter variables the number of copies made within the subset and globally, respectively (see code): this way, I could shift the index of the element I was looking at according to that. I wonder whether there exists some simpler way to do that, maybe using some Python magic.
If the idea is to interpolate your extra copies into the lists without making a complete copy of the whole list, you can try this with a generator expression. As you loop through your lists, collect the matches you want to append. Yield each item as you process it, then yield each collected item too.
This is a simplified example with only one list, but hopefully it illustrates the idea. You would only get a copy if you do like i've done and expand the generator with a comprehension. If you just wanted to store or further analyze the processed list (eg, to write it to disk) you could never have it in memory at all.
def append_matches(input_list, start, end, predicate):
# where predicate is a filter function or lambda
for item in input_list[start:end]:
yield item
for item in filter(predicate, input_list[start:end]):
yield item
example = lambda p: p < 100
data = [1,2,3,101,102,103,4,5,6,104,105,106]
print [k for k in append_matches (data, 0, 6, example)]
print [k for k in append_matches (data, 5, 11, example)]
[1, 2, 3, 101, 102, 103, 1, 2, 3]
[103, 4, 5, 6, 104, 105, 4, 5, 6]
I'm guessing that your desire not to copy the lists is based on your C background - an assumption that it would be more expensive that way. In Python lists are not actually lists, inserts have O(n) time as they are more like vectors and so those insert operations are each copying the list.
Building a new copy with the extra elements would be more efficient than trying to update in-place. If you really want to go that way you would need to write a LinkedList class that held prev/next references so that your Python code really was a copy of the C approach.
The most Pythonic approach would not try to do an in-place update, as it is simpler to express what you want using values rather than references:
def expand(origLs) :
subsets = [ origLs[i*T:(i+1)*T] for i in range(NSUBSETS) ]
result = []
for s in subsets :
copies = [ e for e in s if e<100 ]
result += s + copies
return result
The main thing to keep in mind is that the underlying cost model for an interpreted garbage-collected language is very different to C. Not all copy operations actually cause data movement, and there are no guarantees that trying to reuse the same memory will be successful or more efficient. The only real answer is to try both techniques on your real problem and profile the results.
I'd be inclined to make a copy of your lists and then, while looping across the originals, as you come across a criteria to insert you insert into the copy at the place you need it to be at. You can then output the copied and updated lists.
I think to have found a simple solution.
I cycle from the last subset backwards, putting the copies at the end of each subset. This way, I avoid encountering the "new" elements and get rid of counters and similia.
for i in range(NSUBSETS-1, -1, -1):
for j in range(T-1, -1, -1):
if (x[(i*T)+j] < 100):
s.insert(((i+1)*T), s[(i*T)+j])
x.insert(((i+1)*T), x[(i*T)+j])
One possibility would be using numpy's advanced indexing to provide the illusion of copying elements to the ends of the subsets by building a list of "copy" indices for the original list, and adding that to an index/slice list that represents each subset. Then you'd combine all the index/slice lists at the end, and use the final index list to access all your items (I believe there's support for doing so generator-style, too, which you may find useful as advanced indexing/slicing returns a copy rather than a view). Depending on how many elements meet the criteria to be copied, this should be decently efficient as each subset will have its indices as a slice object, reducing the number of indices needed to keep track of.