Comparison, Python deque's or indexes? - python

I am required to access the leftmost element of a Python list, while popping elements from the left. One example of this usage could be the merge operation of a merge sort.
There are two good ways of doing it.
collections.deque
Using indexes, keep an increasing integer as I pop.
Both these methods seems efficient in complexity terms (Should mention at the beginning of the program I need to convert the list to deque, meaning some extra O(n)).
So in the terms of style and speed, which is the best for Python usage ? I did not see an official recommendation or did not encounter a similar question which is why I am asking this.
Thanks in advance.
Related: Time complexity

Definitely collections.deque. First, it's already written so you don't have to. Second, it's written in C so probably it is going to be much faster than another Python re-implementation. Third, using index leaves the head of the list unused, whereas queue is more clever. Appending to a standard list only to increase the starting index on each left pop is very inefficient because you would need to realloc the list more often.

Related

Does python use the best possible algorithms in order to save the most time it can?

I have a question that might be very simple to answer.
I couldn't find the answer anywhere.
Does python use the best possible algorithms in order to save the most time it can?
I just saw on some website that for example, the max method time order -in lists- is O(n) in python which there are better time orders as you know.
Is it true?
should I use the algorithms that I know they can perform better in order to save more time or does python did this for me in its methods?
max method time order -in lists- is O(n) in python which there are better time orders as you know. Is it true?
No this is not true. Finding the maximum value in a list will require that all values in the list are inspected, hence O(n).
You may be confused with lists that have been prepared in some way. For instance:
You have a list that is already sorted (which is a O(nlogn) process). In that case you can of course get the maximum in constant time, since you know its index. If the list is sorted in ascending order, it would be unwise to call max on it, as that would indeed be a waste of time. You may know the list is sorted, but python will not assume this, and still scan the whole list.
You have a list that has been heapified to a max-heap (which is a O(n) process). Again, in that case you can get the maximum in constant time, since it is stored at index 0. Lists can be heapified with heapq -- the default being a min-heap.
So, if you know nothing about your list, then you will have to inspect all values to be sure to identify the maximum. That is what max() does. In case you do know something more that could help to identify the maximum without having to look at all values, then use another, more appropriate method.
should I use the algorithms that I know they can perform better in order to save more time or does python did this for me in its methods?
You should use the algorithms that you know can perform better (based on what you know about a data structure). In many cases there is such better algorithm implementation available via a python library. For example, to find a particular value in a sorted list, use bisect.bisect_left and not index.
Look at a more complex example. Say you have written code that can generate chess moves and simulate a game of chess. You have good ideas about evaluation functions, alphabeta pruning, killer moves, lookup tables, ...and a bunch of other optimisation techniques. You cannot expect python to get smart when you issue a naive max on "all" evaluated chess states. You need to implement the complex algorithm to efficiently search and filter the right states to get the "best" chess move out of that forest of information without wasting time on less promising moves.
A Python list is a sequential and contiguus container. That means that finding the ith element is in constant time, and adding to the end is easy is no reallocation is required.
Finding a value is O(n/2), and finding min or max is O(n).
If you want a list and being able to find its minimum value in O(1), the heapq module that maintains a binary tree is available.
But Python offers few specialized containers in its standard library.
In terms of complexity, you'll find that python almost always uses solutions based on algorithms with best complexity. Performance may vary depending on constants, and python is just not the fastest language compared to C or C++.
In this case, if you're looking for max value from a list, there is no better solution - to find maximum value, you have to check every value, meaning solution is O(n). That's just how lists work - it's just list with values. If you were to use some other structure, e.g. sorted list - accessing max value would take O(1) - but you would pay for this low complexity with higher complexity of adding/deleting values.
It differs from library to library
The defult python librarys like the sort function (if a algorithem not selected) will use the most efficient algorithem by deffult.
Sadly is Python quite slow in genera compared to languages like C, C++ or java.
This is becouse that python is one script that reads your script and executes it live.
C, C++ and Java all compiles to binary (exe) before executing.
//SW

why is heap created using heapq.heapify different from heap created by iterative heapq.heappush

I noticed that given a list, if i create a heap using heapq.heapify(), the elements are in a different order than what I obtain if i iterate over the list and do heap.heappush().
Could someone help me understand why?
Also, given the iterable, is one way better than the other for creating a heap and why?
heapify uses an O(n) algorithm, which differs from naively inserting one by one, which is only O(n log n). Check out Wikipedia's description of it.
I noticed that given a list, if i create a heap using heapq.heapify(), the elements are in a different order than what I obtain if i iterate over the list and do heap.heappush().
Could someone help me understand why?
There's no reason they should be the same. There is more than one way to store the data.
Also, given the iterable, is one way better than the other for creating a heap and why?
Make a list and hand it to heapify (this is what heapify is for, which makes it simpler, and it also has better asymptotic performance, should that ever matter.)

how to keep a list sorted as you read elements

What is the efficient way to read elements into a list and keep the list sorted apart from searching the place for a new element in the existing sorted list and inserting in there?
Use a specialised data structure, in Python you have the bisect module at your disposal:
This module provides support for maintaining a list in sorted order without having to sort the list after each insertion. For long lists of items with expensive comparison operations, this can be an improvement over the more common approach. The module is called bisect because it uses a basic bisection algorithm to do its work.
You're looking for the functions in heapq.
This module provides an implementation of the heap queue algorithm, also known as the priority queue algorithm.
#Aswin's comment is interesting. If you are sorting each time you insert an item, the call to sort() is O(n) rather than the usual O(n*log(n)). This is due to the way the sort(timsort) is implemented.
However on top of this, you'd need to shift a bunch of elements along the list to make space. This is also O(n), so overall - calling .sort() each time is O(n)
There isn't a way to keep a sorted list in better than O(n), because this shifting is always needed.
If you don't need an actual list, the heapq (as mentioned in #Ignacio's answer) often covers the properties you do need in an efficient manner.
Otherwise you can probably find one of the many tree data structures will suit your cause better than a list.
example with SortedList:
from sortedcontainers import SortedList
sl = SortedList()
sl.add(2)
sl.add(1)
# result sl = [1,2]

Does Python use linked lists for lists? Why is inserting slow?

Just learning Python. Reading through the official tutorials. I ran across this:
While appends and pops from the end of list are fast, doing inserts or pops from the beginning of a list is slow (because all of the other elements have to be shifted by one).
I would have guessed that a mature language like Python would have all sorts of optimizations, so why doesn't Python [seem to] use linked lists so that inserts can be fast?
Python uses a linear list layout in memory so that indexing is fast (O(1)).
As Greg Hewgill has already pointed out, python lists use contiguous blocks of memory to make indexing fast. You can use a deque if you want the performance characteristics of a linked list. But your initial premise seems flawed to me. Indexed insertion into the middle of a (standard) linked list is also slow.
What Python calls "lists" aren't actually linked lists; they're more like arrays. See the list entry from the Python glossary and also How do you make an array in Python? from the Python FAQ.
list is implemented as an arraylist. If you want to insert frequently, you can use a deque (but note that traversal to the middle is expensive).
Alternatively, you can use a heap. It's all there if you take the time to look at the docs.
Python lists are implemented using a resizeable array of references to other objects. This provides O(1) lookup compared to O(n) lookup for a linked list implementation.
See How are lists implemented?
As you mentioned, this implementation makes insertions into the beginning or middle of a Python list slow because every element in the array to the right of the insertion point has to be shifted over one element. Also, sometimes the array will have to be resized to accommodate more elements. For inserting into a linked list, you'll still need O(n) time to find the location where you will insert, but the actual insertion itself will be O(1), since you only need to change the references in the nodes immediately before and after your insertion point (assuming a doubly-linked list).
So the decision to make Python lists use dynamic arrays rather than linked lists has nothing to do with the "maturity" of the language implementation. There are simply trade-offs between different data structures and the designers of Python decided that dynamic arrays were the best option overall. They may have assumed indexing a list is more common than inserting data into it, thus making dynamic arrays a better choice in this case.
See the following table in the Dynamic Array wikipedia article for a comparison of various data structure performance characteristics:
https://en.wikipedia.org/wiki/Dynamic_array#Performance

Efficient reduction of a list in python

So I have a list of 85 items. I would like to continually reduce this list in half (essentially a binary search on the items) -- my question is then, what is the most efficient way to reduce the list? A list comprehension would continually create copies of the list which is not ideal. I would like in-place removal of ranges of my list until I am left with one element.
I'm not sure if this is relevant but I'm using collections.deque instead of a standard list. They probably work the same way more or less though so I doubt this matters.
For a mere 85 items, truthfully, almost any method you want to use would be more than fast enough. Don't optimize prematurely.
That said, depending on what you're actually doing, a list may be faster than a deque. A deque is faster for adding and removing items at either end, but it doesn't support slicing.
With a list, if you want to copy or delete a contiguous range of items (say, the first 42) you can do this with a slice. Assuming half the list is eliminated at each pass, copying items to a new list would be slower on average than deleting items from the existing list (deleting requires moving the half of the list that's not being deleted "leftward" in memory, which would be about the same time cost as copying the other half, but you won't always need to do this; deleting the latter half of a list won't need to move anything).
To do this with a deque efficiently, you would want to pop() or popleft() the items rather than slicing them (lots of attribute access and method calls, which are relatively expensive in Python), and you'd have to write the loop that controls the operation in Python, which will be slower than the native slice operation.
Since you said it's basically a binary search, it is probably actually fastest to simply find the item you want to keep without modifying the original container at all, and then return a new container holding that single item. A list is going to be faster for this than a deque since you will be doing a lot of accessing items by index. To do this in a deque will require Python to follow the linked list from the beginning each time you access an item, while accessing an item by index is a simple, fast calculation for a list.
collections.deque is implemented via a linked list, hence binary search would be much slower than a linear search. Rethink your approach.
Not sure that this is what you really need but:
x = range(100)
while len(x) > 1:
if condition:
x = x[:int(len(x)/2)]
else:
x = x[int(len(x)/2):]
85 items are not even worth thinking about. Computers are fast, really.
Why would you delete ranges from the list, instead of simply picking the one result?
If there is a good reason why you cant do (2): Keep the original list and change two indices only: The start and end index of the sublist you're looking at.
On a previous question I compared a number of techniques for removing a list of items given a predicate. (That is, I have a function which returns True or False for whether to keep a particular item.) As I recall using a list comprehension was the fastest. The fact is, copying is really really cheap.
The only thing that you can do to improve the speed depends on which items you are removing. But you haven't indicated anything about that so I can't suggest anything.

Categories