A list is used as a queue such that the first element is the head of the queue. I was discussing the complexity of dequeue operation in such data structure with my teacher, who said it is O(1). My thinking is, if you're removing the first element of the list, wouldn't that be O(n), since then all elements after that first element need to be shifted over a spot? Am I looking at this the wrong way?
You're correct. List in Python is inefficient at dequeuing since all items following the item being dequeued need to be copied to their preceding positions like you say. You can instead use collections.deque, which is implemented with a doubly-linked list, so that you can use the popLeft method to dequeue efficiently in O(1) time complexity.
Your teacher probably thought python list is a linked list. It is not. From the doc:
It is also possible to use a list as a queue, where the first element added is the first element retrieved (“first-in, first-out”); however, lists are not efficient for this purpose. While appends and pops from the end of list are fast, doing inserts or pops from the beginning of a list is slow (because all of the other elements have to be shifted by one).
Indeed, the doc suggested you to use collections.deque for this purpose.
You are correct. Python's list is an array-list, so, not appropriate for popping from head. Instead, use collections.deque for O(1) add/remove at first element.
Related
I am learning about linked lists on Codecademy and there is an instruction saying that
Before moving on, take a moment to think about doubly-linked lists.
What do you think are some possible real-life uses?
with some uses
A music player with “next” and “previous” buttons
An app that shows you where your subway is on the train line
The “undo” and “redo” functionality in a web browser
Would it be simpler to use a list?
What are the benefits of using a linked list to perform these tasks?
For instance, using a list for a music player next and previous buttons
counter = 0
playlist = ['song', 'song2', 'song3', 'song4']
current_song = playlist[counter]
next_song = playlist[min(counter+1, len(playlist)-1]
On an algorithmic point of view, there are different sequence container types:
static arrays
It is the most simple possible container with a static (maximum) size and direct access (through a numeric index)
dynamic arrays
You still have direct access with a numeric index, but the size can arbitrary grow (limited by available memory). Python lists actually fall here. The downside is that they can require a full reallocation and copy when they reach the allocated size. Removing elements is also a costly operation
singly linked lists
adding and removing elements at the head side is easy, as is inserting a new element (or removing one) after an already found other one. You can only scan them in one direction, and finding an element knowing its position is rather lengthy (no direct access). It has an overhead of one index per node
doubly linked lists
when compared to singly linked lists, you can scan them in both direction and inserting (or removing) before an element is easy. No direct access either and the overhead is two index per node
Dynamic arrays are the multi-purpose work horse in most languages, and are the standard Python lists. With few additions and removal they offer both ease of use and correct performances. But other containers do have use cases. For example a fifo queue could be easily implemented as a singly linked list.
I agree on one point: music player with prev and next button on a know list of elements could be implemented as an array (what a Python list is). But an undo/redo functionality with a limited depth is an excellent use case for a doubly linked list:
you want to be able to remove on both side (once depth has been reached, every addition has to drop the oldest element)
you only need to go one step (on any direction) from one element
Linked list consumes limited memory, where doubly linked list possible to allocate more memory if you have more data. Linked list will keep refernence of next element, where doubly linked list will keep reference of previous and next element.
Since doubly linked list have reference to both side, memory consumption is more but more efficient to access elements (reverse iteration as well (BACK/NEXT))
Reference means - Address of next / previous element
Undo and Redo - I believe, it simply need a linked list because it requires only less memory and insert and delete operation from one side for simple task
Please refer Dictionary best data structure for train routes? to know about "An app that shows you where your subway is on the train line"
The time complexity of removal and insertion is O(1) but searching is O(n)
Please refer to know about time complexity https://www.oreilly.com/library/view/php-7-data/9781786463890/c5319c42-c462-43a1-b33d-d683f3ef7e35.xhtml
Please refer to know more about linked list in python Does python have built-in linkedList data structure?
From the Official Documentation
Python’s lists are really variable-length arrays, not Lisp-style linked lists. The implementation uses a contiguous array of references to other objects, and keeps a pointer to this array and the array’s length in a list head structure.
This makes indexing a list a[i] an operation whose cost is independent of the size of the list or the value of the index.
When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don’t require an actual resize.
So, python lists are nothing but variable length arrays. I dug into source code of cpython, and on expanding the macro, the basic structure is defined as:
typedef struct {
PyObject_VAR_HEAD
PyObject **ob_item;
Py_ssize_t allocated;
} PyListObject;
Coming to the reason why they chose a Doubly Linked List:
• Skip Back/Forward- Because each node in a double linked list has a pointer the previous and next node, it is easy to implement skip forward/backward functionality.
• Play Next Track- The pointer to the next node also makes it quite easy to start the next track when a track is over.
• Append When you add a new track to a playlist, you tack it on to the end. In a linked list, adding a new element is constant time — O(1) operation. Note that as the songs are read in from a data source and added to the play list, this will be done as a sequence of calls to append.
• Beginning/End- Finally, because a linked list has head and tail properties, this provides for an easy way to delineate the beginning and end of a playlist
In my experience programming in many different problem domains, there is almost never a reason to use a linked list, whether singly or doubly linked.
The theoretical advantage is that a linked list supports O(1) insertion and removal at arbitrary positions in the list. But on today's hardware, you need a pretty large list (thousands to tens of thousands of items) and you need to be doing very frequent insertion and removal operations before this advantage really starts to matter in practice.
I'm trying to implement an algorithm to solve the skyline problem that involves removing specific elements from the middle of a max heap. The way I currently do it is maxheap.remove(index) but I have to follow up with a heapify(maxheap) otherwise the order is thrown off. I know in java you can use something like a treemap to do that. Is there anyway to do that in python more efficiently than calling two separate methods each of which takes O(n) time?
Removing an arbitrary item from a heap is an O(log n) operation, provided you know where the item is in the heap. The algorithm is:
Move the last item in the heap to the position that contains the item to remove.
Decrement heap count.
If the item is smaller than its parent
bubble it up the heap
else
sift it down the heap
The primary problem is finding the item's position in the heap. As you've noted, doing so is an O(n) operation unless you maintain more information.
An efficient solution to this is to create a dictionary that contains the item key, and the value is the index of that item in the heap. You have to maintain the dictionary, however:
When you insert an item into the heap, add a dictionary entry
When you remove an item from the heap, remove the dictionary entry
Whenever you change the position of an item in the heap, update that item's value in the dictionary.
With that dictionary in place, you have O(1) access to an item's position in the heap, and you can remove it in O(log n).
I would have have each element be a data structure with a flag for whether to ignore it. When you heappop, you'll just pop again if it is an element that got flagged. This is very easy, obvious, and involves knowing nothing about how the heap works internally. For example you don't need to know where the element actually is in the heap to flag it.
The downside of this approach is that the flagged elements will tend to accumulate over time. Occasionally you can just filter them out then heapify.
If this solution is not sufficient for your needs, you should look for a btree implementation in Python of some sort. That will behave like the treemap that you are used to in Java.
Yes, there is a more efficient way - if you have its index or pointer (depending on implementation method).
Replace the number on the index/pointer you need to remove with its largest
child and repeat the process recursively (replace the child with its largest child,etc...) until you get to a node that has no children, which you remove easily.
The complexity of this algorithm is O(log n).
http://algorithms.tutorialhorizon.com/binary-min-max-heap/
I am required to access the leftmost element of a Python list, while popping elements from the left. One example of this usage could be the merge operation of a merge sort.
There are two good ways of doing it.
collections.deque
Using indexes, keep an increasing integer as I pop.
Both these methods seems efficient in complexity terms (Should mention at the beginning of the program I need to convert the list to deque, meaning some extra O(n)).
So in the terms of style and speed, which is the best for Python usage ? I did not see an official recommendation or did not encounter a similar question which is why I am asking this.
Thanks in advance.
Related: Time complexity
Definitely collections.deque. First, it's already written so you don't have to. Second, it's written in C so probably it is going to be much faster than another Python re-implementation. Third, using index leaves the head of the list unused, whereas queue is more clever. Appending to a standard list only to increase the starting index on each left pop is very inefficient because you would need to realloc the list more often.
What is the efficient way to read elements into a list and keep the list sorted apart from searching the place for a new element in the existing sorted list and inserting in there?
Use a specialised data structure, in Python you have the bisect module at your disposal:
This module provides support for maintaining a list in sorted order without having to sort the list after each insertion. For long lists of items with expensive comparison operations, this can be an improvement over the more common approach. The module is called bisect because it uses a basic bisection algorithm to do its work.
You're looking for the functions in heapq.
This module provides an implementation of the heap queue algorithm, also known as the priority queue algorithm.
#Aswin's comment is interesting. If you are sorting each time you insert an item, the call to sort() is O(n) rather than the usual O(n*log(n)). This is due to the way the sort(timsort) is implemented.
However on top of this, you'd need to shift a bunch of elements along the list to make space. This is also O(n), so overall - calling .sort() each time is O(n)
There isn't a way to keep a sorted list in better than O(n), because this shifting is always needed.
If you don't need an actual list, the heapq (as mentioned in #Ignacio's answer) often covers the properties you do need in an efficient manner.
Otherwise you can probably find one of the many tree data structures will suit your cause better than a list.
example with SortedList:
from sortedcontainers import SortedList
sl = SortedList()
sl.add(2)
sl.add(1)
# result sl = [1,2]
So I have a list of 85 items. I would like to continually reduce this list in half (essentially a binary search on the items) -- my question is then, what is the most efficient way to reduce the list? A list comprehension would continually create copies of the list which is not ideal. I would like in-place removal of ranges of my list until I am left with one element.
I'm not sure if this is relevant but I'm using collections.deque instead of a standard list. They probably work the same way more or less though so I doubt this matters.
For a mere 85 items, truthfully, almost any method you want to use would be more than fast enough. Don't optimize prematurely.
That said, depending on what you're actually doing, a list may be faster than a deque. A deque is faster for adding and removing items at either end, but it doesn't support slicing.
With a list, if you want to copy or delete a contiguous range of items (say, the first 42) you can do this with a slice. Assuming half the list is eliminated at each pass, copying items to a new list would be slower on average than deleting items from the existing list (deleting requires moving the half of the list that's not being deleted "leftward" in memory, which would be about the same time cost as copying the other half, but you won't always need to do this; deleting the latter half of a list won't need to move anything).
To do this with a deque efficiently, you would want to pop() or popleft() the items rather than slicing them (lots of attribute access and method calls, which are relatively expensive in Python), and you'd have to write the loop that controls the operation in Python, which will be slower than the native slice operation.
Since you said it's basically a binary search, it is probably actually fastest to simply find the item you want to keep without modifying the original container at all, and then return a new container holding that single item. A list is going to be faster for this than a deque since you will be doing a lot of accessing items by index. To do this in a deque will require Python to follow the linked list from the beginning each time you access an item, while accessing an item by index is a simple, fast calculation for a list.
collections.deque is implemented via a linked list, hence binary search would be much slower than a linear search. Rethink your approach.
Not sure that this is what you really need but:
x = range(100)
while len(x) > 1:
if condition:
x = x[:int(len(x)/2)]
else:
x = x[int(len(x)/2):]
85 items are not even worth thinking about. Computers are fast, really.
Why would you delete ranges from the list, instead of simply picking the one result?
If there is a good reason why you cant do (2): Keep the original list and change two indices only: The start and end index of the sublist you're looking at.
On a previous question I compared a number of techniques for removing a list of items given a predicate. (That is, I have a function which returns True or False for whether to keep a particular item.) As I recall using a list comprehension was the fastest. The fact is, copying is really really cheap.
The only thing that you can do to improve the speed depends on which items you are removing. But you haven't indicated anything about that so I can't suggest anything.