Python heappush vs simple append - what is the difference? - python

From https://www.tutorialspoint.com/heap-queue-or-heapq-in-python:
heappush – This function adds an element to the heap without altering the current heap.
If the current heap is not altered, why don't we use the append() list method? Is the list with the new element heapified only when heappop() is called?
Am I misunderstanding "without altering the current heap"? Or something else?

This is not an official reference documentation. So it contains what its author wanted to write.
If you consult the official Python Standard Library reference, you will find:
heapq.heappush(heap, item): Push the value item onto the heap, maintaining the heap invariant.
Here what happens is clear: the new item is added to the collection, and the internal structure is enventually adapted to have the binary tree respect: every parent node has a value less than or equal to any of its children.
After a second look at the tutorial, I think that what is meant is that heappush adds the new element without altering other elements on the heap, by opposition to heappop or heapreplace which will remove the current smaller item.

I believe the "without altering the current heap" means "maintaining the heap property that each node has a smaller key than its children". If you need the heap data structure, list.append() would not suffice. You may like to refer to https://www.cs.yale.edu/homes/aspnes/pinewiki/Heaps.html.

Related

What does "..." means in a python def

This might be a very stupid question but I can't understand what the three dots stands for in a python def. I was trying to comprehend the cost of the in operator in a deque object (from the collections module), so I navigated through the code and here's what I found:
I thought they mean the method will use the "upper" definition when called, but if I navigate to the overridden method I can't find nothing if not an abstract method in the Container class.. so I still don't get how the in operator works on a deque object.
You are looking at a .pyi stub file. Referring to this post, a stub file, as the name suggests, is only meant to describe the interface and not the implementation inside. Hence, ... in a Python def really means that this file is just a def and you cannot find the implementation here.
Regarding your question about the cost of in operator in deque, refer to https://wiki.python.org/moin/TimeComplexity
It mentions deque is represented internally as a doubly-linked list, and also mentions that in operator for a list has O(n) complexity. I don't think it being a doubly-linked list changes the time complexity, as you would still need to go through each element, i.e., O(n).

delete the hashed node in deque in O(1)

How do I hash a built-in node in deque (which is a double linked list) and delete the node in the middle in O(1)? Is the built-in node exposed?
For example, I want to save a deque's node in dict so I can delete the node in constant time later.
This is a use case in LRU, using deque so I don't need to write my own double linked list.
from collections import deque
class LRU:
def __init__(self):
self.nodes = deque()
self.key2node = {}
def insertThenDelete(self):
# insert
node = deque.Node('k', 'v') # imagine you can expose deque node here
self.nodes.appendleft(node)
self.key2node = {'k': node}
# delete
self.key2node['k'].deleteInDeque() # HERE shold remove the node in DLL!
del self.key2node['k']
I know you can do del mydeque[2] to delete by index.
but I want to do key2node['k'].deleteInDeque() delete by referance.
The deque API doesn't support direct reference to internal nodes or direct deletion of internal nodes, so what you're trying to do isn't possible with collections.deque().
In addition, the deque implementation is a doubly-linked list of fixed-length blocks where a block in a array of object pointers, so even if you could get a reference, there would be no easy way to delete just part of a block (it is fixed length).
Your best bet is to create your own doubly-linked list from scratch. See the source code for functools.lru_cache() which does exactly what you're describing: https://github.com/python/cpython/blob/3.7/Lib/functools.py#L405
Hope this helps :-)

Python iter() time complexity?

I was looking up an efficient way to retrieve an (any) element from a set in Python and came across this method:
anyElement = next(iter(SET))
What exactly happens when you generate an iterator out of a container such as a set? Does it simply create a pointer to the location of the object in memory and move that pointer whenever next is called? Or does it convert the set to a list then create an iterator out of that?
My main concern is if it were the latter, it seems iter() would be an O(n) operation. At that point it would be better to just pop an item from the set, store the popped item in a variable, then re-insert the popped item back into the set.
Thanks for any information in advance!
sets are iterable, but don't have a .__next__() method, so iter() is calling the .__iter__() method of the set instance, returning an iterable which does have the __next__ method.
As this is a wrapper around an O(1) call, it will operate in O(1) time once declared
https://wiki.python.org/moin/TimeComplexity
See also Retrieve an arbitrary key from python3 dict in O(1) time for an extended answer on .__next__()!

How the linked list works?

I do not have computer science background. I am trying to learn coding by myself, and I'm doing it, partly, by solving the problems on LeetCode.
Anyway, there are the problems that use Linked Lists. And I already found info that linked list have to be simulated in Phython. My problem is that I really cannot get what is behind linked list. For instance, what kind of problems those are suppose to target?
And in general how linked list function. Any link for such info would be really helpfull.
The recent problem I looked at LeetCode asks to swap every two adjacent nodes and return its head. And LeetCode offers following solution, that I cannot actually figure out how it acutaly works.
# Definition for singly-linked list.
# class ListNode(object):
# def __init__(self, x):
# self.val = x
# self.next = None
class Solution(object):
def swapPairs(self, head):
"""
:type head: ListNode
:rtype: ListNode
"""
pre = self
pre.next = head
while pre.next and pre.next.next:
a = pre.next
b = a.next
pre.next =b
b.next =a
a.next =b.next
pre = a
return self.next
As I said, I do not understand this solution. I tried to use example list 1->2->3->4 that should return list 2->1->4->3
All I managed is to make only one pass through the loop, and then computer should exit the loop, but then what happens? How are the last two numbers switched? How does this code work at all if list has only 2 elements, to me it seems impossible.
If you could just direct me to the online literature that explains something like this, I would be most grateful.
Thanks.
a linked-list acts almost the same as an array. There are a few main differences though. In a linked-list, the memory used doesn't (and almost never is) contiguous memory. So in an array, if u have 5 items and you look at the memory all 5 items will be right next to each other (for the most part). However each 'item' in a linked list has a pointer that points directly to the next item, removing the need to have contiguous memory. So an array is a 'list' of items that exist contiguously in memory and a linked-list is a 'list' of objects that each hold an item and a pointer to the next item. This is considered a single linked-list as traversal is only possible from one direction. There is also a double linked-list where each node now has a pointer to the next node and another pointer for the previous node allowing traversal from both directions.
https://www.cs.cmu.edu/~adamchik/15-121/lectures/Linked%20Lists/linked%20lists.html
the link will help you get familiar with visualizing how these linked-lists work. I would probably focus on inserting before and after as these should help you understand what your loop is doing.
Linked lists don't "exist" in Python as the language basically has an iterable builtin list object. Under the hood I'm sure this is implemented as a linked list in C code (most common implementation of Python).
The main feature is that a linked list is easily extendible, wheras an array has to be manually resized if you wish to expand it. Again, in Python these details are all abstracted away. So trying to work an example of linked lists in Python is pointless in my opinion, as you won't learn anything.
You should be doing this in C to get an actual understanding of memory allocation and pointers.
That said, given your example, each ListNode contains a value (like an array), but rather than just that, it has a variable 'next' where you store another ListNode object. This object, just like the first, has a value, and a variable that stores another ListNode object.This can continue for as many objects as desired.
The way the code works is that when we say pre.next, this refers to the ListNode object stored there, and the next object after that is pre.next.next. This works because pre.next is a ListNode object, which has a variable next.
Again, read up on linked lists in C. If you plan to work in higher level languages, I would say you don't really need an understanding of linked lists, as these data structures come "free" with most high level languages.

Python: delete element from heap

Python has heapq module which implements heap data structure and it supports some basic operations (push, pop).
How to remove i-th element from the heap in O(log n)? Is it even possible with heapq or do I have to use another module?
Note, there is an example at the bottom of the documentation:
http://docs.python.org/library/heapq.html
which suggest a possible approach - this is not what I want. I want the element to remove, not to merely mark as removed.
You can remove the i-th element from a heap quite easily:
h[i] = h[-1]
h.pop()
heapq.heapify(h)
Just replace the element you want to remove with the last element and remove the last element then re-heapify the heap. This is O(n), if you want you can do the same thing in O(log(n)) but you'll need to call a couple of the internal heapify functions, or better as larsmans pointed out just copy the source of _siftup/_siftdown out of heapq.py into your own code:
h[i] = h[-1]
h.pop()
if i < len(h):
heapq._siftup(h, i)
heapq._siftdown(h, 0, i)
Note that in each case you can't just do h[i] = h.pop() as that would fail if i references the last element. If you special case removing the last element then you could combine the overwrite and pop.
Note that depending on the typical size of your heap you might find that just calling heapify while theoretically less efficient could be faster than re-using _siftup/_siftdown: a little bit of introspection will reveal that heapify is probably implemented in C but the C implementation of the internal functions aren't exposed. If performance matter to you then consider doing some timing tests on typical data to see which is best. Unless you have really massive heaps big-O may not be the most important factor.
Edit: someone tried to edit this answer to remove the call to _siftdown with a comment that:
_siftdown is not needed. New h[i] is guaranteed to be the smallest of the old h[i]'s children, which is still larger than old h[i]'s parent
(new h[i]'s parent). _siftdown will be a no-op. I have to edit since I
don't have enough rep to add a comment yet.
What they've missed in this comment is that h[-1] might not be a child of h[i] at all. The new value inserted at h[i] could come from a completely different branch of the heap so it might need to be sifted in either direction.
Also to the comment asking why not just use sort() to restore the heap: calling _siftup and _siftdown are both O(log n) operations, calling heapify is O(n). Calling sort() is an O(n log n) operation. It is quite possible that calling sort will be fast enough but for large heaps it is an unnecessary overhead.
Edited to avoid the issue pointed out by #Seth Bruder. When i references the end element the _siftup() call would fail, but in that case popping an element off the end of the heap doesn't break the heap invariant.
(a) Consider why you don't want to lazy delete. It is the right solution in a lot of cases.
(b) A heap is a list. You can delete an element by index, just like any other list, but then you will need to re-heapify it, because it will no longer satisfy the heap invariant.

Categories