delete the hashed node in deque in O(1)

delete the hashed node in deque in O(1) - python

How do I hash a built-in node in deque (which is a double linked list) and delete the node in the middle in O(1)? Is the built-in node exposed?
For example, I want to save a deque's node in dict so I can delete the node in constant time later.
This is a use case in LRU, using deque so I don't need to write my own double linked list.
from collections import deque
class LRU:
def __init__(self):
self.nodes = deque()
self.key2node = {}
def insertThenDelete(self):
# insert
node = deque.Node('k', 'v') # imagine you can expose deque node here
self.nodes.appendleft(node)
self.key2node = {'k': node}
# delete
self.key2node['k'].deleteInDeque() # HERE shold remove the node in DLL!
del self.key2node['k']
I know you can do del mydeque[2] to delete by index.
but I want to do key2node['k'].deleteInDeque() delete by referance.

The deque API doesn't support direct reference to internal nodes or direct deletion of internal nodes, so what you're trying to do isn't possible with collections.deque().
In addition, the deque implementation is a doubly-linked list of fixed-length blocks where a block in a array of object pointers, so even if you could get a reference, there would be no easy way to delete just part of a block (it is fixed length).
Your best bet is to create your own doubly-linked list from scratch. See the source code for functools.lru_cache() which does exactly what you're describing: https://github.com/python/cpython/blob/3.7/Lib/functools.py#L405
Hope this helps :-)

Related

Python heappush vs simple append - what is the difference?

From https://www.tutorialspoint.com/heap-queue-or-heapq-in-python:
heappush – This function adds an element to the heap without altering the current heap.
If the current heap is not altered, why don't we use the append() list method? Is the list with the new element heapified only when heappop() is called?
Am I misunderstanding "without altering the current heap"? Or something else?

This is not an official reference documentation. So it contains what its author wanted to write.
If you consult the official Python Standard Library reference, you will find:
heapq.heappush(heap, item): Push the value item onto the heap, maintaining the heap invariant.
Here what happens is clear: the new item is added to the collection, and the internal structure is enventually adapted to have the binary tree respect: every parent node has a value less than or equal to any of its children.
After a second look at the tutorial, I think that what is meant is that heappush adds the new element without altering other elements on the heap, by opposition to heappop or heapreplace which will remove the current smaller item.

I believe the "without altering the current heap" means "maintaining the heap property that each node has a smaller key than its children". If you need the heap data structure, list.append() would not suffice. You may like to refer to https://www.cs.yale.edu/homes/aspnes/pinewiki/Heaps.html.

How the linked list works?

I do not have computer science background. I am trying to learn coding by myself, and I'm doing it, partly, by solving the problems on LeetCode.
Anyway, there are the problems that use Linked Lists. And I already found info that linked list have to be simulated in Phython. My problem is that I really cannot get what is behind linked list. For instance, what kind of problems those are suppose to target?
And in general how linked list function. Any link for such info would be really helpfull.
The recent problem I looked at LeetCode asks to swap every two adjacent nodes and return its head. And LeetCode offers following solution, that I cannot actually figure out how it acutaly works.
# Definition for singly-linked list.
# class ListNode(object):
# def __init__(self, x):
# self.val = x
# self.next = None
class Solution(object):
def swapPairs(self, head):
"""
:type head: ListNode
:rtype: ListNode
"""
pre = self
pre.next = head
while pre.next and pre.next.next:
a = pre.next
b = a.next
pre.next =b
b.next =a
a.next =b.next
pre = a
return self.next
As I said, I do not understand this solution. I tried to use example list 1->2->3->4 that should return list 2->1->4->3
All I managed is to make only one pass through the loop, and then computer should exit the loop, but then what happens? How are the last two numbers switched? How does this code work at all if list has only 2 elements, to me it seems impossible.
If you could just direct me to the online literature that explains something like this, I would be most grateful.
Thanks.

a linked-list acts almost the same as an array. There are a few main differences though. In a linked-list, the memory used doesn't (and almost never is) contiguous memory. So in an array, if u have 5 items and you look at the memory all 5 items will be right next to each other (for the most part). However each 'item' in a linked list has a pointer that points directly to the next item, removing the need to have contiguous memory. So an array is a 'list' of items that exist contiguously in memory and a linked-list is a 'list' of objects that each hold an item and a pointer to the next item. This is considered a single linked-list as traversal is only possible from one direction. There is also a double linked-list where each node now has a pointer to the next node and another pointer for the previous node allowing traversal from both directions.
https://www.cs.cmu.edu/~adamchik/15-121/lectures/Linked%20Lists/linked%20lists.html
the link will help you get familiar with visualizing how these linked-lists work. I would probably focus on inserting before and after as these should help you understand what your loop is doing.

Linked lists don't "exist" in Python as the language basically has an iterable builtin list object. Under the hood I'm sure this is implemented as a linked list in C code (most common implementation of Python).
The main feature is that a linked list is easily extendible, wheras an array has to be manually resized if you wish to expand it. Again, in Python these details are all abstracted away. So trying to work an example of linked lists in Python is pointless in my opinion, as you won't learn anything.
You should be doing this in C to get an actual understanding of memory allocation and pointers.
That said, given your example, each ListNode contains a value (like an array), but rather than just that, it has a variable 'next' where you store another ListNode object. This object, just like the first, has a value, and a variable that stores another ListNode object.This can continue for as many objects as desired.
The way the code works is that when we say pre.next, this refers to the ListNode object stored there, and the next object after that is pre.next.next. This works because pre.next is a ListNode object, which has a variable next.
Again, read up on linked lists in C. If you plan to work in higher level languages, I would say you don't really need an understanding of linked lists, as these data structures come "free" with most high level languages.

How to make list lookup faster in this recursive function

I have a recursive function which creates a json object
def add_to_tree(name, parent, start_tree):
for x in start_tree:
if x["name"] == parent:
x["children"].append({"name":name, "parent":parent, "children":[]})
else:
add_to_tree(name, parent, x["children"])
It is called from another function
def caller():
start_tree = [{"name":"root", "parent":"null", "children":[]}] # basic structure of the json object which holds the d3.js tree data
for x in new_list:
name = x.split('/')[-2]
parent = x.split('/')[-3]
add_to_tree(name, parent, start_tree)
new_list is list which holds links in this form
/root/A/
/root/A/B/
/root/A/B/C/
/root/A/D/
/root/E/
/root/E/F/
/root/E/F/G/
/root/E/F/G/H/
...
Everything is working fine except for the fact the run times grows exponentially with with the input size.
Normally new_list has ~500k links and depth of these links can be more than 10 so there is lots of looping and looks involved in the add_to_tree() function.
Any ideas on how to make this faster?

You are searching your whole tree each time you add a new entry. This is hugely inefficient as your tree grows; you can easily end up with a O(N^2) searches this way; for each new element search the whole tree again.
You could use a dictionary mapping names to specific tree entries, for fast O(1) lookups; this lets you avoid having to traverse the tree each time. It can be as simple as treeindex[parent]. This'll take some more memory however, and you may need to handle the case where the parent is added after the children (using a queue).
However, since your input list appears to be sorted, you could just process your list recursively or use a stack and take advantage of the fact you just found the parent already. If your path is longer than the previous entry, it'll be a child of that entry. If the path is equal or shorter, it'll be a sibling entry to the previous node or a parent of that node, so return or pop the stack.
For example, for these three elements:
/root/A/B/
/root/A/B/C/
/root/A/D/
/root/A/B/C does not have to search the tree from the root for /root/A/B, it was the previously processed entry. That'll be the parent call for this recursive iteration, or the top of the stack. Just add to that parent directly.
/root/A/D is a sibling of a parent; the path is shorter than /root/A/B/C/, so return or pop that entry of the stack. The length is equal to /root/A/B/, so it is a direct sibling; again return or pop the stack. Now you'll be at the /root/A level, and /root/A/D/ is a child. Add, and continue your process.

I have not tested this, but it looks like the loop does not stop when an insertion has been made, so every entry in new_list will cause a recursive search through all of the tree. This should speed it up:
def add_to_tree(name, parent, start_tree):
for x in start_tree:
if x["name"] == parent:
x["children"].append({"name":name, "parent":parent, "children":[]})
return True
elif add_to_tree(name, parent, x["children"]):
return True
return False
It stops searching as soon as the parent is found.
That said, I think there is a bug in the approach. What if you have:
/root/A/B/C/
/root/D/B/E/
Your algorithm only parses the last two elements and it seems that both C and E will be placed under B. I think you will need to take all elements into account and make your way down the tree element by element. Anyway that is better since you will know at each level which branch to take, and the correct version will be much faster. Each insert will be O(log N).

Directed graph nodes: Keep track of successors and predecessors

I am trying to implement a class Node representing a node in a directed graph, which in particular has a set of successors and predecessors. I would like Node.predecessors and Node.predecessors to behave like sets, in particular I want to iterate over their elements, add and remove elements, check containment, and set them from an iterable. However, after node_1.sucessors.add(node_2) it should be True that node_1 in node_2.pedecessors.
It seems possible to write a new subclass of set that implements this magic, but as far as I see an implementation of such a class would be quite cumbersome, because it would have to know about the Node object it belongs to and if it is a predecessor or successor and would need some special methods for addition and so on, so that node_1.sucessors.add(node_2) will not call node_2.predecessors.add(node_1) and thus lead to an infinite loop.
Generating one of the two attributes on the fly (node for node in all_nodes if self in node.sucessors) should be possible, but then I need to keep track of all Nodes belonging to a graph, which is easy (adding it to a weakref.WeakSet class attribute in __init__) if I have only one graph, but using one big set for all nodes leads to large computational effort if I have multiple disjoint graphs, and I do not see how to modify the set of predecessors.
Does anybody have a good solution for this?

What if you wrap the add method in your class and then inside that wrapper method you just use the two attributes predecessors and sucessors. Something like this
That's the first solution that would come to my mind:
class Node:
def __init__(self):
self.pred = set()
self.suce = set()
def addSucessor(self, node):
self.suce.add(node)
node.pred.add(self)

OrderedDict performance (compared to deque)

I've been trying to performance optimize a BFS implementation in Python and my original implementation was using deque to store the queue of nodes to expand and a dict to store the same nodes so that I would have efficient lookup to see if it is already open.
I attempted to optimize (simplicity and efficiency) by moving to an OrderedDict. However, this takes significantly more time. 400 sample searches done take 2 seconds with deque/dict and 3.5 seconds with just an OrderedDict.
My question is, if OrderedDict does the same functionality as the two original data structures, should it not at least be similar in performance? Or am I missing something here? Code examples below.
Using just an OrderedDict:
open_nodes = OrderedDict()
closed_nodes = {}
current = Node(start_position, None, 0)
open_nodes[current.position] = current
while open_nodes:
current = open_nodes.popitem(False)[1]
closed_nodes[current.position] = (current)
if goal(current.position):
return trace_path(current, open_nodes, closed_nodes)
# Nodes bordering current
for neighbor in self.environment.neighbors[current.position]:
new_node = Node(neighbor, current, current.depth + 1)
open_nodes[new_node.position] = new_node
Using both a deque and a dictionary:
open_queue = deque()
open_nodes = {}
closed_nodes = {}
current = Node(start_position, None, 0)
open_queue.append(current)
open_nodes[current.position] = current
while open_queue:
current = open_queue.popleft()
del open_nodes[current.position]
closed_nodes[current.position] = (current)
if goal_function(current.position):
return trace_path(current, open_nodes, closed_nodes)
# Nodes bordering current
for neighbor in self.environment.neighbors[current.position]:
new_node = Node(neighbor, current, current.depth + 1)
open_queue.append(new_node)
open_nodes[new_node.position] = new_node

Both deque and dict are implemented in C and will run faster than OrderedDict which is implemented in pure Python.
The advantage of the OrderedDict is that it has O(1) getitem, setitem, and delitem just like regular dicts. This means that it scales very well, despite the slower pure python implementation.
Competing implementations using deques, lists, or binary trees usually forgo fast big-Oh times in one of those categories in order to get a speed or space benefit in another category.
Update: Starting with Python 3.5, OrderedDict() now has a C implementation. And though it hasn't been highly optimized like some of the other containers. It should run much faster than the pure python implementation. Then starting with Python 3.6, regular dictionaries has been ordered (though the ordering behavior is not yet guaranteed). Those should run faster still :-)

Like Sven Marnach said, OrderedDict is implemented in Python, I want to add that it is implemented using dict and list.
dict in python is implemented as hashtable. I am not sure how deque is implemented, but documentation says that deque is optimized for quick adding or accessing first/last elements, so I guess that deque is implemented as linked-list.
I think when you do pop on OrderedDict, python does hashtable look-up which is slower compared to linked-list which has direct pointers to last and first elements. Adding an element to the end of linked-list is also faster compared with hash-table.
So primary cause why OrderDict in your example is slower, is because it is faster to access last element from linked-list, than to access any element using hash-table.
My thoughts are based on information from book Beautiful Code, it describes implementation details behind dict, however I do not know much details behind list and deque, this answer is just my intuition of how things work, so in case I am wrong, I really deserve down-votes for talking things which I am not sure about. Why I talk things on which I am not sure? -Because I want to test my intuition :)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

delete the hashed node in deque in O(1) - python

Related

Python heappush vs simple append - what is the difference?

How the linked list works?

How to make list lookup faster in this recursive function

Directed graph nodes: Keep track of successors and predecessors

OrderedDict performance (compared to deque)

Categories

Resources