Suppose I am maintaining a set of integers (will add/remove dynamically, and the integers might have duplicate values). And I need to efficiently find max/min element of the current element set. Wondering if any better solutions?
My current solution is maintain a max heap and a min heap. I am using Python 2.7.x and open for any 3rd party pip plug-ins to fit my problem.
Just use min and max function. There is no point in maintaining the heap. You need min/max heap if you want to perform this action (getting min/max) many times while adding removing elements.
Do not forget that to build a heap you need to spend O(n) time, where the constant is 2 (as far as I remember). Only then you can use it's O(log(n)) time to get your min/max.
P.S. Ok now that you have told that you have to call min/max many times you have to make two heaps (min heap and max heap). It will take O(n) to construct each heap, then each operation (add element, remove element, find min/max) will take O(log(n)), where you add/remove elements to both heaps and do min/max on the corresponding heap.
Where as if you will go with min/max functions over the list, your do not need to construct anything and add will take O(1), remove O(n), and min/max O(n), which is way worse than heap.
P.P.S python has heaps
min(list_of_ints)
Will yield your minimum of the list and...
max(list_of_ints)
Will yield your maximum of the list.
Hope that helps....
Using sorted list may help. First element will always be minimum and last element will always be maximum, with O(1) complexity to get the value.
Using binary search to add/remove with sorted list will be O(log(n))
Related
I know that pop the last element of the list takes O(1).
And after reading this post
What is the time complexity of popping elements from list in Python?
I notice that if we pop an arbitrary number from a list takes O(n) since all the pointers need to shift one position up.
But for the set, there is no order and no index. So I am not sure if there is still pointers in set?
If not, would the pop() for set is O(1)?
Thanks.
On modern CPython implementations, pop takes amortized constant-ish time (I'll explain further). On Python 2, it's usually the same, but performance can degrade heavily in certain cases.
A Python set is based on a hash table, and pop has to find an occupied entry in the table to remove and return. If it searched from the start of the table every time, this would take time proportional to the number of empty leading entries, and it would get slower with every pop.
To avoid this, the standard CPython implementation tries to remember the position of the last popped entry, to speed up sequences of pops. CPython 3.5+ has a dedicated finger member in the set memory layout to store this position, but earlier versions abuse the hash field of the first hash table entry to store this index.
On any Python version, removing all elements from a set with a sequence of pop operations will take time proportional to the size of the underlying hash table, which is usually within a small constant factor of the original number of elements (unless you'd already removed a bunch of elements). Mixing insertions and pop can interfere heavily with this on Python 2 if inserted elements land in hash table index 0, trashing the search finger. This is much less of an issue on Python 3.
I'm implementing A* (A star) algorithm in Python.
As you know we have have "open set" which consists a list of nodes we plan to explore.
In the algorithm we get the node with lowest F(n) value (estimated total cost) from open set.
We often use PriorityQueue, but for some reason which I don't understand why PriorityQueue doens't get the node with lowest value.
So rather I made an array list (regular list in Python) named "frontier" and keep the "open set" there.
There are two ways to use that like PriorityQueue.
currentNode = min(frontier)
to get the minimum value from the open set.
Or we can sort the array everytime we add new node into it, and just use "pop" to get the lowest value.
#adding a node
frontier.append(someNode)
sorted(frontier)
#taking out the node
currentNode = frontier.pop(0)
Which one is faster?
Using min() or first sorting and then using pop() ?
What algorithm does "min()" in Python use to get the minimum value from an array?
There are three operations that this data structure s needs to perform, these are
Get the element with lowest F(n)
Add an element
Remove an element
The easiest way to implement would be to have a plain list and use min(s) to get the min value or min(range(len(values)), key=s.__getitem__) to get the index.
This runs in O(n) because it looks at every element once. Adding an element to the end of the list runs in O(1) and removing from an arbitrary index runs in O(n) because all successive elements need to be shifted.
The idea to sort the array every time is slightly worse, as sorting runs in O(n log n). You save some time by sorting the min element to the end of the list, where it can be removed in O(1), but in total it will be slower.
The best solution is to have a heap or priority queue. It maintains the min element at the top where it can be retrieved in O(1) and allows inserting and removing elements in O(log n).
I need a data structure to store positive (not necessarily integer) values. It must support the following two operations in sublinear time:
Add an element.
Remove the largest element.
Also, the largest key may scale as N^2, N being the number of elements. In principle, having O(N^2) space requirement wouldn't be a big problem, but if a more efficient option exists in terms of store, it would work better.
I am working in Python, so if such a data structure exists, it would be of help to have an implementation in this language.
There is no such data structure. For example, if there were, sorting would be worst-case linear time: add all N elements in O(N) time, then remove the largest element remaining N times, again in total O(N) time.
the best data structure you can choose for this operations is the heap: https://www.tutorialspoint.com/python_data_structure/python_heaps.htm#:~:text=Heap%20is%20a%20special%20tree,is%20called%20a%20max%20heap.
with this data structure both adding an element and removing the max are O(log(n)).
this is the most used data structure when you need a lot of operations on the max element, for example is commonly used to implement priority queues
Although constant time may be impossible, depending on your input constraints, you might consider a y-fast-trie, which has O(log log m) time operations and O(n) space, where m is the range, although they work with integers, taking advantage of the bit structure. One of the supported operations is next higher or lower element, which could let you keep track of the highest when the latter is removed.
min, max have O(N) time complexity because they have to loop over the given list/string and check every index to find min/max. But I am wondering what would be the time complexity of min,max if used on a set? For example:
s = {1,2,3,4} # s is a set
using min/max we get:
min(s) = 1
max(s) = 4
Since sets do not use indices like lists and strings, but instead operate using buckets that can be accessed directly, does the time complexity of min/max differ than the general case?
Thank you!
As pointed out in the comments above, python is a well documented language and one must always refer to the docs first.
Answering the question, according to the docs,
A set object is an unordered collection of distinct hashable objects.
Being unordered means that to evaluate maximum or minimum among all the elements using any means (inbuilt or not) would at least require one to look at each element, which means O(n) complexity at best.
On top of it, max and min functions of python iterate over each element and are O(n) in all cases.
You can always look up the source code yourself.
I'm trying to implement an algorithm to solve the skyline problem that involves removing specific elements from the middle of a max heap. The way I currently do it is maxheap.remove(index) but I have to follow up with a heapify(maxheap) otherwise the order is thrown off. I know in java you can use something like a treemap to do that. Is there anyway to do that in python more efficiently than calling two separate methods each of which takes O(n) time?
Removing an arbitrary item from a heap is an O(log n) operation, provided you know where the item is in the heap. The algorithm is:
Move the last item in the heap to the position that contains the item to remove.
Decrement heap count.
If the item is smaller than its parent
bubble it up the heap
else
sift it down the heap
The primary problem is finding the item's position in the heap. As you've noted, doing so is an O(n) operation unless you maintain more information.
An efficient solution to this is to create a dictionary that contains the item key, and the value is the index of that item in the heap. You have to maintain the dictionary, however:
When you insert an item into the heap, add a dictionary entry
When you remove an item from the heap, remove the dictionary entry
Whenever you change the position of an item in the heap, update that item's value in the dictionary.
With that dictionary in place, you have O(1) access to an item's position in the heap, and you can remove it in O(log n).
I would have have each element be a data structure with a flag for whether to ignore it. When you heappop, you'll just pop again if it is an element that got flagged. This is very easy, obvious, and involves knowing nothing about how the heap works internally. For example you don't need to know where the element actually is in the heap to flag it.
The downside of this approach is that the flagged elements will tend to accumulate over time. Occasionally you can just filter them out then heapify.
If this solution is not sufficient for your needs, you should look for a btree implementation in Python of some sort. That will behave like the treemap that you are used to in Java.
Yes, there is a more efficient way - if you have its index or pointer (depending on implementation method).
Replace the number on the index/pointer you need to remove with its largest
child and repeat the process recursively (replace the child with its largest child,etc...) until you get to a node that has no children, which you remove easily.
The complexity of this algorithm is O(log n).
http://algorithms.tutorialhorizon.com/binary-min-max-heap/