I have a set in Python and I want to sample one element from it, like with random.sample() method. The problem is that sample() converts set to tuple internally which is O(n) and I have to do it in the most optimal way.
Is there a function that I can use to sample an element from a set with time complexity O(1) or the only way to do this is by creating your own implementation of set?
Because the data layout is irregular, it’s impossible to uniformly sample from a hash-based set in O(1) except, in the case of ω(n) queries, by preprocessing it into some sort of array. (Such an array could be maintained while building the set, of course, but that’s not the starting point given and isn’t faster to add than the tuple conversion.)
Related
I need a data structure to store positive (not necessarily integer) values. It must support the following two operations in sublinear time:
Add an element.
Remove the largest element.
Also, the largest key may scale as N^2, N being the number of elements. In principle, having O(N^2) space requirement wouldn't be a big problem, but if a more efficient option exists in terms of store, it would work better.
I am working in Python, so if such a data structure exists, it would be of help to have an implementation in this language.
There is no such data structure. For example, if there were, sorting would be worst-case linear time: add all N elements in O(N) time, then remove the largest element remaining N times, again in total O(N) time.
the best data structure you can choose for this operations is the heap: https://www.tutorialspoint.com/python_data_structure/python_heaps.htm#:~:text=Heap%20is%20a%20special%20tree,is%20called%20a%20max%20heap.
with this data structure both adding an element and removing the max are O(log(n)).
this is the most used data structure when you need a lot of operations on the max element, for example is commonly used to implement priority queues
Although constant time may be impossible, depending on your input constraints, you might consider a y-fast-trie, which has O(log log m) time operations and O(n) space, where m is the range, although they work with integers, taking advantage of the bit structure. One of the supported operations is next higher or lower element, which could let you keep track of the highest when the latter is removed.
I have two big string list in python. I want to subtract these two list fast with the order of o(n). I found some way like remove second list elements in a loop from first list, or converting list to set() (problem:change order of list)and use minus(-) operator, but these methods are not efficient. Is there any way for doing this operation?
a=['1','2','3',...,'500000']
b=['1','2','3',...,'200000']
c=a-b
c=['200001','200002',...,'500000']
Your problem, as formulated, is:
Go through A
For each element, search it in B and take it if it's not found
No assumptions about elements is made
For arbitrary data, list search is O(N), set search is O(1), converting to set is O(N). Going through A is O(N).
So it's O(N^2) with only lists and O(N) if converting B to a set.
The only way you can speed it up is to make either iterating or searching more efficient -- which is impossible without using some additional knowledge about your data. E.g.
In your example, your data are sequential numbers, so you can take A[len(B):].
If you are going to use the same B multiple times, you can cache the set
You can make B a set right off the bat (if order needs to be preserved, you can use an ordered set)
If all data are of the same type and are short, you can use numpy arrays and its fast setdiff1d
etc
min, max have O(N) time complexity because they have to loop over the given list/string and check every index to find min/max. But I am wondering what would be the time complexity of min,max if used on a set? For example:
s = {1,2,3,4} # s is a set
using min/max we get:
min(s) = 1
max(s) = 4
Since sets do not use indices like lists and strings, but instead operate using buckets that can be accessed directly, does the time complexity of min/max differ than the general case?
Thank you!
As pointed out in the comments above, python is a well documented language and one must always refer to the docs first.
Answering the question, according to the docs,
A set object is an unordered collection of distinct hashable objects.
Being unordered means that to evaluate maximum or minimum among all the elements using any means (inbuilt or not) would at least require one to look at each element, which means O(n) complexity at best.
On top of it, max and min functions of python iterate over each element and are O(n) in all cases.
You can always look up the source code yourself.
In Python, what are the running time and space complexities if a list is converted to a set?
Example:
data = [1,2,3,4,5,5,5,5,6]
# this turns list to set and overwrites the list
data = set(data)
print data
# output will be (1,2,3,4,5,6)
Converting a list to a set requires that every item in the list be visited once, O(n). Inserting an element into a set is O(1), so the overall time complexity would be O(n).
Space required for the new set is less than or equal to the length of the list, so that is also O(n).
Here's a good reference for Python data structures.
You have to iterate through the entire list, which is O(n) time, and then insert each into a set, which is O(1) time. So the overall time complexity is O(n), where n is the length of the list.
No other space other than the set being created or the list being used is needed.
As others have stated regarding the runtime, the set creation time is O(N) for the entire list, and set existence check is O(1) for each item.
But their comments on memory usage being the same between lists and sets are incorrect.
In python, sets can use 3x to 10x more memory than lists. Set memory usage growth still O(N), but it's always at least 3x more than lists. Maybe because of needing to keep in memory all those hashes.
related: https://stackoverflow.com/a/54891295/1163355
I am designing a software in Python and I was getting little curious about whether there is any time differences when popping out items from a dictionary of very small lengths and when popping out items from a dictionary of very large length or it is same in all cases.
You can easily answer this question for yourself using the timeit module. But the entire point of a dictionary is near-instant access to any desired element by key, so I would not expect to have a large difference between the two scenarios.
Check out this article on Python TimeComplexity:
The Average Case times listed for dict objects assume that the hash
function for the objects is sufficiently robust to make collisions
uncommon. The Average Case assumes the keys used in parameters are
selected uniformly at random from the set of all keys.
Note that there is a fast-path for dicts that (in practice) only deal
with str keys; this doesn't affect the algorithmic complexity, but it
can significantly affect the constant factors: how quickly a typical
program finishes.
According to this article, for a 'Get Item' operation, the average case is O(1), with a worse case of O(n). In other words, the worst case is that the time increases linearly with size. See Big O Notation on Wikipedia for more information.