I thought about the following question about computer's architecture. Suppose I do in Python
from bisect import bisect
index = bisect(x, a) # O(log n) (also, shouldn't it be a standard list function?)
x.insert(index, a) # O(1) + memcpy()
which takes log n, plus, if I correctly understand it, a memory copy operation for x[index:]. Now I read recently that the bottleneck is usually in the communication between processor and the memory so the memory copy could be done by RAM quite fast. Is it how that works?
Python is a language. Multiple implementations exist, and they may have different implementations for lists. So, without looking at the code of an actual implementation, you cannot know for sure how lists are implemented and how they behave under certain circumstances.
My bet would be that the references to the objects in a list are stored in contiguous memory (certainly not as a linked list...). If that is indeed so, then insertion using x.insert will cause all elements behind the inserted element to be moved. This may be done efficiently by the hardware, but the complexity would still be O(n).
For small lists the bisect operation may take more time than x.insert, even though the former is O(log n) while the latter is O(n). For long lists, however, I'd hazard a guess that x.insert is the bottleneck. In such cases you must consider using a different data structure.
Use the blist module if you need a list with better insert performance.
CPython lists are contiguous arrays. Which one of the O(log n) bisect and O(n) insert dominates your performance profile depends on the size of your list and also the constant factors inside the O(). Particularly, the comparison function invoked by bisect can be something expensive depending on the type of objects in the list.
If you need to hold potentially large mutable sorted sequences then the linear array underlying Pythons list type isn't a good choice. Depending on your requirements heaps, trees or skip-lists might be appropriate.
Related
People coming from other coding languages to python often ask how they should pre-allocate or initialize their list. This is especially true for people coming from Matlab where codes as
l = []
for i = 1:100
l(end+1) = 1;
end
returns a warning that explicitly suggest you to initialize the list.
There are several posts on SO explaining (and showing through tests) that list initialization isn't required in python. A good example with a fair bit of discussion is this one (but the list could be very long): Create a list with initial capacity in Python
The other day, however, while looking for operations complexity in python, I stumbled this sentence on the official python wiki:
the largest [cost for list operations] come from growing beyond the current allocation size (because everything must move),
This seems to suggest that indeed lists do have a pre-allocation size and that growing beyond that size cause the whole list to move.
This shacked a bit my foundations. Can list pre-allocation reduce the overall complexity (in terms of number of operations) of a code? If not, what does that sentence means?
EDIT:
Clearly my question regards the (very common) code:
container = ... #some iterable with 1 gazilion elements
new_list = []
for x in container:
... #do whatever you want with x
new_list.append(x) #or something computed using x
In this case the compiler cannot know how many items there are in container, so new_list could potentially require his allocated memory to change an incredible number of times if what is said in that sentence is true.
I know that this is different for list-comprehensions
Can list pre-allocation reduce the overall complexity (in terms of number of operations) of a code?
No, the overall time complexity of the code will be the same, because the time cost of reallocating the list is O(1) when amortised over all of the operations which increase the size of the list.
If not, what does that sentence means?
In principle, pre-allocating the list could reduce the running time by some constant factor, by avoiding multiple re-allocations. This doesn't mean the complexity is lower, but it may mean the code is faster in practice. If in doubt, benchmark or profile the relevant part of your code to compare the two options; in most circumstances it won't matter, and when it does, there are likely to be better alternatives anyway (e.g. NumPy arrays) for achieving the same goal.
new_list could potentially require his allocated memory to change an incredible number of times
List reallocation follows a geometric progression, so if the final length of the list is n then the list is reallocated only O(log n) times along the way; not an "incredible number of times". The way the maths works out, the average number of times each element gets copied to a new underlying array is a constant regardless of how large the list gets, hence the O(1) amortised cost of appending to the list.
I have been using .pop() and .append() extensively for Leetcode-style programming problems, especially in cases where you have to accumulate palindromes, subsets, permutations, etc.
Would I get a substantial performance gain from migrating to using a fixed size list instead? My concern is that internally the python list reallocates to a smaller internal array when I execute a bunch of pops, and then has to "allocate up" again when I append.
I know that the amortized time complexity of append and pop is O(1), but I want to get better performance if I can.
Yes.
Python (at least the CPython implementation) uses magic under the hood to make lists as efficient as possible. According to this blog post (2011), calls to append and pop will dynamically allocate and deallocate memory in chunks (overallocating where necessary) for efficiency. The list will only deallocate memory if it shrinks below the chunk size. So, for most cases if you are doing a lot of appends and pops, no memory allocation/deallocation will be performed.
Basically the idea with these high level languages is that you should be able to use the data structure most suited to your use case and the interpreter will ensure that you don't have to worry about the background workings. (eg. avoid micro-optimisation and instead focus on the efficiency of the algorithms in general) If you're that worried about performance, I'd suggest using a language where you have more control over the memory, like C/C++ or Rust.
Python guarantees O(1) complexity for append and pops as you noted, so it sounds like it will be perfectly suited for your case. If you wanted to use it like a queue and using things like list.pop(1) or list.insert(0, obj) which are slower, then you could look into a dedicated queue data structure, for example.
Is there any Python complexity reference? In cppreference, for example, for many functions (such as std::array::size or std::array::fill) there's a complexity section which describes their running complexity, in terms of linear in the size of the container or constant.
I would expect the same information to appear in the python website, perhaps, at least for the CPython implementation. For example, in the list reference, in list.insert I would expect to see complexity: linear; I know this case (and many other container-related operations) is covered here, but many other cases are not. Here are a few examples:
What is the complexity of tuple.__le__? It seems like when comparing two tuples of size n, k, the complexity is about O(min(n,k)) (however, for small n's it looks different).
What is the complexity of random.shuffle? It appears to be O(n). It also appears that the complexity of random.randint is O(1).
What is the complexity of the __format__ method of strings? It appears to be linear in the size of the input string; however, it also grows when the number of relevant arguments grow (compare ("{0}"*100000).format(*(("abc",)*100000)) with ("{}"*100000).format(*(("abc",)*100000))).
I'm aware that (a) each of these questions may be answered by itself, (b) one may look at the code of these modules (even though some are written in C), and (c) StackExchange is not a python mailing list for user requests. So: this is not a doc-feature request, just a question of two parts:
Do you know if such a resource exists?
If not, do you know what is the place to ask for such, or can you suggest why I don't need such?
CPython is pretty good about its algorithms, and the time complexity of an operation is usually just the best you would expect of a good standard library.
For example:
Tuple ordering has to be O(min(n,m)), because it works by comparing element-wise.
random.shuffle is O(n), because that's the complexity of the modern Fisher–Yates shuffle.
.format I imagine is linear, since it only requires one scan through the template string. As for the difference you see, CPython might just be clever enough to cache the same format code used twice.
The docs do mention time complexity, but generally only when it's not what you would expect — for example, because a deque is implemented with a doubly-linked list, it's explicitly mentioned as having O(n) for indexing in the middle.
Would the docs benefit from having time complexity called out everywhere it's appropriate? I'm not sure. The docs generally present builtins by what they should be used for and have implementations optimized for those use cases. Emphasizing time complexity seems like it would either be useless noise or encourage developers to second-guess the Python implementation itself.
Goal: sorting a sequence in a functional way without using builtin sorted(..) function.
def my_sorted(seq):
"""returns an iterator"""
pass
Motivation: In the FP way, I am constrained:
never mutate seq (which could be an iterator or a realized list)
By implication, no in-place sorting.
Question 1 Since I cannot mutate seq, I would need to maintain a separate mutable data structure to store the sorted sequence. That seems wasteful compared to an in-place list.sort(). How do other functional programming languages handle this ?
Question 2 If I return a mutable sequence, it that ok in the functional paradigm?
Of course sorting cannot be totally lazy (the last element of input could be the first on output) but you could implement a computational lazy sort that after reading the whole sequence only generates exact sorted output on request element-by-element. You can also delay reading input until at least one output is requested so sorting and ignoring the result will require no computation.
For this computationally lazy approach the best candidate I know is the heapsort algorithm (you only do the heap-building step upfront).
Mutation in-place is only safe if no one else has references to the data, expecting it to be as it was prior to the sort. So it isn't really wasteful to have a new structure for the sorted results, in general. The in-place optimization is only safe if you're using the data in a linear fashion.
So, just allocate a new structure, since that is more generally useful. The in-place version is a special case.
The appropriate defensive programming is wasteful at times, but there's also nothing you can do about it.
This is why languages built to support functional use from the ground up use structural sharing for their natively immutable types; programming in a functional style in a language which isn't built for it (such as Python) isn't going to be as well-supported as a matter of course. That said, a sort operation isn't necessarily a good candidate for structural sharing (if more than minor changes need to be made).
As such, there often is at least one copy operation involved in a sort, even in other functional languages. Clojure, for instance, delegates to Java's native (highly optimized) sort operation on a temporary mutable array, and returns a seq wrapping that array (and thus making the result just as immutible as the input which was used to populate same). If the inputs are immutible, and the outputs are immutible, and what happens inbetween isn't visible to the outside world (particularly, to any other thread), transient mutability is often a necessary and appropriate thing.
Use a sorting algorithm that can be performed in a manner that creates a new datastructure, such as heapsort or mergesort.
Wasteful of what? bits? electricity? wall-clock time? A parallel merge-sort may be the quickest to complete if you have enough cpus and a large amount of data, but may produce many intermediary representations.
In general, parallelising an algorithm may lead to a very different optimisation strategy than a serial algorithm. For instance, due to Amdahl's Law, re-performing redundant work locally to avoid sharing. This may be considered "wasteful" in a serial context, but leads to a much more scalable algorithm.
What is the the time complexity of each of python's set operations in Big O notation?
I am using Python's set type for an operation on a large number of items. I want to know how each operation's performance will be affected by the size of the set. For example, add, and the test for membership:
myset = set()
myset.add('foo')
'foo' in myset
Googling around hasn't turned up any resources, but it seems reasonable that the time complexity for Python's set implementation would have been carefully considered.
If it exists, a link to something like this would be great. If nothing like this is out there, then perhaps we can work it out?
Extra marks for finding the time complexity of all set operations.
According to Python wiki: Time complexity, set is implemented as a hash table. So you can expect to lookup/insert/delete in O(1) average. Unless your hash table's load factor is too high, then you face collisions and O(n).
P.S. for some reason they claim O(n) for delete operation which looks like a mistype.
P.P.S. This is true for CPython, pypy is a different story.
The other answers do not talk about 2 crucial operations on sets: Unions and intersections. In the worst case, union will take O(n+m) whereas intersection will take O(min(x,y)) provided that there are not many element in the sets with the same hash. A list of time complexities of common operations can be found here: https://wiki.python.org/moin/TimeComplexity
The operation in should be independent from he size of the container, ie. O(1) -- given an optimal hash function. This should be nearly true for Python strings. Hashing strings is always critical, Python should be clever there and thus you can expect near-optimal results.