I am writing a blogpost on Python list.clear() method where I also want to mention about the time and space complexity of the underlying algorithm. I expected the time complexity to be O(N), iterate over the elements and free the memory? But, I found an article where it is mentioned that it is actually an O(1) operation. Then, I searched the source code of the method in CPython implementation and found a method which I believe is the actual internal implementation of list.clear(), however, I am not really sure it is. Here's the source code of the method:
static int
_list_clear(PyListObject *a)
{
Py_ssize_t i;
PyObject **item = a->ob_item;
if (item != NULL) {
/* Because XDECREF can recursively invoke operations on
this list, we make it empty first. */
i = Py_SIZE(a);
Py_SIZE(a) = 0;
a->ob_item = NULL;
a->allocated = 0;
while (--i >= 0) {
Py_XDECREF(item[i]);
}
PyMem_FREE(item);
}
/* Never fails; the return value can be ignored.
Note that there is no guarantee that the list is actually empty
at this point, because XDECREF may have populated it again! */
return 0;
}
I could be wrong but it does look like O(N) to me. Also, I found a similar question here, but there's no clear answer there. Just want to confirm the actual time and space complexity of list.clear(), and maybe a little explanation supporting the answer. Any help appreciated. Thanks.
As you correctly noticed, the CPython implementation of list.clear is O(n). The code iterates over the elements in order to decrease the reference count of each one, without a way to avoid it. There is no doubt that it is an O(n) operation and, given a large enough list, you can measure the time spent in clear() as function of list size:
import time
for size in 1_000_000, 10_000_000, 100_000_000, 1_000_000_000:
l = [None] * size
t0 = time.time()
l.clear()
t1 = time.time()
print(size, t1 - t0)
The output shows linear complexity; on my system with Python 3.7 it prints the following:
1000000 0.0023756027221679688
10000000 0.02452826499938965
100000000 0.23625731468200684
1000000000 2.31496524810791
The time per element is of course tiny because the loop is coded in C and each iteration does very little work. But, as the above measurement shows, even a miniscule per-element factor eventually adds up. Small per-element constant is not the reason to ignore the cost of an operation, or the same would apply to the loop that shifts the list elements in l.insert(0, ...), which is also very efficient - and yet few would claim insertion at the beginning to be O(1). (And clear potentially does more work because a decref will run an arbitrary chain of destructors for an object whose reference count actually reaches zero.)
On a philosophical level, one could argue that costs of memory management should be ignored when assessing complexity because otherwise it would be impossible to analyze anything with certainty, as any operation could trigger a GC. This argument has merit; GC does come occasionally and unpredictably, and its cost can be considered amortized across all allocations. In a similar vein complexity analysis tends to ignore the complexity of malloc because the parameters it depends on (like memory fragmentation) are typically not directly related to allocation size or even to the number of already allocated blocks. However, in case of list.clear there is only one allocated block, no GC is triggered, and the code is still visiting each and every list element. Even with the assumption of O(1) malloc and amortized O(1) GC, list.clear still takes the time proportional to the number of elements in the list.
The article linked from the question is about Python the language and doesn't mention a particular implementation. Python implementations that don't use reference counting, such as Jython or PyPy, are likely to have true O(1) list.clear, and for them the claim from the article would be entirely correct. So, when explaining the Python list on a conceptual level, it is not wrong to say that clearing the list is O(1) - after all, all the object references are in a contiguous array, and you free it only once. This is the point your blog post probably should make, and that is what the linked article is trying to say. Taking the cost of reference counting into account too early might confuse your readers and give them completely wrong ideas about Python's lists (e.g. they could imagine that they are implemented as linked lists).
Finally, at some point one must accept that memory management strategy does change complexity of some operations. For example, destroying a linked list in C++ is O(n) from the perspective of the caller; discarding it in Java or Go would be O(1). And not in the trivial sense of a garbage-collected language just postponing the same work for later - it is quite possible that a moving collector will only traverse reachable objects and will indeed never visit the elements of the discarded linked list. Reference counting makes discarding large containers algorithmically similar to manual collection, and GC can remove that. While CPython's list.clear has to touch every element to avoid a memory leak, it is quite possible that PyPy's garbage collector never needs to do anything of the sort, and thus has a true O(1) list.clear.
It's O(1) neglecting memory management. It's not quite right to say it's O(N) accounting for memory management, because accounting for memory management is complicated.
Most of the time, for most purposes, we treat the costs of memory management separately from the costs of the operations that triggered it. Otherwise, just about everything you could possibly do becomes O(who even knows), because almost any operation could trigger a garbage collection pass or an expensive destructor or something. Heck, even in languages like C with "manual" memory management, there's no guarantee that any particular malloc or free call will be fast.
There's an argument to be made that refcounting operations should be treated differently. After all, list.clear explicitly performs a number of Py_XDECREF operations equal to the list's length, and even if no objects are deallocated or finalized as a result, the refcounting itself will necessarily take time proportional to the length of the list.
If you count the Py_XDECREF operations list.clear performs explicitly, but ignore any destructors or other code that might be triggered by the refcounting operations, and you assume PyMem_FREE is constant time, then list.clear is O(N), where N is the original length of the list. If you discount all memory management overhead, including the explicit Py_XDECREF operations, list.clear is O(1). If you count all memory management costs, then the runtime of list.clear cannot be asymptotically bounded by any function of the list's length.
As the other answers have noted, it takes O(n) time to clear a list of length n. But I think there is an additional point to be made about amortized complexity here.
If you start with an empty list, and do N append or clear operations in any order, then the total running time across all of those operations is always O(N), giving an average per operation of O(1), however long the list gets in the process, and however many of those operations are clear.
Like clear, the worst case for append is also O(n) time where n is the length of the list. That's because when the capacity of the underlying array needs to be increased, we have to allocate a new array and copy everything across. But the cost of copying each element can be "charged" to one of the append operations which got the list to a length where the array needs to be resized, in such a way that N append operations starting from an empty list always take O(N) time.
Likewise, the cost of decrementing an element's refcount in the clear method can be "charged" to the append operation which inserted that element in the first place, because each element can only get cleared once. The conclusion is that if you are using a list as an internal data structure in your algorithm, and your algorithm repeatedly clears that list inside a loop, then for the purpose of analysing your algorithm's time complexity you should count clear on that list as an O(1) operation, just as you'd count append as an O(1) operation in the same circumstances.
A quick time check indicates that it is O(n).
Let's execute the following and create the lists beforehand to avoid overhead:
import time
import random
list_1000000 = [random.randint(0,10) for i in range(1000000)]
list_10000000 = [random.randint(0,10) for i in range(10000000)]
list_100000000 = [random.randint(0,10) for i in range(100000000)]
Now check for the time it takes to clear these four lists of different sizes as follows:
start = time.time()
list.clear(my_list)
end = time.time()
print(end - start))
Results:
list.clear(list_1000000) takes 0.015
list.clear(list_10000000) takes 0.074
list.clear(list_100000000) takes 0.64
A more robust time measurement is needed, since these numbers can deviate each time it is ran, but the results indicate that the execution time goes pretty much linearly as the input size grows. Hence we can conclude an O(n) complexity.
Related
I have the following function, that checks for the first duplicate value in an array.
Example: an array input of [123344] would return 3.
I believe the space complexity is O(1) since the total space being used remains constant. Is this correct?
#o(n) time.
#o(1) space or o(n) space?
def firstDuplicateValue(array):
found = set()
while len(array) > 0:
x = array.pop(0) #we remove elements from the array as we add them to the set
if x in found:
return x
else:
found.add(x)
return -1
Space complexity
I have to answer "no, the space complexity is not O(1)" for two reasons. The first is theoretical and pedantic, the second is more practical.
We typically consider space complexity to be the extra space which your program needs access to in addition to the input it is given.
Some programs read the input but are not allowed to modify it. For these programs, your question isn't raised and the space complexity is exactly the space taken by all the extra variables and data structures created in the program.
Some programs are explicitly allowed to modify the input. Still, for those programs, the space complexity only counts the extra space, not the space already in the input. For these programs it is quite possible to have a space complexity of O(log(n)). A space complexity of O(1) is unheard of in theory, because to iterate over an input consisting in an array of n elements, you need a counter variable that can count up to n, which requires log(n) bits. So, when people say O(1) in practice, they probably mean O(log(n)) in theory. This is my first reason to answer no: if we're being pedantic, the space complexity cannot be better than O(log(n)) in theory.
Your program modifies the input array by deleting elements from it, and stores these elements in an extra data structure found. In theory, this new data structure could fit exactly in the space liberated from the input array, and if that were the case, you would be right that the complexity of your algorithm is better than O(n). In practice, it looks shady because:
there is no disclaimer in your function that warns the user that it might destroy the input;
there is no guarantee by the python interpreter that array.pop() will really free space on the computer.
In practice, python interpreters typically only resize the array when it doubles or halves ido not free space when using array.pop(), until you've popped about half the values in the array. This is my second reason to say no: you need to pop at least n/2 values from the input array before any space will be freed. Before that happens, you will have used n/2 space in your extra data structure found. Hence the space complexity of your function will not be better than O(n) with a standard python interpreter.
Time complexity
Your fist comment in the code says #o(n) time.. But that is not correct for two reasons. First, don't confuse o() and O(), which are very different. Second, this function is O(n^2), not O(n). This is because of the repeated use of array.pop(0). As a good rule of thumb: never use list.pop(0) in python. Use list.pop() if you can, or find some other way.
list.pop(0) is terribly inefficient, as it removes the first element, then moves every other element one space up to fill the gap. Thus, every call to list.pop(0) has O(n) time complexity, and your function makes up to n calls to list.pop(0), so the function has O(n^2) time complexity.
People coming from other coding languages to python often ask how they should pre-allocate or initialize their list. This is especially true for people coming from Matlab where codes as
l = []
for i = 1:100
l(end+1) = 1;
end
returns a warning that explicitly suggest you to initialize the list.
There are several posts on SO explaining (and showing through tests) that list initialization isn't required in python. A good example with a fair bit of discussion is this one (but the list could be very long): Create a list with initial capacity in Python
The other day, however, while looking for operations complexity in python, I stumbled this sentence on the official python wiki:
the largest [cost for list operations] come from growing beyond the current allocation size (because everything must move),
This seems to suggest that indeed lists do have a pre-allocation size and that growing beyond that size cause the whole list to move.
This shacked a bit my foundations. Can list pre-allocation reduce the overall complexity (in terms of number of operations) of a code? If not, what does that sentence means?
EDIT:
Clearly my question regards the (very common) code:
container = ... #some iterable with 1 gazilion elements
new_list = []
for x in container:
... #do whatever you want with x
new_list.append(x) #or something computed using x
In this case the compiler cannot know how many items there are in container, so new_list could potentially require his allocated memory to change an incredible number of times if what is said in that sentence is true.
I know that this is different for list-comprehensions
Can list pre-allocation reduce the overall complexity (in terms of number of operations) of a code?
No, the overall time complexity of the code will be the same, because the time cost of reallocating the list is O(1) when amortised over all of the operations which increase the size of the list.
If not, what does that sentence means?
In principle, pre-allocating the list could reduce the running time by some constant factor, by avoiding multiple re-allocations. This doesn't mean the complexity is lower, but it may mean the code is faster in practice. If in doubt, benchmark or profile the relevant part of your code to compare the two options; in most circumstances it won't matter, and when it does, there are likely to be better alternatives anyway (e.g. NumPy arrays) for achieving the same goal.
new_list could potentially require his allocated memory to change an incredible number of times
List reallocation follows a geometric progression, so if the final length of the list is n then the list is reallocated only O(log n) times along the way; not an "incredible number of times". The way the maths works out, the average number of times each element gets copied to a new underlying array is a constant regardless of how large the list gets, hence the O(1) amortised cost of appending to the list.
I see two sentences:
total amortized cost of a sequence of operations must be an upper
bound on the total actual cost of the sequence
When assigning amortized costs to operations on a data structure, you
need to ensure that, for any sequence of operations performed, that
the sum of the amortized costs is always at least as big as the sum of
the actual costs of those operations.
my challenge is two things:
A) both of them meaning: amortized cost >= Real Cost of operation? I think amortized is (n* real cost).
B) is there any example to more clear me to understand? a real and short example?
The problem that amortization solves is that common operations may trigger occasional slow ones. Therefore if we add up the worst cases, we are effectively looking at how the program would perform if garbage collection is always running and every data structure had to be moved in memory every time. But if we ignore the worst cases, we are effectively ignoring that garbage collection sometimes does run, and large lists sometimes do run out of allocated space and have to be moved to a bigger bucket.
We solve this by gradually writing off occasional big operations over time. We write it off as soon as we realize that it may be needed some day. Which means that the amortized cost is usually bigger than the real cost, because it includes that future work, but occasionally the real cost is way bigger than the amortized. And, on average, they come out to around the same.
The standard example people start with is a list implementation where we allocate 2x the space we currently need, and then reallocate and move it if we use up space. When I run foo.append(...) in this implementation, usually I just insert. But occasionally I have to copy the whole large list. However if I just copied and the list had n items, after I append n times I will need to copy 2n items to a bigger space. Therefore my amortized analysis of what it costs to append includes the cost of an insert and moving 2 items. And over the next n times I call append it my estimate exceeds the real cost n-1 times and is less the nth time, but averages out exactly right.
(Python's real list implementation works like this except that the new list is around 9/8 the size of the old one.)
CPython deque is implemented as a doubly-linked list of 64-item sized "blocks" (arrays). The blocks are all full, except for the ones at either end of the linked list. IIUC, the blocks are freed when a pop / popleft removes the last item in the block; they are allocated when append/appendleft attempts to add a new item and the relevant block is full.
I understand the listed advantages of using a linked list of blocks rather than a linked list of items:
reduce memory cost of pointers to prev and next in every item
reduce runtime cost of doing malloc/free for every item added/removed
improve cache locality by placing consecutive pointers next to each other
But why wasn't a single dynamically-sized circular array used instead of the doubly-linked list in the first place?
AFAICT, the circular array would preserve all the above advantages, and maintain the (amortized) cost of pop*/append* at O(1) (by overallocating, just like in list). In addition, it would improve the cost of lookup by index from the current O(n) to O(1). A circular array would also be simpler to implement, since it can reuse much of the list implementation.
I can see an argument in favor of a linked list in languages like C++, where removal of an item from the middle can be done in O(1) using a pointer or iterator; however, python deque has no API to do this.
Adapted from my reply on the python-dev mailing list:
The primary point of a deque is to make popping and pushing at both ends efficient. That's what the current implementation does: worst-case constant time per push or pop regardless of how many items are in the deque. That beats "amortized O(1)" in the small and in the large. That's why it was done this way.
Some other deque methods are consequently slower than they are for lists, but who cares? For example, the only indices I've ever used with a deque are 0 and -1 (to peek at one end or the other of a deque), and the implementation makes accessing those specific indices constant-time too.
Indeed, the message from Raymond Hettinger referenced by Jim Fasarakis Hilliard in his comment:
https://www.mail-archive.com/python-dev#python.org/msg25024.html
confirms that
The only reason that __getitem__ was put in was to support fast access to the head and tail without actually popping the value
In addition to accepting #TimPeters answer, I wanted to add a couple additional observations that don't fit into a comment format.
Let's just focus on a common use case where deque is used as a simple a FIFO queue.
Once the queue reaches its peak size, the circular array need no more allocations of memory from the heap. I thought of it as an advantage, but it turns out the CPython implementation achieved the same by keeping a list of reusable memory blocks. A tie.
While the queue size is growing, both the circular array and the CPython need memory from the heap. CPython needs a simple malloc, while the array needs the (potentially much more expensive) realloc (unless extra space happens to be available on the right edge of the original memory block, it needs to free the old memory and copy the data over). Advantage to CPython.
If the queue peaked out at a much larger size than its stable size, both CPython and the array implementation would waste the unused memory (the former by saving it in a reusable block list, the latter by leaving the unused empty space in the array). A tie.
As #TimPeters pointed out, the cost of each FIFO queue put / get is always O(1) for CPython, but only amortized O(1) for the array. Advantage to CPython.
I need to create a list with n items that all equals to 0, I used this method:
list = [0] * n
Is the time-complexity O(n) or O(1)?
If it is O(n) is it a way to achieve this list with an O(1) complexity?
One way to allocate a large object in O(1) time can be (depending on your underlying operating system) mmap.mmap, see https://docs.python.org/2/library/mmap.html . However, it's not a list with all its flexibility -- it behaves rather like a mutable array of characters.
Sometimes that can still be helpful. In recent versions of Python 3, you can put a memoryview on the result of mmap, and e.b call .cast('L') to obtain a view of the same memory as a sequence of integers (both operations are O(1)). Still not a list -- you don't get a list's rich flexibility and panoply of methods -- but, more likely to be helpful...
Added: this does however remind of a time many years ago when I was working to port a big CAD application to a then-brand-new AIX version and Power-based IBM workstation. malloc was essentially instantaneous -- the kernel was just doing some magic behind the scenes with memory mapping hardware in the CPU.
However... actually accessing an item far into the malloced area (say N bytes form its start) for the first time could take O(N) as all needed pages actually got allocated -- and could even cause a crash if the system couldn't actually find or free that many pages within your process's address space (IOW, that malloc was "overcommitting" memory in the first place!).
As you can imagine this wrecked havoc on an application originally developed for more traditional implementations of malloc: the app checked malloc's return value -- if 0, it took appropriate remedial action (worst case, just letting the user know their desired operation could not be performed due to lack of memory); otherwise, it assumed the memory it just allocated was actually there... and sometimes ended up crashing in "random" spots down the road:-).
The solution was only moderately hard -- wrap malloc into a function with a postfix actually touching the right subset of the "nominally allocated" pages to ensure they were actually all there (it was hard to catch the low-level errors that could come at such times, but eventually I managed to find a way, with a little assembly-language coding as I recall). Of course, this did make the wrapped malloc O(N) all over again... there does appear to be no such thing as a free lunch, who'd have thought?!-)
I'm not sure how mmap.mmap(-1, lotsofbytes) actually behaves from this point on view on all platforms of your interest -- nothing for it but actually experiment, I guess!-) (If it does turn out to offer a free lunch, at least on popular platforms, now that would be worth celebrating:-).
Creating a (dense) list of n zeros can't be done in O(1). The only option you have is to look at some sparse or lazy data structures. Depending on what you want to use this for, one option might be itertools.repeat().
a = itertools.repeat(0, n)
will be executed in O(1). Iterating over it will, of course, still be O(n).
you can roll your own methods for generators too:
def foo(num, times):
for a in range(0, times):
yield num
After profiling in my machine :
import timeit
code = """
def foo(num, times):
for a in range(0, times):
yield num
"""
print timeit.timeit("foo(0,1)", setup=code)
print timeit.timeit("foo(0,2**8)", setup=code)
# 0.310677051544
# 0.32208776474