amortized analysis and one basic question (is there any simple example)?

amortized analysis and one basic question (is there any simple example)? - python

I see two sentences:
total amortized cost of a sequence of operations must be an upper
bound on the total actual cost of the sequence
When assigning amortized costs to operations on a data structure, you
need to ensure that, for any sequence of operations performed, that
the sum of the amortized costs is always at least as big as the sum of
the actual costs of those operations.
my challenge is two things:
A) both of them meaning: amortized cost >= Real Cost of operation? I think amortized is (n* real cost).
B) is there any example to more clear me to understand? a real and short example?

The problem that amortization solves is that common operations may trigger occasional slow ones. Therefore if we add up the worst cases, we are effectively looking at how the program would perform if garbage collection is always running and every data structure had to be moved in memory every time. But if we ignore the worst cases, we are effectively ignoring that garbage collection sometimes does run, and large lists sometimes do run out of allocated space and have to be moved to a bigger bucket.
We solve this by gradually writing off occasional big operations over time. We write it off as soon as we realize that it may be needed some day. Which means that the amortized cost is usually bigger than the real cost, because it includes that future work, but occasionally the real cost is way bigger than the amortized. And, on average, they come out to around the same.
The standard example people start with is a list implementation where we allocate 2x the space we currently need, and then reallocate and move it if we use up space. When I run foo.append(...) in this implementation, usually I just insert. But occasionally I have to copy the whole large list. However if I just copied and the list had n items, after I append n times I will need to copy 2n items to a bigger space. Therefore my amortized analysis of what it costs to append includes the cost of an insert and moving 2 items. And over the next n times I call append it my estimate exceeds the real cost n-1 times and is less the nth time, but averages out exactly right.
(Python's real list implementation works like this except that the new list is around 9/8 the size of the old one.)

Related

Does this sentence contradicts the python paradigm "list should not be initialized"?

People coming from other coding languages to python often ask how they should pre-allocate or initialize their list. This is especially true for people coming from Matlab where codes as
l = []
for i = 1:100
l(end+1) = 1;
end
returns a warning that explicitly suggest you to initialize the list.
There are several posts on SO explaining (and showing through tests) that list initialization isn't required in python. A good example with a fair bit of discussion is this one (but the list could be very long): Create a list with initial capacity in Python
The other day, however, while looking for operations complexity in python, I stumbled this sentence on the official python wiki:
the largest [cost for list operations] come from growing beyond the current allocation size (because everything must move),
This seems to suggest that indeed lists do have a pre-allocation size and that growing beyond that size cause the whole list to move.
This shacked a bit my foundations. Can list pre-allocation reduce the overall complexity (in terms of number of operations) of a code? If not, what does that sentence means?
EDIT:
Clearly my question regards the (very common) code:
container = ... #some iterable with 1 gazilion elements
new_list = []
for x in container:
... #do whatever you want with x
new_list.append(x) #or something computed using x
In this case the compiler cannot know how many items there are in container, so new_list could potentially require his allocated memory to change an incredible number of times if what is said in that sentence is true.
I know that this is different for list-comprehensions

Can list pre-allocation reduce the overall complexity (in terms of number of operations) of a code?
No, the overall time complexity of the code will be the same, because the time cost of reallocating the list is O(1) when amortised over all of the operations which increase the size of the list.
If not, what does that sentence means?
In principle, pre-allocating the list could reduce the running time by some constant factor, by avoiding multiple re-allocations. This doesn't mean the complexity is lower, but it may mean the code is faster in practice. If in doubt, benchmark or profile the relevant part of your code to compare the two options; in most circumstances it won't matter, and when it does, there are likely to be better alternatives anyway (e.g. NumPy arrays) for achieving the same goal.
new_list could potentially require his allocated memory to change an incredible number of times
List reallocation follows a geometric progression, so if the final length of the list is n then the list is reallocated only O(log n) times along the way; not an "incredible number of times". The way the maths works out, the average number of times each element gets copied to a new underlying array is a constant regardless of how large the list gets, hence the O(1) amortised cost of appending to the list.

Python list.clear() time and space complexity?

I am writing a blogpost on Python list.clear() method where I also want to mention about the time and space complexity of the underlying algorithm. I expected the time complexity to be O(N), iterate over the elements and free the memory? But, I found an article where it is mentioned that it is actually an O(1) operation. Then, I searched the source code of the method in CPython implementation and found a method which I believe is the actual internal implementation of list.clear(), however, I am not really sure it is. Here's the source code of the method:
static int
_list_clear(PyListObject *a)
{
Py_ssize_t i;
PyObject **item = a->ob_item;
if (item != NULL) {
/* Because XDECREF can recursively invoke operations on
this list, we make it empty first. */
i = Py_SIZE(a);
Py_SIZE(a) = 0;
a->ob_item = NULL;
a->allocated = 0;
while (--i >= 0) {
Py_XDECREF(item[i]);
}
PyMem_FREE(item);
}
/* Never fails; the return value can be ignored.
Note that there is no guarantee that the list is actually empty
at this point, because XDECREF may have populated it again! */
return 0;
}
I could be wrong but it does look like O(N) to me. Also, I found a similar question here, but there's no clear answer there. Just want to confirm the actual time and space complexity of list.clear(), and maybe a little explanation supporting the answer. Any help appreciated. Thanks.

As you correctly noticed, the CPython implementation of list.clear is O(n). The code iterates over the elements in order to decrease the reference count of each one, without a way to avoid it. There is no doubt that it is an O(n) operation and, given a large enough list, you can measure the time spent in clear() as function of list size:
import time
for size in 1_000_000, 10_000_000, 100_000_000, 1_000_000_000:
l = [None] * size
t0 = time.time()
l.clear()
t1 = time.time()
print(size, t1 - t0)
The output shows linear complexity; on my system with Python 3.7 it prints the following:
1000000 0.0023756027221679688
10000000 0.02452826499938965
100000000 0.23625731468200684
1000000000 2.31496524810791
The time per element is of course tiny because the loop is coded in C and each iteration does very little work. But, as the above measurement shows, even a miniscule per-element factor eventually adds up. Small per-element constant is not the reason to ignore the cost of an operation, or the same would apply to the loop that shifts the list elements in l.insert(0, ...), which is also very efficient - and yet few would claim insertion at the beginning to be O(1). (And clear potentially does more work because a decref will run an arbitrary chain of destructors for an object whose reference count actually reaches zero.)
On a philosophical level, one could argue that costs of memory management should be ignored when assessing complexity because otherwise it would be impossible to analyze anything with certainty, as any operation could trigger a GC. This argument has merit; GC does come occasionally and unpredictably, and its cost can be considered amortized across all allocations. In a similar vein complexity analysis tends to ignore the complexity of malloc because the parameters it depends on (like memory fragmentation) are typically not directly related to allocation size or even to the number of already allocated blocks. However, in case of list.clear there is only one allocated block, no GC is triggered, and the code is still visiting each and every list element. Even with the assumption of O(1) malloc and amortized O(1) GC, list.clear still takes the time proportional to the number of elements in the list.
The article linked from the question is about Python the language and doesn't mention a particular implementation. Python implementations that don't use reference counting, such as Jython or PyPy, are likely to have true O(1) list.clear, and for them the claim from the article would be entirely correct. So, when explaining the Python list on a conceptual level, it is not wrong to say that clearing the list is O(1) - after all, all the object references are in a contiguous array, and you free it only once. This is the point your blog post probably should make, and that is what the linked article is trying to say. Taking the cost of reference counting into account too early might confuse your readers and give them completely wrong ideas about Python's lists (e.g. they could imagine that they are implemented as linked lists).
Finally, at some point one must accept that memory management strategy does change complexity of some operations. For example, destroying a linked list in C++ is O(n) from the perspective of the caller; discarding it in Java or Go would be O(1). And not in the trivial sense of a garbage-collected language just postponing the same work for later - it is quite possible that a moving collector will only traverse reachable objects and will indeed never visit the elements of the discarded linked list. Reference counting makes discarding large containers algorithmically similar to manual collection, and GC can remove that. While CPython's list.clear has to touch every element to avoid a memory leak, it is quite possible that PyPy's garbage collector never needs to do anything of the sort, and thus has a true O(1) list.clear.

It's O(1) neglecting memory management. It's not quite right to say it's O(N) accounting for memory management, because accounting for memory management is complicated.
Most of the time, for most purposes, we treat the costs of memory management separately from the costs of the operations that triggered it. Otherwise, just about everything you could possibly do becomes O(who even knows), because almost any operation could trigger a garbage collection pass or an expensive destructor or something. Heck, even in languages like C with "manual" memory management, there's no guarantee that any particular malloc or free call will be fast.
There's an argument to be made that refcounting operations should be treated differently. After all, list.clear explicitly performs a number of Py_XDECREF operations equal to the list's length, and even if no objects are deallocated or finalized as a result, the refcounting itself will necessarily take time proportional to the length of the list.
If you count the Py_XDECREF operations list.clear performs explicitly, but ignore any destructors or other code that might be triggered by the refcounting operations, and you assume PyMem_FREE is constant time, then list.clear is O(N), where N is the original length of the list. If you discount all memory management overhead, including the explicit Py_XDECREF operations, list.clear is O(1). If you count all memory management costs, then the runtime of list.clear cannot be asymptotically bounded by any function of the list's length.

As the other answers have noted, it takes O(n) time to clear a list of length n. But I think there is an additional point to be made about amortized complexity here.
If you start with an empty list, and do N append or clear operations in any order, then the total running time across all of those operations is always O(N), giving an average per operation of O(1), however long the list gets in the process, and however many of those operations are clear.
Like clear, the worst case for append is also O(n) time where n is the length of the list. That's because when the capacity of the underlying array needs to be increased, we have to allocate a new array and copy everything across. But the cost of copying each element can be "charged" to one of the append operations which got the list to a length where the array needs to be resized, in such a way that N append operations starting from an empty list always take O(N) time.
Likewise, the cost of decrementing an element's refcount in the clear method can be "charged" to the append operation which inserted that element in the first place, because each element can only get cleared once. The conclusion is that if you are using a list as an internal data structure in your algorithm, and your algorithm repeatedly clears that list inside a loop, then for the purpose of analysing your algorithm's time complexity you should count clear on that list as an O(1) operation, just as you'd count append as an O(1) operation in the same circumstances.

A quick time check indicates that it is O(n).
Let's execute the following and create the lists beforehand to avoid overhead:
import time
import random
list_1000000 = [random.randint(0,10) for i in range(1000000)]
list_10000000 = [random.randint(0,10) for i in range(10000000)]
list_100000000 = [random.randint(0,10) for i in range(100000000)]
Now check for the time it takes to clear these four lists of different sizes as follows:
start = time.time()
list.clear(my_list)
end = time.time()
print(end - start))
Results:
list.clear(list_1000000) takes 0.015
list.clear(list_10000000) takes 0.074
list.clear(list_100000000) takes 0.64
A more robust time measurement is needed, since these numbers can deviate each time it is ran, but the results indicate that the execution time goes pretty much linearly as the input size grows. Hence we can conclude an O(n) complexity.

Using heap for big disk sorts

On the official Python docs here, it is mentioned that:
Heaps are also very useful in big disk sorts. You most probably all
know that a big sort implies producing “runs” (which are pre-sorted
sequences, whose size is usually related to the amount of CPU memory),
followed by a merging passes for these runs, which merging is often
very cleverly organised.
It is very important that the initial
sort produces the longest runs possible. Tournaments are a good way
to achieve that. If, using all the memory available to hold a
tournament, you replace and percolate items that happen to fit the
current run, you’ll produce runs which are twice the size of the
memory for random input, and much better for input fuzzily ordered.
Moreover, if you output the 0’th item on disk and get an input which
may not fit in the current tournament (because the value “wins” over
the last output value), it cannot fit in the heap, so the size of the
heap decreases. The freed memory could be cleverly reused
immediately for progressively building a second heap, which grows at
exactly the same rate the first heap is melting.
When the first heap
completely vanishes, you switch heaps and start a new run. Clever and
quite effective!
I am aware of an algorithm called External sorting in which we:
Break down the input into smaller chunks.
Sort all the chunks individually and write them back to disk one-by-one.
Create a heap and do a k-way merge among all the sorted chunks.
I completely understood external sorting as described on Wikipedia, but am not able to understand the author when they say:
If, using all the memory available to hold a tournament, you replace
and percolate items that happen to fit the current run, you’ll produce
runs which are twice the size of the memory for random input, and much
better for input fuzzily ordered.
and:
Moreover, if you output the 0’th item on disk and get an input which
may not fit in the current tournament (because the value “wins” over
the last output value), it cannot fit in the heap, so the size of the
heap decreases.
What is this heap melting?

Heap melting is not a thing. It's just the word the author uses for the heap getting smaller as to pull out the smallest items.
The idea he's talking about is a clever replacement for "divide the input into chunks and sort the chunks" part of the external sort. It produces larger sorted chunks.
The idea is that you first read the biggest chunk you can into memory and arrange it into a heap, then you start writing out the smallest elements from the heap as you read new elements in.
When you read in an element that is smaller than an element you have already written out, it can't go in the current chunk (it would ruin the sort), so you remember it for the next chunk. Elements that are not smaller than the last one you wrote out can be inserted into the heap. They will make it out into the current chunk, making the current chunk larger.
Eventually your heap will be empty. At that point you're done with the current chunk -- heapify all the elements you remembered and start writing out the next chunk.

Is my understanding of Hashsets correct?(Python)

I'm teaching myself data structures through this python book and I'd appreciate if someone can correct me if I'm wrong since a hash set seems to be extremely similar to a hash map.
Implementation:
A Hashset is a list [] or array where each index points to the head of a linkedlist
So some hash(some_item) --> key, and then list[key] and then add to the head of a LinkedList. This occurs in O(1) time
When removing a value from the linkedlist, in python we replace it with a placeholder because hashsets are not allowed to have Null/None values, correct?
When the list[] gets over a certain % of load/fullness, we copy it over to another list
Regarding Time Complexity Confusion:
So one question is, why is Average search/access O(1) if there can be a list of N items at the linkedlist at a given index?
Wouldnt the average case be the searchitem is in the middle of its indexed linkedlist so it should be O(n/2) -> O(n)?
Also, when removing an item, if we are replacing it with a placeholder value, isn't this considered a waste of memory if the placeholder is never used?
And finally, what is the difference between this and a HashMap other than HashMaps can have nulls? And HashMaps are key/value while Hashsets are just value?

For your first question - why is the average time complexity of a lookup O(1)? - this statement is in general only true if you have a good hash function. An ideal hash function is one that causes a nice spread on its elements. In particular, hash functions are usually chosen so that the probability that any two elements collide is low. Under this assumption, it's possible to formally prove that the expected number of elements to check is O(1). If you search online for "universal family of hash functions," you'll probably find some good proofs of this result.
As for using placeholders - there are several different ways to implement a hash table. The approach you're using is called "closed addressing" or "hashing with chaining," and in that approach there's little reason to use placeholders. However, other hashing strategies exist as well. One common family of approaches is called "open addressing" (the most famous of which is linear probing hashing), and in those setups placeholder elements are necessary to avoid false negative lookups. Searching online for more details on this will likely give you a good explanation about why.
As for how this differs from HashMap, the HashMap is just one possible implementation of a map abstraction backed by a hash table. Java's HashMap does support nulls, while other approaches don't.

The lookup time wouldn't be O(n) because not all items need to be searched, it also depends on the number of buckets. More buckets would decrease the probability of a collision and reduce the chain length.
The number of buckets can be kept as a constant factor of the number of entries by resizing the hash table as needed. Along with a hash function that evenly distributes the values, this keeps the expected chain length bounded, giving constant time lookups.
The hash tables used by hashmaps and hashsets are the same except they store different values. A hashset will contain references to a single value, and a hashmap will contain references to a key and a value. Hashsets can be implemented by delegating to a hashmap where the keys and values are the same.

A lot has been written here about open hash tables, but some fundamental points are missed.
Practical implementations generally have O(1) lookup and delete because they guarantee buckets won't contain more than a fixed number of items (the load factor). But this means they can only achieve amortized O(1) time for insert because the table needs to be reorganized periodically as it grows.
(Some may opt to reorganize on delete, also, to shrink the table when the load factor reaches some bottom threshold, gut this only affect space, not asymptotic run time.)
Reorganization means increasing (or decreasing) the number of buckets and re-assigning all elements into their new bucket locations. There are schemes, e.g. extensible hashing, to make this a bit cheaper. But in general it means touching each element in the table.
Reorganization, then, is O(n). How can insert be O(1) when any given one may incur this cost? The secret is amortization and the power of powers. When the table is grown, it must be grown by a factor greater than one, two being most common. If the table starts with 1 bucket and doubles each time the load factor reaches F, then the cost of N reorganizations is
F + 2F + 4F + 8F ... (2^(N-1))F = (2^N - 1)F
At this point the table contains (2^(N-1))F elements, the number in the table during the last reorganization. I.e. we have done (2^(N-1))F inserts, and the total cost of reorganization is as shown on the right. The interesting part is the average cost per element in the table (or insert, take your pick):
(2^N - 1)F 2^N
---------- ~= ------- = 2
(2^(N-1))F 2^(N-1)
That's where the amortized O(1) comes from.
One additional point is that for modern processors, linked lists aren't a great idea for the bucket lists. With 8-byte pointers, the overhead is meaningful. More importantly, heap-allocated nodes in a single list will almost never be contiguous in memory. Traversing such a list kills cache performance, which can slow things down by orders of magnitude.
Arrays (with an integer count for number of data-containing elements) are likely to work out better. If the load factor is small enough, just allocate an array equal in size to the load factor at the time the first element is inserted in the bucket. Otherwise, grow these element arrays by factors the same way as the bucket array! Everything will still amortize to O(1).
To delete an item from such a bucket, don't mark it deleted. Just copy the last array element to the location of the deleted one and decrement the element count. Of course this won't work if you allow external pointers into the hash buckets, but that's a bad idea anyway.

Why is the following algorithm O(1) space?

Input: A list of positive integers where one entry occurs exactly once, and all other entries occur exactly twice (for example [1,3,2,5,3,4,1,2,4])
Output: The unique entry (5 in the above example)
The following algorithm is supposed to be O(m) time and O(1) space where m is the size of the list.
def get_unique(intlist):
unique_val = 0
for int in intlist:
unique_val ^= int
return unique_val
My analysis: Given a list of length m there will be (m + 1)/2 unique positive integers in the input list, so that the smallest possible maximum integer in the list will be (m+1)/2. If we assume this best case, then when taking an XOR sum the variable unique_val will require ceiling(log((m+1)/2)) bits in memory, so I thought the space complexity should be at least O(log(m)).

Your analysis is certainly one correct answer, particularly in a language like Python which gracefully handles arbitrarily large numbers.
It's important to be clear about what you're trying to measure when thinking about space and time complexity. A reasonable assumption might be that the size of an integer is constant (e.g. you're using 64-bit integers). In that case, the space complexity is certainly O(1), but the time complexity is still O(m).
Now, you could also argue that using a fixed-size integer means you have a constant upper-bound on the size of m, so perhaps the time complexity is also O(1). But in most cases where you need to analyze the running time of this sort of algorithm, you're probably very interested in the difference between a list of length 10 and one of length 1 billion.
I'd say it's important to clarify and state your assumptions when analyzing space- and time-complexity. In this case, I would assume we have a fixed size integer and a value of m much smaller than the maximum integer value. In that case, O(1) space and O(m) time are probably the best answers.
EDIT (based on discussion in other answers)
Since all m gives you is a lower-bound no the maximum value in the list, you really can't provide a worst-case estimate of the space. I.e. a number in the list can be arbitrarily large. To have any reasonable answer as to the space complexity of this algorithm, you need to make some assumption about the maximum size of the input values.

The (space/time) complexity analysis is usually applied to algorithms on a higher level. While you can drop down to specific language implementation level, it may not be useful in all cases.
Your analysis is both right and possibly wrong. It's right for current cpython implementation where integers do not have a maximum value. It's ok if all your integers are relatively small and fit into the implementation-specific case of small numbers.
But it doesn't have to be valid for all other implementations of python. For example, you could have an optimizing implementation which figures out that intlist is not used again and instead of using unique_val, it reuses the space of the consumed list elements. (basically transforming this function into a space-optimized reduce call)
Then again, can we even talk about space complexity in a GC'd language with allocated integers? Your analysis of the complexity is wrong, because a ^= b will allocate new memory for big value b and the size of that depends on the system, architecture, python version, and luck.
Your original question is however "Why is the following algorithm O(1) space?". If you look at the algorithm itself and assume you have some arbitrary maximum integer limits, or your language can represent any number in a limited space, then the answer is yes. The algorithm itself with those conditions uses constant space.

The complexity of an algorithm is always dependent on the machine model (= platform) you use. E.g. we often say that multiplying and dividing IEEE floating point numbers is of run-time complexity O(1) - which is not always the case (e.g. on an 8086 processor without FPU).
For the above algorithm, the space complexity O(1) only holds as long as your input list has no element > 2147483647 (= sys.maxint). Usually, python stores integers as signed 32 bit values. For those datatypes, your processor has all relevant operations already implemented in hardware and it generally takes only a constant number of clock cycles (in most cases only one) to perform them (= run-time complexity O(1)) and only a constant number of memory addresses (only one) is occupied to store the result (= space complexity O(1)).
However, if your input exceeds 2147483647, python generally uses a software-implemented datatype to store these big integers. Operations on these are no longer in O(1) and they require more than constant O(1) space.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.