Does string slicing perform copy in memory? [duplicate]

Does string slicing perform copy in memory? [duplicate] - python

This question already has answers here:
Does Python do slice-by-reference on strings?
(2 answers)
Closed 2 years ago.
I'm wondering if :
a = "abcdef"
b = "def"
if a[3:] == b:
print("something")
does actually perform a copy of the "def" part of a somewhere in memory, or if the letters checking is done in-place ?
Note : I'm speaking about a string, not a list (for which I know the answer)

String slicing makes a copy in CPython.
Looking in the source, this operation is handled in unicodeobject.c:unicode_subscript. There is evidently a special-case to re-use memory when the step is 1, start is 0, and the entire content of the string is sliced - this goes into unicode_result_unchanged and there will not be a copy. However, the general case calls PyUnicode_Substring where all roads lead to a memcpy.
To empirically verify these claims, you can use a stdlib memory profiling tool tracemalloc:
# s.py
import tracemalloc
tracemalloc.start()
before = tracemalloc.take_snapshot()
a = "." * 7 * 1024**2 # 7 MB of ..... # line 6, first alloc
b = a[1:] # line 7, second alloc
after = tracemalloc.take_snapshot()
for stat in after.compare_to(before, 'lineno')[:2]:
print(stat)
You should see the top two statistics output like this:
/tmp/s.py:6: size=7168 KiB (+7168 KiB), count=1 (+1), average=7168 KiB
/tmp/s.py:7: size=7168 KiB (+7168 KiB), count=1 (+1), average=7168 KiB
This result shows two allocations of 7 meg, strong evidence of the memory copying, and the exact line numbers of those allocations will be indicated.
Try changing the slice from b = a[1:] into b = a[0:] to see that entire-string-special-case in effect: there should be only one large allocation now, and sys.getrefcount(a) will increase by one.
In theory, since strings are immutable, an implementation could re-use memory for substring slices. This would likely complicate any reference-counting based garbage collection process, so it may not be a useful idea in practice. Consider the case where a small slice from a much larger string was taken - unless you implemented some kind of sub-reference counting on the slice, the memory from the much larger string could not be freed until the end of the substring's lifetime.
For users that specifically need a standard type which can be sliced without copying the underlying data, there is memoryview. See What exactly is the point of memoryview in Python for more information about that.

Possible talking point (feel free to edit adding information).
I have just written this test to verify empirically what the answer to the question might be (this cannot and does not want to be a certain answer).
import sys
a = "abcdefg"
print("a id:", id(a))
print("a[2:] id:", id(a[2:]))
print("a[2:] is a:", a[2:] is a)
print("Empty string memory size:", sys.getsizeof(""))
print("a memory size:", sys.getsizeof(a))
print("a[2:] memory size:", sys.getsizeof(a[2:]))
Output:
a id: 139796109961712
a[2:] id: 139796109962160
a[2:] is a: False
Empty string memory size: 49
a memory size: 56
a[2:] memory size: 54
As we can see here:
the size of an empty string object is 49 bytes
a single character occupies 1 byte (Latin-1 encoding)
a and a[2:] ids are different
the occupied memory of each a and a[2:] is consistent with the memory occupied by a string with that number of char assigned

Related

Is there away to allocate memory to a long string in python?

I am working with strings that are recursively concatenated to lengths of around 80 million characters. Python slows down dramatically as the string length increases.
Consider the following loop:
s = ''
for n in range(0,r):
s += 't'
I measure the time it takes to run to be 86ms for r = 800,000, 3.11 seconds for r = 8,000,000 and 222 seconds for r = 80,000,000
I am guessing this has something to do with how python allocates additional memory for the string. Is there a way to speed this up, such as allocating the full 80MB to the string s when it is declared?

When you have a string value and change it in your program, then the previous value will remain in a part of memory and the changed string will be placed in a new part of RAM.
As a result, the old values in RAM remain unused.
For this purpose, Garbage Collector is used and cleans your RAM from old, unused values But it will take time.
You can do this yourself. You can use the gc module to see different generations of your objects See this:
import gc
print(gc.get_threshold())
result:
(596, 2, 1)
In this example, we have 596 objects in our youngest generation, two objects in the next generation, and one object in the oldest generation.
For this reason, the allocation speed may be slow and your program may slow down
use this link to efficient String Concatenation in Python
Good luck.

It can't be done in a straightforward way with text (string) objects, but it it is trivial if you are dealing with bytes - in that case, you can create a bytearray object larger than your final outcome and insert your values into it.
If you need the final object as text, you can then decode it to text, the single step will be fast enough.
As you don't state what is the nature of your data, it may become harder - if no single-byte encoding can cover all the characters you need, you have to resort to a variable-lenght encoding, such as utf-8, or a multibyte encoding, such as utf-16 or 32. In both cases, it is no problem if you keep proper track of your insetion index - which will be also your final datasize for re-encoding. (If all you are using are genetic "GATACA" strings, just use ASCII encoding, and you are done)
data = bytearray(100_000_000) # 100 million positions -
index = 0
for character in input_data:
v = character.encode("utf-8")
s = len(v)
if s == 1:
data[index] = v
else:
data[index: index + len(v)] = v
index += len(v)
data_as_text = data[:index].decode("utf-8")

Why tempory variable calculating in python for-loop takes so much memory usage? [duplicate]

This question already has answers here:
Python string interning
(2 answers)
About the changing id of an immutable string
(5 answers)
Closed 3 years ago.
The following two codes are equivalent, but the first one takes about 700M memory, the latter one takes only about 100M memory(via windows task manager). What happens here?
def a():
lst = []
for i in range(10**7):
t = "a"
t = t * 2
lst.append(t)
return lst
_ = a()
def a():
lst = []
for i in range(10**7):
t = "a" * 2
lst.append(t)
return lst
_ = a()

#vurmux presented the right reason for the different memory usage: string interning, but some important details seem to be missing.
CPython-implementation interns some strings during the compilation, e.g "a"*2 - for more info about how/why "a"*2 gets interned see this SO-post.
Clarification: As #MartijnPieters has correctly pointed out in his comment: the important thing is whether the compiler does the constant-folding (e.g. evaluates the multiplication of two constants "a"*2) or not. If constant-folding is done, the resulting constant will be used and all elements in the list will be references to the same object, otherwise not. Even if all string constants get interned (and thus constant folding performed => string interned) - still it was sloppy to speak about interning: constant folding is the key here, as it explains the behavior also for types which have no interning at all, for example floats (if we would use t=42*2.0).
Whether constant folding has happened, can be easily verified with dis-module (I call your second version a2()):
>>> import dis
>>> dis.dis(a2)
...
4 18 LOAD_CONST 2 ('aa')
20 STORE_FAST 2 (t)
...
As we can see, during the run time the multiplication isn't performed, but directly the result (which was computed during the compiler time) of the multiplication is loaded - the resulting list consists of references to the same object (the constant loaded with 18 LOAD_CONST 2):
>>> len({id(s) for s in a2()})
1
There, only 8 bytes per reference are needed, that means about 80Mb (+overalocation of the list + memory needed for the interpreter) memory needed.
In Python3.7 constant folding isn't performed if the resulting string has more than 4096 characters, so replacing "a"*2 with "a"*4097 leads to the following byte-code:
>>> dis.dis(a1)
...
4 18 LOAD_CONST 2 ('a')
20 LOAD_CONST 3 (4097)
22 BINARY_MULTIPLY
24 STORE_FAST 2 (t)
...
Now, the multiplication isn't precalculated, the references in the resulting string will be of different objects.
The optimizer is yet not smart enough to recognize, that t is actually "a" in t=t*2, otherwise it would be able to perform the constant folding, but for now the resulting byte-code for your first version (I call it a2()):
...
5 22 LOAD_CONST 3 (2)
24 LOAD_FAST 2 (t)
26 BINARY_MULTIPLY
28 STORE_FAST 2 (t)
...
and it will return a list with 10^7 different objects (but all object being equal) inside:
>>> len({id(s) for s in a1()})
10000000
i.e. you will need about 56 bytes per string (sys.getsizeof returns 51, but because the pymalloc-memory-allocator is 8-byte aligned, 5 bytes will be wasted) + 8 bytes per reference (assuming 64bit-CPython-version), thus about 610Mb (+overalocation of the list + memory needed for the interpreter).
You can enforce the interning of the string via sys.intern:
import sys
def a1_interned():
lst = []
for i in range(10**7):
t = "a"
t = t * 2
# here ensure, that the string-object gets interned
# returned value is the interned version
t = sys.intern(t)
lst.append(t)
return lst
And realy, we can now not only see, that less memory is needed, but also that the list has references to the same object (see it online for a slightly smaller size(10^5) here):
>>> len({id(s) for s in a1_interned()})
1
>>> all((s=="aa" for s in a1_interned())
True
String interning can save a lot of memory, but it is sometimes tricky to understand, whether/why a string gets interned or not. Calling sys.intern explicitly eliminates this uncertainty.
The existence of additional temporary objects referenced by t is not the problem: CPython uses reference counting for memory managment, so an object gets deleted as soon as there is no references to it - without any interaction from the garbage collector, which in CPython is only used to break-up cycles (which is different to for example Java's GC, as Java doesn't use reference counting). Thus, temporary variables are really temporaries - those objects cannot be accumulated to make any impact on memory usage.
The problem with the temporary variable t is only that it prevents peephole optimization during the compilation, which is performed for "a"*2 but not for t*2.

This difference is exist because of string interning in Python interpreter:
String interning is the method of caching particular strings in memory as they are instantiated. The idea is that, since strings in Python are immutable objects, only one instance of a particular string is needed at a time. By storing an instantiated string in memory, any future references to that same string can be directed to refer to the singleton already in existence, instead of taking up new memory.
Let me show it in a simple example:
>>> t1 = 'a'
>>> t2 = t1 * 2
>>> t2 is 'aa'
False
>>> t1 = 'a'
>>> t2 = 'a'*2
>>> t2 is 'aa'
True
When you use the first variant, the Python string interning is not used so the interpreter creates additional internal variables to store temporal data. It can't optimize many-lines-code this way.
I am not a Python guru, but I think the interpreter works this way:
t = "a"
t = t * 2
In the first line it creates an object for t. In the second line it creates a temporary object for t right of the = sign and writes the result in the third place in the memory (with GC called later). So the second variant should use at least 3 times less memory than the first.
P.S. You can read more about string interning here.

Why does python forbid the use of sum with strings? [duplicate]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 4 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
Python has a built in function sum, which is effectively equivalent to:
def sum2(iterable, start=0):
return start + reduce(operator.add, iterable)
for all types of parameters except strings. It works for numbers and lists, for example:
sum([1,2,3], 0) = sum2([1,2,3],0) = 6 #Note: 0 is the default value for start, but I include it for clarity
sum({888:1}, 0) = sum2({888:1},0) = 888
Why were strings specially left out?
sum( ['foo','bar'], '') # TypeError: sum() can't sum strings [use ''.join(seq) instead]
sum2(['foo','bar'], '') = 'foobar'
I seem to remember discussions in the Python list for the reason, so an explanation or a link to a thread explaining it would be fine.
Edit: I am aware that the standard way is to do "".join. My question is why the option of using sum for strings was banned, and no banning was there for, say, lists.
Edit 2: Although I believe this is not needed given all the good answers I got, the question is: Why does sum work on an iterable containing numbers or an iterable containing lists but not an iterable containing strings?

Python tries to discourage you from "summing" strings. You're supposed to join them:
"".join(list_of_strings)
It's a lot faster, and uses much less memory.
A quick benchmark:
$ python -m timeit -s 'import operator; strings = ["a"]*10000' 'r = reduce(operator.add, strings)'
100 loops, best of 3: 8.46 msec per loop
$ python -m timeit -s 'import operator; strings = ["a"]*10000' 'r = "".join(strings)'
1000 loops, best of 3: 296 usec per loop
Edit (to answer OP's edit): As to why strings were apparently "singled out", I believe it's simply a matter of optimizing for a common case, as well as of enforcing best practice: you can join strings much faster with ''.join, so explicitly forbidding strings on sum will point this out to newbies.
BTW, this restriction has been in place "forever", i.e., since the sum was added as a built-in function (rev. 32347)

You can in fact use sum(..) to concatenate strings, if you use the appropriate starting object! Of course, if you go this far you have already understood enough to use "".join(..) anyway..
>>> class ZeroObject(object):
... def __add__(self, other):
... return other
...
>>> sum(["hi", "there"], ZeroObject())
'hithere'

Here's the source: http://svn.python.org/view/python/trunk/Python/bltinmodule.c?revision=81029&view=markup
In the builtin_sum function we have this bit of code:
/* reject string values for 'start' parameter */
if (PyObject_TypeCheck(result, &PyBaseString_Type)) {
PyErr_SetString(PyExc_TypeError,
"sum() can't sum strings [use ''.join(seq) instead]");
Py_DECREF(iter);
return NULL;
}
Py_INCREF(result);
}
So.. that's your answer.
It's explicitly checked in the code and rejected.

From the docs:
The preferred, fast way to concatenate a
sequence of strings is by calling
''.join(sequence).
By making sum refuse to operate on strings, Python has encouraged you to use the correct method.

Short answer: Efficiency.
Long answer: The sum function has to create an object for each partial sum.
Assume that the amount of time required to create an object is directly proportional to the size of its data. Let N denote the number of elements in the sequence to sum.
doubles are always the same size, which makes sum's running time O(1)×N = O(N).
int (formerly known as long) is arbitary-length. Let M denote the absolute value of the largest sequence element. Then sum's worst-case running time is lg(M) + lg(2M) + lg(3M) + ... + lg(NM) = N×lg(M) + lg(N!) = O(N log N).
For str (where M = the length of the longest string), the worst-case running time is M + 2M + 3M + ... + NM = M×(1 + 2 + ... + N) = O(N²).
Thus, summing strings would be much slower than summing numbers.
str.join does not allocate any intermediate objects. It preallocates a buffer large enough to hold the joined strings, and copies the string data. It runs in O(N) time, much faster than sum.

The Reason Why
#dan04 has an excellent explanation for the costs of using sum on large lists of strings.
The missing piece as to why str is not allowed for sum is that many, many people were trying to use sum for strings, and not many use sum for lists and tuples and other O(n**2) data structures. The trap is that sum works just fine for short lists of strings, but then gets put in production where the lists can be huge, and the performance slows to a crawl. This was such a common trap that the decision was made to ignore duck-typing in this instance, and not allow strings to be used with sum.

Edit: Moved the parts about immutability to history.
Basically, its a question of preallocation. When you use a statement such as
sum(["a", "b", "c", ..., ])
and expect it to work similar to a reduce statement, the code generated looks something like
v1 = "" + "a" # must allocate v1 and set its size to len("") + len("a")
v2 = v1 + "b" # must allocate v2 and set its size to len("a") + len("b")
...
res = v10000 + "$" # must allocate res and set its size to len(v9999) + len("$")
In each of these steps a new string is created, which for one might give some copying overhead as the strings are getting longer and longer. But that’s maybe not the point here. What’s more important, is that every new string on each line must be allocated to it’s specific size (which. I don’t know it it must allocate in every iteration of the reduce statement, there might be some obvious heuristics to use and Python might allocate a bit more here and there for reuse – but at several points the new string will be large enough that this won’t help anymore and Python must allocate again, which is rather expensive.
A dedicated method like join, however has the job to figure out the real size of the string before it starts and would therefore in theory only allocate once, at the beginning and then just fill that new string, which is much cheaper than the other solution.

I dont know why, but this works!
import operator
def sum_of_strings(list_of_strings):
return reduce(operator.add, list_of_strings)

Why do tuples take less space in memory than lists?

A tuple takes less memory space in Python:
>>> a = (1,2,3)
>>> a.__sizeof__()
48
whereas lists takes more memory space:
>>> b = [1,2,3]
>>> b.__sizeof__()
64
What happens internally on the Python memory management?

I assume you're using CPython and with 64bits (I got the same results on my CPython 2.7 64-bit). There could be differences in other Python implementations or if you have a 32bit Python.
Regardless of the implementation, lists are variable-sized while tuples are fixed-size.
So tuples can store the elements directly inside the struct, lists on the other hand need a layer of indirection (it stores a pointer to the elements). This layer of indirection is a pointer, on 64bit systems that's 64bit, hence 8bytes.
But there's another thing that lists do: They over-allocate. Otherwise list.append would be an O(n) operation always - to make it amortized O(1) (much faster!!!) it over-allocates. But now it has to keep track of the allocated size and the filled size (tuples only need to store one size, because allocated and filled size are always identical). That means each list has to store another "size" which on 64bit systems is a 64bit integer, again 8 bytes.
So lists need at least 16 bytes more memory than tuples. Why did I say "at least"? Because of the over-allocation. Over-allocation means it allocates more space than needed. However, the amount of over-allocation depends on "how" you create the list and the append/deletion history:
>>> l = [1,2,3]
>>> l.__sizeof__()
64
>>> l.append(4) # triggers re-allocation (with over-allocation), because the original list is full
>>> l.__sizeof__()
96
>>> l = []
>>> l.__sizeof__()
40
>>> l.append(1) # re-allocation with over-allocation
>>> l.__sizeof__()
72
>>> l.append(2) # no re-alloc
>>> l.append(3) # no re-alloc
>>> l.__sizeof__()
72
>>> l.append(4) # still has room, so no over-allocation needed (yet)
>>> l.__sizeof__()
72
Images
I decided to create some images to accompany the explanation above. Maybe these are helpful
This is how it (schematically) is stored in memory in your example. I highlighted the differences with red (free-hand) cycles:
That's actually just an approximation because int objects are also Python objects and CPython even reuses small integers, so a probably more accurate representation (although not as readable) of the objects in memory would be:
Useful links:
tuple struct in CPython repository for Python 2.7
list struct in CPython repository for Python 2.7
int struct in CPython repository for Python 2.7
Note that __sizeof__ doesn't really return the "correct" size! It only returns the size of the stored values. However when you use sys.getsizeof the result is different:
>>> import sys
>>> l = [1,2,3]
>>> t = (1, 2, 3)
>>> sys.getsizeof(l)
88
>>> sys.getsizeof(t)
72
There are 24 "extra" bytes. These are real, that's the garbage collector overhead that isn't accounted for in the __sizeof__ method. That's because you're generally not supposed to use magic methods directly - use the functions that know how to handle them, in this case: sys.getsizeof (which actually adds the GC overhead to the value returned from __sizeof__).

I'll take a deeper dive into the CPython codebase so we can see how the sizes are actually calculated. In your specific example, no over-allocations have been performed, so I won't touch on that.
I'm going to use 64-bit values here, as you are.
The size for lists is calculated from the following function, list_sizeof:
static PyObject *
list_sizeof(PyListObject *self)
{
Py_ssize_t res;
res = _PyObject_SIZE(Py_TYPE(self)) + self->allocated * sizeof(void*);
return PyInt_FromSsize_t(res);
}
Here Py_TYPE(self) is a macro that grabs the ob_type of self (returning PyList_Type) while _PyObject_SIZE is another macro that grabs tp_basicsize from that type. tp_basicsize is calculated as sizeof(PyListObject) where PyListObject is the instance struct.
The PyListObject structure has three fields:
PyObject_VAR_HEAD # 24 bytes
PyObject **ob_item; # 8 bytes
Py_ssize_t allocated; # 8 bytes
these have comments (which I trimmed) explaining what they are, follow the link above to read them. PyObject_VAR_HEAD expands into three 8 byte fields (ob_refcount, ob_type and ob_size) so a 24 byte contribution.
So for now res is:
sizeof(PyListObject) + self->allocated * sizeof(void*)
or:
40 + self->allocated * sizeof(void*)
If the list instance has elements that are allocated. the second part calculates their contribution. self->allocated, as it's name implies, holds the number of allocated elements.
Without any elements, the size of lists is calculated to be:
>>> [].__sizeof__()
40
i.e the size of the instance struct.
tuple objects don't define a tuple_sizeof function. Instead, they use object_sizeof to calculate their size:
static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
Py_ssize_t res, isize;
res = 0;
isize = self->ob_type->tp_itemsize;
if (isize > 0)
res = Py_SIZE(self) * isize;
res += self->ob_type->tp_basicsize;
return PyInt_FromSsize_t(res);
}
This, as for lists, grabs the tp_basicsize and, if the object has a non-zero tp_itemsize (meaning it has variable-length instances), it multiplies the number of items in the tuple (which it gets via Py_SIZE) with tp_itemsize.
tp_basicsize again uses sizeof(PyTupleObject) where the PyTupleObject struct contains:
PyObject_VAR_HEAD # 24 bytes
PyObject *ob_item[1]; # 8 bytes
So, without any elements (that is, Py_SIZE returns 0) the size of empty tuples is equal to sizeof(PyTupleObject):
>>> ().__sizeof__()
24
huh? Well, here's an oddity which I haven't found an explanation for, the tp_basicsize of tuples is actually calculated as follows:
sizeof(PyTupleObject) - sizeof(PyObject *)
why an additional 8 bytes is removed from tp_basicsize is something I haven't been able to find out. (See MSeifert's comment for a possible explanation)
But, this is basically the difference in your specific example. lists also keep around a number of allocated elements which helps determine when to over-allocate again.
Now, when additional elements are added, lists do indeed perform this over-allocation in order to achieve O(1) appends. This results in greater sizes as MSeifert's covers nicely in his answer.

MSeifert answer covers it broadly; to keep it simple you can think of:
tuple is immutable. Once set, you can't change it. So you know in advance how much memory you need to allocate for that object.
list is mutable. You can add or remove items to or from it. It has to know its current size. It resizes as needed.
There are no free meals - these capabilities comes with a cost. Hence the overhead in memory for lists.

The size of the tuple is prefixed, i.e. at tuple initialization the interpreter allocates enough space for the contained data and hence it's immutable (can't be modified). Whereas a list is a mutable object, hence implying dynamic allocation of memory, so to avoid allocating space each time you append or modify the list (allocate enough space to contain the changed data and copy the data to it), it allocates additional space for future runtime changes, like appends and modifications.
That pretty much sums it up.

Size of list in memory

I just experimented with the size of python data structures in memory. I wrote the following snippet:
import sys
lst1=[]
lst1.append(1)
lst2=[1]
print(sys.getsizeof(lst1), sys.getsizeof(lst2))
I tested the code on the following configurations:
Windows 7 64bit, Python3.1: the output is: 52 40 so lst1 has 52 bytes and lst2 has 40 bytes.
Ubuntu 11.4 32bit with Python3.2: output is 48 32
Ubuntu 11.4 32bit Python2.7: 48 36
Can anyone explain to me why the two sizes differ although both are lists containing a 1?
In the python documentation for the getsizeof function I found the following: ...adds an additional garbage collector overhead if the object is managed by the garbage collector. Could this be the case in my little example?

Here's a fuller interactive session that will help me explain what's going on (Python 2.6 on Windows XP 32-bit, but it doesn't matter really):
>>> import sys
>>> sys.getsizeof([])
36
>>> sys.getsizeof([1])
40
>>> lst = []
>>> lst.append(1)
>>> sys.getsizeof(lst)
52
>>>
Note that the empty list is a bit smaller than the one with [1] in it. When an element is appended, however, it grows much larger.
The reason for this is the implementation details in Objects/listobject.c, in the source of CPython.
Empty list
When an empty list [] is created, no space for elements is allocated - this can be seen in PyList_New. 36 bytes is the amount of space required for the list data structure itself on a 32-bit machine.
List with one element
When a list with a single element [1] is created, space for one element is allocated in addition to the memory required by the list data structure itself. Again, this can be found in PyList_New. Given size as argument, it computes:
nbytes = size * sizeof(PyObject *);
And then has:
if (size <= 0)
op->ob_item = NULL;
else {
op->ob_item = (PyObject **) PyMem_MALLOC(nbytes);
if (op->ob_item == NULL) {
Py_DECREF(op);
return PyErr_NoMemory();
}
memset(op->ob_item, 0, nbytes);
}
Py_SIZE(op) = size;
op->allocated = size;
So we see that with size = 1, space for one pointer is allocated. 4 bytes (on my 32-bit box).
Appending to an empty list
When calling append on an empty list, here's what happens:
PyList_Append calls app1
app1 asks for the list's size (and gets 0 as an answer)
app1 then calls list_resize with size+1 (1 in our case)
list_resize has an interesting allocation strategy, summarized in this comment from its source.
Here it is:
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
/* check for integer overflow */
if (new_allocated > PY_SIZE_MAX - newsize) {
PyErr_NoMemory();
return -1;
} else {
new_allocated += newsize;
}
Let's do some math
Let's see how the numbers I quoted in the session in the beginning of my article are reached.
So 36 bytes is the size required by the list data structure itself on 32-bit. With a single element, space is allocated for one pointer, so that's 4 extra bytes - total 40 bytes. OK so far.
When app1 is called on an empty list, it calls list_resize with size=1. According to the over-allocation algorithm of list_resize, the next largest available size after 1 is 4, so place for 4 pointers will be allocated. 4 * 4 = 16 bytes, and 36 + 16 = 52.
Indeed, everything makes sense :-)

sorry, previous comment was a bit curt.
what's happening is that you're looking at how lists are allocated (and i think maybe you just wanted to see how big things were - in that case, use sys.getsizeof())
when something is added to a list, one of two things can happen:
the extra item fits in spare space
extra space is needed, so a new list is made, and the contents copied across, and the extra thing added.
since (2) is expensive (copying things, even pointers, takes time proportional to the number of things to be copied, so grows as lists get large) we want to do it infrequently. so instead of just adding a little more space, we add a whole chunk. typically the size of the amount added is similar to what is already in use - that way the maths works out that the average cost of allocating memory, spread out over many uses, is only proportional to the list size.
so what you are seeing is related to this behaviour. i don't know the exact details, but i wouldn't be surprised if [] or [1] (or both) are special cases, where only enough memory is allocated (to save memory in these common cases), and then appending does the "grab a new chunk" described above that adds more.
but i don't know the exact details - this is just how dynamic arrays work in general. the exact implementation of lists in python will be finely tuned so that it is optimal for typical python programs. so all i am really saying is that you can't trust the size of a list to tell you exactly how much it contains - it may contain extra space, and the amount of extra free space is difficult to judge or predict.
ps a neat alternative to this is to make lists as (value, pointer) pairs, where each pointer points to the next tuple. in this way you can grow lists incrementally, although the total memory used is higher. that is a linked list (what python uses is more like a vector or a dynamic array).
[update] see Eli's excellent answer. they explain that both [] and [1] are allocated exactly, but that appending to [] allocates an extra chunk. the comment in the code is what i am saying above (this is called "over-allocation" and the amount is porportional to what we have so that the average ("amortised") cost is proportional to size).

Here's a quick demonstration of the list growth pattern. Changing the third argument in range() will change the output so it doesn't look like the comments in listobject.c, but the result when simply appending one element seem to be perfectly accurate.
allocated = 0
for newsize in range(0,100,1):
if (allocated < newsize):
new_allocated = (newsize >> 3) + (3 if newsize < 9 else 6)
allocated = newsize + new_allocated;
print newsize, allocated

formula changes based on the system architecture
(size-36)/4 for 32 bit machines and
(size-64)/8 for 64 bit machines
36,64 - size of an empty list based on machine
4,8 - size of a single element in the list based on machine

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.