What causes [*a] to overallocate? - python

Apparently list(a) doesn't overallocate, [x for x in a] overallocates at some points, and [*a] overallocates all the time?
Here are sizes n from 0 to 12 and the resulting sizes in bytes for the three methods:
0 56 56 56
1 64 88 88
2 72 88 96
3 80 88 104
4 88 88 112
5 96 120 120
6 104 120 128
7 112 120 136
8 120 120 152
9 128 184 184
10 136 184 192
11 144 184 200
12 152 184 208
Computed like this, reproducable at repl.it, using Python 3.8:
from sys import getsizeof
for n in range(13):
a = [None] * n
print(n, getsizeof(list(a)),
getsizeof([x for x in a]),
getsizeof([*a]))
So: How does this work? How does [*a] overallocate? Actually, what mechanism does it use to create the result list from the given input? Does it use an iterator over a and use something like list.append? Where is the source code?
(Colab with data and code that produced the images.)
Zooming in to smaller n:
Zooming out to larger n:

[*a] is internally doing the C equivalent of:
Make a new, empty list
Call newlist.extend(a)
Returns list.
So if you expand your test to:
from sys import getsizeof
for n in range(13):
a = [None] * n
l = []
l.extend(a)
print(n, getsizeof(list(a)),
getsizeof([x for x in a]),
getsizeof([*a]),
getsizeof(l))
Try it online!
you'll see the results for getsizeof([*a]) and l = []; l.extend(a); getsizeof(l) are the same.
This is usually the right thing to do; when extending you're usually expecting to add more later, and similarly for generalized unpacking, it's assumed that multiple things will be added one after the other. [*a] is not the normal case; Python assumes there are multiple items or iterables being added to the list ([*a, b, c, *d]), so overallocation saves work in the common case.
By contrast, a list constructed from a single, presized iterable (with list()) may not grow or shrink during use, and overallocating is premature until proven otherwise; Python recently fixed a bug that made the constructor overallocate even for inputs with known size.
As for list comprehensions, they're effectively equivalent to repeated appends, so you're seeing the final result of the normal overallocation growth pattern when adding an element at a time.
To be clear, none of this is a language guarantee. It's just how CPython implements it. The Python language spec is generally unconcerned with specific growth patterns in list (aside from guaranteeing amortized O(1) appends and pops from the end). As noted in the comments, the specific implementation changes again in 3.9; while it won't affect [*a], it could affect other cases where what used to be "build a temporary tuple of individual items and then extend with the tuple" now becomes multiple applications of LIST_APPEND, which can change when the overallocation occurs and what numbers go into the calculation.

Full picture of what happens, building on the other answers and comments (especially ShadowRanger's answer, which also explains why it's done like that).
Disassembling shows that BUILD_LIST_UNPACK gets used:
>>> import dis
>>> dis.dis('[*a]')
1 0 LOAD_NAME 0 (a)
2 BUILD_LIST_UNPACK 1
4 RETURN_VALUE
That's handled in ceval.c, which builds an empty list and extends it (with a):
case TARGET(BUILD_LIST_UNPACK): {
...
PyObject *sum = PyList_New(0);
...
none_val = _PyList_Extend((PyListObject *)sum, PEEK(i));
_PyList_Extend uses list_extend:
_PyList_Extend(PyListObject *self, PyObject *iterable)
{
return list_extend(self, iterable);
}
Which calls list_resize with the sum of the sizes:
list_extend(PyListObject *self, PyObject *iterable)
...
n = PySequence_Fast_GET_SIZE(iterable);
...
m = Py_SIZE(self);
...
if (list_resize(self, m + n) < 0) {
And that overallocates as follows:
list_resize(PyListObject *self, Py_ssize_t newsize)
{
...
new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
Let's check that. Compute the expected number of spots with the formula above, and compute the expected byte size by multiplying it with 8 (as I'm using 64-bit Python here) and adding an empty list's byte size (i.e., a list object's constant overhead):
from sys import getsizeof
for n in range(13):
a = [None] * n
expected_spots = n + (n >> 3) + (3 if n < 9 else 6)
expected_bytesize = getsizeof([]) + expected_spots * 8
real_bytesize = getsizeof([*a])
print(n,
expected_bytesize,
real_bytesize,
real_bytesize == expected_bytesize)
Output:
0 80 56 False
1 88 88 True
2 96 96 True
3 104 104 True
4 112 112 True
5 120 120 True
6 128 128 True
7 136 136 True
8 152 152 True
9 184 184 True
10 192 192 True
11 200 200 True
12 208 208 True
Matches except for n = 0, which list_extend actually shortcuts, so actually that matches, too:
if (n == 0) {
...
Py_RETURN_NONE;
}
...
if (list_resize(self, m + n) < 0) {

These are going to be implementation details of the CPython interpreter, and so may not be consistent across other interpreters.
That said, you can see where the comprehension and list(a) behaviors come in here:
https://github.com/python/cpython/blob/master/Objects/listobject.c#L36
Specifically for the comprehension:
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
...
new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
Just below those lines, there is list_preallocate_exact which is used when calling list(a).

Related

How to measure the true size of an object in Python (including its elements/children) [duplicate]

I am trying to create a list of size 1 MB. while the following code works:
dummy = ['a' for i in xrange(0, 1024)]
sys.getsizeof(dummy)
Out[1]: 9032
The following code does not work.
import os
import sys
dummy = []
dummy.append((os.urandom(1024))
sys.getsizeof(dummy)
Out[1]: 104
Can someone explain why?
If you're wondering why I am not using the first code snippet, I am writing a program to benchmark my memory by writing a for loop that writes blocks (of size 1 B, 1 KB and 1 MB) into memory.
start = time.time()
for i in xrange(1, (1024 * 10)):
dummy.append(os.urandom(1024)) #loop to write 1 MB blocks into memory
end = time.time()
If you check the size of a list, it will be provide the size of the list data structure, including the pointers to its constituent elements. It won't consider the size of elements.
str1_size = sys.getsizeof(['a' for i in xrange(0, 1024)])
str2_size = sys.getsizeof(['abc' for i in xrange(0, 1024)])
int_size = sys.getsizeof([123 for i in xrange(0, 1024)])
none_size = sys.getsizeof([None for i in xrange(0, 1024)])
str1_size == str2_size == int_size == none_size
The size of empty list: sys.getsizeof([]) == 72
Add an element: sys.getsizeof([1]) == 80
Add another element: sys.getsizeof([1, 1]) == 88
So each element adds 4 bytes.
To get 1024 bytes, we need (1024 - 72) / 8 = 119 elements.
The size of the list with 119 elements: sys.getsizeof([None for i in xrange(0, 119)]) == 1080.
This is because a list maintains an extra buffer for inserting more items, so that it doesn't have to resize every time. (The size comes out to be same as 1080 for number of elements between 107 and 126).
So what we need is an immutable data structure, which doesn't need to keep this buffer - tuple.
empty_tuple_size = sys.getsizeof(()) # 56
single_element_size = sys.getsizeof((1,)) # 64
pointer_size = single_element_size - empty_tuple_size # 8
n_1mb = (1024 - empty_tuple_size) / pointer_size # (1024 - 56) / 8 = 121
tuple_1mb = (1,) * n_1mb
sys.getsizeof(tuple_1mb) == 1024
So this is your answer to get a 1MB data structure: (1,)*121
But note that this is only the size of tuple and the constituent pointers. For the total size, you actually need to add up the size of individual elements.
Alternate:
sys.getsizeof('') == 37
sys.getsizeof('1') == 38 # each character adds 1 byte
For 1 MB, we need 987 characters:
sys.getsizeof('1'*987) == 1024
And this is the actual size, not just the size of pointers.

python random binary list, evenly distributed

I have code to make a binary list of any length I want, with a random number of bits turned on:
rand_binary_list = lambda n: [random.randint(0,1) for b in range(1,n+1)]
rand_binary_list(10)
this returns something like this:
[0,1,1,0,1,0,1,0,0,0]
and if you run it a million times you'll get a bell curve distribution where about the sum(rand_binary_list(10)) is about 5 way more often than 1 or 10.
What I'd prefer is that having 1 bit turned on out of 10 is equally as likely as having half of them turned on. The number of bits turned on should be uniformly distributed.
I'm not sure how this can be done without compromising the integrity of the randomness. Any ideas?
EDIT:
I wanted to show this bell curve phenomenon explicitly so here it is:
>>> import random
>>> rand_binary_list = lambda n: [random.randint(0,1) for b in range(1,n+1)]
>>> counts = {0:0,1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0}
>>> for i in range(10000):
... x = sum(rand_binary_list(10))
... counts[x] = counts[x] + 1
...
>>> counts[0]
7
>>> counts[1]
89
>>> counts[2]
454
>>> counts[3]
1217
>>> counts[4]
2017
>>> counts[5]
2465
>>> counts[6]
1995
>>> counts[7]
1183
>>> counts[8]
460
>>> counts[9]
107
>>> counts[10]
6
see how the chances of getting 5 turned on are much higher than the chances of getting 1 bit turned on?
Something like this:
def randbitlist(n=10):
n_on = random.randint(0, n)
n_off = n - n_on
result = [1]*n_on + [0]*n_off
random.shuffle(result)
return result
The number of bits "on" should be uniformly distributed in [0, n] inclusive, and then those bits selected will be uniformly distributed throughout the list.

XTEA testvectors in Python

I am trying to test this implementation of the xtea algorithm in Python. The only testvectors I have found are these.
How can I test the output of the algorithm so that I can compare it bytewise?
Which password/key should I choose? Which endian would be best?
(I am on 64 bit xubuntu/x86/little endian)
XTEA
# 64 bit block of data to encrypt
v0, v1 = struct.unpack(endian + "2L", block)
# 128 bit key
k = struct.unpack(endian + "4L", key)
sum, delta, mask = 0L, 0x9e3779b9L, 0xffffffffL
for round in range(n):
v0 = (v0 + (((v1<<4 ^ v1>>5) + v1) ^ (sum + k[sum & 3]))) & mask
sum = (sum + delta) & mask
v1 = (v1 + (((v0<<4 ^ v0>>5) + v0) ^ (sum + k[sum>>11 & 3]))) & mask)
return struct.pack(endian + "2L", v0, v1)
Initial 64 bit test input
# pack 000000 in 64 bit string
byte_string = ''
for c in range(56, -8, -8):
byte_string += chr(000000 >> c & 0xff)
Testvectors (copied from here)
tean values
These are made by starting with a vector of 6 zeroes,
data followed by key, and coding with one cycle then
moving the six cyclically so that n becomes n-1 modulo 6.
We repeat with 2-64 cycles printing at powers of 2 in
hexadecimal. The process is reversed decoding back
to the original zeroes which are printed.
1 0 9e3779b9 0 0 0 0
2 ec01a1de aaa0256d 0 0 0 0
4 bc3a7de2 4e238eb9 0 0 ec01a1de 114f6d74
8 31c5fa6c 241756d6 bc3a7de2 845846cf 2794a127 6b8ea8b8
16 1d8e6992 9a478905 6a1d78c8 8c86d67 2a65bfbe b4bd6e46
32 d26428af a202283 27f917b1 c1da8993 60e2acaa a6eb923d
64 7a01cbc9 b03d6068 62ee209f 69b7afc 376a8936 cdc9e923
1 0 0 0 0 0 0
The C code you linked to seems to assume that a long has 32 bits -- XTEA uses a 64-bit block made of two uint32; the code uses a couple of long and doesn't do anything to handle the overflow which happens when you sum/leftshift (and propagates into later computations).
The python code lets you choose endianness, while the C code treats those numbers as... well, numbers, so if you want to compare them, you need to pick endianness (or if you're lazy, try both and see if one matches :)
Regarding the key, I'm not sure what your problem is, so I'll guess: in case you're not a C programmer, the line static long pz[1024], n, m; is a static declaration, meaning that all those values are implicitly initialized to zero.
Anything else I missed?

Recursive Permutation Generator, swapping list items not working

I want to systematically generate permutations of the alphabet.
I cannot don't want to use python itertools.permutation, because pregenerating a list of every permutation causes my computer to crash (first time i actually got it to force a shutdown, it was pretty great).
Therefore, my new approach is to generate and test each key on the fly. Currently, I am trying to handle this with recursion.
My idea is to start with the largest list (i'll use a 3 element list as an example), recurse in to smaller list until the list is two elements long. Then, it will print the list, swap the last two, print the list again, and return up one level and repeat.
For example, for 123
123 (swap position 0 with position 0)
23 --> 123 (swap position 1 with position 1)
32 --> 132 (swap position 1 with position 2)
213 (swap position 0 with position 1)
13 --> 213 (swap position 1 with position 1)
31 --> 231 (swap position 1 with position 2)
321 (swap position 0 with position 2)
21 --> 321 (swap position 1 with position 1)
12 --> 312 (swap position 1 with position 2)
for a four letter number (1234)
1234 (swap position 0 with position 0)
234 (swap position 1 with position 1)
34 --> 1234
43 --> 1243
324 (swap position 1 with position 2)
24 --> 1324
42 --> 1342
432 (swap position 1 with position 3)
32 --> 1432
23 --> 1423
2134 (swap position 0 for position 1)
134 (swap position 1 with position 1)
34 --> 2134
43 --> 2143
314 (swap position 1 with position 2)
14--> 2314
41--> 2341
431 (swap position 1 with position 3)
31--> 2431
13 -->2413
This is the code i currently have for the recursion, but its causing me a lot of grief, recursion not being my strong suit. Here's what i have.
def perm(x, y, key):
print "Perm called: X=",x,", Y=",y,", key=",key
while (x<y):
print "\tLooping Inward"
print "\t", x," ",y," ", key
x=x+1
key=perm(x, y, key)
swap(x,y,key)
print "\tAfter 'swap':",x," ",y," ", key, "\n"
print "\nFull Depth Reached"
#print key, " SWAPPED:? ",swap(x,y,key)
print swap(x, y, key)
print " X=",x,", Y=",y,", key=",key
return key
def swap(x, y, key):
v=key[x]
key[x]=key[y]
key[y]=v
return key
Any help would be greatly appreciated, this is a really cool project and I don't want to abandon it.
Thanks to all! Comments on my method or anything are welcome.
Happened upon my old question later in my career
To efficiently do this, you want to write a generator.
Instead of returning a list of all of the permutations, which requires that you store them (all of them) in memory, a generator returns one permutation (one element of this list), then pauses, and then computes the next one when you ask for it.
The advantages to generators are:
Take up much less space.
Generators take up between 40 and 80 bytes of space. One generators can have generate millions of items.
A list with one item takes up 40 bytes. A list with 1000 items takes up 4560 bytes
More efficient
Only computes as many values as you need. In permuting the alphabet, if the correct permutation was found before the end of the list, the time spend generating all of the other permutations was wasted.
(Itertools.permutation is an example of a generator)
How do I Write a Generator?
Writing a generator in python is actually very easy.
Basically, write code that would work for generating a list of permutations. Now, instead of writing resultList+=[ resultItem ], write yield(resultItem).
Now you've made a generator. If I wanted to loop over my generator, I could write
for i in myGenerator:
It's that easy.
Below is a generator for the code that I tried to write long ago:
def permutations(iterable, r=None):
# permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC
# permutations(range(3)) --> 012 021 102 120 201 210
pool = tuple(iterable)
n = len(pool)
r = n if r is None else r
if r > n:
return
indices = range(n)
cycles = range(n, n-r, -1)
yield tuple(pool[i] for i in indices[:r])
while n:
for i in reversed(range(r)):
cycles[i] -= 1
if cycles[i] == 0:
indices[i:] = indices[i+1:] + indices[i:i+1]
cycles[i] = n - i
else:
j = cycles[i]
indices[i], indices[-j] = indices[-j], indices[i]
yield tuple(pool[i] for i in indices[:r])
break
else:
return
I think you have a really good idea, but keeping track of the positions might get a bit difficult to deal with. The general way I've seen for generating permutations recursively is a function which takes two string arguments: one to strip characters from (str) and one to add characters to (soFar).
When generating a permutation then we can think of taking characters from str and adding them to soFar. Assume we have a function perm that takes these two arguments and finds all permutations of str. We can then consider the current string str. We'll have permutations beginning with each character in str so we just need to loop over str, using each of these characters as the initial character and calling perm on the characters remaining in the string:
// half python half pseudocode
def perm(str, soFar):
if(str == ""):
print soFar // here we have a valid permutation
return
for i = 0 to str.length:
next = soFar + str[i]
remaining = str.substr(0, i) + str.substr(i+1)
perm(remaining, next)

Poisson simulation not working as expected?

I have a simple script to set up a Poisson distribution by constructing an array of "events" of probability = 0.1, and then counting the number of successes in each group of 10. It almost works, but the distribution is not quite right (P(0) should equal P(1), but is instead about 90% of P(1)). It's like there's an off-by-one kind of error, but I can't figure out what it is. The script uses the Counter class from here (because I have Python 2.6 and not 2.7) and the grouping uses itertools as discussed here. It's not a stochastic issue, repeats give pretty tight results, and the overall mean looks good, group size looks good. Any ideas where I've messed up?
from itertools import izip_longest
import numpy as np
import Counter
def groups(iterable, n=3, padvalue=0):
"groups('abcde', 3, 'x') --> ('a','b','c'), ('d','e','x')"
return izip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
def event():
f = 0.1
r = np.random.random()
if r < f: return 1
return 0
L = [event() for i in range(100000)]
rL = [sum(g) for g in groups(L,n=10)]
print len(rL)
print sum(list(L))
C = Counter.Counter(rL)
for i in range(max(C.keys())+1):
print str(i).rjust(2), C[i]
$ python script.py
10000
9949
0 3509
1 3845
2 1971
3 555
4 104
5 15
6 1
$ python script.py
10000
10152
0 3417
1 3879
2 1978
3 599
4 115
5 12
I did a combinatorial reality check on your math, and it looks like your results are correct actually. P(0) should not be roughly equivalent to P(1)
.9^10 = 0.34867844 = probability of 0 events
.1 * .9^9 * (10 choose 1) = .1 * .9^9 * 10 = 0.387420489 = probability of 1 event
I wonder if you accidentally did your math thusly:
.1 * .9^10 * (10 choose 1) = 0.34867844 = incorrect probability of 1 event

Categories