Which takes less memory, a frozenset or a tuple?

Which takes less memory, a frozenset or a tuple? - python

I have an object which needs to be "tagged" with 0-3 strings (out of a set of 20-some possibilities); these values are all unique and order doesn't matter. The only operation that needs to be done on the tags is checking if a particular one is present or not (specific_value in self.tags).
However, there's an enormous number of these objects in memory at once, to the point that it pushes the limits of my old computer's RAM. So saving a few bytes can add up.
With so few tags on each object, I doubt the lookup time is going to matter much. But: is there a memory difference between using a tuple and a frozenset here? Is there any other real reason to use one over the other?

Tuples are very compact. Sets are based on hash tables, and depend on having "empty" slots to make hash collisions less likely.
For a recent enough version of CPython, sys._debugmallocstats() displays lots of potentially interesting info. Here under a 64-bit Python 3.7.3:
>>> from sys import _debugmallocstats as d
>>> tups = [tuple("abc") for i in range(1000000)]
tuple("abc") creates a tuple of 3 1-character strings, ('a', 'b', 'c'). Here I'll edit out almost all the output:
>>> d()
Small block threshold = 512, in 64 size classes.
class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
...
8 72 17941 1004692 4
Since we created a million tuples, it's a very good bet that the size class using 1004692 blocks is the one we want ;-) Each of the blocks consumes 72 bytes.
Switching to frozensets instead, the output shows that those consume 224 bytes each, a bit over 3x more:
>>> tups = [frozenset(t) for t in tups]
>>> d()
Small block threshold = 512, in 64 size classes.
class size num pools blocks in use avail blocks
----- ---- --------- ------------- ------------
...
27 224 55561 1000092 6
In this particular case, the other answer you got happens to give the same results:
>>> import sys
>>> sys.getsizeof(tuple("abc"))
72
>>> sys.getsizeof(frozenset(tuple("abc")))
224
While that's often true, it's not always so, because an object may require allocating more bytes than it actually needs, to satisfy HW alignment requirements. getsizeof() doesn't know anything about that, but _debugmallocstats() shows the number of bytes Python's small-object allocator actually needs to use.
For example,
>>> sys.getsizeof("a")
50
On a 32-bit box, 52 bytes actually need to be used, to provide 4-byte alignment. On a 64-bit box, 8-byte alignment is currently required, so 56 bytes need to be used. Under Python 3.8 (not yet released), on a 64-bit box 16-byte alignment is required, and 64 bytes will need to be used.
But ignoring all that, a tuple will always need less memory than any form of set with the same number of elements - and even less than a list with the same number of elements.

sys.getsizeof seems like the stdlib option you want... but I feel queasy about your whole use case
import sys
t = ("foo", "bar", "baz")
f = frozenset(("foo","bar","baz"))
print(sys.getsizeof(t))
print(sys.getsizeof(f))
https://docs.python.org/3.7/library/sys.html#sys.getsizeof
All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific.
...So don't get comfy with this solution
EDIT: Obviously #TimPeters answer is more correct...

If you're trying to save memory, consider
Trading off some elegance for some memory savings by extracting the data structure of which tags are present into an external (singleton) data structure
Using a "flags" (bitmap) type approach, where each tag is mapped to a bit of a 32-bit integer. Then all you need is a (singleton) dict mapping from the object (identity) to a 32-bit integer (flags). If no flags are present, no entry in the dictionary.

`There is a possibility to reduce memory if replace tuple with a type from recordclass library:
>>> from recordclass import make_arrayclass
>>> Triple = make_arrayclass("Triple", 3)
>>> from sys import getsizeof as sizeof
>>> sizeof(Triple("ab","cd","ef"))
40
>>> sizeof(("ab","cd","ef"))
64
The difference is equal to the sizeof(PyGC_Head) + sizeof(Py_ssize_t).
P.S.: The numbers are mesured on 64-bit Python 3.8.

Related

Clarification for "it should be possible to change the value of 1" from the CPython documentation

See this link: https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong
The current implementation keeps an array of integer objects for all integers between -5 and 256; when you create an int in that range, you actually just get back a reference to the existing object. So, it should be possible to change the value of 1. I suspect the behavior of Python, in this case, is undefined. :-)
What do the bold lines mean in this context?

It means that integers in Python are actual objects with a "value"-field to hold the integer's value. In Java, you could express Python's integers like so (leaving out a lot of details, of course):
class PyInteger {
private int value;
public PyInteger(int val) {
this.value = val;
}
public PyInteger __add__(PyInteger other) {
return new PyInteger(this.value + other.value);
}
}
In order to not have hunderts of Python integers with the same value around, it caches some integers, along the lines of:
PyInteger[] cache = {
new PyInteger(0),
new PyInteger(1),
new PyInteger(2),
...
}
However, what would happen if you did something like this (let's ignore that value is private for a moment):
PyInteger one = cache[1]; // the PyInteger representing 1
one.value = 3;
Suddenly, every time you used 1 in your program, you would actually get back 3, because the object representing 1 has an effective value of 3.
Indeed, you can do that in Python! That is: it is possible to change the effective numeric value of an integer in Python. There is an answer in this reddit post. I copy it here for completeness, though (original credits go to Veedrac):
import ctypes
def deref(addr, typ):
return ctypes.cast(addr, ctypes.POINTER(typ))
deref(id(29), ctypes.c_int)[6] = 100
#>>>
29
#>>> 100
29 ** 0.5
#>>> 10.0
The Python specification itself does not say anything about how integers are to be stored or represented internally. It also does not say which integers should be cached, or that any should be cached at all. In short: there is nothing in the Python specifications defining what should happen if you do something silly like this ;-).
We could even go slightly further...
In reality, the field value above is actually an array of integers, emulating an arbitrary large integer value (for a 64-bit integer, you just combine two 32-bit fields, etc). However, when integers start to get large and outgrow a standard 32-bit integer, caching is no longer a viable option. Even if you used a dictionary, comparing integer arrays for equality would be too much of an overhead with too little gain.
You can actually check this yourself by using is to compare identities:
>>> 3 * 4 is 12
True
>>> 300 * 400 is 120000
False
>>> 300 * 400 == 120000
True
In a typical Python system, there is exactly one object representing the number 12. 120000, on the other hand, is hardly ever cached. So, above, 300 * 400 yields a new object representing 120000, which is different from the object created for the number on the right hand side.
Why is this relevant? If you change the value of a small number like 1 or 29, it will affect all calculations that use that number. You will most likely seriously break your system (until you restart). But if you change the value of a large integer, the effects will be minimal.
Changing the value of 12 to 13 means that 3 * 4 will yield 13. Chaning the value of 120000 to 130000 has much less effect and 300 * 400 will still yield (a new) 120000 and not 130000.
As soon as you take other Python implementations into the picture, things can get even harder to predict. MicroPython, for instance, does not have objects for small numbers, but emalutes them on the fly, and PyPy might well just optimise your changes away.
Bottomline: the exact behaviour of numbers that you tinker with is truly undefined, but depends on several factors and the exact implementation.
Answer to a question in the comments: What is the significance of 6 in Veedrac's code above?
All objects in Python share a common memory layout. The first field is a reference counter that tells you how many other objects are currently referring to this object. The second field is a reference to the object's class or type. Since integers do not have a fixed size, the third field is the size of the data part (you can find the relevant definitions here (general objects) and here (integers/longs)):
struct longObject {
native_int ref_counter; // offset: +0 / +0
PyObject* type; // offset: +1 / +2
native_int size; // offset: +2 / +4
unsigned short value[]; // offset: +3 / +6
}
On a 32-bit system, native_int and PyObject* both occupy 32 bits, whereas on a 64-bit system they occupy 64 bits, naturally. So, if we access the data as 32 bits (using ctypes.c_int) on a 64-bit system, the actual value of the integer is to be found at offset +6. If you change the type to ctypes.c_long, on the other hand, the offset is +3.
Because id(x) in CPython returns the memory address of x, you can actually check this yourself. Based on the above deref function, let's do:
>>> deref(id(29), ctypes.c_long)[3]
29
>>> deref(id(29), ctypes.c_long)[1]
10277248
>>> id(int) # memory address of class "int"
10277248

Since the object is returned by reference, then if you change the object it will change for everything in the program.
So taking the value 1 as an example, you could change it to 42.
This is only possible because the C API gives you internal access to the Python interpreter; it feels unlikely you could do this from within a Python script itself (without using something like cffi for example).

Another way to think about what would happen if you “changed the value at 1’s address to 17” in the internals is just printing each element in range(3)— you would see 0, 17, 2.

Python OrderDict sputtering as compared to dict()

This one has me entirely baffled.
asset_hist = []
for key_host, val_hist_list in am_output.asset_history.items():
for index, hist_item in enumerate(val_hist_list):
#row = collections.OrderedDict([("computer_name", key_host), ("id", index), ("hist_item", hist_item)])
row = {"computer_name": key_host, "id": index, "hist_item": hist_item}
asset_hist.append(row)
This code works perfectly with the collections line commented out. However, when I comment out the row = dict line and remove the comment from the collections line things get very strange. There are about 4 million of these rows being generated and appended to asset_hist.
So, when I use row=dict, the entire loop finishes in about 10 milliseconds, lightning fast. When I use the ordered dictionary, I've waited over 10 minutes and it still didn't finish. Now, I know OrderDict is supposed to be a little slower than a dict, but it's supposed to be about 10x slower at worst and by my math its actually about 100,000 times slower in this function.
I decided to print the index in the lowest loop to see what was happening. Interestingly enough, I noticed a sputtering in console output. The index would print very fast on the screen and then stop for about 3-5 seconds before continuing on.
am_output.asset_history is a dictionary which has one key, host, and every row is a list of strings. E.g.
am_output.asset_history = {"host1": ["string1", "string2", ...], "host2": ["string1", "string2", ...], ...}
EDIT: Sputter Analysis with OrderedDict
Total Memory on this VM Server: Only 8GB... need to get more provissioned.
LOOP NUM
184796 (~5 second wait, ~60% memory usage)
634481 (~5 second wait, ~65% memory usage)
1197564 (~5 second wait, ~70% memory usage)
1899247 (~5 second wait, ~75% memory usage)
2777296 (~5 second wait, ~80% memory usage)
3873730 (LONG WAIT... waited 20 minutes and gave up!, 88.3% memory usage, process is still running)
Where the wait happens changes with each run.
EDIT: Ran it again, this time it stop on 3873333, close to the spot it stopped before. It stopped after forming the row, while trying to append... I didn't notice this last attempt but it was there then too... the problem is with the append line, not the row line... I'm still baffled. Here's the row it produced right before the long stop (added the row to the print statement)... hostname changed to protect the innocent:
3873333: OrderedDict([('computer_name', 'bg-fd5612ea'), ('id', 1), ('hist_item', "sys1 Normalizer (sys1-4): Domain Name cannot be determined from sys1 Name 'bg-fd5612ea'.")])

As your own tests prove, you're running out of memory. Even on CPython 3.6 (where plain dict is actually ordered, though not as a language guarantee yet), OrderedDict has significant memory overhead compared to dict; it's still implemented with a side-band linked list to preserve the order and support easy iteration, reordering with move_to_end, etc. You can tell just by checking with sys.getsizeof (exact results will differ by Python version and build bitwidth, 32 vs. 64 bit):
>>> od = OrderedDict([("a", 1), ("b", 2), ("c", 3)])
>>> d = {**od}
>>> sys.getsizeof(od)
464 # On 3.5 x64 it's 512
>>> sys.getsizeof(d)
240 # On 3.5 x64 it's 288
Ignoring the data stored, the overhead for the OrderedDict here is nearly twice that of the plain dict. If you're making 4 million of these items, on my machine that would add overhead of a titch over 850 MB (on both 3.5 and 3.6).
It's likely the combination of all the other programs on your system, plus your Python program, is exceeding the RAM allocated to your machine, and you're stuck swap thrashing. In particular, whenever asset_hist has to expand for new entries, it's likely needing to page in large parts of it (that got paged out for lack of use), and whenever a cyclic garbage collection run triggers (a full GC occurs roughly every 70,000 allocations and deallocations by default), all the OrderedDicts get paged back in to check if they're still referenced outside of cycles (you could check if the GC runs are the main problem by disabling cyclic GC via gc.disable()).
Given your particular use case, I'd strongly recommend avoiding both dict and OrderedDict though. The overhead of even dict, even the cheaper form on Python 3.6, is kind of extreme when you have a set of exactly three fixed keys over and over. Instead, use collections.namedtuple, which is designed for lightweight objects referenceable by either name or index (they act like regular tuples, but also allow accessing each value as a named attribute), which would dramatically reduce the memory cost of your program (and likely speed it up even when memory is not an issue).
For example:
from collections import namedtuple
ComputerInfo = namedtuple('ComputerInfo', ['computer_name', 'id', 'hist_item'])
asset_hist = []
for key_host, val_hist_list in am_output.asset_history.items():
for index, hist_item in enumerate(val_hist_list):
asset_hist.append(ComputerInfo(key_host, index, hist_item))
Only difference in use is that you replace row['computer_name'] with row.computer_name, or if you need all the values, you can unpack it like a regular tuple, e.g. comphost, idx, hist = row. If you need a true OrderedDict temporarily (don't store them for everything), you can call row._asdict() to get an OrderedDict with the same mapping as the namedtuple, but that's not normally needed. The memory savings are meaningful; on my system, the three element namedtuple drops the per-item overhead to 72 bytes, less than a third that of even the 3.6 dict and less than a sixth of a 3.6 OrderedDict (and three element namedtuple remains 72 bytes on 3.5, where dict/OrderedDict are larger pre-3.6). It may save even more than that; tuples (and namedtuple by extension) are allocated as a single contiguous C struct, while dict and company are at least two allocations (one for the object structure, one or more for the dynamically resizable parts of the structure), each of which may pay allocator overhead and alignment costs.
Either way, for your four million row scenario, using namedtuple would mean paying (beyond the cost of the values) overhead of around 275 MB total, vs. 915 (3.6) - 1100 (3.5) MB for dict and 1770 (3.6) - 1950 (3.5) MB for OrderedDict. When you're talking about an 8 GB system, shaving 1.5 GB off your overhead is a major improvement.

Generating very large 2D-array in Python?

I'd like to generate very large 2D-array (or, in other terms, a matrix) using list of lists. Each element should be a float.
So, just to give an example, let's assume to have the following code:
import numpy as np
N = 32000
def largeMat():
m = []
for i in range(N):
l = list(np.ones(N))
m.append(l)
if i % 1000 == 0:
print i
return m
m = largeMat()
I have 12GB of RAM, but as the code reaches the 10000-th line of the matrix, my RAM is already full. Now, if I'm not wrong, each float is 64-bit large (or 8 byte), so the total occupied RAM should be:
32000 * 32000 * 8 / 1 MB = 8192 MB
Why does python fill my whole RAM and even start to allocate into swap?

Python does not necessarily store list items in the most compact form, as lists require pointers to the next item, etc. This is a side effect of having a data type which allows deletes, inserts, etc. For a simple two-way linked list the usage would be two pointers plus the value, in a 64-bit machine that would be 24 octets per float item in the list. In practice the implementation is not that stupid, but there is still some overhead.
If you want to have a concise format, I'd suggest using a numpy.array as it will take exactly as many bytes you think it'd take (plus a small overhead).
Edit Oops. Not necessarily. Explanation wrong, suggestion valid. numpy is the right tool as numpy.array exists for this reason. However, the problem is most probably something else. My computer will run the procedure even though it takes a lot of time (appr. 2 minutes). Also, quitting python after this takes a long time (actually, it hung). Memory use of the python process (as reported by top) peaks at 10 000 MB and then falls down to slightly below 9 000 MB. Probably the allocated numpy arrays are not garbage collected very fast.
But about the raw data size in my machine:
>>> import sys
>>> l = [0.0] * 1000000
>>> sys.getsizeof(l)
8000072
So there seems to be a fixed overhead of 72 octets per list.
>>> listoflists = [ [1.0*i] * 1000000 for i in range(1000)]
>>> sys.getsizeof(listoflists)
9032
>>> sum([sys.getsizeof(l) for l in listoflists])
8000072000
So, this is as expected.
On the other hand, reserving and filling the long list of lists takes a while (about 10 s). Also, quitting python takes a while. The same for numpy:
>>> a = numpy.empty((1000,1000000))
>>> a[:] = 1.0
>>> a.nbytes
8000000000
(The byte count is not entirely reliable, as the object itself takes some space for its metadata, etc. There has to be the pointer to the start of the memory block, data type, array shape, etc.)
This takes much less time. The creation of the array is almost instantaneous, inserting the numbers takes maybe a second or two. Allocating and freeing a lot of small memory chunks is time consuming and while it does not cause fragmentation problems in a 64-bit machine, it is still much easier to allocate a big chunk of data.
If you have a lot of data which can be put into an array, you need a good reason for not using numpy.

Why do dicts of defaultdict(int)'s use so much memory? (and other simple python performance questions)

I do understand that querying a non-existent key in a defaultdict the way I do will add items to the defaultdict. That is why it is fair to compare my 2nd code snippet to my first one in terms of performance.
import numpy as num
from collections import defaultdict
topKeys = range(16384)
keys = range(8192)
table = dict((k,defaultdict(int)) for k in topKeys)
dat = num.zeros((16384,8192), dtype="int32")
print "looping begins"
#how much memory should this use? I think it shouldn't use more that a few
#times the memory required to hold (16384*8192) int32's (512 mb), but
#it uses 11 GB!
for k in topKeys:
for j in keys:
dat[k,j] = table[k][j]
print "done"
What is going on here? Furthermore, this similar script takes eons to run compared to the first one, and also uses an absurd quantity of memory.
topKeys = range(16384)
keys = range(8192)
table = [(j,0) for k in topKeys for j in keys]
I guess python ints might be 64 bit ints, which would account for some of this, but do these relatively natural and simple constructions really produce such a massive overhead?
I guess these scripts show that they do, so my question is: what exactly is causing the high memory usage in the first script and the long runtime and high memory usage of the second script and is there any way to avoid these costs?
Edit:
Python 2.6.4 on 64 bit machine.
Edit 2: I can see why, to a first approximation, my table should take up 3 GB
16384*8192*(12+12) bytes
and 6GB with a defaultdict load factor that forces it to reserve double the space.
Then inefficiencies in memory allocation eat up another factor of 2.
So here are my remaining questions:
Is there a way for me to tell it to use 32 bit ints somehow?
And why does my second code snippet take FOREVER to run compared to the first one? The first one takes about a minute and I killed the second one after 80 minutes.

Python ints are internally represented as C longs (it's actually a bit more complicated than that), but that's not really the root of your problem.
The biggest overhead is your usage of dicts. (defaultdicts and dicts are about the same in this description). dicts are implemented using hash tables, which is nice because it gives quick lookup of pretty general keys. (It's not so necessary when you only need to look up sequential numerical keys, since they can be laid out in an easy way to get to them.)
A dict can have many more slots than it has items. Let's say you have a dict with 3x as many slots as items. Each of these slots needs room for a pointer to a key and a pointer serving as the end of a linked list. That's 6x as many points as numbers, plus all the pointers to the items you're interested in. Consider that each of these pointers is 8 bytes on your system and that you have 16384 defaultdicts in this situation. As a rough, handwavey look at this, 16384 occurrences * (8192 items/occurance) * 7 (pointers/item) * 8 (bytes/pointer) = 7 GB. This is before I've gotten to the actual numbers you're storing (each unique number of which is itself a Python dict), the outer dict, that numpy array, or the stuff Python's keeping track of to try to optimize some.
Your overhead sounds a little higher than I suspect and I would be interested in knowing whether that 11GB was for a whole process or whether you calculated it for just table. In any event, I do expect the size of this dict-of-defaultdicts data structure to be orders of magnitude bigger than the numpy array representation.
As to "is there any way to avoid these costs?" the answer is "use numpy for storing large, fixed-size contiguous numerical arrays, not dicts!" You'll have to be more specific and concrete about why you found such a structure necessary for better advice about what the best solution is.

Well, look at what your code is actually doing:
topKeys = range(16384)
table = dict((k,defaultdict(int)) for k in topKeys)
This creates a dict holding 16384 defaultdict(int)'s. A dict has a certain amount of overhead: the dict object itself is between 60 and 120 bytes (depending on the size of pointers and ssize_t's in your build.) That's just the object itself; unless the dict is less than a couple of items, the data is a separate block of memory, between 12 and 24 bytes, and it's always between 1/2 and 2/3rds filled. And defaultdicts are 4 to 8 bytes bigger because they have this extra thing to store. And ints are 12 bytes each, and although they're reused where possible, that snippet won't reuse most of them. So, realistically, in a 32-bit build, that snippet will take up 60 + (16384*12) * 1.8 (fill factor) bytes for the table dict, 16384 * 64 bytes for the defaultdicts it stores as values, and 16384 * 12 bytes for the integers. So that's just over a megabyte and a half without storing anything in your defaultdicts. And that's in a 32-bit build; a 64-bit build would be twice that size.
Then you create a numpy array, which is actually pretty conservative with memory:
dat = num.zeros((16384,8192), dtype="int32")
This will have some overhead for the array itself, the usual Python object overhead plus the dimensions and type of the array and such, but it wouldn't be much more than 100 bytes, and only for the one array. It does store 16384*8192 int32's in your 512Mb though.
And then you have this rather peculiar way of filling this numpy array:
for k in topKeys:
for j in keys:
dat[k,j] = table[k][j]
The two loops themselves don't use much memory, and they re-use it each iteration. However, table[k][j] creates a new Python integer for each value you request, and stores it in the defaultdict. The integer created is always 0, and it so happens that that always gets reused, but storing the reference to it still uses up space in the defaultdict: the aforementioned 12 bytes per entry, times the fill factor (between 1.66 and 2.) That lands you close to 3Gb of actual data right there, and 6Gb in a 64-bit build.
On top of that the defaultdicts, because you keep adding data, have to keep growing, which means they have to keep reallocating. Because of Python's malloc frontend (obmalloc) and how it allocates smaller objects in blocks of its own, and how process memory works on most operating systems, this means your process will allocate more and not be able to free it; it won't actually use all of the 11Gb, and Python will re-use the available memory inbetween the large blocks for the defaultdicts, but the total mapped address space will be that 11Gb.

Mike Graham gives a good explanation of why dictionaries use more memory, but I thought that I'd explain why your table dict of defaultdicts starts to take up so much memory.
The way that the defaultdict (DD) is set-up right now, whenever you retrieve an element that isn't in the DD, you get the default value for the DD (0 for your case) but also the DD now stores a key that previously wasn't in the DD with the default value of 0. I personally don't like this, but that's how it goes. However, it means that for every iteration of the inner loop, new memory is being allocated which is why it is taking forever. If you change the lines
for k in topKeys:
for j in keys:
dat[k,j] = table[k][j]
to
for k in topKeys:
for j in keys:
if j in table[k]:
dat[k,j] = table[k][j]
else:
dat[k,j] = 0
then default values aren't being assigned to keys in the DDs and so the memory stays around 540 MB for me which is mostly just the memory allocated for dat. DDs are decent for sparse matrices though you probably should just use the sparse matrices in Scipy if that's what you want.

How many bytes per element are there in a Python list (tuple)?

For example, how much memory is required to store a list of one million (32-bit) integers?
alist = range(1000000) # or list(range(1000000)) in Python 3.0

"It depends." Python allocates space for lists in such a way as to achieve amortized constant time for appending elements to the list.
In practice, what this means with the current implementation is... the list always has space allocated for a power-of-two number of elements. So range(1000000) will actually allocate a list big enough to hold 2^20 elements (~ 1.045 million).
This is only the space required to store the list structure itself (which is an array of pointers to the Python objects for each element). A 32-bit system will require 4 bytes per element, a 64-bit system will use 8 bytes per element.
Furthermore, you need space to store the actual elements. This varies widely. For small integers (-5 to 256 currently), no additional space is needed, but for larger numbers Python allocates a new object for each integer, which takes 10-100 bytes and tends to fragment memory.
Bottom line: it's complicated and Python lists are not a good way to store large homogeneous data structures. For that, use the array module or, if you need to do vectorized math, use NumPy.
PS- Tuples, unlike lists, are not designed to have elements progressively appended to them. I don't know how the allocator works, but don't even think about using it for large data structures :-)

Useful links:
How to get memory size/usage of python object
Memory sizes of python objects?
if you put data into dictionary, how do we calculate the data size?
However they don't give a definitive answer. The way to go:
Measure memory consumed by Python interpreter with/without the list (use OS tools).
Use a third-party extension module which defines some sort of sizeof(PyObject).
Update:
Recipe 546530: Size of Python objects (revised)
import asizeof
N = 1000000
print asizeof.asizeof(range(N)) / N
# -> 20 (python 2.5, WinXP, 32-bit Linux)
# -> 33 (64-bit Linux)

Addressing "tuple" part of the question
Declaration of CPython's PyTuple in a typical build configuration boils down to this:
struct PyTuple {
size_t refcount; // tuple's reference count
typeobject *type; // tuple type object
size_t n_items; // number of items in tuple
PyObject *items[1]; // contains space for n_items elements
};
Size of PyTuple instance is fixed during it's construction and cannot be changed afterwards. The number of bytes occupied by PyTuple can be calculated as
sizeof(size_t) x 2 + sizeof(void*) x (n_items + 1).
This gives shallow size of tuple. To get full size you also need to add total number of bytes consumed by object graph rooted in PyTuple::items[] array.
It's worth noting that tuple construction routines make sure that only single instance of empty tuple is ever created (singleton).
References:
Python.h,
object.h,
tupleobject.h,
tupleobject.c

A new function, getsizeof(), takes a
Python object and returns the amount
of memory used by the object, measured
in bytes. Built-in objects return
correct results; third-party
extensions may not, but can define a
__sizeof__() method to return the object’s size.
kveretennicov#nosignal:~/py/r26rc2$ ./python
Python 2.6rc2 (r26rc2:66712, Sep 2 2008, 13:11:55)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
>>> import sys
>>> sys.getsizeof(range(1000000))
4000032
>>> sys.getsizeof(tuple(range(1000000)))
4000024
Obviously returned numbers don't include memory consumed by contained objects (sys.getsizeof(1) == 12).

This is implementation specific, I'm pretty sure. Certainly it depends on the internal representation of integers - you can't assume they'll be stored as 32-bit since Python gives you arbitrarily large integers so perhaps small ints are stored more compactly.
On my Python (2.5.1 on Fedora 9 on core 2 duo) the VmSize before allocation is 6896kB, after is 22684kB. After one more million element assignment, VmSize goes to 38340kB. This very grossly indicates around 16000kB for 1000000 integers, which is around 16 bytes per integer. That suggests a lot of overhead for the list. I'd take these numbers with a large pinch of salt.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.