I was testing the speeds of a few different ways to do complex iterations over some of my data, and I found something weird. It seems that having a large list local to some function slows down that function considerably, even if it is not touching that list. For example, creating 2 independent lists via 2 instances of the same generator function is about 2.5x slower the second time. If the first list is removed prior to creating the second, both iterators go at the same spee.
def f():
l1, l2 = [], []
for c1, c2 in generatorFxn():
l1.append((c1, c2))
# destroying l1 here fixes the problem
for c3, c4 in generatorFxn():
l2.append((c3, c4))
The lists end up about 3.1 million items long each, but I saw the same effect with smaller lists too. The first for loop takes about 4.5 seconds to run, the second takes 10.5. If I insert l1= [] or l1= len(l1) at the comment position, both for loops take 4.5 seconds.
Why does the speed of local memory allocation in a function have anything to do with the current size of that function's variables?
EDIT:
Disabling the garbage collector fixes everything, so must be due to it running constantly. Case closed!
When you create that many new objects (3 million tuples), the garbage collector gets bogged down. If you turn off garbage collection with gc.disable(), the issue goes away (and the program runs 4x faster to boot).
It's impossible to say without more detailed instrumentation.
As a very, very preliminary step, check your main memory usage. If your RAM is all filled up and your OS is paging to disk, your performance will be quite dreadful. In such a case, you may be best off taking your intermediate products and putting them somewhere other than in memory. If you only need sequential reads of your data, consider writing to a plain file; if your data follows a strict structure, consider persisting into a relational database.
My guess is that when the first list is made, there is more memory available, meaning less chance that the list needs to be reallocated as it grows.
After you take up a decent chunk of memory with the first list, your second list has a higher
chance of needing to be reallocated as it grows since python lists are dynamically sized.
The memory used by the data local to the function isn't going to be garbage-collected until the function returns. Unless you have a need to do slicing, using lists for large collections of data is not a great idea.
From your example it's not entirely clear what the purpose of creating these lists are. You might want to consider using generators instead of lists, especially if the lists are just going to be iterated. If you need to do slicing on the return data, cast the generators to lists at that time.
Related
I need to find out an optimal selection of media, based on certain constraints. I am doing it in FOUR nested for loop and since it would take about O(n^4) iterations, it is slow. I had been trying to make it faster but it is still damn slow. My variables can be as high as couple of thousands.
Here is a small example of what I am trying to do:
max_disks = 5
max_ssds = 5
max_tapes = 1
max_BR = 1
allocations = []
for i in range(max_disks):
for j in range(max_ssds):
for k in range(max_tapes):
for l in range(max_BR):
allocations.append((i,j,k,l)) # this is just for example. In actual program, I do processing here, like checking for bandwidth and cost constraints, and choosing the allocation based on that.
It wasn't slow for up to hundreds of each media type but would slow down for thousands.
Other way I tried is :
max_disks = 5
max_ssds = 5
max_tapes = 1
max_BR = 1
allocations = [(i,j,k,l) for i in range(max_disks) for j in range(max_ssds) for k in range(max_tapes) for l in range(max_BR)]
This way it is slow even for such small numbers.
Two questions:
Why the second one is slow for small numbers?
How can I make my program work for big numbers (in thousands)?
Here is the version with itertools.product
max_disks = 500
max_ssds = 100
max_tapes = 100
max_BR = 100
# allocations = []
for i, j, k,l in itertools.product(range(max_disks),range(max_ssds),range(max_tapes),range(max_BR)):
pass
It takes 19.8 seconds to finish with these numbers.
From the comments, I got that you're working on a problem that can be rewritten as an ILP. You have several constraints, and need to find a (near) optimal solution.
Now, ILPs are quite difficult to solve, and brute-forcing them quickly becomes intractable (as you've already witnessed). This is why there are several really clever algorithms used in the industry that truly work magic.
For Python, there are quite a few interfaces that hook-up to modern solvers; for more details, see e.g. this SO post. You could also consider using an optimizer, like SciPy optimize, but those generally don't do integer programming.
Doing any operation in Python a trillion times is going to be slow. However, that's not all you're doing. By attempting to store all the trillion items in a single list you are storing lots of data in memory and manipulating it in a way that creates a lot of work for the computer to swap memory in and out once it no longer fits in RAM.
The way that Python lists work is that they allocate some amount of memory to store the items in the list. When you fill up the list and it needs to allocate more, Python will allocate twice as much memory and copy all the old entries into the new storage space. This is fine so long as it fits in memory - even though it has to copy all the contents of the list each time it expands the storage, it has to do so less frequently as it keeps doubling the size. The problem comes when it runs out of memory and has to swap unused memory out to disk. The next time it tries to resize the list, it has to reload from disk all the entries that are now swapped out to disk, then swap them all back out again to get space to write the new entries. So this creates lots of slow disk operations that will get in the way of your task and slow it down even more.
Do you really need to store every item in a list? What are you going to do with them when you're done? You could perhaps write them out to disk as you're going instead of accumulating them in a giant list, though if you have a trillion of them, that's still a very large amount of data! Or perhaps you're filtering most of them out? That will help.
All that said, without seeing the actual program itself, it's hard to know if you have a hope of completing this work by an exhaustive search. Can all the variables be on the thousands scale at once? Do you really need to consider every combination of these variables? When max_disks==2000, do you really need to distinguish the results for i=1731 from i=1732? For example, perhaps you could consider values of i 1,2,3,4,5,10,20,30,40,50,100,200,300,500,1000,2000? Or perhaps there's a mathematical solution instead? Are you just counting items?
I'm able to find a bevy of information online (on Stack Overflow and otherwise) about how it's a very inefficient and bad practice to use + or += for concatenation in Python.
I can't seem to find WHY += is so inefficient. Outside of a mention here that "it's been optimized for 20% improvement in certain cases" (still not clear what those cases are), I can't find any additional information.
What is happening on a more technical level that makes ''.join() superior to other Python concatenation methods?
Let's say you have this code to build up a string from three strings:
x = 'foo'
x += 'bar' # 'foobar'
x += 'baz' # 'foobarbaz'
In this case, Python first needs to allocate and create 'foobar' before it can allocate and create 'foobarbaz'.
So for each += that gets called, the entire contents of the string and whatever is getting added to it need to be copied into an entirely new memory buffer. In other words, if you have N strings to be joined, you need to allocate approximately N temporary strings and the first substring gets copied ~N times. The last substring only gets copied once, but on average, each substring gets copied ~N/2 times.
With .join, Python can play a number of tricks since the intermediate strings do not need to be created. CPython figures out how much memory it needs up front and then allocates a correctly-sized buffer. Finally, it then copies each piece into the new buffer which means that each piece is only copied once.
There are other viable approaches which could lead to better performance for += in some cases. E.g. if the internal string representation is actually a rope or if the runtime is actually smart enough to somehow figure out that the temporary strings are of no use to the program and optimize them away.
However, CPython certainly does not do these optimizations reliably (though it may for a few corner cases) and since it is the most common implementation in use, many best-practices are based on what works well for CPython. Having a standardized set of norms also makes it easier for other implementations to focus their optimization efforts as well.
I think this behaviour is best explained in Lua's string buffer chapter.
To rewrite that explanation in context of Python, let's start with an innocent code snippet (a derivative of the one at Lua's docs):
s = ""
for l in some_list:
s += l
Assume that each l is 20 bytes and the s has already been parsed to a size of 50 KB. When Python concatenates s + l it creates a new string with 50,020 bytes and copies 50 KB from s into this new string. That is, for each new line, the program moves 50 KB of memory, and growing. After reading 100 new lines (only 2 KB), the snippet has already moved more than 5 MB of memory. To make things worse, after the assignment
s += l
the old string is now garbage. After two loop cycles, there are two old strings making a total of more than 100 KB of garbage. So, the language compiler decides to run its garbage collector and frees those 100 KB. The problem is that this will happen every two cycles and the program will run its garbage collector two thousand times before reading the whole list. Even with all this work, its memory usage will be a large multiple of the list's size.
And, at the end:
This problem is not peculiar to Lua: Other languages with true garbage
collection, and where strings are immutable objects, present a similar
behavior, Java being the most famous example. (Java offers the
structure StringBuffer to ameliorate the problem.)
Python strings are also immutable objects.
I notice that when using sys.getsizeof() to check the size of list and dictionary, something interesting happens.
i have:
a = [1,2,3,4,5]
with the size of 56 bytes (and empty list has size of 36, so it makes sense because 20/5 = 4)
however, after I remove all the items in the list (using .remove or del), the size is still 56. This is strange to me. Shouldn't the size be back to 36?
Any explanation?
The list doesn't promise to release memory when you remove elements. Lists are over-allocated, which is how they can have amortized O(1) performance for appending elements.
Details of the time performance of the data structures: http://wiki.python.org/moin/TimeComplexity
Increasing the size of a container can be an expensive operation, since it may require that a lot of things be moved around in memory. So Python almost always allocates more memory than is needed for the current contents of a list, allowing any individual addition to the list to have a very good chance of being performed without needing to move memory. For similar reasons, a list may not release the memory for deleted elements immediately, or ever.
However, if you delete all the elements at once using a slice assignment:
a[:] = []
that seems to reset it. This is an implementation detail, however.
When you append an item to a Python list, it allocates a given amount of memory if the already allocated memory for the list is full. When you remove an item from a list, it keeps memory allocated for the next time you would append items to the list. See this related post for an example.
I have a Python GAE app that stores data in each instance, and the memory usage is much higher than I’d expected. As an illustration, consider this test code which I’ve added to my app:
from google.appengine.ext import webapp
bucket = []
class Memory(webapp.RequestHandler):
def get(self):
global bucket
n = int(self.request.get('n'))
size = 0
for i in range(n):
text = '%10d' % i
bucket.append(text)
size += len(text)
self.response.out.write('Total number of characters = %d' % size)
A call to this handler with a value for query variable n will cause the instance to add n strings to its list, each 10 characters long.
If I call this with n=1 (to get everything loaded) and then check the instance memory usage on the production server, I see a figure of 29.4MB. If I then call it with n=100000 and check again, memory usage has jumped to 38.9MB. That is, my memory footprint has increased by 9.5MB to store only one million characters, nearly ten times what I’d expect. I believe that characters consume only one byte each, but even if that’s wrong there’s still a long way to go. Overhead of the list structure surely can’t explain it. I tried adding an explicit garbage collection call, but the figures didn’t change. What am I missing, and is there a way to reduce the footprint?
(Incidentally, I tried using a set instead of a list and found that after calling with n=100000 the memory usage increased by 13MB. That suggests that the set overhead for 100000 strings is 3.5MB more than that of lists, which is also much greater than expected.)
I know that I'm really late to the party here, but this isn't surprising at all...
Consider a string of length 1:
s = '1'
That's pretty small, right? Maybe somewhere on the order of 1 byte? Nope.
>>> import sys
>>> sys.getsizeof('1')
38
So there are approximately 37 bytes of overhead associated with each string that you create (all of those string methods need to be stored somewhere).
Additionally it's usually most efficient for your CPU to store items based on "word size" rather than byte size. On lots of systems, a "word" is 4 bytes...). I don't know for certain, but I wouldn't be surprised if python's memory allocator plays tricks there too to keep it running fairly quickly.
Also, don't forget that lists are represented as over-allocated arrays (to prevent huge performance problems each time you .append). It is possible that, when you make a list of 100k elements, python actually allocates pointers for 110k or more.
Finally, regarding set -- That's probably fairly easily explained by the fact that set are even more over-allocated than list (they need to avoid all those hash collisions after all). They end up having large jumps in memory usage as the set size grows in order to have enough free slots in the array to avoid hash collisions:
>>> sys.getsizeof(set([1]))
232
>>> sys.getsizeof(set([1, 2]))
232
>>> sys.getsizeof(set([1, 2, 3]))
232
>>> sys.getsizeof(set([1, 2, 3, 4]))
232
>>> sys.getsizeof(set([1, 2, 3, 4, 5]))
232
>>> sys.getsizeof(set([1, 2, 3, 4, 5, 6])) # resize!
744
The overhead of the list structure doesn't explain what you're seeing directly, but memory fragmentation does. And strings have a non-zero overhead in terms of underlying memory, so counting string lengths is going to undercount significantly.
I'm not an expert, but this is an interesting question. It seems like it's more of a python memory management issue than a GAE issue. Have you tried running it locally and comparing the memory usage on your local dev_appserver vs deployed on GAE? That should indicate whether it's the GAE platform, or just python.
Secondly, the python code you used is simple, but not very efficient, a list comprehension instead of the for loop should be more efficient. This should reduce the memory usage a bit:
''.join([`%10d` % i for i in range(n)])
Under the covers your growing string must be constantly reallocated. Every time through the for loop, there's a discarded string left lying around. I would have expected that triggering the garbage collector after your for loop should have cleaned up the extra strings though.
Try triggering the garbage collector before you check the memory usage.
import gc
gc.collect()
return len(gc.get_objects())
That should give you an idea if the garbage collector hasn't cleaned out some of the extra strings.
This is largely a response to dragonx.
The sample code exists only to illustrate the problem, so I wasn't concerned with small efficiencies. I am instead concerned about why the application consumes around ten times as much memory as there is actual data. I can understand there being some memory overhead, but this much?
Nonetheless, I tried using a list comprehension (without the join, to match my original) and the memory usage increases slightly, from 9.5MB to 9.6MB. Perhaps that's within the margin of error. Or perhaps the large range() expression sucks it up; it's released, no doubt, but better to use xrange(), I think. With the join the instance variable is set to one very long string, and the memory footprint unsurprisingly drops to a sensible 1.1MB, but this isn't the same case at all. You get the same 1.1MB just setting the instance variable to one million characters without using a list comprehension.
I'm not sure I agree that with my loop "there's a discarded string left lying around." I believe that the string is added to the list (by reference, if that's proper to say) and that no strings are discarded.
I had already tried explicit garbage collection, as my original question states. No help there.
Here's a telling result. Changing the length of the strings from 10 to some other number causes a proportional change in memory usage, but there's a constant in there as well. My experiments show that for every string added to the list there's an 85 byte overhead, no matter what the string length. Is this the cost for strings or for putting the strings into a list? I lean toward the latter. Creating a list of 100,000 None’s consumes 4.5MB, or around 45 bytes per None. This isn't as bad as for strings, but it's still pretty bad. And as I mentioned before, it's worse for sets than it is for lists.
I wish I understood why the overhead (or fragmentation) was this bad, but the inescapable conclusion seems to be that large collections of small objects are extremely expensive. You're probably right that this is more of a Python issue than a GAE issue.
Many methods that used to return lists in Python 2.x now seem to return iterators in Py3k
Are iterators also generator expressions? Lazy evaluation?
Thus, with this the memory footprint of python is going to reduce drastically. Isn't it?
What about for the programs converted from 2to3 using the builtin script?
Does the builtin tool explicitly convert all the returned iterators into lists, for compatibility? If so then the lower memory footprint benefit of Py3k is not really apparent in the converted programs. Is it?
Many of them are not exactly iterators, but special view objects. For instance range() now returns something similar to the old xrange object - it can still be indexed, but lazily constructs the integers as needed.
Similarly dict.keys() gives a dict_keys object implementing a view on the dict, rather than creating a new list with a copy of the keys.
How this affects memory footprints probably depends on the program. Certainly there's more of an emphasis towards using iterators unless you really need lists, whereas using lists was generally the default case in python2. That will cause the average program to probably be more memory efficient. Cases where there are really big savings are probably going to already be implemented as iterators in python2 programs however, as really large memory usage will stand out, and is more likely to be already addressed. (eg. the file iterator is already much more memory efficient than the older file.readlines() method)
Converting is done by the 2to3 tool, and will generally convert things like range() to iterators where it can safely determine a real list isn't needed, so code like:
for x in range(10): print x
will switch to the new range() object, no longer creating a list, and so will obtain the reduced memory benefit, but code like:
x = range(20)
will be converted as:
x = list(range(20))
as the converter can't know if the code expects a real list object in x.
Are iterators also generator expressions? Lazy evaluation?
An iterator is just an object with a next method. What the documentation means most of the time when saying that a function returns an iterator is that its result is lazily loaded.
Thus, with this the memory footprint of python is going to reduce drastically. Isn't it?
It depends. I'd guess that the average program wouldn't notice a huge difference though. The performance advantages of iterators over lists is really only significant if you have a large dataset. You may want to see this question.
One of the biggest benefits of iterators over lists isn't memory, it is actually computation time. For instance, in Python 2:
for i in range(1000000): # spend a bunch of time making a big list
if i == 0:
break # Building the list was a waste since we only looped once
Now take for instance:
for i in xrange(1000000): # starts loop almost immediately
if i == 0:
break # we did't waste time even if we break early
Although the example is contrived, the use case isn't: loops are often broken out of mid-way. Building an entire list to only use part of it is a waste unless you are going to use it more than once. If that is the case, you can explicitly build a list: r = list(range(100)). This is why iterators are the default in more places in Python 3; you aren't out anything since you can still explicitly create lists (or other containers) when you need. But you aren't forced to when all you plan to do is iterate over an iterable once (which I would argue is the much more common case).