python read() and write() in large blocks / memory management

python read() and write() in large blocks / memory management - python

I'm writing some python code that splices together large files at various points. I've done something similar in C where I allocated a 1MB char array and used that as the read/write buffer. And it was very simple: read 1MB into the char array then write it out.
But with python I'm assuming it is different, each time I call read() with size = 1M, it will allocate a 1M long character string. And hopefully when the buffer goes out of scope it will we freed in the next gc pass.
Would python handle the allocation this way? If so, is the constant allocation/deallocation cycle be computationally expensive?
Can I tell python to use the same block of memory just like in C? Or is the python vm smart enough to do it itself?
I guess what I'm essentially aiming for is kinda like an implementation of dd in python.

Search site docs.python.org for readinto to find docs appropriate for the version of Python you're using. readinto is a low-level feature. They'll look a lot like this:
readinto(b)
Read up to len(b) bytes into bytearray b and return the number of bytes read.
Like read(), multiple reads may be issued to the underlying raw stream, unless the latter is interactive.
A BlockingIOError is raised if the underlying raw stream is in non blocking-mode, and has no data available at the moment.
But don't worry about it prematurely. Python allocates and deallocates dynamic memory at a ferocious rate, and it's likely that the cost of repeatedly getting & free'ing a measly megabyte will be lost in the noise. And note that CPython is primarily reference-counted, so your buffer will get reclaimed "immediately" when it goes out of scope. As to whether Python will reuse the same memory space each time, the odds are decent but it's not assured. Python does nothing to try to force that, but depending on the entire allocation/deallocation pattern and the details of the system C's malloc()/free() implementation, it's not impossible it will get reused ;-)

Related

When is a write to disk triggered?

In Python, I can open a file with f= open(<filename>,<permissions>). This returns an object f which I can write to using f.write(<some data>).
If, at this point, I access the original final (eg with cat from a terminal), it appears empty: Python stored the data I wrote to the object f and not the actual on-disk file. If I then call f.close(), the data in f is persisted to the on-disk file (and I can access it from other programs).
I assume data is buffered to improve latency. However, what happens if the buffered data grows a lot? Will Python initiate a write? If so, details on the internals (what influences the buffer size? is the disk I/O handled within Python or by another program/thread? is there a chance Python will just hang during the write?) would be much appreciated.

The general subject of I/O buffering has been treated many times (including in questions linked from the comments). But to answer your specific questions:
By default, when writing to a terminal (“the screen”), a newline causes the text to be flushed up through it. For all files, the buffer is flushed each time it fills. (Large single writes might flush any existing buffer contents and then bypass it.)
The buffer has a fixed size and is allocated before any data is written; Python 3 doesn’t use stdio, so it chooses its own buffer sizes. (A few kB is typical.)
The “disk I/O” (really kernel I/O, which is distinguishable only in certain special circumstances like network/power failure) happens within whatever Python write triggers the flush.
Yes, it can hang, if the file is a pipe to a busy process, a socket over a slow network, a special device, or even a regular file mounted from a remote machine.

Release memory in python when removing item from a list [duplicate]

I have a few related questions regarding memory usage in the following example.
If I run in the interpreter,
foo = ['bar' for _ in xrange(10000000)]
the real memory used on my machine goes up to 80.9mb. I then,
del foo
real memory goes down, but only to 30.4mb. The interpreter uses 4.4mb baseline so what is the advantage in not releasing 26mb of memory to the OS? Is it because Python is "planning ahead", thinking that you may use that much memory again?
Why does it release 50.5mb in particular - what is the amount that is released based on?
Is there a way to force Python to release all the memory that was used (if you know you won't be using that much memory again)?
NOTE
This question is different from How can I explicitly free memory in Python?
because this question primarily deals with the increase of memory usage from baseline even after the interpreter has freed objects via garbage collection (with use of gc.collect or not).

I'm guessing the question you really care about here is:
Is there a way to force Python to release all the memory that was used (if you know you won't be using that much memory again)?
No, there is not. But there is an easy workaround: child processes.
If you need 500MB of temporary storage for 5 minutes, but after that you need to run for another 2 hours and won't touch that much memory ever again, spawn a child process to do the memory-intensive work. When the child process goes away, the memory gets released.
This isn't completely trivial and free, but it's pretty easy and cheap, which is usually good enough for the trade to be worthwhile.
First, the easiest way to create a child process is with concurrent.futures (or, for 3.1 and earlier, the futures backport on PyPI):
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
result = executor.submit(func, *args, **kwargs).result()
If you need a little more control, use the multiprocessing module.
The costs are:
Process startup is kind of slow on some platforms, notably Windows. We're talking milliseconds here, not minutes, and if you're spinning up one child to do 300 seconds' worth of work, you won't even notice it. But it's not free.
If the large amount of temporary memory you use really is large, doing this can cause your main program to get swapped out. Of course you're saving time in the long run, because that if that memory hung around forever it would have to lead to swapping at some point. But this can turn gradual slowness into very noticeable all-at-once (and early) delays in some use cases.
Sending large amounts of data between processes can be slow. Again, if you're talking about sending over 2K of arguments and getting back 64K of results, you won't even notice it, but if you're sending and receiving large amounts of data, you'll want to use some other mechanism (a file, mmapped or otherwise; the shared-memory APIs in multiprocessing; etc.).
Sending large amounts of data between processes means the data have to be pickleable (or, if you stick them in a file or shared memory, struct-able or ideally ctypes-able).

Memory allocated on the heap can be subject to high-water marks. This is complicated by Python's internal optimizations for allocating small objects (PyObject_Malloc) in 4 KiB pools, classed for allocation sizes at multiples of 8 bytes -- up to 256 bytes (512 bytes in 3.3). The pools themselves are in 256 KiB arenas, so if just one block in one pool is used, the entire 256 KiB arena will not be released. In Python 3.3 the small object allocator was switched to using anonymous memory maps instead of the heap, so it should perform better at releasing memory.
Additionally, the built-in types maintain freelists of previously allocated objects that may or may not use the small object allocator. The int type maintains a freelist with its own allocated memory, and clearing it requires calling PyInt_ClearFreeList(). This can be called indirectly by doing a full gc.collect.
Try it like this, and tell me what you get. Here's the link for psutil.Process.memory_info.
import os
import gc
import psutil
proc = psutil.Process(os.getpid())
gc.collect()
mem0 = proc.memory_info().rss
# create approx. 10**7 int objects and pointers
foo = ['abc' for x in range(10**7)]
mem1 = proc.memory_info().rss
# unreference, including x == 9999999
del foo, x
mem2 = proc.memory_info().rss
# collect() calls PyInt_ClearFreeList()
# or use ctypes: pythonapi.PyInt_ClearFreeList()
gc.collect()
mem3 = proc.memory_info().rss
pd = lambda x2, x1: 100.0 * (x2 - x1) / mem0
print "Allocation: %0.2f%%" % pd(mem1, mem0)
print "Unreference: %0.2f%%" % pd(mem2, mem1)
print "Collect: %0.2f%%" % pd(mem3, mem2)
print "Overall: %0.2f%%" % pd(mem3, mem0)
Output:
Allocation: 3034.36%
Unreference: -752.39%
Collect: -2279.74%
Overall: 2.23%
Edit:
I switched to measuring relative to the process VM size to eliminate the effects of other processes in the system.
The C runtime (e.g. glibc, msvcrt) shrinks the heap when contiguous free space at the top reaches a constant, dynamic, or configurable threshold. With glibc you can tune this with mallopt (M_TRIM_THRESHOLD). Given this, it isn't surprising if the heap shrinks by more -- even a lot more -- than the block that you free.
In 3.x range doesn't create a list, so the test above won't create 10 million int objects. Even if it did, the int type in 3.x is basically a 2.x long, which doesn't implement a freelist.

eryksun has answered question #1, and I've answered question #3 (the original #4), but now let's answer question #2:
Why does it release 50.5mb in particular - what is the amount that is released based on?
What it's based on is, ultimately, a whole series of coincidences inside Python and malloc that are very hard to predict.
First, depending on how you're measuring memory, you may only be measuring pages actually mapped into memory. In that case, any time a page gets swapped out by the pager, memory will show up as "freed", even though it hasn't been freed.
Or you may be measuring in-use pages, which may or may not count allocated-but-never-touched pages (on systems that optimistically over-allocate, like linux), pages that are allocated but tagged MADV_FREE, etc.
If you really are measuring allocated pages (which is actually not a very useful thing to do, but it seems to be what you're asking about), and pages have really been deallocated, two circumstances in which this can happen: Either you've used brk or equivalent to shrink the data segment (very rare nowadays), or you've used munmap or similar to release a mapped segment. (There's also theoretically a minor variant to the latter, in that there are ways to release part of a mapped segment—e.g., steal it with MAP_FIXED for a MADV_FREE segment that you immediately unmap.)
But most programs don't directly allocate things out of memory pages; they use a malloc-style allocator. When you call free, the allocator can only release pages to the OS if you just happen to be freeing the last live object in a mapping (or in the last N pages of the data segment). There's no way your application can reasonably predict this, or even detect that it happened in advance.
CPython makes this even more complicated—it has a custom 2-level object allocator on top of a custom memory allocator on top of malloc. (See the source comments for a more detailed explanation.) And on top of that, even at the C API level, much less Python, you don't even directly control when the top-level objects are deallocated.
So, when you release an object, how do you know whether it's going to release memory to the OS? Well, first you have to know that you've released the last reference (including any internal references you didn't know about), allowing the GC to deallocate it. (Unlike other implementations, at least CPython will deallocate an object as soon as it's allowed to.) This usually deallocates at least two things at the next level down (e.g., for a string, you're releasing the PyString object, and the string buffer).
If you do deallocate an object, to know whether this causes the next level down to deallocate a block of object storage, you have to know the internal state of the object allocator, as well as how it's implemented. (It obviously can't happen unless you're deallocating the last thing in the block, and even then, it may not happen.)
If you do deallocate a block of object storage, to know whether this causes a free call, you have to know the internal state of the PyMem allocator, as well as how it's implemented. (Again, you have to be deallocating the last in-use block within a malloced region, and even then, it may not happen.)
If you do free a malloced region, to know whether this causes an munmap or equivalent (or brk), you have to know the internal state of the malloc, as well as how it's implemented. And this one, unlike the others, is highly platform-specific. (And again, you generally have to be deallocating the last in-use malloc within an mmap segment, and even then, it may not happen.)
So, if you want to understand why it happened to release exactly 50.5mb, you're going to have to trace it from the bottom up. Why did malloc unmap 50.5mb worth of pages when you did those one or more free calls (for probably a bit more than 50.5mb)? You'd have to read your platform's malloc, and then walk the various tables and lists to see its current state. (On some platforms, it may even make use of system-level information, which is pretty much impossible to capture without making a snapshot of the system to inspect offline, but luckily this isn't usually a problem.) And then you have to do the same thing at the 3 levels above that.
So, the only useful answer to the question is "Because."
Unless you're doing resource-limited (e.g., embedded) development, you have no reason to care about these details.
And if you are doing resource-limited development, knowing these details is useless; you pretty much have to do an end-run around all those levels and specifically mmap the memory you need at the application level (possibly with one simple, well-understood, application-specific zone allocator in between).

First, you may want to install glances:
sudo apt-get install python-pip build-essential python-dev lm-sensors
sudo pip install psutil logutils bottle batinfo https://bitbucket.org/gleb_zhulik/py3sensors/get/tip.tar.gz zeroconf netifaces pymdstat influxdb elasticsearch potsdb statsd pystache docker-py pysnmp pika py-cpuinfo bernhard
sudo pip install glances
Then run it in the terminal!
glances
In your Python code, add at the begin of the file, the following:
import os
import gc # Garbage Collector
After using the "Big" variable (for example: myBigVar) for which, you would like to release memory, write in your python code the following:
del myBigVar
gc.collect()
In another terminal, run your python code and observe in the "glances" terminal, how the memory is managed in your system!
Good luck!
P.S. I assume you are working on a Debian or Ubuntu system

Is there any advantage to reading the entire file

Are there any advantages/disadvantages to reading an entire file in one go rather than reading the bytes as required? So is there any advantage to:
file_handle = open("somefile", rb)
file_contents = file_handle.read()
# do all the things using file_contents
compared to:
file_handle = open("somefile", rb)
part1 = file_handle.read(10)
# do some stuff
part2 = file_handle.read(8)
# do some more stuff etc
Background: I am writing a p-code (bytecode) interpreter in Python and have initially just written a naive implementation that reads bytes from the file as required and performs the necessary actions etc. A friend I was showing the program has suggested that I should instead read the entire file into memory (Python list?) and then process it from memory to avoid lots of slow disk reads. The test files are currently less than 1KB and will probably be at most a few 100KB so I would have expected the Operating System and disk controller system to cache the file obviating any performance issues caused by repeatedly reading small chunks of the file.

Cache aside, you still have system calls. Each read() results in a mode switch to trigger the kernel. You can see this with strace or another tool to look at system calls.
This might be premature for a 100 KB file though. As always, test your code to know for sure.

If you want to do any kind of random access then putting it in a list is going to be much faster than seeking from disk. Even if the OS does cache disk access, you are hitting another layer of cache. In any case, you can't be sure how the OS will behave.
Here are 3 cases I can think of that would motivate doing it in-memory:
You might have a jump instruction which you can execute by adding a number to your program counter. Doing that to the index of an array vs seeking a file is a good use case.
You may want to optimise your VM's behaviour, and that may involve reading the file more than once. Scanning a list twice vs reading a file twice will be much quicker.
Depending on opcodes and the grammar of your language you may want to look ahead in a 'cycle' to speed up execution. If that ends up doing two seeks then this could end up degrading performance.
If your file will always be small enough fit in RAM then it's probably worth reading it all into memory. Profile it with a real program and see if it's noticeably faster.

A single call to read() will be faster than multiple calls to read(). The tradeoff is that with a single call you must be able to fit all data in memory at once, whereas with multiple reads you only have to retain a fraction of the total amount of data. For files that are just a few kilobytes or megabytes, the difference won't be noticeable. For files that are several gigs in size, memory becomes more important.
Also, to do a single read means all of the data must be present, whereas multiple reads can be used to process data as it is streaming in from an external source.

If you are looking for performance, I would recommend going through generators. Since you have small file size, memory would not be any big concern, but its still a good practice. Still reading file from disc multiple times is a definite bottleneck for a scalable solution.

Releasing memory in Python

I have a few related questions regarding memory usage in the following example.
If I run in the interpreter,
foo = ['bar' for _ in xrange(10000000)]
the real memory used on my machine goes up to 80.9mb. I then,
del foo
real memory goes down, but only to 30.4mb. The interpreter uses 4.4mb baseline so what is the advantage in not releasing 26mb of memory to the OS? Is it because Python is "planning ahead", thinking that you may use that much memory again?
Why does it release 50.5mb in particular - what is the amount that is released based on?
Is there a way to force Python to release all the memory that was used (if you know you won't be using that much memory again)?
NOTE
This question is different from How can I explicitly free memory in Python?
because this question primarily deals with the increase of memory usage from baseline even after the interpreter has freed objects via garbage collection (with use of gc.collect or not).

Memory allocated on the heap can be subject to high-water marks. This is complicated by Python's internal optimizations for allocating small objects (PyObject_Malloc) in 4 KiB pools, classed for allocation sizes at multiples of 8 bytes -- up to 256 bytes (512 bytes in 3.3). The pools themselves are in 256 KiB arenas, so if just one block in one pool is used, the entire 256 KiB arena will not be released. In Python 3.3 the small object allocator was switched to using anonymous memory maps instead of the heap, so it should perform better at releasing memory.
Additionally, the built-in types maintain freelists of previously allocated objects that may or may not use the small object allocator. The int type maintains a freelist with its own allocated memory, and clearing it requires calling PyInt_ClearFreeList(). This can be called indirectly by doing a full gc.collect.
Try it like this, and tell me what you get. Here's the link for psutil.Process.memory_info.
import os
import gc
import psutil
proc = psutil.Process(os.getpid())
gc.collect()
mem0 = proc.memory_info().rss
# create approx. 10**7 int objects and pointers
foo = ['abc' for x in range(10**7)]
mem1 = proc.memory_info().rss
# unreference, including x == 9999999
del foo, x
mem2 = proc.memory_info().rss
# collect() calls PyInt_ClearFreeList()
# or use ctypes: pythonapi.PyInt_ClearFreeList()
gc.collect()
mem3 = proc.memory_info().rss
pd = lambda x2, x1: 100.0 * (x2 - x1) / mem0
print "Allocation: %0.2f%%" % pd(mem1, mem0)
print "Unreference: %0.2f%%" % pd(mem2, mem1)
print "Collect: %0.2f%%" % pd(mem3, mem2)
print "Overall: %0.2f%%" % pd(mem3, mem0)
Output:
Allocation: 3034.36%
Unreference: -752.39%
Collect: -2279.74%
Overall: 2.23%
Edit:
I switched to measuring relative to the process VM size to eliminate the effects of other processes in the system.
The C runtime (e.g. glibc, msvcrt) shrinks the heap when contiguous free space at the top reaches a constant, dynamic, or configurable threshold. With glibc you can tune this with mallopt (M_TRIM_THRESHOLD). Given this, it isn't surprising if the heap shrinks by more -- even a lot more -- than the block that you free.
In 3.x range doesn't create a list, so the test above won't create 10 million int objects. Even if it did, the int type in 3.x is basically a 2.x long, which doesn't implement a freelist.

eryksun has answered question #1, and I've answered question #3 (the original #4), but now let's answer question #2:
Why does it release 50.5mb in particular - what is the amount that is released based on?
What it's based on is, ultimately, a whole series of coincidences inside Python and malloc that are very hard to predict.
First, depending on how you're measuring memory, you may only be measuring pages actually mapped into memory. In that case, any time a page gets swapped out by the pager, memory will show up as "freed", even though it hasn't been freed.
Or you may be measuring in-use pages, which may or may not count allocated-but-never-touched pages (on systems that optimistically over-allocate, like linux), pages that are allocated but tagged MADV_FREE, etc.
If you really are measuring allocated pages (which is actually not a very useful thing to do, but it seems to be what you're asking about), and pages have really been deallocated, two circumstances in which this can happen: Either you've used brk or equivalent to shrink the data segment (very rare nowadays), or you've used munmap or similar to release a mapped segment. (There's also theoretically a minor variant to the latter, in that there are ways to release part of a mapped segment—e.g., steal it with MAP_FIXED for a MADV_FREE segment that you immediately unmap.)
But most programs don't directly allocate things out of memory pages; they use a malloc-style allocator. When you call free, the allocator can only release pages to the OS if you just happen to be freeing the last live object in a mapping (or in the last N pages of the data segment). There's no way your application can reasonably predict this, or even detect that it happened in advance.
CPython makes this even more complicated—it has a custom 2-level object allocator on top of a custom memory allocator on top of malloc. (See the source comments for a more detailed explanation.) And on top of that, even at the C API level, much less Python, you don't even directly control when the top-level objects are deallocated.
So, when you release an object, how do you know whether it's going to release memory to the OS? Well, first you have to know that you've released the last reference (including any internal references you didn't know about), allowing the GC to deallocate it. (Unlike other implementations, at least CPython will deallocate an object as soon as it's allowed to.) This usually deallocates at least two things at the next level down (e.g., for a string, you're releasing the PyString object, and the string buffer).
If you do deallocate an object, to know whether this causes the next level down to deallocate a block of object storage, you have to know the internal state of the object allocator, as well as how it's implemented. (It obviously can't happen unless you're deallocating the last thing in the block, and even then, it may not happen.)
If you do deallocate a block of object storage, to know whether this causes a free call, you have to know the internal state of the PyMem allocator, as well as how it's implemented. (Again, you have to be deallocating the last in-use block within a malloced region, and even then, it may not happen.)
If you do free a malloced region, to know whether this causes an munmap or equivalent (or brk), you have to know the internal state of the malloc, as well as how it's implemented. And this one, unlike the others, is highly platform-specific. (And again, you generally have to be deallocating the last in-use malloc within an mmap segment, and even then, it may not happen.)
So, if you want to understand why it happened to release exactly 50.5mb, you're going to have to trace it from the bottom up. Why did malloc unmap 50.5mb worth of pages when you did those one or more free calls (for probably a bit more than 50.5mb)? You'd have to read your platform's malloc, and then walk the various tables and lists to see its current state. (On some platforms, it may even make use of system-level information, which is pretty much impossible to capture without making a snapshot of the system to inspect offline, but luckily this isn't usually a problem.) And then you have to do the same thing at the 3 levels above that.
So, the only useful answer to the question is "Because."
Unless you're doing resource-limited (e.g., embedded) development, you have no reason to care about these details.
And if you are doing resource-limited development, knowing these details is useless; you pretty much have to do an end-run around all those levels and specifically mmap the memory you need at the application level (possibly with one simple, well-understood, application-specific zone allocator in between).

First, you may want to install glances:
sudo apt-get install python-pip build-essential python-dev lm-sensors
sudo pip install psutil logutils bottle batinfo https://bitbucket.org/gleb_zhulik/py3sensors/get/tip.tar.gz zeroconf netifaces pymdstat influxdb elasticsearch potsdb statsd pystache docker-py pysnmp pika py-cpuinfo bernhard
sudo pip install glances
Then run it in the terminal!
glances
In your Python code, add at the begin of the file, the following:
import os
import gc # Garbage Collector
After using the "Big" variable (for example: myBigVar) for which, you would like to release memory, write in your python code the following:
del myBigVar
gc.collect()
In another terminal, run your python code and observe in the "glances" terminal, how the memory is managed in your system!
Good luck!
P.S. I assume you are working on a Debian or Ubuntu system

Virtual memory management in Python (2.6-2.7) - Will it reuse the allocated memory for any type of data?

I have a question about the virtual memory in Python.
When the process is consuming a relatively large amount of memory, it doesn't "release" the unused memory. For example, after creating a massive list of strings, let's say the list uses 30MB of memory, so the entire process takes roughly 40MB, when the list is deleted, the process still consuming 40MB, but if another list with the same amount of data is created, the process will not take more memory, because it will use the virtual memory that is available but not released to the OS.
My question is: What kind of data will reuse that non-released virtual memory? I mean, that 30MB was "taken" from the OS when I created a list of strings, and even when I delete it, the next list of strings will not take more memory from the OS as long as it fits in the 30MB. But if instead a list of strings another type of data is created, like a QPixmap (from Qt, using PyQt), will it use that 30MB originally allocated by the list of strings?
Thank you in advance.
Edit: Well, this question sounds lazy. I know I could simply test this specific case, but i want to know in theory, I don't want the answer for this "list of strings and qpixmap" specific case, but in general.

At the C level (CPython's implementation), anything that is allocated on the heap with malloc() will consume memory and this memory will not be released to the OS when that memory is freed with free(). It will only be returned when the process dies. But when new blocks are allocated with malloc() they will use the freed-up memory.
(Unless the free memory is really badly fragmented and there is not enough contiguous free space in the freed-up zones to accommodate new allocations. But let's not worry about this pathological case.)
Every Python object is implemented by CPython as one or more blocks of memory allocated with malloc() so the answer to your question is: pretty much any piece of Python data can reuse the space that was freed by the deallocation of some other piece of Python data.

There are two parts to the problem of "freeing" memory: first, getting Python to garbage collect the objects, and second, getting unused memory returned to the OS at the C level.
If you are having problems with process size growing without bounds, you are almost certainly not allowing objects to be garbage collected. 99.9% of the time (to 0 significant digits :) ) if you are trying to second-guess Python's C-level memory management, you are in a bunny hole.
Remember that in Python your objects are not even candidates to be garbage collected until there are no more live objects with references to them. You can very easily squirrel away a reference to an object somewhere without realizing it.
There's a Python tool called Dowser that is very helpful at finding leaks of memory caused by keeping around references to objects. If you see your object count for a certain class growing without bounds over time.... there's your memory problem.
Good luck!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.