I was tracking down an out of memory bug, and was horrified to find that python's multiprocessing appears to copy large arrays, even if I have no intention of using them.
Why is python (on Linux) doing this, I thought copy-on-write would protect me from any extra copying? I imagine that whenever I reference the object some kind of trap is invoked and only then is the copy made.
Is the correct way to solve this problem for an arbitrary data type, like a 30 gigabyte custom dictionary to use a Monitor? Is there some way to build Python so that it doesn't have this nonsense?
import numpy as np
import psutil
from multiprocessing import Process
mem=psutil.virtual_memory()
large_amount=int(0.75*mem.available)
def florp():
print("florp")
def bigdata():
return np.ones(large_amount,dtype=np.int8)
if __name__=='__main__':
foo=bigdata()#Allocated 0.75 of the ram, no problems
p=Process(target=florp)
p.start()#Out of memory because bigdata is copied?
print("Wow")
p.join()
Running:
[ebuild R ] dev-lang/python-3.4.1:3.4::gentoo USE="gdbm ipv6 ncurses readline ssl threads xml -build -examples -hardened -sqlite -tk -wininst" 0 KiB
I'd expect this behavior -- when you pass code to Python to compile, anything that's not guarded behind a function or object is immediately execed for evaluation.
In your case, bigdata=np.ones(large_amount,dtype=np.int8) has to be evaluated -- unless your actual code has different behavior, florp() not being called has nothing to do with it.
To see an immediate example:
>>> f = 0/0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
>>> def f():
... return 0/0
...
>>>
To apply this to your code, put bigdata=np.ones(large_amount,dtype=np.int8) behind a function and call it as your need it, otherwise, Python is trying to be hepful by having that variable available to you at runtime.
If bigdata doesn't change, you could write a function that gets or sets it on an object that you keep around for the duration of the process.
edit: Coffee just started working. When you make a new process, Python will need to copy all objects into that new process for access. You can avoid this by using threads or by a mechanism that will allow you to share memory between processes such as shared memory maps or shared ctypes
The problem was that by default Linux checks for the worst case memory usage, which can indeed exceed memory capacity. This is true even if the python language doesn't exposure the variables. You need to turn off "overcommit" system wide, to achieve the expected COW behavior.
sysctl `vm.overcommit_memory=2'
See https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
Related
Running Python 3.4 on Windows 7, the close function of Gio.MemoryInputStream does not free the memory, as it should. The test code is :
from gi.repository import Gio
import os, psutil
process = psutil.Process(os.getpid())
for i in range (1,10) :
input_stream = Gio.MemoryInputStream.new_from_data(b"x" * 10**7)
x = input_stream.close_async(2)
y = int(process.memory_info().rss / 10**6) # Get the size of memory used by the program
print (x, y)
This returns :
True 25
True 35
True 45
True 55
True 65
True 75
True 85
True 95
True 105
This shows that on each loop, the memory used by the program increases of 10 MB, even if the close function returned True.
How is it possible to free the memory, once the Stream is closed ?
Another good solution would be to reuse the stream. But set_data or replace_data raises the following error :
'Data access methods are unsupported. Use normal Python attributes instead'
Fine, but which property ?
I need a stream in memory in Python 3.4. I create a Pdf File with PyPDF2, and then I want to preview it with Poppler. Due to a bug in Poppler (see Has anyone been able to use poppler new_from_data in python?) I cannot use the new_from_data function and would like to use the new_from_stream function.
This is a bug in GLib’s Python bindings which can’t be trivially fixed.
Instead, you should use g_memory_input_stream_new_from_bytes(), which handles freeing memory differently, and shouldn’t suffer from the same bug.
In more detail, the bug with new_from_data() is caused by the introspection annotations, which GLib uses to allow language bindings to automatically expose all of its API, not supporting the GDestroyNotify parameter for new_from_data() which needs to be set to a non-NULL function to free the allocated memory which is passed in to the other arguments. Running your script under gdb shows that pygobject passes NULL to the GDestroyNotify parameter. It can’t do any better, since there is currently no way of expressing that the memory management semantics of the data parameter depend on what’s passed to destroy.
Thanks for your answer, #Philip Withnall. I tested the solution you propose, and it works. To help others to understand, here is my test code :
from gi.repository import Gio, GLib
import os, psutil
process = psutil.Process(os.getpid())
for i in range (1,10) :
input_stream = Gio.MemoryInputStream.new_from_bytes(GLib.Bytes(b"x" * 10**7))
x = input_stream.close()
y = int(process.memory_info().rss / 10**6) # Get the size of memory used by the program
print (x, y)
Now y non longer grows.
I'll start this by saying I'm not the most familiar with python, and this issue could be a more general python thing that I don't get (i.e. a glaringly obvious duplicate).
In the python bindings for the ev3, a motor is referenced like this:
# hardware.py #
import ev3dev.ev3 as ev3
motor = ev3.LargeMotor('outA')
motor.connected
Where 'outA' is the output port on the robot that the motor is connected to.
If I then do:
$:python hardware.py
I get no issues and I can use the motor normally. However, if I write a new file
# do_something.py #
from hardware import *
I get an error:
Exception TypeError: "'NoneType' object is not callable" in <bound method LargeMotor.__del__ of <ev3dev.core.LargeMotor object at 0xb67d2fd0>> ignored
Does anyone know why this is happening? Is it a python thing or an ev3 thing?
My reason for wanting to import in this way is so that I can do all of the hardware setup in one file (a sizeable chunk of code) and then import this to the files that actually make the robot perform tasks.
I know that NoneType is the type of None in python, I just don't know why a direct compile works but an import doesn't.
1st Edit:
Ok, so I compiled it as:
$:python hardware.py do_something.py
$:python do_something.py
And this gave no errors.
However, upon request, I've added more code, hardware.py is the same:
# do_something.py #
from hardware import *
counter = 0
while True:
if (counter >= 1000):
break
motor.run_direct(duty_cycle_sp = 20)
counter += 1
I.e. run the motor at a cycle speed of 20 until we've been through a thousand loop iterations.This works, and runs until the loop breaks and the script ends. The same NoneType error is then given and the motor continues to run even though the script has finished. The behaviour is the same with a KeyboardInterrupt. There is no traceback given, just that error message.
First of all, python is a language at which it's codes are formed from words, and on the other hand the Lego Mindstorms "language" is formed of simple blocks. That said, logically the two languages cannot be mixed together and have nothing in common. And with having vast experience with both, I never found any thing in common between them.
For example i have a code that produces many integers.
import sys
import random
a = [random.randint(0, sys.maxint) for i in xrange(10000000)]
After running it i got VIRT 350M, RES 320M (view by htop).
Then i do:
del a
But memory still is VIRT 272M, RES 242M (before producing integers was VIRT 24M, RES 6M).
The pmap of a process say that there are to big pieces of [anon] memory.
Python 3.4 does not have such behavior: memory are frees when i delete list here!
What happens? Does python leave integers in memory?
Here's how I can duplicate it. If I start python 2.7, the interpreter uses about 4.5 MB of memory. (I'm quoting "Real Mem" values from the Mac OS X Activity Monitor.app).
>>> a = [random.randint(0, sys.maxint) for i in xrange(10000000)]
Now, memory usage is ~ 305.7 MB.
>>> del a
Removing a seems to have no effect on memory.
>>> import gc
>>> gc.collect() # perform a full collection
Now, memory usage is 27.7 MB. Sometimes, the first call to collect() doesn't seem to do anything, but a second collect() call will clean things up.
But, this behavior is by design, Python isn't leaking. This old FAQ on effbot.org explains a bit more about what's happening:
“For speed”, Python maintains an internal free list for integer objects. Unfortunately, that free list is both immortal and unbounded in size. floats also use an immortal & unbounded free list.
Essentially, python is treating the integers as singletons, under the assumption that you might use them more than once.
Consider this:
# 4.5 MB
>>> a = [object() for i in xrange(10000000)]
# 166.7 MB
>>> del a
# 9.1 MB
In this case, python it's pretty obvious that python is not keeping the objects around in memory, and removing a triggers a garbage collection which cleans everything up.
As I recall, python will actually keep low-valued integers in memory forever (0 - 1000 or so). This may explain why the gc.collect() call doesn't return as much memory as removing the list of objects.
I looked around through the PEPs a bit to figure out why Python3 is different. However, I didn't see anything obvious. If you really wanted to know, you could dig around in the source code.
Suffice to say in Python 3, it either the number-singleton behavior has changed, or the garbage collector got better.
Many things are better in Python 3.
I'm thoroughly confused about the memory usage of a specific python script. I guess I don't really know how to profile the usage despite advice from several SO Questions/Answers.
My questions are: What's the difference between memory_profiler and guppy.hpy? Why is one telling me I'm using huge amounts of memory, and the other is telling me I'm not?
I'm working with pysam, a library for accessing bioinformatics SAM/BAM files. My main script is running out of memory quickly when converting SAM (ASCII) to BAM (Binary) and manipulating the files in between.
I created a small test example to understand how much memory gets allocated at each step.
# test_pysam.py:
import pysam
#from guppy import hpy
TESTFILENAME = ('/projectnb/scv/yannpaul/MAR_CEJ082/' +
'test.sam')
#H = hpy()
#profile # for memory_profiler
def samopen(filename):
# H.setrelheap()
samf = pysam.Samfile(filename)
# print H.heap()
pass
if __name__ == "__main__":
samopen(TESTFILENAME)
Monitoring the memory usage with memory_profiler (python -m memory_profiler test_pysam.py) results in the following output:
Filename: test_pysam.py
Line # Mem usage Increment Line Contents
================================================
10 #profile # for memory_profiler
11 def samopen(filename):
12 10.48 MB 0.00 MB # print H.setrelheap()
13 539.51 MB 529.03 MB samf = pysam.Samfile(filename)
14 # print H.heap()
15 539.51 MB 0.00 MB pass
Then commenting out #profile decorator and uncommenting the guppy related lines, I get the following output (python test_pysam.py):
Partition of a set of 3 objects. Total size = 624 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1 33 448 72 448 72 types.FrameType
1 1 33 88 14 536 86 __builtin__.weakref
2 1 33 88 14 624 100 csamtools.Samfile
The total size of line 13 is 529.03 MB in one case and 624 bytes in the other. What's actually going on here? 'test.sam' is a ~52MB SAM (again an ASCII format) file. It's a bit tricky for me to dig deep into pysam, as it's a wrapper around a C library related to samtools. Regardless of what a Samfile actually is, I think I should be able to learn how much memory is allocated to create it. What procedure should I use to correctly profile the memory usage of each step of my larger, more complex python program?
What's the difference between memory_profiler and guppy.hpy?
Do you understand the difference between your internal view of the heap and the OS's external view of your program? (For example, when the Python interpreter calls free on 1MB, that doesn't immediately—or maybe even ever—return 1MB worth of pages to the OS, for multiple reasons.) If you do, then the answer is pretty easy: memory_profiler is asking the OS for your memory use; guppy is figuring it out internally from the heap structures.
Beyond that, memory_profiler has one feature guppy doesn't—automatically instrumenting your function to print a report after each line of code; it's is otherwise much simpler and easier but less flexible. If there's something you know you want to do and memory_profiler doesn't seem to do it, it probably can't; with guppy, maybe it can, so study the docs and the source.
Why is one telling me I'm using huge amounts of memory, and the other is telling me I'm not?
It's hard to be sure, but here are some guesses; the answer is likely to be a combination of more than one:
Maybe samtools uses mmap to map small enough files entirely into memory. This would increase your page usage by the size of the file, but not increase your heap usage at all.
Maybe samtools or pysam creates a lot of temporary objects that are quickly freed. You could have lots of fragmentation (only a couple live PyObjects on each page), or your system's malloc may have decided it should keep lots of nodes in its freelist because of the way you've been allocating, or it may not have returned pages to the OS yet, or the OS's VM may not have reclaimed pages that were returned. The exact reason is almost always impossible to guess; the simplest thing to do is to assume that freed memory is never returned.
What procedure should I use to correctly profile the memory usage of each step of my larger, more complex python program?
If you're asking about memory usage from the OS point of view, memory_profiler is doing exactly what you want. While major digging into pysam may be difficult, it should be trivial to wrap a few of the functions with the #profile decorator. Then you'll know which C functions are responsible for memory; if you want to dig any deeper, you obviously have to profile at the C level (unless there's information in the samtools docs or from the samtools community).
I think I have the opposite problem as described here. I have one process writing data to a log, and I want a second process to read it, but I don't want the 2nd process to be able to modify the contents. This is potentially a large file, and I need random access, so I'm using python's mmap module.
If I create the mmap as read/write (for the 2nd process), I have no problem creating ctypes object as a "view" of the mmap object using from_buffer. From a cursory look at the c-code, it looks like this is a cast, not a copy, which is what I want. However, this breaks if I make the mmap ACCESS_READ, throwing an exception that from_buffer requires write privileges.
I think I want to use ctypes from_address() method instead, which doesn't appear to need write access. I'm probably missing something simple, but I'm not sure how to get the address of the location within an mmap. I know I can use ACCESS_COPY (so write operations show up in memory, but aren't persisted to disk), but I'd rather keep things read only.
Any suggestions?
I ran into a similar issue (unable to setup a readonly mmap) but I was using only the python mmap module. Python mmap 'Permission denied' on Linux
I'm not sure it is of any help to you since you don't want the mmap to be private?
Ok, from looking at the mmap .c code, I don't believe it supports this use case. Also, I found that the performance pretty much sucks - for my use case. I'd be curious what kind performance others see, but I found that it took about 40 sec to walk through a binary file of 500 MB in Python. This is creating a mmap, then turning the location into a ctype object with from_buffer(), and using the ctypes object to decipher the size of the object so I could step to the next object. I tried doing the same thing directly in c++ from msvc. Obviously here I could cast directly into an object of the correct type, and it was fast - less than a second (this is with a core 2 quad and ssd).
I did find that I could get a pointer with the following
firstHeader = CEL_HEADER.from_buffer(map, 0) #CEL_HEADER is a ctypes Structure
pHeader = pointer(firstHeader)
#Now I can use pHeader[ind] to get a CEL_HEADER object
#at an arbitrary point in the file
This doesn't get around the original problem - the mmap isn't read-only, since I still need to use from_buffer for the first call. In this config, it still took around 40 sec to process the whole file, so it looks like the conversion from a pointer into ctypes structs is killing the performance. That's just a guess, but I don't see a lot of value in tracking it down further.
I'm not sure my plan will help anyone else, but I'm going to try to create a c module specific to my needs based on the mmap code. I think I can use the fast c-code handling to index the binary file, then expose only small parts of the file at a time through calls into ctypes/python objects. Wish me luck.
Also, as a side note, Python 2.7.2 was released today (6/12/11), and one of the changes is an update to the mmap code so that you can use a python long to set the file offset. This lets you use mmap for files over 4GB on 32-bit systems. See Issue #4681 here
Ran into this same problem, we needed the from_buffer interface and wanted read only access. From the python docs https://docs.python.org/3/library/mmap.html "Assignment to an ACCESS_COPY memory map affects memory but does not update the underlying file."
If it's acceptable for you to use an anonymous file backing you can use ACCESS_COPY
An example: open two cmd.exe or terminals and in one terminal:
mm_file_write = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
mm_file_read = mmap.mmap(-1, 4096, access=mmap.ACCESS_COPY, tagname="shmem")
write = ctypes.c_int.from_buffer(mm_file_write)
read = ctypes.c_int.from_buffer(mm_file_read)
try:
while True:
value = int(input('enter an integer using mm_file_write: '))
write.value = value
print('updated value')
value = int(input('enter an integer using mm_file_read: '))
#read.value assignment doesnt update anonymous backed file
read.value = value
print('updated value')
except KeyboardInterrupt:
print('got exit event')
In the other terminal do:
mm_file = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
i = None
try:
while True:
new_i = struct.unpack('i', mm_file[:4])
if i != new_i:
print('i: {} => {}'.format(i, new_i))
i = new_i
time.sleep(0.1)
except KeyboardInterrupt:
print('Stopped . . .')
And you will see that the second process does not receive updates when the first process writes using ACCESS_COPY