Running Python 3.4 on Windows 7, the close function of Gio.MemoryInputStream does not free the memory, as it should. The test code is :
from gi.repository import Gio
import os, psutil
process = psutil.Process(os.getpid())
for i in range (1,10) :
input_stream = Gio.MemoryInputStream.new_from_data(b"x" * 10**7)
x = input_stream.close_async(2)
y = int(process.memory_info().rss / 10**6) # Get the size of memory used by the program
print (x, y)
This returns :
True 25
True 35
True 45
True 55
True 65
True 75
True 85
True 95
True 105
This shows that on each loop, the memory used by the program increases of 10 MB, even if the close function returned True.
How is it possible to free the memory, once the Stream is closed ?
Another good solution would be to reuse the stream. But set_data or replace_data raises the following error :
'Data access methods are unsupported. Use normal Python attributes instead'
Fine, but which property ?
I need a stream in memory in Python 3.4. I create a Pdf File with PyPDF2, and then I want to preview it with Poppler. Due to a bug in Poppler (see Has anyone been able to use poppler new_from_data in python?) I cannot use the new_from_data function and would like to use the new_from_stream function.
This is a bug in GLib’s Python bindings which can’t be trivially fixed.
Instead, you should use g_memory_input_stream_new_from_bytes(), which handles freeing memory differently, and shouldn’t suffer from the same bug.
In more detail, the bug with new_from_data() is caused by the introspection annotations, which GLib uses to allow language bindings to automatically expose all of its API, not supporting the GDestroyNotify parameter for new_from_data() which needs to be set to a non-NULL function to free the allocated memory which is passed in to the other arguments. Running your script under gdb shows that pygobject passes NULL to the GDestroyNotify parameter. It can’t do any better, since there is currently no way of expressing that the memory management semantics of the data parameter depend on what’s passed to destroy.
Thanks for your answer, #Philip Withnall. I tested the solution you propose, and it works. To help others to understand, here is my test code :
from gi.repository import Gio, GLib
import os, psutil
process = psutil.Process(os.getpid())
for i in range (1,10) :
input_stream = Gio.MemoryInputStream.new_from_bytes(GLib.Bytes(b"x" * 10**7))
x = input_stream.close()
y = int(process.memory_info().rss / 10**6) # Get the size of memory used by the program
print (x, y)
Now y non longer grows.
Related
I've asked this question before about killing a process that uses too much memory, and I've got most of a solution worked out.
However, there is one problem: calculating massive numbers seems to be untouched by the method I'm trying to use. This code below is intended to put a 10 second CPU time limit on the process.
import resource
import os
import signal
def timeRanOut(n, stack):
raise SystemExit('ran out of time!')
signal.signal(signal.SIGXCPU, timeRanOut)
soft,hard = resource.getrlimit(resource.RLIMIT_CPU)
print(soft,hard)
resource.setrlimit(resource.RLIMIT_CPU, (10, 100))
y = 10**(10**10)
What I expect to see when I run this script (on a Unix machine) is this:
-1 -1
ran out of time!
Instead, I get no output. The only way I get output is with Ctrl + C, and I get this if I Ctrl + C after 10 seconds:
^C-1 -1
ran out of time!
CPU time limit exceeded
If I Ctrl + C before 10 seconds, then I have to do it twice, and the console output looks like this:
^C-1 -1
^CTraceback (most recent call last):
File "procLimitTest.py", line 18, in <module>
y = 10**(10**10)
KeyboardInterrupt
In the course of experimenting and trying to figure this out, I've also put time.sleep(2) between the print and large number calculation. It doesn't seem to have any effect. If I change y = 10**(10**10) to y = 10**10, then the print and sleep statements work as expected. Adding flush=True to the print statement or sys.stdout.flush() after the print statement don't work either.
Why can I not limit CPU time for the calculation of a very large number? How can I fix or at least mitigate this?
Additional information:
Python version: 3.3.5 (default, Jul 22 2014, 18:16:02) \n[GCC 4.4.7 20120313 (Red Hat 4.4.7-4)]
Linux information: Linux web455.webfaction.com 2.6.32-431.29.2.el6.x86_64 #1 SMP Tue Sep 9 21:36:05 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
TLDR: Python precomputes constants in the code. If any very large number is calculated with at least one intermediate step, the process will be CPU time limited.
It took quite a bit of searching, but I have discovered evidence that Python 3 does precompute constant literals that it finds in the code before evaluating anything. One of them is this webpage: A Peephole Optimizer for Python. I've quoted some of it below.
ConstantExpressionEvaluator
This class precomputes a number of constant expressions and stores them in the function's constants list, including obvious binary and unary operations and tuples consisting of just constants. Of particular note is the fact that complex literals are not represented by the compiler as constants but as expressions, so 2+3j appears as
LOAD_CONST n (2)
LOAD_CONST m (3j)
BINARY_ADD
This class converts those to
LOAD_CONST q (2+3j)
which can result in a fairly large performance boost for code that uses complex constants.
The fact that 2+3j is used as an example very strongly suggests that not only small constants are being precomputed and cached, but also any constant literals in the code. I also found this comment on another Stack Overflow question (Are constant computations cached in Python?):
Note that for Python 3, the peephole optimizer does precompute the 1/3 constant. (CPython specific, of course.) – Mark Dickinson Oct 7 at 19:40
These are supported by the fact that replacing
y = 10**(10**10)
with this also hangs, even though I never call the function!
def f():
y = 10**(10**10)
The good news
Luckily for me, I don't have any such giant literal constants in my code. Any computation of such constants will happen later, which can be and is limited by the CPU time limit. I changed
y = 10**(10**10)
to this,
x = 10
print(x)
y = 10**x
print(y)
z = 10**y
print(z)
and got this output, as desired!
-1 -1
10
10000000000
ran out of time!
The moral of the story: Limiting a process by CPU time or memory consumption (or some other method) will work if there is not a large literal constant in the code that Python tries to precompute.
Use a function.
It does seem that Python tries to precompute integer literals (I only have empirical evidence; if anyone has a source please let me know). This would normally be a helpful optimization, since the vast majority of literals in scripts are probably small enough to not incur noticeable delays when precomputing. To get around this, you need to make your literal be the result of a non-constant computation, like a function call with parameters.
Example:
import resource
import os
import signal
def timeRanOut(n, stack):
raise SystemExit('ran out of time!')
signal.signal(signal.SIGXCPU, timeRanOut)
soft,hard = resource.getrlimit(resource.RLIMIT_CPU)
print(soft,hard)
resource.setrlimit(resource.RLIMIT_CPU, (10, 100))
f = lambda x=10:x**(x**x)
y = f()
This gives the expected result:
xubuntu#xubuntu-VirtualBox:~/Desktop$ time python3 hang.py
-1 -1
ran out of time!
real 0m10.027s
user 0m10.005s
sys 0m0.016s
For example i have a code that produces many integers.
import sys
import random
a = [random.randint(0, sys.maxint) for i in xrange(10000000)]
After running it i got VIRT 350M, RES 320M (view by htop).
Then i do:
del a
But memory still is VIRT 272M, RES 242M (before producing integers was VIRT 24M, RES 6M).
The pmap of a process say that there are to big pieces of [anon] memory.
Python 3.4 does not have such behavior: memory are frees when i delete list here!
What happens? Does python leave integers in memory?
Here's how I can duplicate it. If I start python 2.7, the interpreter uses about 4.5 MB of memory. (I'm quoting "Real Mem" values from the Mac OS X Activity Monitor.app).
>>> a = [random.randint(0, sys.maxint) for i in xrange(10000000)]
Now, memory usage is ~ 305.7 MB.
>>> del a
Removing a seems to have no effect on memory.
>>> import gc
>>> gc.collect() # perform a full collection
Now, memory usage is 27.7 MB. Sometimes, the first call to collect() doesn't seem to do anything, but a second collect() call will clean things up.
But, this behavior is by design, Python isn't leaking. This old FAQ on effbot.org explains a bit more about what's happening:
“For speed”, Python maintains an internal free list for integer objects. Unfortunately, that free list is both immortal and unbounded in size. floats also use an immortal & unbounded free list.
Essentially, python is treating the integers as singletons, under the assumption that you might use them more than once.
Consider this:
# 4.5 MB
>>> a = [object() for i in xrange(10000000)]
# 166.7 MB
>>> del a
# 9.1 MB
In this case, python it's pretty obvious that python is not keeping the objects around in memory, and removing a triggers a garbage collection which cleans everything up.
As I recall, python will actually keep low-valued integers in memory forever (0 - 1000 or so). This may explain why the gc.collect() call doesn't return as much memory as removing the list of objects.
I looked around through the PEPs a bit to figure out why Python3 is different. However, I didn't see anything obvious. If you really wanted to know, you could dig around in the source code.
Suffice to say in Python 3, it either the number-singleton behavior has changed, or the garbage collector got better.
Many things are better in Python 3.
I was tracking down an out of memory bug, and was horrified to find that python's multiprocessing appears to copy large arrays, even if I have no intention of using them.
Why is python (on Linux) doing this, I thought copy-on-write would protect me from any extra copying? I imagine that whenever I reference the object some kind of trap is invoked and only then is the copy made.
Is the correct way to solve this problem for an arbitrary data type, like a 30 gigabyte custom dictionary to use a Monitor? Is there some way to build Python so that it doesn't have this nonsense?
import numpy as np
import psutil
from multiprocessing import Process
mem=psutil.virtual_memory()
large_amount=int(0.75*mem.available)
def florp():
print("florp")
def bigdata():
return np.ones(large_amount,dtype=np.int8)
if __name__=='__main__':
foo=bigdata()#Allocated 0.75 of the ram, no problems
p=Process(target=florp)
p.start()#Out of memory because bigdata is copied?
print("Wow")
p.join()
Running:
[ebuild R ] dev-lang/python-3.4.1:3.4::gentoo USE="gdbm ipv6 ncurses readline ssl threads xml -build -examples -hardened -sqlite -tk -wininst" 0 KiB
I'd expect this behavior -- when you pass code to Python to compile, anything that's not guarded behind a function or object is immediately execed for evaluation.
In your case, bigdata=np.ones(large_amount,dtype=np.int8) has to be evaluated -- unless your actual code has different behavior, florp() not being called has nothing to do with it.
To see an immediate example:
>>> f = 0/0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
>>> def f():
... return 0/0
...
>>>
To apply this to your code, put bigdata=np.ones(large_amount,dtype=np.int8) behind a function and call it as your need it, otherwise, Python is trying to be hepful by having that variable available to you at runtime.
If bigdata doesn't change, you could write a function that gets or sets it on an object that you keep around for the duration of the process.
edit: Coffee just started working. When you make a new process, Python will need to copy all objects into that new process for access. You can avoid this by using threads or by a mechanism that will allow you to share memory between processes such as shared memory maps or shared ctypes
The problem was that by default Linux checks for the worst case memory usage, which can indeed exceed memory capacity. This is true even if the python language doesn't exposure the variables. You need to turn off "overcommit" system wide, to achieve the expected COW behavior.
sysctl `vm.overcommit_memory=2'
See https://www.kernel.org/doc/Documentation/vm/overcommit-accounting
I'm thoroughly confused about the memory usage of a specific python script. I guess I don't really know how to profile the usage despite advice from several SO Questions/Answers.
My questions are: What's the difference between memory_profiler and guppy.hpy? Why is one telling me I'm using huge amounts of memory, and the other is telling me I'm not?
I'm working with pysam, a library for accessing bioinformatics SAM/BAM files. My main script is running out of memory quickly when converting SAM (ASCII) to BAM (Binary) and manipulating the files in between.
I created a small test example to understand how much memory gets allocated at each step.
# test_pysam.py:
import pysam
#from guppy import hpy
TESTFILENAME = ('/projectnb/scv/yannpaul/MAR_CEJ082/' +
'test.sam')
#H = hpy()
#profile # for memory_profiler
def samopen(filename):
# H.setrelheap()
samf = pysam.Samfile(filename)
# print H.heap()
pass
if __name__ == "__main__":
samopen(TESTFILENAME)
Monitoring the memory usage with memory_profiler (python -m memory_profiler test_pysam.py) results in the following output:
Filename: test_pysam.py
Line # Mem usage Increment Line Contents
================================================
10 #profile # for memory_profiler
11 def samopen(filename):
12 10.48 MB 0.00 MB # print H.setrelheap()
13 539.51 MB 529.03 MB samf = pysam.Samfile(filename)
14 # print H.heap()
15 539.51 MB 0.00 MB pass
Then commenting out #profile decorator and uncommenting the guppy related lines, I get the following output (python test_pysam.py):
Partition of a set of 3 objects. Total size = 624 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 1 33 448 72 448 72 types.FrameType
1 1 33 88 14 536 86 __builtin__.weakref
2 1 33 88 14 624 100 csamtools.Samfile
The total size of line 13 is 529.03 MB in one case and 624 bytes in the other. What's actually going on here? 'test.sam' is a ~52MB SAM (again an ASCII format) file. It's a bit tricky for me to dig deep into pysam, as it's a wrapper around a C library related to samtools. Regardless of what a Samfile actually is, I think I should be able to learn how much memory is allocated to create it. What procedure should I use to correctly profile the memory usage of each step of my larger, more complex python program?
What's the difference between memory_profiler and guppy.hpy?
Do you understand the difference between your internal view of the heap and the OS's external view of your program? (For example, when the Python interpreter calls free on 1MB, that doesn't immediately—or maybe even ever—return 1MB worth of pages to the OS, for multiple reasons.) If you do, then the answer is pretty easy: memory_profiler is asking the OS for your memory use; guppy is figuring it out internally from the heap structures.
Beyond that, memory_profiler has one feature guppy doesn't—automatically instrumenting your function to print a report after each line of code; it's is otherwise much simpler and easier but less flexible. If there's something you know you want to do and memory_profiler doesn't seem to do it, it probably can't; with guppy, maybe it can, so study the docs and the source.
Why is one telling me I'm using huge amounts of memory, and the other is telling me I'm not?
It's hard to be sure, but here are some guesses; the answer is likely to be a combination of more than one:
Maybe samtools uses mmap to map small enough files entirely into memory. This would increase your page usage by the size of the file, but not increase your heap usage at all.
Maybe samtools or pysam creates a lot of temporary objects that are quickly freed. You could have lots of fragmentation (only a couple live PyObjects on each page), or your system's malloc may have decided it should keep lots of nodes in its freelist because of the way you've been allocating, or it may not have returned pages to the OS yet, or the OS's VM may not have reclaimed pages that were returned. The exact reason is almost always impossible to guess; the simplest thing to do is to assume that freed memory is never returned.
What procedure should I use to correctly profile the memory usage of each step of my larger, more complex python program?
If you're asking about memory usage from the OS point of view, memory_profiler is doing exactly what you want. While major digging into pysam may be difficult, it should be trivial to wrap a few of the functions with the #profile decorator. Then you'll know which C functions are responsible for memory; if you want to dig any deeper, you obviously have to profile at the C level (unless there's information in the samtools docs or from the samtools community).
I think I have the opposite problem as described here. I have one process writing data to a log, and I want a second process to read it, but I don't want the 2nd process to be able to modify the contents. This is potentially a large file, and I need random access, so I'm using python's mmap module.
If I create the mmap as read/write (for the 2nd process), I have no problem creating ctypes object as a "view" of the mmap object using from_buffer. From a cursory look at the c-code, it looks like this is a cast, not a copy, which is what I want. However, this breaks if I make the mmap ACCESS_READ, throwing an exception that from_buffer requires write privileges.
I think I want to use ctypes from_address() method instead, which doesn't appear to need write access. I'm probably missing something simple, but I'm not sure how to get the address of the location within an mmap. I know I can use ACCESS_COPY (so write operations show up in memory, but aren't persisted to disk), but I'd rather keep things read only.
Any suggestions?
I ran into a similar issue (unable to setup a readonly mmap) but I was using only the python mmap module. Python mmap 'Permission denied' on Linux
I'm not sure it is of any help to you since you don't want the mmap to be private?
Ok, from looking at the mmap .c code, I don't believe it supports this use case. Also, I found that the performance pretty much sucks - for my use case. I'd be curious what kind performance others see, but I found that it took about 40 sec to walk through a binary file of 500 MB in Python. This is creating a mmap, then turning the location into a ctype object with from_buffer(), and using the ctypes object to decipher the size of the object so I could step to the next object. I tried doing the same thing directly in c++ from msvc. Obviously here I could cast directly into an object of the correct type, and it was fast - less than a second (this is with a core 2 quad and ssd).
I did find that I could get a pointer with the following
firstHeader = CEL_HEADER.from_buffer(map, 0) #CEL_HEADER is a ctypes Structure
pHeader = pointer(firstHeader)
#Now I can use pHeader[ind] to get a CEL_HEADER object
#at an arbitrary point in the file
This doesn't get around the original problem - the mmap isn't read-only, since I still need to use from_buffer for the first call. In this config, it still took around 40 sec to process the whole file, so it looks like the conversion from a pointer into ctypes structs is killing the performance. That's just a guess, but I don't see a lot of value in tracking it down further.
I'm not sure my plan will help anyone else, but I'm going to try to create a c module specific to my needs based on the mmap code. I think I can use the fast c-code handling to index the binary file, then expose only small parts of the file at a time through calls into ctypes/python objects. Wish me luck.
Also, as a side note, Python 2.7.2 was released today (6/12/11), and one of the changes is an update to the mmap code so that you can use a python long to set the file offset. This lets you use mmap for files over 4GB on 32-bit systems. See Issue #4681 here
Ran into this same problem, we needed the from_buffer interface and wanted read only access. From the python docs https://docs.python.org/3/library/mmap.html "Assignment to an ACCESS_COPY memory map affects memory but does not update the underlying file."
If it's acceptable for you to use an anonymous file backing you can use ACCESS_COPY
An example: open two cmd.exe or terminals and in one terminal:
mm_file_write = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
mm_file_read = mmap.mmap(-1, 4096, access=mmap.ACCESS_COPY, tagname="shmem")
write = ctypes.c_int.from_buffer(mm_file_write)
read = ctypes.c_int.from_buffer(mm_file_read)
try:
while True:
value = int(input('enter an integer using mm_file_write: '))
write.value = value
print('updated value')
value = int(input('enter an integer using mm_file_read: '))
#read.value assignment doesnt update anonymous backed file
read.value = value
print('updated value')
except KeyboardInterrupt:
print('got exit event')
In the other terminal do:
mm_file = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
i = None
try:
while True:
new_i = struct.unpack('i', mm_file[:4])
if i != new_i:
print('i: {} => {}'.format(i, new_i))
i = new_i
time.sleep(0.1)
except KeyboardInterrupt:
print('Stopped . . .')
And you will see that the second process does not receive updates when the first process writes using ACCESS_COPY