Python mmap ctypes - read only

Python mmap ctypes - read only - python

I think I have the opposite problem as described here. I have one process writing data to a log, and I want a second process to read it, but I don't want the 2nd process to be able to modify the contents. This is potentially a large file, and I need random access, so I'm using python's mmap module.
If I create the mmap as read/write (for the 2nd process), I have no problem creating ctypes object as a "view" of the mmap object using from_buffer. From a cursory look at the c-code, it looks like this is a cast, not a copy, which is what I want. However, this breaks if I make the mmap ACCESS_READ, throwing an exception that from_buffer requires write privileges.
I think I want to use ctypes from_address() method instead, which doesn't appear to need write access. I'm probably missing something simple, but I'm not sure how to get the address of the location within an mmap. I know I can use ACCESS_COPY (so write operations show up in memory, but aren't persisted to disk), but I'd rather keep things read only.
Any suggestions?

I ran into a similar issue (unable to setup a readonly mmap) but I was using only the python mmap module. Python mmap 'Permission denied' on Linux
I'm not sure it is of any help to you since you don't want the mmap to be private?

Ok, from looking at the mmap .c code, I don't believe it supports this use case. Also, I found that the performance pretty much sucks - for my use case. I'd be curious what kind performance others see, but I found that it took about 40 sec to walk through a binary file of 500 MB in Python. This is creating a mmap, then turning the location into a ctype object with from_buffer(), and using the ctypes object to decipher the size of the object so I could step to the next object. I tried doing the same thing directly in c++ from msvc. Obviously here I could cast directly into an object of the correct type, and it was fast - less than a second (this is with a core 2 quad and ssd).
I did find that I could get a pointer with the following
firstHeader = CEL_HEADER.from_buffer(map, 0) #CEL_HEADER is a ctypes Structure
pHeader = pointer(firstHeader)
#Now I can use pHeader[ind] to get a CEL_HEADER object
#at an arbitrary point in the file
This doesn't get around the original problem - the mmap isn't read-only, since I still need to use from_buffer for the first call. In this config, it still took around 40 sec to process the whole file, so it looks like the conversion from a pointer into ctypes structs is killing the performance. That's just a guess, but I don't see a lot of value in tracking it down further.
I'm not sure my plan will help anyone else, but I'm going to try to create a c module specific to my needs based on the mmap code. I think I can use the fast c-code handling to index the binary file, then expose only small parts of the file at a time through calls into ctypes/python objects. Wish me luck.
Also, as a side note, Python 2.7.2 was released today (6/12/11), and one of the changes is an update to the mmap code so that you can use a python long to set the file offset. This lets you use mmap for files over 4GB on 32-bit systems. See Issue #4681 here

Ran into this same problem, we needed the from_buffer interface and wanted read only access. From the python docs https://docs.python.org/3/library/mmap.html "Assignment to an ACCESS_COPY memory map affects memory but does not update the underlying file."
If it's acceptable for you to use an anonymous file backing you can use ACCESS_COPY
An example: open two cmd.exe or terminals and in one terminal:
mm_file_write = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
mm_file_read = mmap.mmap(-1, 4096, access=mmap.ACCESS_COPY, tagname="shmem")
write = ctypes.c_int.from_buffer(mm_file_write)
read = ctypes.c_int.from_buffer(mm_file_read)
try:
while True:
value = int(input('enter an integer using mm_file_write: '))
write.value = value
print('updated value')
value = int(input('enter an integer using mm_file_read: '))
#read.value assignment doesnt update anonymous backed file
read.value = value
print('updated value')
except KeyboardInterrupt:
print('got exit event')
In the other terminal do:
mm_file = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
i = None
try:
while True:
new_i = struct.unpack('i', mm_file[:4])
if i != new_i:
print('i: {} => {}'.format(i, new_i))
i = new_i
time.sleep(0.1)
except KeyboardInterrupt:
print('Stopped . . .')
And you will see that the second process does not receive updates when the first process writes using ACCESS_COPY

Related

python multiprocessing - OverflowError('cannot serialize a bytes object larger than 4GiB')

We are running a script using the multiprocessing library (python 3.6), where a big pd.DataFrame is passed as an argument to a function :
from multiprocessing import Pool
import time
def my_function(big_df):
# do something time consuming
time.sleep(50)
if __name__ == '__main__':
with Pool(10) as p:
res = {}
output = {}
for id, big_df in some_dict_of_big_dfs:
res[id] = p.apply_async(my_function,(big_df ,))
output = {id : res[id].get() for id in id_list}
The problem is that we are getting an error from the pickle library.
Reason: 'OverflowError('cannot serialize a bytes objects larger than
4GiB',)'
We are aware than pickle v4 can serialize larger objects question related, link, but we don't know how to modify the protocol that multiprocessing is using.
does anybody know what to do?
Thanks !!

Apparently is there an open issue about this topic , and there is a few related initiatives described on this particular answer. I Found a way to change the default pickle protocol that is used in the multiprocessing library based on this answer. As was pointed out in the comments this solution Only works with Linux and OS multiprocessing lib
Solution:
You first create a new separated module
pickle4reducer.py
from multiprocessing.reduction import ForkingPickler, AbstractReducer
class ForkingPickler4(ForkingPickler):
def __init__(self, *args):
if len(args) > 1:
args[1] = 2
else:
args.append(2)
super().__init__(*args)
#classmethod
def dumps(cls, obj, protocol=4):
return ForkingPickler.dumps(obj, protocol)
def dump(obj, file, protocol=4):
ForkingPickler4(file, protocol).dump(obj)
class Pickle4Reducer(AbstractReducer):
ForkingPickler = ForkingPickler4
register = ForkingPickler4.register
dump = dump
And then, in your main script you need to add the following:
import pickle4reducer
import multiprocessing as mp
ctx = mp.get_context()
ctx.reducer = pickle4reducer.Pickle4Reducer()
with mp.Pool(4) as p:
# do something
That will probably solve the problem of the overflow.
But, warning, you might consider reading this before doing anything or you might reach the same error as me:
'i' format requires -2147483648 <= number <= 2147483647
(the reason of this error is well explained in the link above). Long story short, multiprocessing send data through all its process using the pickle protocol, if you are already reaching the 4gb limit, that probably means that you might consider redefining your functions more as "void" methods rather than input/output methods. All this inbound/outbound data increase the RAM usage, is probably inefficient by construction (my case) and it might be better to point all process to the same object rather than create a new copy for each call.
hope this helps.

Supplementing answer from Pablo
The following problem can be resolved be Python3.8, if you are okay to use this version of python:
'i' format requires -2147483648 <= number <= 2147483647

CTypes error in return value

I'm testing a very simple NASM dll (64-bit) called from ctypes. I pass a single int_64 and the function is expected to return a different int_64.
I get the same error every time:
OSError: exception: access violation writing 0x000000000000092A
where hex value translates to the value I am returning (in this case, 2346). If I change that value, the hex value changes to that value, so the problem is in the value I am returning in rax. I get the same error if I assign mov rax,2346.
I have tested this repeatedly, trying different things, and I've done a lot of research, but this seemingly simple problem is still not solved.
Here is the Python code:
def lcm_ctypes():
input_value = ctypes.c_int64(235)
hDLL = ctypes.WinDLL(r"C:/NASM_Test_Projects/While_Loop_01/While_loops-01.dll")
CallTest = hDLL.lcm
CallTest.argtypes = [ctypes.c_int64]
CallTest.restype = ctypes.c_int64
retvar = CallTest (input_value)
Here is the NASM code:
[BITS 64]
export lcm
section .data
return_val: dq 2346
section .text
finit
lcm:
push rdi
push rbp
mov rax,qword[return_val]
pop rbp
pop rdi
Thanks for any information to help solve this problem.

Your function correctly loads 2346 (0x92a) into RAX. Then execution continues into some following bytes because you didn't jmp or ret.
In this case, we can deduce that the following bytes are probably 00 00, which decodes as add byte [rax], al, hence the access violation writing 0x000000000000092A error message. (i.e. it's not a coincidence that the address it's complaining about is your constant).
As Michael Petch said, using a debugger would have found the problem.
You also don't need to save/restore rdi because you're not touching it. The Windows x86-64 calling convention has many call-clobbered registers, so for a least-common-multiple function you shouldn't need to save/restore anything, just use rax, rcx, rdx, r8, r9, and whatever else Windows lets you clobber (I forget, check the calling convention docs in the x86 tag wiki, especially Agner Fog's guide).
You should definitely use default rel at the top of your file, so the [return_val] load will use a RIP-relative addressing mode instead of absolute.
Also, finit never executes because it's before your function label. But you don't need it either. You had the same finit in your previous asm question: Passing arrays to NASM DLL, pointer value gets reset to zero, where it was also not needed and not executed. The calling convention requires that on function entry (and return), the x87 FPU is already in the state that finit puts it in, more or less. So you don't need it before executing x87 instructions like fmulp and fidivr. But you weren't doing that anyway, you were using SSE FP instructions (which is recommended, especially in 64-bit mode), which don't touch the x87 state at all.
Go read a good tutorial and some docs (some links in the x86 tag wiki) so you understand what's going on well enough to debug a problem like this on your own, or you will have a bad time writing anything more complicated. Guessing what might work doesn't work very well for asm.
From a deleted non-answer: https://www.cs.uaf.edu/2017/fall/cs301/reference/nasm_vs/ shows how to set up Visual Studio to build an executable out of C++ and NASM sources, so you can debug it.

Python, why does mmap.move() fill up the memory?

edit: Using Win10 and python 3.5
I have a function that uses mmap to remove bytes from a file at a certain offset:
def delete_bytes(fobj, offset, size):
fobj.seek(0, 2)
filesize = fobj.tell()
move_size = filesize - offset - size
fobj.flush()
file_map = mmap.mmap(fobj.fileno(), filesize)
file_map.move(offset, offset + size, move_size)
file_map.close()
fobj.truncate(filesize - size)
fobj.flush()
It works super fast, but when I run it on a large number of files, the memory quickly fills up and my system becomes unresponsive.
After some experimenting, I found that the move() method was the culprit here, and in particular the amount of data being moved (move_size).
The amount of memory being used is equivalent to the total amount of data being moved by mmap.move().
If I have 100 files with each ~30 MB moved, the memory gets filled with ~3GB.
Why isn't the moved data released from memory?
Things I tried that had no effect:
calling gc.collect() at the end of the function.
rewriting the function to move in small chunks.

This seems like it should work. I did find one suspicious bit in the mmapmodule.c source code, #ifdef MS_WINDOWS. Specifically, after all the setup to parse arguments, the code then does this:
if (fileno != -1 && fileno != 0) {
/* Ensure that fileno is within the CRT's valid range */
if (_PyVerify_fd(fileno) == 0) {
PyErr_SetFromErrno(PyExc_OSError);
return NULL;
}
fh = (HANDLE)_get_osfhandle(fileno);
if (fh==(HANDLE)-1) {
PyErr_SetFromErrno(PyExc_OSError);
return NULL;
}
/* Win9x appears to need us seeked to zero */
lseek(fileno, 0, SEEK_SET);
}
which moves your underlying file object's offset from "end of file" to "start of file" and then leaves it there. That seems like it should not break anything, but it might be worth doing your own seek-to-start-of-file just before calling mmap.mmap to map the file.
(Everything below is wrong, but left in since there are comments on it.)
In general, after using mmap(), you must use munmap() to undo the mapping. Simply closing the file descriptor has no effect. The Linux documentation calls this out explicitly:
munmap()
The munmap() system call deletes the mappings for the specified address range, and causes further references to addresses within the range to generate invalid memory references. The region is also automatically unmapped when the process is terminated. On the other hand, closing the file descriptor does not unmap the region.
(The BSD documentation is similar. Windows may behave differently from Unix-like systems here, but what you are seeing suggests that they work the same way.)
Unfortunately, Python's mmap module does not bind the munmap system call (nor mprotect), at least as of both 2.7.11 and 3.4.4. As a workaround you can use the ctypes module. See this question for an example (it calls reboot but the same technique works for all C library functions). Or, for a somewhat nicer method, you can write wrappers in cython.

Why is multiprocessing copying my data if I don't touch it?

I was tracking down an out of memory bug, and was horrified to find that python's multiprocessing appears to copy large arrays, even if I have no intention of using them.
Why is python (on Linux) doing this, I thought copy-on-write would protect me from any extra copying? I imagine that whenever I reference the object some kind of trap is invoked and only then is the copy made.
Is the correct way to solve this problem for an arbitrary data type, like a 30 gigabyte custom dictionary to use a Monitor? Is there some way to build Python so that it doesn't have this nonsense?
import numpy as np
import psutil
from multiprocessing import Process
mem=psutil.virtual_memory()
large_amount=int(0.75*mem.available)
def florp():
print("florp")
def bigdata():
return np.ones(large_amount,dtype=np.int8)
if __name__=='__main__':
foo=bigdata()#Allocated 0.75 of the ram, no problems
p=Process(target=florp)
p.start()#Out of memory because bigdata is copied?
print("Wow")
p.join()
Running:
[ebuild R ] dev-lang/python-3.4.1:3.4::gentoo USE="gdbm ipv6 ncurses readline ssl threads xml -build -examples -hardened -sqlite -tk -wininst" 0 KiB

I'd expect this behavior -- when you pass code to Python to compile, anything that's not guarded behind a function or object is immediately execed for evaluation.
In your case, bigdata=np.ones(large_amount,dtype=np.int8) has to be evaluated -- unless your actual code has different behavior, florp() not being called has nothing to do with it.
To see an immediate example:
>>> f = 0/0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero
>>> def f():
... return 0/0
...
>>>
To apply this to your code, put bigdata=np.ones(large_amount,dtype=np.int8) behind a function and call it as your need it, otherwise, Python is trying to be hepful by having that variable available to you at runtime.
If bigdata doesn't change, you could write a function that gets or sets it on an object that you keep around for the duration of the process.
edit: Coffee just started working. When you make a new process, Python will need to copy all objects into that new process for access. You can avoid this by using threads or by a mechanism that will allow you to share memory between processes such as shared memory maps or shared ctypes

The problem was that by default Linux checks for the worst case memory usage, which can indeed exceed memory capacity. This is true even if the python language doesn't exposure the variables. You need to turn off "overcommit" system wide, to achieve the expected COW behavior.
sysctl `vm.overcommit_memory=2'
See https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

Strange altered behaviour when linking from .so file with ctypes in python

I am writing a program to handle data from a high speed camera for my Ph.D. project. This camera comes with a SDK in the form a .so file on Linux, for communicating with the camera and getting images out. As said it is a high speed camera delivering lots of data, (several GB a minute). To handle this amount of data the SDK has a very handy spool function that spools data directly to the hard drive via DMA, in the form of a FITS file, a raw binary format with a header that is used in astronomy.
This function works fine when I write a small C program, link the .so file in and call the spool function this way. But when I wrap the .so file with ctypes and call the functions from python, all the functions are working except the spool function. When I call the spool function it returns no errors, but the spooled data file are garbled up, the file has the right format but half of all the frames are 0's.
In my world it does not make sense that a function in a .so file should behave different depending on which program its called from, my own little C program or python which after all is only a bigger C program. Does any body have any clue as to what is different when calling the .so from different programs?
I would be very thankful for any suggestions
Even though the camera is commercial, some the driver is GPLed and available, though a bit complicated. (unfortunately not the spool function it seems) I have an object in python for Handel the camera.
The begining of the class reads:
class Andor:
def __init__(self,handle=100):
#cdll.LoadLibrary("/usr/local/lib/libandor.so")
self.dll = CDLL("/usr/local/lib/libandor.so")
error = self.dll.SetCurrentCamera(c_long(handle))
error = self.dll.Initialize("/usr/local/etc/andor/")
cw = c_int()
ch = c_int()
self.dll.GetDetector(byref(cw), byref(ch))
The relevant function reads:
def SetSpool(self, active, method, path, framebuffersize):
error = self.dll.SetSpool(active, method, c_char_p(path), framebuffersize)
self.verbose(ERROR_CODE[error], sys._getframe().f_code.co_name)
return ERROR_CODE[error]
And in the corresponding header it reads:
unsigned int SetSingleTrackHBin(int bin);
unsigned int SetSpool(int active, int method, char * path, int framebuffersize);
unsigned int SetStorageMode(at_32 mode);
unsigned int SetTemperature(int temperature);
The code to get the camera running would read something like:
cam = andor.Andor()
cam.SetReadMode(4)
cam.SetFrameTransferMode(1)
cam.SetShutter(0,1,0,0)
cam.SetSpool(1,5,'/tmp/test.fits',10);
cam.GetStatus()
if cam.status == 'DRV_IDLE':
acquireEvent.clear()
cam.SetAcquisitionMode(3)
cam.SetExposureTime(0.0)
cam.SetNumberKinetics(exposureNumber)
cam.StartAcquisition()

My guess is that it isn't the call to the spooling function itself, but a call series which results in corrupted values being fed to/from the library.
Are you on a 64-bit platform? Not specifying restype for anything which returns a 64-bit integer (long with gcc) or pointer will result in those values being silently truncated to 32 bits. Additionally, ctypes.c_voidp handling is a little surprising — restype values of ctypes.c_voidp aren't truncated, but are returned in the Python interpreter as type int, with predictably hilarious results if high pointers are fed back as parameters to other functions without a cast.
I haven't tested it, but both these conditions might also affect 32-bit platforms for values larger than sys.maxint.
The only way to be 100% certain you're passing and receiving the values you expect is to specify the argtypes and restype for all the functions you call. And that includes creating Structure classes and associated POINTERs for all structs those functions operate on, even opaque structs.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.