I have a kernel module in C which reads continuously data from a photiodiode and writes the values together with current time into a memory mapped file. From the C program in ther user space I can access the data from the kernel. I tried to do the same in Python via the mmap functionality. However, when I try to mmap the file I get errors like "mmap length is greater than file size" or "mmap file is empty". It seems that Python cannot access the mmaped file from C, is that correct? In the end, I need a numpy array of the photodiode data for further processing.
Details about kernel data structure:
The mmap contains a struct with index to latest voltage values and a struct array with voltage and time. The kernel has one big struct array and writes the photodiode data in chunks of page size into the struct array. The C user space program reads then each chunk for futher processing.
Python code to read mmaped C file:
num_pages = 103
page_size = 10000
max_buffer_size = num_pages * page_size
class buf_element(ctypes.Structure):
_fields_ = [("voltage", ctypes.c_int),
("time", ctypes.c_uint)]
class data(ctypes.Structure):
_fields_ = [("latest_page_offset", ctypes.c_int),
("buffer", ctypes.POINTER(buf_element))]
length_data = ctypes.sizeof(ctypes.c_int) + max_buffer_size * ctypes.sizeof(buf_element);
fd = os.open(data_file, os.O_RDWR)
buf = mmap.mmap(fd, length_data, mmap.MAP_SHARED, mmap.PROT_READ)
test_data = data.from_buffer(buf)
print test_data.latest_page_offset
os.close(fd)
My idea was to use the already existing and working C code from python via C extensions. So, python calls C and hand over a numpy array and C writes the data into that. Is that the fastest way? Other recommendations?
To get it working, I use the C code via Cython from Python.
Related
First of all, I just started python yet I really tried hard to find what fits for me. The thing I am going to do is a simple file system for linux but to tell the truth I don't even sure if it is achievable with python. So I need a bit help of here.
I tried to create a class structure and named tuples (one at a time which one fits) and I decided classes would be better for me. The thing is I couldn't read byte by byte because of the size of my class was 888 while in C it was 44 (I used sys.getsizeof() there) It will be more understand what I want to achieve with some code below
For this structure
struct sb{
int inode_bitmap;
int data_bitmap[10];
};
I used
#SUPER BLOCK
class sb(object):
__slots__ = ['inode_bitmap', 'data_bitmap'] #REDUCE RAM USAGE
def __init__(bruh, inode_bitmap, data_bitmap):
bruh.inode_bitmap = inode_bitmap
bruh.data_bitmap = [None] * 10 #DEFINITION OF ARRAY
Everything was fine till I read it
FILE * fin = fopen("simplefs.bin", "r");
struct inode slash;
fseek(fin, sizeof(struct sb), SEEK_SET);
fread(&slash,sizeof(slash),1,fin);
fin = open("simplefs.bin", "rb")
slash = inode
print("pos:", fin.tell())
contents = fin.read(sys.getsizeof(sb))
print(contents)
Since actual file size was smth like 4800 however when I was reading the size was approximately 318
I am pretty aware that python is not C but I am just doing some experiments if it is achievable
You cannot design a struct and then try to read/write it to the file and expect it to be binary identical. If you want to parse any binary data, you have module struct that allows you to interpret the data you have read as int, float and a dozen of other formats. Still you have to write the formats manually. In your particular case:
import struct
with ('datafile.dat') as fin :
raw_data = fin.read()
data = struct.unpack_from( '11I', raw_data ) # 11 integers
inode_bitmap = data[0]
data_bitmap = data[1:]
Or something along the lines...
let say I have some big matrix saved on disk. storing it all in memory is not really feasible so I use memmap to access it
A = np.memmap(filename, dtype='float32', mode='r', shape=(3000000,162))
now let say I want to iterate over this matrix (not essentially in an ordered fashion) such that each row will be accessed exactly once.
p = some_permutation_of_0_to_2999999()
I would like to do something like that:
start = 0
end = 3000000
num_rows_to_load_at_once = some_size_that_will_fit_in_memory()
while start < end:
indices_to_access = p[start:start+num_rows_to_load_at_once]
do_stuff_with(A[indices_to_access, :])
start = min(end, start+num_rows_to_load_at_once)
as this process goes on my computer is becoming slower and slower and my RAM and virtual memory usage is exploding.
Is there some way to force np.memmap to use up to a certain amount of memory? (I know I won't need more than the amount of rows I'm planning to read at a time and that caching won't really help me since I'm accessing each row exactly once)
Maybe instead is there some other way to iterate (generator like) over a np array in a custom order? I could write it manually using file.seek but it happens to be much slower than np.memmap implementation
do_stuff_with() does not keep any reference to the array it receives so no "memory leaks" in that aspect
thanks
This has been an issue that I've been trying to deal with for a while. I work with large image datasets and numpy.memmap offers a convenient solution for working with these large sets.
However, as you've pointed out, if I need to access each frame (or row in your case) to perform some operation, RAM usage will max out eventually.
Fortunately, I recently found a solution that will allow you to iterate through the entire memmap array while capping the RAM usage.
Solution:
import numpy as np
# create a memmap array
input = np.memmap('input', dtype='uint16', shape=(10000,800,800), mode='w+')
# create a memmap array to store the output
output = np.memmap('output', dtype='uint16', shape=(10000,800,800), mode='w+')
def iterate_efficiently(input, output, chunk_size):
# create an empty array to hold each chunk
# the size of this array will determine the amount of RAM usage
holder = np.zeros([chunk_size,800,800], dtype='uint16')
# iterate through the input, replace with ones, and write to output
for i in range(input.shape[0]):
if i % chunk_size == 0:
holder[:] = input[i:i+chunk_size] # read in chunk from input
holder += 5 # perform some operation
output[i:i+chunk_size] = holder # write chunk to output
def iterate_inefficiently(input, output):
output[:] = input[:] + 5
Timing Results:
In [11]: %timeit iterate_efficiently(input,output,1000)
1 loop, best of 3: 1min 48s per loop
In [12]: %timeit iterate_inefficiently(input,output)
1 loop, best of 3: 2min 22s per loop
The size of the array on disk is ~12GB. Using the iterate_efficiently function keeps the memory usage to 1.28GB whereas the iterate_inefficiently function eventually reaches 12GB in RAM.
This was tested on Mac OS.
I've been experimenting with this problem for a couple days now and it appears there are two ways to control memory consumption using np.mmap. The first is reliable while the second would require some testing and will be OS dependent.
Option 1 - reconstruct the memory map with each read / write:
def MoveMMapNPArray(data, output_filename):
CHUNK_SIZE = 4096
for idx in range(0,x.shape[1],CHUNK_SIZE):
x = np.memmap(data.filename, dtype=data.dtype, mode='r', shape=data.shape, order='F')
y = np.memmap(output_filename, dtype=data.dtype, mode='r+', shape=data.shape, order='F')
end = min(idx+CHUNK_SIZE, data.shape[1])
y[:,idx:end] = x[:,idx:end]
Where data is of type np.memmap. This discarding of the memmap object with each read keeps the array from being collected into memory and will keep memory consumption very low if the chunk size is low. It likely introduces some CPU overhead but was found to be small on my setup (MacOS).
Option 2 - construct the mmap buffer yourself and provide memory advice
If you look at the np.memmap source code here, you can see that it is relatively simple to create your own memmapped numpy array relatively easily. Specifically, with the snippet:
mm = mmap.mmap(fid.fileno(), bytes, access=acc, offset=start)
mmap_np_array = ndarray.__new__(subtype, shape, dtype=descr, buffer=mm, offset=array_offset, order=order)
Note this python mmap instance is stored as the np.memmap's private _mmap attribute.
With access to the python mmap object, and python 3.8, you can use its madvise method, described here.
This allows you to advise the OS to free memory where available. The various madvise constants are described here for linux, with some generic cross platform options specified.
The MADV_DONTDUMP constant looks promising but I haven't tested memory consumption with it like I have for option 1.
I am using the multiprocessing functions of Python to run my code parallel on a machine with roughly 500GB of RAM. To share some arrays between the different workers I am creating a Array object:
N = 150
ndata = 10000
sigma = 3
ddim = 3
shared_data_base = multiprocessing.Array(ctypes.c_double, ndata*N*N*ddim*sigma*sigma)
shared_data = np.ctypeslib.as_array(shared_data_base.get_obj())
shared_data = shared_data.reshape(-1, N, N, ddim*sigma*sigma)
This is working perfectly for sigma=1, but for sigma=3 one of the harddrives of the device is slowly filled, until there is no free space anymore and then the process fails with this exception:
OSError: [Errno 28] No space left on device
Now I've got 2 questions:
Why does this code even write anything to the disc? Why isn't it all stored in the memory?
How can I solve this problem? Can I make Python store it entireley in the RAM without writing it to the HDD? Or can I change the HDD on which this array is written?
EDIT: I found something online which suggests, that the array is stored in the "shared memory". But the /dev/shm device has plenty more free space as the /dev/sda1 which is filled up by the code above.
Here is the (relevant part of the) strace log of this code.
Edit #2: I think that I have found a workarround for this problem. By looking at the source I found that multiprocessing tries to create a temporary file in a directory which is determinded by using
process.current_process()._config.get('tempdir')
Setting this value manually at the beginning of the script
from multiprocessing import process
process.current_process()._config['tempdir'] = '/data/tmp/'
seems to be solving this issue. But I think that this is not the best way to solve it. So: are there any other suggestions how to handle it?
These data are larger than 500GB. Just shared_data_base would be 826.2GB on my machine by sys.getsizeof() and 1506.6GB by pympler.asizeof.asizeof(). Even if they were only 500GB, your machine needs some of that memory in order to run. This is why the data are going to disk.
import ctypes
from pympler.asizeof import asizeof
import sys
N = 150
ndata = 10000
sigma = 3
ddim = 3
print(sys.getsizeof(ctypes.c_double(1.0)) * ndata*N*N*ddim*sigma*sigma)
print(asizeof(ctypes.c_double(1.0)) * ndata*N*N*ddim*sigma*sigma)
Note that on my machine (Debian 9), /tmp is the location that fills. If you find that you must use disk, be certain that the location on disk used has enough available space, typically /tmp isn't a large partition.
Background
The binary file contain successive raw output from a camera sensor which is in the form of a bayer pattern. i.e. the data is successive blocks containing information of the form shown below and where each block is a image in image stream
[(bayer width) * (bayer height) * sizeof(short)]
Objective
To read information from a specific block of data and store it as an array for processing. I was digging through the opencv documentation, and totally lost on how to proceed. I apologize for the novice question but any suggestions?
Assuming you can read the binary file (as a whole), I would try to use
Numpy to read it into a numpy.array. You can use numpy.fromstring and depending on the system the file was written on (little or big endian), use >i2 or <i2 as your data type (you can find the list of data types here).
Also note that > means big endian and < means little endian (more on that here)
You can set an offset and specify the length in order to read to read a certain block.
import numpy as np
with open('datafile.bin','r') as f:
dataBytes = f.read()
data = np.fromstring(dataBytes[blockStartIndex:blockEndIndex], dtype='>i2')
In case you cannot read the file as a whole, I would use mmap (requires a little knowledge of C) in order to break it down to multiple files and then use the method above.
OP here, with #lsxliron's suggestion I looked into using Numpy to achieve my goals and this is what I ended up doing
import numpy as np
# Data read length
length = (bayer width) * (bayer height)
# In terms of bytes: short = 2
step = 2 * length
# Open filename
img = open("filename","rb")
# Block we are interested in i
img.seek(i * step)
# Copy data as Numpy array
Bayer = np.fromfime(img,dtype=np.uint16,count=length)
Bayer now holds the bayer pattern values in the form of an numpy array success!!
I am writing code for an addon to XBMC that copies an image provided in a bytearray to a slice of a mmap object. Using Kern's line profiler, the bottleneck in my code is when I copy the bytearray into the mmap object at the appropriate location. In essence:
length = 1920 * 1080 * 4
mymmap = mmap.mmap(0, length + 11, 'Name', mmap.ACCESS_WRITE)
image = capture.getImage() # This returns a bytearray that is 4*1920*1080 in size
mymmap[11:(11 + length)] = str(image) # This is the bottleneck according to the line profiler
I cannot change the data types of either the image or mmap. XBMC provides the image as a bytearray and the mmap interface was designed by a developer who won't change the implementation. It is NOT being used to write to a file, however - it was his way of getting the data out of XBMC and into his C++ application. I recognize that he could write an interface using ctypes that might handle this better, but he is not interested in further development. The python implementation in XBMC is 2.7.
I looked at the possibility of using ctypes (in a self-contained way withing python) myself with memmove, but can't quite figure out how to convert the bytearray and mmap slice into c structures that can be used with memmove and don't know if that would be any faster. Any advice on a fast way to move these bytes between these two data types?
If the slice assignment to the mmap object is your bottleneck I don't think anything can be done to improve the performance of your code. All the assignment does internally is call memcpy.