Byte limit when transferring Python objects between Processes using a Pipe? - python

I have a custom simulator (for biology) running on a 64-bit Linux (kernel
version 2.6.28.4) machine using a 64-bit Python 3.3.0 CPython interpreter.
Because the simulator depends on many independent experiments for valid results,
I built in parallel processing for running experiments. Communication between
the threads primarily occurs under a producer-consumer pattern with managed
multiprocessing Queues
(doc).
The rundown of the architecture is as follows:
a master processes that handles spawning and managing Processes and the various Queues
N worker processes that do simulations
1 result consumer process that consumes the results of a simulation and sorts and analyzes the results
The master process and the worker processes communicate via an input Queue.
Similarly, the worker processes place their results in an output Queue which
the result consumer process consumes items from. The final ResultConsumer
object is passed via a multiprocessing Pipe
(doc)
back to the master process.
Everything works fine until it tries to pass the ResultConsumer object back to
the master process via the Pipe:
Traceback (most recent call last):
File "/home/cmccorma/.local/lib/python3.3/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/cmccorma/.local/lib/python3.3/multiprocessing/process.py", line 95, in run
self._target(*self._args, **self._kwargs)
File "DomainArchitectureGenerator.py", line 93, in ResultsConsumerHandler
pipeConn.send(resCon)
File "/home/cmccorma/.local/lib/python3.3/multiprocessing/connection.py", line 207, in send
self._send_bytes(buf.getbuffer())
File "/home/cmccorma/.local/lib/python3.3/multiprocessing/connection.py", line 394, in _send_bytes
self._send(struct.pack("!i", n))
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
I understand the first two traces (unhandled exits in the Process library),
and the third is my line of code for sending the ResultConsumer object down the
Pipe to the master process. The last two traces are where it gets
interesting. A Pipe pickles any object that is sent to it and passes the
resulting bytes to the other end (matching connection) where it is unpickled
upon running recv(). self._send_bytes(buf.getbuffer()) is attempting to
send the bytes of the pickled object. self._send(struct.pack("!i", n)) is
attempting to pack a struct with an integer (network/big-endian) of length n,
where n is the length of the buffer passed in as a parameter (the struct
library handles conversions between Python values and C structs represented as
Python strings, see the doc).
This error only occurs when attempting a lot of experiments, e.g. 10 experiments
will not cause it, but 1000 will consitently (all other parameters being constant). My best
hypothesis so far as to why struct.error is thrown is that the number of bytes
trying to be pushed down the pipe exceeds 2^32-1 (2147483647), or ~2 GB.
So my question is two-fold:
I'm getting stuck with my investigations as struct.py essentially just
imports from _struct and I have no idea where that is.
The byte limit seems arbitrary given that the underlying architecture is all
64-bit. So, why can't I pass anything larger than that? Additionally, if I
can't change this, are there any good (read: easy) workarounds to this issue?
Note: I don't think that using a Queue in place of a Pipe will solve the issue,
as I suspect that Queue's use a similar pickling intermediate step. EDIT: This note is entirely incorrect as pointed out in abarnert's answer.

I'm getting stuck with my investigations as struct.py essentially just imports from _struct and I have no idea where that is.
In CPython, _struct is a C extension module built from _struct.c in the Modules directory in the source tree. You can find the code online here.
Whenever foo.py does an import _foo, that's almost always a C extension module, usually built from _foo.c. And if you can't find a foo.py at all, it's probably a C extension module, built from _foomodule.c.
It's also often worth looking at the equivalent PyPy source, even if you're not using PyPy. They reimplement almost all extension modules in pure Python—and for the remainder (including this case), the underlying "extension language" is RPython, not C.
However, in this case, you don't need to know anything about how struct is working beyond what's in the docs.
The byte limit seems arbitrary given that the underlying architecture is all 64-bit.
Look at the code it's calling:
self._send(struct.pack("!i", n))
If you look at the documentation, the 'i' format character explicitly means "4-byte C integer", not "whatever ssize_t is". For that, you'd have to use 'n'. Or you might want to explicitly use a long long, with 'q'.
You can monkeypatch multiprocessing to use struct.pack('!q', n). Or '!q'. Or encode the length in some way other than struct. This will, of course, break compatibility with non-patched multiprocessing, which could be a problem if you're trying to do distributed processing across multiple computers or something. But it should be pretty simple:
def _send_bytes(self, buf):
# For wire compatibility with 3.2 and lower
n = len(buf)
self._send(struct.pack("!q", n)) # was !i
# The condition is necessary to avoid "broken pipe" errors
# when sending a 0-length buffer if the other end closed the pipe.
if n > 0:
self._send(buf)
def _recv_bytes(self, maxsize=None):
buf = self._recv(8) # was 4
size, = struct.unpack("!q", buf.getvalue()) # was !i
if maxsize is not None and size > maxsize:
return None
return self._recv(size)
Of course there's no guarantee that this change is sufficient; you'll want to read through the rest of the surrounding code and test the hell out of it.
Note: I suspect that using a Queue in place of a Pipe will not solve the issue, as I suspect that Queue's use a similar pickling intermediate step.
Well, the problem has nothing to do with pickling. Pipe isn't using pickle to send the length, it's using struct. You can verify that pickle wouldn't have this problem: pickle.loads(pickle.dumps(1<<100)) == 1<<100 will return True.
(In earlier versions, pickle also had problems with huge objects—e.g., a list of 2G elements—which could have caused problems at a scale about 8x as high as the one you're currently hitting. But that's been fixed by 3.3.)
Meanwhile… wouldn't it be faster to just try it and see, instead of digging through the source to try to figure out whether it would work?
Also, are you sure you really want to pass around a 2GB data structure by implicit pickling?
If I were doing something that slow and memory-hungry, I'd prefer to make that explicit—e.g., pickle to a tempfile and send the path or fd. (If you're using numpy or pandas or something, use its binary file format instead of pickle, but same idea.)
Or, even better, share the data. Yes, mutable shared state is bad… but sharing immutable objects is fine. Whatever you've got 2GB of, can you put it in a multiprocessing.Array, or put it in a ctypes array or struct (of arrays or structs of …) that you can share via multiprocessing.sharedctypes, or ctypes it out of a file that you mmap on both sides, or…? There's a bit of extra code to define and pick apart the structures, but when the benefits are likely to be this big, it's worth trying.
Finally, when you think you've found a bug/obvious missing feature/unreasonable limitation in Python, it's worth looking at the bug tracker. It looks like issue 17560: problem using multiprocessing with really big objects? is exactly your problem, and has lots of information, including suggested workarounds.

Related

Make python process writes be scheduled for writeback immediately without being marked dirty

We are building a python framework that captures data from a framegrabber card through a cffi interface. After some manipulation, we try to write RAW images (numpy arrays using the tofile method) to disk at a rate of around 120 MB/s. We are well aware that are disks are capable of handling this throughput.
The problem we were experiencing was dropped frames, often entire seconds of data completely missing from the framegrabber output. What we found was that these framedrops were occurring when our Debian system hit the dirty_background_ratio set in sysctl. The system was calling the flush background gang that would choke up the framegrabber and cause it to skip frames.
Not surprisingly, setting the dirty_background_ratio to 0% managed to get rid of the problem entirely (It is worth noting that even small numbers like 1% and 2% still resulted in ~40% frame loss)
So, my question is, is there any way to get this python process to write in such a way that it is immediately scheduled for writeout, bypassing the dirty buffer entirely?
Thanks
So heres one way I've managed to do it.
By using the numpy memmap object you can instantiate an array that directly corresponds with a part of the disk. Calling the method flush() or python's del causes the array to sync to disk, completely bypassing the OS's buffer. I've successfully written ~280GB to disk at max throughput using this method.
Will continue researching.
Another option is to get the os file id and call os.fsync on it. This will schedule it for writeback immediately.

Serial port writing style

I am using two libraries to connect with a port, and two of them uses different styles in writing these commands. I want to understand the difference because I want to use the second one, but it results in port becoming unresponsive after some time, I wonder if it causes a kind of overloading. Here are the methods.
Method 1:
if self.port:
self.port.flushOutput()
self.port.flushInput()
for c in cmd:
self.port.write(c)
self.port.write("\r\n")
Method 2:
if self.port:
cmd += b"\r\n"
self.port.flushInput()
self.port.write(cmd)
self.port.flush()
The major difference I first encounter is that the first one splitting the command in to letters then send it. I wonder if this makes any difference. And as I said the second code fails after some time( it is unclear, if these methods are the problem). I don't understand what flushes do there. I want to understand the difference between these and know if the second one prone to errors.
Note: Please note that self.port is serial.Serial object.
Any advice appreciated.
Well, from the pySerial documentation the function flushInput has been renamed to reset_input_buffer and flushOutput to reset_output_buffer:
reset_input_buffer()
Flush input buffer, discarding all it’s contents.
Changed in version 3.0: renamed from flushInput()
reset_output_buffer()
Clear output buffer, aborting the current output and discarding all that is in the buffer.
Changed in version 3.0: renamed from flushOutput()
The first method is less likely to fail because the output buffer is reset before attempting a write. This implies that the buffer is always empty before writing, hence the odds the write will fail are lower.
The problem is that both the methods are error prone:
There is no guarantee that all the data you are attempting to write will be written by the write() function, either with or without the for loop. This can happen if the output buffer is already full. But the write() functions returns the number of bytes successfully written to the buffer. Hence you should loop untill the number of written bytes is equal to the number of bytes you wanted to write:
toWrite = "the command\r\n"
written = 0
while written < len(toWrite) :
written += self.port.write(toWrite[written:])
if written == 0 :
# the buffer is full
# wait untill some bytes are actually transmitted
time.slepp(100)
Note that "writing to the buffer" doesn't mean that the data is instantly trasmitted on the serial port, the buffer will be flushed on the serial port when the operative system thinks is time to do so, or when you force it by calling the flush() function which will wait for all the data to be written on the port.
Note also that this approach will block the thread execution untill the write is successfully completed, this can take a while if the serial port is slow or you want to write a big amount of data.
If your program is ok with that you are fine, otherwise you can dedicate a different thread to serial port communication or adopt a non-blocking approach. In the former you will have to handle multithread communication, in the latter you will have to manage internally your buffer and delete only the successfully written bytes.
Finally if your program is really simple an approach like this should do the trick:
if self.port:
cmd+=b"\r\n"
for c in cmd:
self.port.write(c)
self.port.flush()
But it will be extremely unefficient.

Inconsistent python mmap behaviour with /dev/mem

I've been working on a project in PHP which requires mmap'ing /dev/mem to gain access to the hardware peripheral registers. As there's no native mmap, the simplest way I could think of to achieve this was to construct a python subprocess, which communicated with the PHP app via stdin/stdout.
I have run into a strange issue which only occurs while reading addresses, not writing them. The subprocess functions correctly (for reading) with the following:
mem.write(sys.stdin.read(length))
So, I expected that I could conversely write memory segments back to the parent using the following:
sys.stdout.write(mem.read(length))
If I mmap a standard file, both commands work as expected (irrelevant of the length of read/write). If I map the /dev/mem "file," I get nonsense back during the read. It's worth noting that the area I'm mapping is outside the physical memory address space and is used to access the peripheral registers.
The work-around I have in place is the following:
for x in range(0, length / 4):
sys.stdout.write(str(struct.pack('L', struct.unpack_from('L', mem, mem.tell())[0])))
mem.seek(4, os.SEEK_CUR)
This makes the reads behave as expected.
What I can't understand is why reading from the address using unpack_from should see anything different to reading it directly. The same (non-working) thing occurs if I try to just assign a read to a variable.
In case additional context is helpful, I'm running this on a Raspberry Pi/Debian 8. The file that contains the above issue is here. The project that uses it is here.

Feasibility of using pipe for ruby-python communication

Currently, I have two programs, one running on Ruby and the other in Python. I need to read a file in Ruby but I need first a library written in Python to parse the file. Currently, I use XMLRPC to have the two programs communicate. Porting the Python library to Ruby is out of question. However, I find and read that using XMLRPC has some performance overhead. Recently, I read that another solution for the Ruby-Python conundrum is the use of pipes. So I tried to experiment on that one. For example, I wrote this master script in ruby:
(0..2).each do
slave = IO.popen(['python','slave.py'],mode='r+')
slave.write "master"
slave.close_write
line = slave.readline
while line do
sleep 1
p eval line
break if slave.eof
line = slave.readline
end
end
The following is the Python slave:
import sys
cmd = sys.stdin.read()
while cmd:
x = cmd
for i in range(0,5):
print "{'%i'=>'%s'}" % (i, x)
sys.stdout.flush()
cmd = sys.stdin.read()
Everything seems to work fine:
~$ ruby master.rb
{"0"=>"master"}
{"1"=>"master"}
{"2"=>"master"}
{"3"=>"master"}
{"4"=>"master"}
{"0"=>"master"}
{"1"=>"master"}
{"2"=>"master"}
{"3"=>"master"}
{"4"=>"master"}
{"0"=>"master"}
{"1"=>"master"}
{"2"=>"master"}
{"3"=>"master"}
{"4"=>"master"}
My question is, is it really feasible to implement the use of pipes for working with objects between Ruby and Python? One consideration is that there may be multiple instances of master.rb running. Will concurrency be an issue? Can pipes handle extensive operations and objects to be passed in between? If so, would it be a better alternative for RPC?
Yes. No. If you implement it, yes. Depends on what your application needs.
Basically if all you need is simple data passing pipes are fine, if you need to be constantly calling functions on objects in your remote process then you'll probably be better of using some form of existing RPC instead of reinventing the wheel. Whether that should be XMLRPC or something else is another matter.
Note that RPC will have to use some underlying IPC mechanism, which could well be pipes. but might also be sockets, message queues, shared memory, whatever.

Using GDB in a virtual machine to debug Python ctypes segfaults

I have been working on a project that has been working fine on a dedicated Linux CentOS system.
The general idea is that there is a Python workflow manager than calls shared libraries written in C using ctypes. It works fine.
However, the need for me to have a local instance of the project for development purposes has come up. I set up a Linux Mint virtual machine with VMWare under Windows 7. For the most part, everything works fine.
The problem is one module is crashing with a segmentation fault upon calling a function in one of the shared libraries. Normally this is OK, and on the dedicated Linux machine, using something like "gdb python corename" lets me see exactly where it crashed and work the problem out.
However, with the local setup I am having problems. The thing I notice most is that GDB does not report correct memory addresses. It's a huge project so I can't post all the code, but I'll give a rundown:
Python module creates a "file_path" variable, a string. It first passes this to a certain shared library to load the file. If I execute the command, in python
hex(id(file_path))
it will return something like '46cb4ec'. In the first shared library, which is C, I start it out with
printf("file_pathaddress = %x\n", &file_path[0]);
and it outputs 'file_path address = 46cb4ec', which is the same as I get through Python's 'id()' function. I guess this is expected...?
Anyways.. I send this same variable to another shared library, but it crashes immediately on this call. If I analyze the core file, it shows it crashes on the function call itself, and not a line within the function. The strange thing, though, is it outputs something like:
Program terminated with signal 11, Segmentation fault.
#0 0x00007f124448c9fc in seam_processor (irm_filename=<error reading variable: Cannot access memory at address 0x7fff5fab51b8>,
seam_data_path=<error reading variable: Cannot access memory at address 0x7fff5fab51b0>,
full_data_path=<error reading variable: Cannot access memory at address 0x7fff5fab51a8>, ranges=<error reading variable: Cannot access memory at address 0x7fff5fab51a0>,
job_id=<error reading variable: Cannot access memory at address 0x7fff5fab519c>, job_owner=<error reading variable: Cannot access memory at address 0x7fff5fab5198>,
y_tile_in=1, x_tile_in=1, fc_int=10650000000, atmos_props=..., surf_props=..., extra_props=<error reading variable: Cannot access memory at address 0x7fff5fab5190>,
gascalc_ftype=513, len_gas_sectrum=16, vect_spec_fq=<error reading variable: Cannot access memory at address 0x7fff5fab5188>, surfscat_theta_inc_vector_size=6,
surfscat_theta_inc_vector=<error reading variable: Cannot access memory at address 0x7fff5fab5180>, surfscat_phi_inc_vector_size=6,
surfscat_phi_inc_vector=<error reading variable: Cannot access memory at address 0x7fff5fab5178>, surfscat_theta_scat_vector_size=6,
surfscat_theta_scat_vector=<error reading variable: Cannot access memory at address 0x7fff5fab5170>, surfscat_phi_scat_vector_size=6,
surfscat_phi_scat_vector=<error reading variable: Cannot access memory at address 0x7fff5fab5168>) at src/seam_processor.c:47
So, what I can't figure out is why GDB is reporting these memory addresses as such. In this case, the 'irm_filename' variable is what the Python passed in as 'file_path', so its address should be what the other library and the id() function report, 0x46CB4EC. Why is it different? However, the strange thing is that some of the variables are just fine, like 'y_tile_in'. If I do in gdb:
(gdb) print &y_tile_in
$1 = (int *) 0x7fff60543f80
So, although it can read this memory address, this is not the same as what Python's id() would report, or what a similar C printf() of the address would report in a library that doesn't crash. Also, these memory addresses are really big numbers, larger than the amount of memory I have by far... What do they really mean?
My question, then, is what exactly is going on here? Is the fact that this is being run in a virtual machine doing this? Is there some mapping going on? I haven't been able to find anything online about something different I'd have to do if using gdb in a virtual machine, so I'm at a loss...
Anyone know what's going on?
Thanks.
EDIT
So, the problem has gotten stranger. Basically, I commented out all the code from the library that does anything and left the function call the same. When I do this and run it in gdb with a breakpoint, all of the memory addresses that it prints in the function call are normal, match the Python id() function and match printf() on the addresses.
I started un-commenting out code to see what the problem could be. The problem is with a declaration:
double nrcs_h_d[MAX_NINC_S*MAX_SCAT_S];
double nrcs_v_d[MAX_NINC_S*MAX_SCAT_S];
If I comment out both lines, there is no crash. If I comment out only the second line, there is no crash. If no lines are commented out, though, it crashes.
The strange thing is that MAX_NINC_S and MAX_SCAT_S both equal 500. So, these arrays are only a couple of megabytes in size... Elsewhere in the code arrays of several hundred megabytes are allocated just fine.
Also, if I replace the above lines with:
double *nrcs_h_d, *nrcs_v_d;
nrcs_h_d = (double *)malloc(MAX_NINC_S*MAX_SCAT_S*sizeof(double));
nrcs_v_d = (double *)malloc(MAX_NINC_S*MAX_SCAT_S*sizeof(double));
It seems to work fine... So apparently the problem was related to trying to allocate too much on the stack.
So, the questions become:
Why does gdb not show this is the line of the code where the segmentation fault happens, but instead says it is the function call?
Why do the memory addresses of the core dump file seem to get all screwed up if that allocation is made?
Thanks.
Recall that stack space grows downward, while heap space grows upward; addresses close to the top of your virtual memory space (e.g. 0x7fff5fab51b0) are stack-allocated variables, while addresses closer to the bottom (e.g. 0x46cb4ec) are heap-allocated variables. Note that virtual memory space is usually much, much larger than physical memory; it looks like your operating system and architecture support up to 128 GiB of virtual memory.
Since Python depends heavily on dynamic memory allocation, it will end up putting its objects on the heap, which is why id() tends to return addresses on the low side. If C code copies any values to stack-allocated variables and subsequently tries to call functions using those local copies, you'll see addresses on the high side again.
There is a line number provided: src/seam_processor.c:47. Is there anything interesting there? The various memory addresses that GDB is complaining about are various memory addresses on the stack, all of which are sequential, and almost all of which themselves appear to be pointers (since they are all 8 bytes wide).
(This is the best answer I can give with the information at hand; feel free to suggest revisions or provide additional information.)
One possible explanation is that the allocation of data for local variables in a function occurs before any code in a particular function is executed.
The compiler works out the amount of required stack space and allocates it before getting to the statements in the function. So the actual execution of your function goes a bit like this:
Allocate the amount of storage for all local function variables,
Start executing the statements in the function
In that way, when you call another say function2 from fucntion1, the local variables of function1 will not be disturbed with data from local variables in function2.
Thus in your case, the space required for your local data variables is more than the stack has available and the exception is raised at a point in time before the function code starts executing.

Categories