python multiprocessing - OverflowError('cannot serialize a bytes object larger than 4GiB')

python multiprocessing - OverflowError('cannot serialize a bytes object larger than 4GiB') - python

We are running a script using the multiprocessing library (python 3.6), where a big pd.DataFrame is passed as an argument to a function :
from multiprocessing import Pool
import time
def my_function(big_df):
# do something time consuming
time.sleep(50)
if __name__ == '__main__':
with Pool(10) as p:
res = {}
output = {}
for id, big_df in some_dict_of_big_dfs:
res[id] = p.apply_async(my_function,(big_df ,))
output = {id : res[id].get() for id in id_list}
The problem is that we are getting an error from the pickle library.
Reason: 'OverflowError('cannot serialize a bytes objects larger than
4GiB',)'
We are aware than pickle v4 can serialize larger objects question related, link, but we don't know how to modify the protocol that multiprocessing is using.
does anybody know what to do?
Thanks !!

Apparently is there an open issue about this topic , and there is a few related initiatives described on this particular answer. I Found a way to change the default pickle protocol that is used in the multiprocessing library based on this answer. As was pointed out in the comments this solution Only works with Linux and OS multiprocessing lib
Solution:
You first create a new separated module
pickle4reducer.py
from multiprocessing.reduction import ForkingPickler, AbstractReducer
class ForkingPickler4(ForkingPickler):
def __init__(self, *args):
if len(args) > 1:
args[1] = 2
else:
args.append(2)
super().__init__(*args)
#classmethod
def dumps(cls, obj, protocol=4):
return ForkingPickler.dumps(obj, protocol)
def dump(obj, file, protocol=4):
ForkingPickler4(file, protocol).dump(obj)
class Pickle4Reducer(AbstractReducer):
ForkingPickler = ForkingPickler4
register = ForkingPickler4.register
dump = dump
And then, in your main script you need to add the following:
import pickle4reducer
import multiprocessing as mp
ctx = mp.get_context()
ctx.reducer = pickle4reducer.Pickle4Reducer()
with mp.Pool(4) as p:
# do something
That will probably solve the problem of the overflow.
But, warning, you might consider reading this before doing anything or you might reach the same error as me:
'i' format requires -2147483648 <= number <= 2147483647
(the reason of this error is well explained in the link above). Long story short, multiprocessing send data through all its process using the pickle protocol, if you are already reaching the 4gb limit, that probably means that you might consider redefining your functions more as "void" methods rather than input/output methods. All this inbound/outbound data increase the RAM usage, is probably inefficient by construction (my case) and it might be better to point all process to the same object rather than create a new copy for each call.
hope this helps.

Supplementing answer from Pablo
The following problem can be resolved be Python3.8, if you are okay to use this version of python:
'i' format requires -2147483648 <= number <= 2147483647

Related

Python: BigFloat+Multiprocessing

I am trying to parallelise a series of computations that use bigfloat. However, there is the error
Error sending result: '[BigFloat.exact('1.0000000', precision=20)]'. Reason: 'TypeError('self._value cannot be converted to a Python object for pickling')'
I MWE to reproduce the error is
from bigfloat import *
from multiprocessing import Pool
def f(x,a,b,N):
with precision(20):
X=BigFloat(x)
for i in range(N):
X = a*X*X-b
return X
if __name__ == '__main__':
pool = Pool(processes=2)
out1,out2 = pool.starmap(f,[(1,2,1,3),(2,2,2,2)])
(the function itself is not important at all). If I do not use bigfloat, then everything is fine. So, it is definitely some sort of interaction between multiprocessing and bigfloat.
So, I imagine that multiprocessing is having troubles saving the BigFloat object. I do not seem to be able to "extract" only the value thrown by BigFloat. How can I resolve this issue?

apparently bigfloat doesn't support pickling, I get the same error when doing pickle.dumps(BigFloat(1))
https://github.com/mdickinson/bigfloat/issues/106 notes this as needing to be done
as a work around, why not just convert to strings when transferring between processes? i.e. change f to return str(X) and then have other routines parse the strings into BigFloats as needed
otherwise, you could write some code to support this and submit it to the project

Multiprocessing IOError: bad message length

I get an IOError: bad message length when passing large arguments to the map function. How can I avoid this?
The error occurs when I set N=1500 or bigger.
The code is:
import numpy as np
import multiprocessing
def func(args):
i=args[0]
images=args[1]
print i
return 0
N=1500 #N=1000 works fine
images=[]
for i in np.arange(N):
images.append(np.random.random_integers(1,100,size=(500,500)))
iter_args=[]
for i in range(0,1):
iter_args.append([i,images])
pool=multiprocessing.Pool()
print pool
pool.map(func,iter_args)
In the docs of multiprocessing there is the function recv_bytes that raises an IOError. Could it be because of this? (https://python.readthedocs.org/en/v2.7.2/library/multiprocessing.html)
EDIT
If I use images as a numpy array instead of a list, I get a different error: SystemError: NULL result without error in PyObject_Call.
A bit different code:
import numpy as np
import multiprocessing
def func(args):
i=args[0]
images=args[1]
print i
return 0
N=1500 #N=1000 works fine
images=[]
for i in np.arange(N):
images.append(np.random.random_integers(1,100,size=(500,500)))
images=np.array(images) #new
iter_args=[]
for i in range(0,1):
iter_args.append([i,images])
pool=multiprocessing.Pool()
print pool
pool.map(func,iter_args)
EDIT2 The actual function that I use is:
def func(args):
i=args[0]
images=args[1]
image=np.mean(images,axis=0)
np.savetxt("image%d.txt"%(i),image)
return 0
Additionally, the iter_args do not contain the same set of images:
iter_args=[]
for i in range(0,1):
rand_ind=np.random.random_integers(0,N-1,N)
iter_args.append([i,images[rand_ind]])

You're creating a pool and sending all the images at once to func(). If you can get away with working on a single image at once, try something like this, which runs to completion with N=10000 in 35s with Python 2.7.10 for me:
import numpy as np
import multiprocessing
def func(args):
i = args[0]
img = args[1]
print "{}: {} {}".format(i, img.shape, img.sum())
return 0
N=10000
images = ((i, np.random.random_integers(1,100,size=(500,500))) for i in xrange(N))
pool=multiprocessing.Pool(4)
pool.imap(func, images)
pool.close()
pool.join()
The key here is to use iterators so you don't have to hold all the data in memory at once. For instance I converted images from an array holding all the data to a generator expression to create the image only when needed. You could modify this to load your images from disk or whatever. I also used pool.imap instead of pool.map.
If you can, try to load the image data in the worker function. Right now you have to serialize all the data and ship it across to another process. If your image data is larger, this might be a bottleneck.
[update now that we know func has to handle all images at once]
You could do an iterative mean on your images. Here's a solution without using multiprocessing. To use multiprocessing, you could divide your images into chunks, and farm those chunks out to the pool.
import numpy as np
N=10000
shape = (500,500)
def func(images):
average = np.full(shape, 0)
for i, img in images:
average += img / N
return average
images = ((i, np.full(shape,i)) for i in range(N))
print func(images)

Python is likely to load your data in your RAM memory and you need this memory to be available. Have you checked your computer memory usage ?
Also as Patrick mentioned, you're loading 3GB of data, make sure you use the 64 bits version of Python as you are reaching the 32 bits memory contraint. This could cause your process to crash : 32 vs 64 bits Python
Another improvement would be to use python 3.4 instead of 2.7. Python 3 implementation seems to be optimized for very large ranges, see Python3 vs Python2 list/generator range performance

When running your program it actually gives me an clear error:
OSError: [Errno 12] Cannot allocate memory
Like mentioned by other users, the solution to your problem is simple add memory(a lot) or change the way your program is handling the images.
The reason it's using so much memory is because you allocate your memory for your images on a module level. So when multiprocess forks your process it's also copying all the images (which isn't free according to Shared-memory objects in python multiprocessing), this is not necessary because you are also giving the images as an argument to the function which the multiprocess module also copies using ipc and pickle, this would still likely result in a lack of memory. Try one of the proposed solutions given by the other users.

This is what solved the problem: declaring the images global.
import numpy as np
import multiprocessing
N=1500 #N=1000 works fine
images=[]
for i in np.arange(N):
images.append(np.random.random_integers(1,100,size=(500,500)))
def func(args):
i=args[0]
images=images
print i
return 0
iter_args=[]
for i in range(0,1):
iter_args.append([i])
pool=multiprocessing.Pool()
print pool
pool.map(func,iter_args)

The reason why you get IOError: bad message length when passing around large objects is due to a hard coded limit in older CPython versions (3.2 and earlier) of 0x7fffffff Bytes or around 2.1GB: https://github.com/python/cpython/blob/v2.7.5/Modules/_multiprocessing/multiprocessing.h#L182
This CPython changeset (which is in CPython 3.3 and later) removed the hard coded limit: https://github.com/python/cpython/commit/87cf220972c9cb400ddcd577962883dcc5dca51a#diff-4711c9abeca41b149f648d4b3c15b6a7d2baa06aa066f46359e4498eb8e39f60L182

python MPI sendrecv() to pass a python object

I am trying to use mpi4py's sendrecv() to pass a dictionary obj.
from mpi4py import MPI
comm=MPI_COMM_WORLD
rnk=comm.Get_rank()
size=comm.Get_size()
idxdict={1:2}
buffer=None
comm.sendrecv(idxdict,dest=(rnk+1)%size,sendtag=rnk,recvobj=buffer,source=(rnk-1+size)%size,recvtag=(rnk-1+size)%size)
idxdict=buffer
If I print idxidctat the last step, I will get a bunch of "None"s, so the dictionary idxdict is not passed between cores. If I use a dictionary as buffer: buffer={}, then there is typeerror:TypeError: expected a writeable buffer object.
What did I do wrong? Many thanks for your help.

I believe the documentation is misleading here; sendrecv returns the received buffer, and doesn't use the receive object argument at all that I can see (at least in older versions, 1.2.x). So your above code doesn't work (although the receive does in fact happen), but the below does:
from mpi4py import MPI
comm=MPI.COMM_WORLD
rnk=comm.Get_rank()
size=comm.Get_size()
idxdict={1:2}
buffer = comm.sendrecv(sendobj=idxdict,dest=(rnk+1)%size,source=(rnk-1+size)%size)
print "idxdict = ", idxdict
print "buffer = ", buffer

Python: how to run several scripts (or functions) at the same time under windows 7 multicore processor 64bit

sorry for this question because there are several examples in Stackoverflow. I am writing in order to clarify some of my doubts because I am quite new in Python language.
i wrote a function:
def clipmyfile(inFile,poly,outFile):
... # doing something with inFile and poly and return outFile
Normally I do this:
clipmyfile(inFile="File1.txt",poly="poly1.shp",outFile="res1.txt")
clipmyfile(inFile="File2.txt",poly="poly2.shp",outFile="res2.txt")
clipmyfile(inFile="File3.txt",poly="poly3.shp",outFile="res3.txt")
......
clipmyfile(inFile="File21.txt",poly="poly21.shp",outFile="res21.txt")
I had read in this example Run several python programs at the same time and i can use (but probably i wrong)
from multiprocessing import Pool
p = Pool(21) # like in your example, running 21 separate processes
to run the function in the same time and speed my analysis
I am really honest to say that I didn't understand the next step.
Thanks in advance for help and suggestion
Gianni

The map that is used in the example you provided only works for functions that recieve one argument. You can see a solution to this here: Python multiprocessing pool.map for multiple arguments
In your case what you would do is (assuming you have 3 arrays with files, polies, outs):
def expand_args(f_p_o):
clipmyfile(*f_p_o)
files = ["file1.txt", "file2.txt"]
polis = ["poli1.txt", "poly2.txt"]
outis = ["out1.txt", "out2.txt"]
len_f = len(files)
p = Pool()
p.map(expand_args, [(files[i], polis[i], outis[i]) for i in xrange(len_f)])

Python mmap ctypes - read only

I think I have the opposite problem as described here. I have one process writing data to a log, and I want a second process to read it, but I don't want the 2nd process to be able to modify the contents. This is potentially a large file, and I need random access, so I'm using python's mmap module.
If I create the mmap as read/write (for the 2nd process), I have no problem creating ctypes object as a "view" of the mmap object using from_buffer. From a cursory look at the c-code, it looks like this is a cast, not a copy, which is what I want. However, this breaks if I make the mmap ACCESS_READ, throwing an exception that from_buffer requires write privileges.
I think I want to use ctypes from_address() method instead, which doesn't appear to need write access. I'm probably missing something simple, but I'm not sure how to get the address of the location within an mmap. I know I can use ACCESS_COPY (so write operations show up in memory, but aren't persisted to disk), but I'd rather keep things read only.
Any suggestions?

I ran into a similar issue (unable to setup a readonly mmap) but I was using only the python mmap module. Python mmap 'Permission denied' on Linux
I'm not sure it is of any help to you since you don't want the mmap to be private?

Ok, from looking at the mmap .c code, I don't believe it supports this use case. Also, I found that the performance pretty much sucks - for my use case. I'd be curious what kind performance others see, but I found that it took about 40 sec to walk through a binary file of 500 MB in Python. This is creating a mmap, then turning the location into a ctype object with from_buffer(), and using the ctypes object to decipher the size of the object so I could step to the next object. I tried doing the same thing directly in c++ from msvc. Obviously here I could cast directly into an object of the correct type, and it was fast - less than a second (this is with a core 2 quad and ssd).
I did find that I could get a pointer with the following
firstHeader = CEL_HEADER.from_buffer(map, 0) #CEL_HEADER is a ctypes Structure
pHeader = pointer(firstHeader)
#Now I can use pHeader[ind] to get a CEL_HEADER object
#at an arbitrary point in the file
This doesn't get around the original problem - the mmap isn't read-only, since I still need to use from_buffer for the first call. In this config, it still took around 40 sec to process the whole file, so it looks like the conversion from a pointer into ctypes structs is killing the performance. That's just a guess, but I don't see a lot of value in tracking it down further.
I'm not sure my plan will help anyone else, but I'm going to try to create a c module specific to my needs based on the mmap code. I think I can use the fast c-code handling to index the binary file, then expose only small parts of the file at a time through calls into ctypes/python objects. Wish me luck.
Also, as a side note, Python 2.7.2 was released today (6/12/11), and one of the changes is an update to the mmap code so that you can use a python long to set the file offset. This lets you use mmap for files over 4GB on 32-bit systems. See Issue #4681 here

Ran into this same problem, we needed the from_buffer interface and wanted read only access. From the python docs https://docs.python.org/3/library/mmap.html "Assignment to an ACCESS_COPY memory map affects memory but does not update the underlying file."
If it's acceptable for you to use an anonymous file backing you can use ACCESS_COPY
An example: open two cmd.exe or terminals and in one terminal:
mm_file_write = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
mm_file_read = mmap.mmap(-1, 4096, access=mmap.ACCESS_COPY, tagname="shmem")
write = ctypes.c_int.from_buffer(mm_file_write)
read = ctypes.c_int.from_buffer(mm_file_read)
try:
while True:
value = int(input('enter an integer using mm_file_write: '))
write.value = value
print('updated value')
value = int(input('enter an integer using mm_file_read: '))
#read.value assignment doesnt update anonymous backed file
read.value = value
print('updated value')
except KeyboardInterrupt:
print('got exit event')
In the other terminal do:
mm_file = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
i = None
try:
while True:
new_i = struct.unpack('i', mm_file[:4])
if i != new_i:
print('i: {} => {}'.format(i, new_i))
i = new_i
time.sleep(0.1)
except KeyboardInterrupt:
print('Stopped . . .')
And you will see that the second process does not receive updates when the first process writes using ACCESS_COPY

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.