python MPI sendrecv() to pass a python object - python

I am trying to use mpi4py's sendrecv() to pass a dictionary obj.
from mpi4py import MPI
comm=MPI_COMM_WORLD
rnk=comm.Get_rank()
size=comm.Get_size()
idxdict={1:2}
buffer=None
comm.sendrecv(idxdict,dest=(rnk+1)%size,sendtag=rnk,recvobj=buffer,source=(rnk-1+size)%size,recvtag=(rnk-1+size)%size)
idxdict=buffer
If I print idxidctat the last step, I will get a bunch of "None"s, so the dictionary idxdict is not passed between cores. If I use a dictionary as buffer: buffer={}, then there is typeerror:TypeError: expected a writeable buffer object.
What did I do wrong? Many thanks for your help.

I believe the documentation is misleading here; sendrecv returns the received buffer, and doesn't use the receive object argument at all that I can see (at least in older versions, 1.2.x). So your above code doesn't work (although the receive does in fact happen), but the below does:
from mpi4py import MPI
comm=MPI.COMM_WORLD
rnk=comm.Get_rank()
size=comm.Get_size()
idxdict={1:2}
buffer = comm.sendrecv(sendobj=idxdict,dest=(rnk+1)%size,source=(rnk-1+size)%size)
print "idxdict = ", idxdict
print "buffer = ", buffer

Related

Python: BigFloat+Multiprocessing

I am trying to parallelise a series of computations that use bigfloat. However, there is the error
Error sending result: '[BigFloat.exact('1.0000000', precision=20)]'. Reason: 'TypeError('self._value cannot be converted to a Python object for pickling')'
I MWE to reproduce the error is
from bigfloat import *
from multiprocessing import Pool
def f(x,a,b,N):
with precision(20):
X=BigFloat(x)
for i in range(N):
X = a*X*X-b
return X
if __name__ == '__main__':
pool = Pool(processes=2)
out1,out2 = pool.starmap(f,[(1,2,1,3),(2,2,2,2)])
(the function itself is not important at all). If I do not use bigfloat, then everything is fine. So, it is definitely some sort of interaction between multiprocessing and bigfloat.
So, I imagine that multiprocessing is having troubles saving the BigFloat object. I do not seem to be able to "extract" only the value thrown by BigFloat. How can I resolve this issue?
apparently bigfloat doesn't support pickling, I get the same error when doing pickle.dumps(BigFloat(1))
https://github.com/mdickinson/bigfloat/issues/106 notes this as needing to be done
as a work around, why not just convert to strings when transferring between processes? i.e. change f to return str(X) and then have other routines parse the strings into BigFloats as needed
otherwise, you could write some code to support this and submit it to the project

Pickle dumps and timestamps not matching what is expected

I'm manipulating scapy packets object with pickle in order to share them to different processes.
However I'm witnessing that pickle changes the pkt.time attribute of my capture file :
In order to reproduce this behaviour you just need a small pcap :
import pickle
from scapy.all import * # v2.4.3
def ppacket(pkt):
print(pkt.time)
print(pickle.loads(pickle.dumps(pkt)).time)
sniff(offline="test.pcap", prn=ppacket, count=10) #You only need one packet
Now running this on a pcap that was created earlier this month here's what I get :
1562587696.325424 #7/8/2019, 2:08:16 PM
1567437619.227692 #9/2/2019, 5:20:19 PM
From what I understand the problem comes from the fact that this attribute is actually a defined as a function call :
#In scapy: packet.py
def __init__(self, _pkt=b"", post_transform=None, _internal=0, _underlayer=None, **fields): # noqa: E501
self.time = time.time()
How can I avoid this behavior ? pickle was kind of great for me as I didn"t needed to care about formatting data before sending it to other processes.
Thank you for your help.
pickle version :
$ pip freeze |grep pickle
jsonpickle==1.1
pickleshare==0.7.4
Edit 1 :
After further digging I found that if I pickle only the time attribute it works as expected I don't understand how it is supposed to change anything.
print(pkt.time)
# > 1562587696.325424 #7/8/2019, 2:08:16 PM
print(pickle.loads(pickle.dumps(pkt.time)))
# > 1562587696.325424 #7/8/2019, 2:08:16 PM
I found a way to cope with this issue, since pickling only pkt.time works I just pickle the pkt and add pkt.time next to it :
dmp = pickle.dumps((pkt,pkt.time))
ld = pickle.loads(dmp)
print(ld[1]) #Works as expected.

python multiprocessing - OverflowError('cannot serialize a bytes object larger than 4GiB')

We are running a script using the multiprocessing library (python 3.6), where a big pd.DataFrame is passed as an argument to a function :
from multiprocessing import Pool
import time
def my_function(big_df):
# do something time consuming
time.sleep(50)
if __name__ == '__main__':
with Pool(10) as p:
res = {}
output = {}
for id, big_df in some_dict_of_big_dfs:
res[id] = p.apply_async(my_function,(big_df ,))
output = {id : res[id].get() for id in id_list}
The problem is that we are getting an error from the pickle library.
Reason: 'OverflowError('cannot serialize a bytes objects larger than
4GiB',)'
We are aware than pickle v4 can serialize larger objects question related, link, but we don't know how to modify the protocol that multiprocessing is using.
does anybody know what to do?
Thanks !!
Apparently is there an open issue about this topic , and there is a few related initiatives described on this particular answer. I Found a way to change the default pickle protocol that is used in the multiprocessing library based on this answer. As was pointed out in the comments this solution Only works with Linux and OS multiprocessing lib
Solution:
You first create a new separated module
pickle4reducer.py
from multiprocessing.reduction import ForkingPickler, AbstractReducer
class ForkingPickler4(ForkingPickler):
def __init__(self, *args):
if len(args) > 1:
args[1] = 2
else:
args.append(2)
super().__init__(*args)
#classmethod
def dumps(cls, obj, protocol=4):
return ForkingPickler.dumps(obj, protocol)
def dump(obj, file, protocol=4):
ForkingPickler4(file, protocol).dump(obj)
class Pickle4Reducer(AbstractReducer):
ForkingPickler = ForkingPickler4
register = ForkingPickler4.register
dump = dump
And then, in your main script you need to add the following:
import pickle4reducer
import multiprocessing as mp
ctx = mp.get_context()
ctx.reducer = pickle4reducer.Pickle4Reducer()
with mp.Pool(4) as p:
# do something
That will probably solve the problem of the overflow.
But, warning, you might consider reading this before doing anything or you might reach the same error as me:
'i' format requires -2147483648 <= number <= 2147483647
(the reason of this error is well explained in the link above). Long story short, multiprocessing send data through all its process using the pickle protocol, if you are already reaching the 4gb limit, that probably means that you might consider redefining your functions more as "void" methods rather than input/output methods. All this inbound/outbound data increase the RAM usage, is probably inefficient by construction (my case) and it might be better to point all process to the same object rather than create a new copy for each call.
hope this helps.
Supplementing answer from Pablo
The following problem can be resolved be Python3.8, if you are okay to use this version of python:
'i' format requires -2147483648 <= number <= 2147483647

Using pymodbus to read registers

I'm trying to read modbus registers from a PLC using pymodbus. I am following the example posted here. When I attempt print.registers I get the following error: object has no attribute 'registers'
The example doesn't show the modules being imported but seems to be the accepted answer. I think the error may be that I'm importing the wrong module or that I am missing a module. I am simply trying to read a register.
Here is my code:
from pymodbus.client.sync import ModbusTcpClient
c = ModbusTcpClient(host="192.168.1.20")
chk = c.read_holding_registers(257,10, unit = 1)
response = c.execute(chk)
print response.registers
From reading the pymodbus code, it appears that the read_holding_registers object's execute method will return either a response object or an ExceptionResponse object that contains an error. I would guess you're receiving the latter. You need to try something like this:
from pymodbus.register_read_message import ReadHoldingRegistersResponse
#...
response = c.execute(chk)
if isinstance(response, ReadHoldingRegistersResponse):
print response.registers
else:
pass # handle error condition here

Python mmap ctypes - read only

I think I have the opposite problem as described here. I have one process writing data to a log, and I want a second process to read it, but I don't want the 2nd process to be able to modify the contents. This is potentially a large file, and I need random access, so I'm using python's mmap module.
If I create the mmap as read/write (for the 2nd process), I have no problem creating ctypes object as a "view" of the mmap object using from_buffer. From a cursory look at the c-code, it looks like this is a cast, not a copy, which is what I want. However, this breaks if I make the mmap ACCESS_READ, throwing an exception that from_buffer requires write privileges.
I think I want to use ctypes from_address() method instead, which doesn't appear to need write access. I'm probably missing something simple, but I'm not sure how to get the address of the location within an mmap. I know I can use ACCESS_COPY (so write operations show up in memory, but aren't persisted to disk), but I'd rather keep things read only.
Any suggestions?
I ran into a similar issue (unable to setup a readonly mmap) but I was using only the python mmap module. Python mmap 'Permission denied' on Linux
I'm not sure it is of any help to you since you don't want the mmap to be private?
Ok, from looking at the mmap .c code, I don't believe it supports this use case. Also, I found that the performance pretty much sucks - for my use case. I'd be curious what kind performance others see, but I found that it took about 40 sec to walk through a binary file of 500 MB in Python. This is creating a mmap, then turning the location into a ctype object with from_buffer(), and using the ctypes object to decipher the size of the object so I could step to the next object. I tried doing the same thing directly in c++ from msvc. Obviously here I could cast directly into an object of the correct type, and it was fast - less than a second (this is with a core 2 quad and ssd).
I did find that I could get a pointer with the following
firstHeader = CEL_HEADER.from_buffer(map, 0) #CEL_HEADER is a ctypes Structure
pHeader = pointer(firstHeader)
#Now I can use pHeader[ind] to get a CEL_HEADER object
#at an arbitrary point in the file
This doesn't get around the original problem - the mmap isn't read-only, since I still need to use from_buffer for the first call. In this config, it still took around 40 sec to process the whole file, so it looks like the conversion from a pointer into ctypes structs is killing the performance. That's just a guess, but I don't see a lot of value in tracking it down further.
I'm not sure my plan will help anyone else, but I'm going to try to create a c module specific to my needs based on the mmap code. I think I can use the fast c-code handling to index the binary file, then expose only small parts of the file at a time through calls into ctypes/python objects. Wish me luck.
Also, as a side note, Python 2.7.2 was released today (6/12/11), and one of the changes is an update to the mmap code so that you can use a python long to set the file offset. This lets you use mmap for files over 4GB on 32-bit systems. See Issue #4681 here
Ran into this same problem, we needed the from_buffer interface and wanted read only access. From the python docs https://docs.python.org/3/library/mmap.html "Assignment to an ACCESS_COPY memory map affects memory but does not update the underlying file."
If it's acceptable for you to use an anonymous file backing you can use ACCESS_COPY
An example: open two cmd.exe or terminals and in one terminal:
mm_file_write = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
mm_file_read = mmap.mmap(-1, 4096, access=mmap.ACCESS_COPY, tagname="shmem")
write = ctypes.c_int.from_buffer(mm_file_write)
read = ctypes.c_int.from_buffer(mm_file_read)
try:
while True:
value = int(input('enter an integer using mm_file_write: '))
write.value = value
print('updated value')
value = int(input('enter an integer using mm_file_read: '))
#read.value assignment doesnt update anonymous backed file
read.value = value
print('updated value')
except KeyboardInterrupt:
print('got exit event')
In the other terminal do:
mm_file = mmap.mmap(-1, 4096, access=mmap.ACCESS_WRITE, tagname="shmem")
i = None
try:
while True:
new_i = struct.unpack('i', mm_file[:4])
if i != new_i:
print('i: {} => {}'.format(i, new_i))
i = new_i
time.sleep(0.1)
except KeyboardInterrupt:
print('Stopped . . .')
And you will see that the second process does not receive updates when the first process writes using ACCESS_COPY

Categories