Pickle dumps and timestamps not matching what is expected - python

I'm manipulating scapy packets object with pickle in order to share them to different processes.
However I'm witnessing that pickle changes the pkt.time attribute of my capture file :
In order to reproduce this behaviour you just need a small pcap :
import pickle
from scapy.all import * # v2.4.3
def ppacket(pkt):
print(pkt.time)
print(pickle.loads(pickle.dumps(pkt)).time)
sniff(offline="test.pcap", prn=ppacket, count=10) #You only need one packet
Now running this on a pcap that was created earlier this month here's what I get :
1562587696.325424 #7/8/2019, 2:08:16 PM
1567437619.227692 #9/2/2019, 5:20:19 PM
From what I understand the problem comes from the fact that this attribute is actually a defined as a function call :
#In scapy: packet.py
def __init__(self, _pkt=b"", post_transform=None, _internal=0, _underlayer=None, **fields): # noqa: E501
self.time = time.time()
How can I avoid this behavior ? pickle was kind of great for me as I didn"t needed to care about formatting data before sending it to other processes.
Thank you for your help.
pickle version :
$ pip freeze |grep pickle
jsonpickle==1.1
pickleshare==0.7.4
Edit 1 :
After further digging I found that if I pickle only the time attribute it works as expected I don't understand how it is supposed to change anything.
print(pkt.time)
# > 1562587696.325424 #7/8/2019, 2:08:16 PM
print(pickle.loads(pickle.dumps(pkt.time)))
# > 1562587696.325424 #7/8/2019, 2:08:16 PM

I found a way to cope with this issue, since pickling only pkt.time works I just pickle the pkt and add pkt.time next to it :
dmp = pickle.dumps((pkt,pkt.time))
ld = pickle.loads(dmp)
print(ld[1]) #Works as expected.

Related

Problems reading a .pcap file in Python using scapy

I'm trying to create a program where I have to read a pcap file and then count the number of packets related to some IPs. I'm not used to program in Python but I have to use it because I'm using it on a Raspberry Pi and depending of the output I have to control several pins.
Right now I have this, but I have an error and I donĀ“t know how to solve it.
from scapy.all import *
from scapy.utils import RawPcapReader
from scapy.layers.l2 import Ether
from scapy.layers.inet import IP, TCP
def read_pcap(name_pcap):
print("Opening", name_pcap)
client_1 = '192.168.4.4:48878'
server = '10.0.0.2:80'
(client_1_ip, client_1_port) = client_1.split(':')
(server_ip, server_port) = server.split(':')
counter = 0
for(pkt_data, pkt_metadata,) in RawPcapReader(name_pcap):
counter += 1
ether_pkt = Ether(pkt_data)
# Below here are functions to filter the data
read_pcap("captura.pcap")
And the error is this one:
NameError: name 'Packet' is not defined
The error apears to be in this (for(pkt_data, pkt_metadata,) in RawPcapReader(name_pcap):) line.
Someone knows how to solve it?
Thnak you :)
As Carcigenicate pointed out, that's a known bug. It's fixed in https://github.com/secdev/scapy/commit/ff644181d9bee35979a84671690d8cd1aa1971fa
You can use the development version (over https://scapy.readthedocs.io/en/latest/installation.html#current-development-version) in the meantime
Uninstall previous version & Install Latest version from https://pypi.org/project/scapy/
pip install scapy==2.5.0rc1
This should fix the error

python multiprocessing - OverflowError('cannot serialize a bytes object larger than 4GiB')

We are running a script using the multiprocessing library (python 3.6), where a big pd.DataFrame is passed as an argument to a function :
from multiprocessing import Pool
import time
def my_function(big_df):
# do something time consuming
time.sleep(50)
if __name__ == '__main__':
with Pool(10) as p:
res = {}
output = {}
for id, big_df in some_dict_of_big_dfs:
res[id] = p.apply_async(my_function,(big_df ,))
output = {id : res[id].get() for id in id_list}
The problem is that we are getting an error from the pickle library.
Reason: 'OverflowError('cannot serialize a bytes objects larger than
4GiB',)'
We are aware than pickle v4 can serialize larger objects question related, link, but we don't know how to modify the protocol that multiprocessing is using.
does anybody know what to do?
Thanks !!
Apparently is there an open issue about this topic , and there is a few related initiatives described on this particular answer. I Found a way to change the default pickle protocol that is used in the multiprocessing library based on this answer. As was pointed out in the comments this solution Only works with Linux and OS multiprocessing lib
Solution:
You first create a new separated module
pickle4reducer.py
from multiprocessing.reduction import ForkingPickler, AbstractReducer
class ForkingPickler4(ForkingPickler):
def __init__(self, *args):
if len(args) > 1:
args[1] = 2
else:
args.append(2)
super().__init__(*args)
#classmethod
def dumps(cls, obj, protocol=4):
return ForkingPickler.dumps(obj, protocol)
def dump(obj, file, protocol=4):
ForkingPickler4(file, protocol).dump(obj)
class Pickle4Reducer(AbstractReducer):
ForkingPickler = ForkingPickler4
register = ForkingPickler4.register
dump = dump
And then, in your main script you need to add the following:
import pickle4reducer
import multiprocessing as mp
ctx = mp.get_context()
ctx.reducer = pickle4reducer.Pickle4Reducer()
with mp.Pool(4) as p:
# do something
That will probably solve the problem of the overflow.
But, warning, you might consider reading this before doing anything or you might reach the same error as me:
'i' format requires -2147483648 <= number <= 2147483647
(the reason of this error is well explained in the link above). Long story short, multiprocessing send data through all its process using the pickle protocol, if you are already reaching the 4gb limit, that probably means that you might consider redefining your functions more as "void" methods rather than input/output methods. All this inbound/outbound data increase the RAM usage, is probably inefficient by construction (my case) and it might be better to point all process to the same object rather than create a new copy for each call.
hope this helps.
Supplementing answer from Pablo
The following problem can be resolved be Python3.8, if you are okay to use this version of python:
'i' format requires -2147483648 <= number <= 2147483647

Using pymodbus to read registers

I'm trying to read modbus registers from a PLC using pymodbus. I am following the example posted here. When I attempt print.registers I get the following error: object has no attribute 'registers'
The example doesn't show the modules being imported but seems to be the accepted answer. I think the error may be that I'm importing the wrong module or that I am missing a module. I am simply trying to read a register.
Here is my code:
from pymodbus.client.sync import ModbusTcpClient
c = ModbusTcpClient(host="192.168.1.20")
chk = c.read_holding_registers(257,10, unit = 1)
response = c.execute(chk)
print response.registers
From reading the pymodbus code, it appears that the read_holding_registers object's execute method will return either a response object or an ExceptionResponse object that contains an error. I would guess you're receiving the latter. You need to try something like this:
from pymodbus.register_read_message import ReadHoldingRegistersResponse
#...
response = c.execute(chk)
if isinstance(response, ReadHoldingRegistersResponse):
print response.registers
else:
pass # handle error condition here

paraview python script : Delete(renderView1) does not free memory

I am trying to load multiple state files by running a python script with python shell. The script is the following
#### import the simple module from the paraview
from paraview.simple import *
for time in range(0,200):
renderView1 = GetActiveViewOrCreate('RenderView')
# destroy renderView1
Delete(renderView1)
del renderView1
filename = 'filepath/filename-%s.pvsm' % time
servermanager.LoadState(filename)
renderView=SetActiveView(GetRenderView())
Render()
# get layout
viewLayout = GetLayout()
# save screenshot
SaveScreenshot('filepath/filename-%s.png' % time, layout=viewLayout, magnification=3, quality=100)
I was monitoring the memory used by machine, after some time steps the memory used by the machine is whole RAM. Can anybody tell me what I am doing wrong here.
I appreciate your time and support.
Thanks
Raj
As suggested by Utkarsh the following script solved my problem. For details see this answer.
--------------------------------------------------------------------------------
from paraview.simple import *
def ResetSession():
pxm = servermanager.ProxyManager()
pxm.UnRegisterProxies()
del pxm
Disconnect()
Connect()
for i in range(0, 10):
ResetSession()
servermanager.LoadState("/tmp/sample.pvsm")
renderView=SetActiveView(GetRenderView())
Render()
--------------------------------------------------------------------------------
Python is garbage-collected, which means that there are no guarantees that an object is actually removed from memory when you do 'del someBigObject'. In fact, doing 'del someObject' is not only pointless, it's considered bad style.
del only removes the binding between that identifier and the object. Thus del doesn't actually result in the object being deleted from memory.
See this for full details:
https://www.quora.com/Why-doesnt-Python-release-the-memory-when-I-delete-a-large-object

python MPI sendrecv() to pass a python object

I am trying to use mpi4py's sendrecv() to pass a dictionary obj.
from mpi4py import MPI
comm=MPI_COMM_WORLD
rnk=comm.Get_rank()
size=comm.Get_size()
idxdict={1:2}
buffer=None
comm.sendrecv(idxdict,dest=(rnk+1)%size,sendtag=rnk,recvobj=buffer,source=(rnk-1+size)%size,recvtag=(rnk-1+size)%size)
idxdict=buffer
If I print idxidctat the last step, I will get a bunch of "None"s, so the dictionary idxdict is not passed between cores. If I use a dictionary as buffer: buffer={}, then there is typeerror:TypeError: expected a writeable buffer object.
What did I do wrong? Many thanks for your help.
I believe the documentation is misleading here; sendrecv returns the received buffer, and doesn't use the receive object argument at all that I can see (at least in older versions, 1.2.x). So your above code doesn't work (although the receive does in fact happen), but the below does:
from mpi4py import MPI
comm=MPI.COMM_WORLD
rnk=comm.Get_rank()
size=comm.Get_size()
idxdict={1:2}
buffer = comm.sendrecv(sendobj=idxdict,dest=(rnk+1)%size,source=(rnk-1+size)%size)
print "idxdict = ", idxdict
print "buffer = ", buffer

Categories