Async ping with multiprocessing.pool

Async ping with multiprocessing.pool - python

I am trying to ping a few hundred devices in Python using multiprocessing.pool on Windows as part of a larger program. The responses are parsed into success cases and failure cases (i.e request times out, or the host is unreachable).
The code below works fine, however, another part of the program takes the average from the response and calculates a rolling average with previous results fetched from a database.
The rolling average function fails very occasionally on int(new_average) because the new_average passed in is a None type. Note that the rolling average function is only calculated in success cases.
I think the error must be in the parse function (this seems unlikely), or with how I am using multiprocessing.pool.
My question: am I using multiprocessing correctly? More generally, is there a better way to implement this asynchronous ping? I have looked at using Twisted, but I didn't see any ICMP protocol implementation (there is txnettools on GitHub, but I am not sure of the correctness of this, and it doesn't look maintained anymore).
A node object looks like:
class Node(object):
def __init__(self, ip):
self.ip = ip
self.ping_result = None
# other attributes and methods...
Here is the idea of the async ping code:
import os
from multiprocessing.pool import ThreadPool
def parse_ping_response(response):
'''Parses a response into a list of the format:
[ip_address, packets_lost, average, success_or_failure]
Ex: ['10.10.10.10', '0', '90', 'success']
Ex: [None, 5, None, 'failure']
'''
reply = re.compile("Reply\sfrom\s(.*?)\:")
lost = re.compile("Lost\s=\s(\d*)\s")
average = re.compile("Average\s=\s(\d+)m")
results = [x.findall(response) for x in [reply, lost, average]]
# Get reply, if it was found. Set [] to None.
results = [x[0] if len(x) > 0 else None for x in results]
# Check for host unreachable error.
# If we cannot find an ip address, the request timed out.
if results[0] is None:
return results + ['failure']
elif 'Destination host unreachable' in response:
return results + ['failure']
else:
return results + ['success']
def ping_ip(node):
ping = os.popen("ping -n 5 "+node.ip,"r")
node.ping_result = parse_ping_response(ping.read())
return
def run_ping_tests(nodelist):
pool = ThreadPool(processes=100)
pool.map(ping_ip, nodelist)
return
if __name__ == "__main__":
# nodelist is a list of node objects
run_ping_tests(nodelist)
An example ping response for reference (from the Microsoft docs):
Pinging 131.107.8.1 with 1450 bytes of data:
Reply from 131.107.8.1: bytes=1450 time<10ms TTL=32
Reply from 131.107.8.1: bytes=1450 time<10ms TTL=32
Ping statistics for 131.107.8.1:
Packets: Sent = 2, Received = 2, Lost = 0 (0% loss),
Approximate roundtrip times in milliseconds:
Minimum = 0ms, Maximum = 10ms, Average = 2ms

I recommend you to use gevent (http://www.gevent.org, asynchronous I/O library based on libev and greenlet coroutines) instead of multiprocessing.
It turns out there an implementation of ICMP for gevent:
https://github.com/mastahyeti/gping

Related

Zero simulation time

I am creating some kind of RAM memory. Idea was firstly to create RAM "write" functionality, as you can see in code below. Beside RAM memory, there is RAM model driver, which was suposed to write data to RAM (just to briefly verify if write functionality works properly).
RAM model driver and RAM model are connected to each other and some transaction should occur, but problem is that simulation is completed within zero simulation seconds.
Anybody has idea what could be a problem?
#gear
def ram_model(write_addr: Uint,
write_data: Queue['dtype'],*,
ram_mem = None,
dtype = b'dtype',
mem_granularity_in_bytes = 1) -> (Queue['dtype']):
if(ram_mem is None and type(ram_mem) is not dict):
ram_mem = {}
ram_write_op(write_addr = write_addr,
write_data = write_data,
ram_memory = ram_mem)
#gear
async def ram_write_op(write_addr: Uint,
write_data: Queue,*,
ram_memory = None,
mem_granularity_in_bytes = 1):
if(ram_memory is None and type(ram_mem) is not dict):
SystemError("Ram memory is %s but it should be dictionary",(type(ram_memory)))
byte_t = Array[Uint[8], mem_granularity_in_bytes]
async with write_addr as addr:
async for data, _ in write_data:
for b in code(data, byte_t):
ram_memory[addr] = b
addr += 1
#gear
async def ram_model_drv(*,addr_bus_width = b'asize',
data_type = b'dtype') -> (Uint[8], Queue['data_type']):
num_of_w_comnds = 15
matrix = np.random.randint(10, size = (num_of_w_comnds, 10))
for command_id in range(num_of_w_comnds):
for i in range(matrix[command_id].size):
yield (command_id, (matrix[command_id][i], i == matrix[command_id].size))
stimul = ram_model_drv(addr_bus_width = 8, data_type = Fixp[8,8])
out = ram_model(stimul[0], stimul[1])
sim()
Here is the output message:
python ram_model.py
- [INFO]: Running sim with seed: 3934280405122873233
0 [INFO]: -------------- Simulation start --------------
0 [INFO]: ----------- Simulation done ---------------
0 [INFO]: Elapsed: 0.00

Yeah, this one is a bit convoluted. Gist of the issue is that in the ram_model_drv module you are synchronously outputting data on both of its output interfaces with they yield statement. For PyGears this means that you would like data on both of these interfaces acknowledged before continuing. The ram_write_op module is connected to both of these interfaces via write_addr and write_data agruments. Inside that module, you acknowledge data from write_addr interface only after you've read multiple data from write_data interface, hence there's a deadlock and PyGears simulator detects that no further simulation steps are possible and exits at the end of the step 0.
There are also two additional issues in the driver:
It will never generate an eot for the output data Queue. Instead eot should be generated when i == matrix[command_id].size - 1.
The async modules are run in an endless loop by PyGears, so your ram_model_drv will generate the data endlessly unless you explicitly generate a GearDone exception.
OK, back to the main issue. There are several possibilities to circumvent it:
Use decoupling
For this to work, you first need to split data output in two yield statements, one for the write_addr and the other for the write_data, since your ram_write_op will use only one address per several write data.
#gear
async def ram_model_drv(*, addr_bus_width, data_type) -> (Uint[8], Queue['data_type']):
num_of_w_comnds = 15
matrix = np.random.randint(10, size=(num_of_w_comnds, 10))
for command_id in range(num_of_w_comnds):
yield (command_id, None)
for i in range(matrix[command_id].size):
yield (None, (matrix[command_id][i], i == matrix[command_id].size - 1))
raise GearDone
You can use either dreg or decouple modules to temporarily store output data from ram_model_drv before they are consumed by ram_write_op.
out = ram_model(stimul[0] | decouple, stimul[1] | decouple)
Split driver into two modules, one driving each of the two interfaces
Use low level synchronization API for interfaces
Underneath the yield mechanism, there is a lower level API for communicating via PyGears interfaces. Handles to output interfaces can be obtained via module().dout field. Data can be sent via interface without waiting for it to be acknowledged using put_nb() method. Later, in order to wait for the aknowledgment, ready() method can be awaited. Finally, put() method combines the two in one call, so it will both send the data and wait for the acknowledgement.
#gear
async def ram_model_drv(*,
addr_bus_width=b'asize',
data_type=b'dtype') -> (Uint[8], Queue['data_type']):
addr, data = module().dout
num_of_w_comnds = 15
matrix = np.random.randint(10, size=(num_of_w_comnds, 10))
for command_id in range(num_of_w_comnds):
addr.put_nb(command_id)
for i in range(matrix[command_id].size):
await data.put((matrix[command_id][i], i == matrix[command_id].size - 1))
await addr.ready()
raise GearDone

unable to unpack information between custom Preamble in Python and telnetlib

I have an industrial sensor which provides me information via telnet over port 10001.
It has a Data Format as follows:
Also the manual:
All the measuring values are transmitted int32 or uint32 or float depending on the sensors
Code
import telnetlib
import struct
import time
# IP Address, Port, timeout for Telnet
tn = telnetlib.Telnet("169.254.168.150", 10001, 10)
while True:
op = tn.read_eager() # currently read information limit this till preamble
print(op[::-1]) # make little-endian
if not len(op[::-1]) == 0: # initially an empty bit starts (b'')
data = struct.unpack('!4c', op[::-1]) # unpacking `MEAS`
time.sleep(0.1)
my initial attempt:
Connect to the sensor
read data
make it to little-endian
OUTPUT
b''
b'MEAS\x85\x8c\x8c\x07\xa7\x9d\x01\x0c\x15\x04\xf6MEAS'
b'\x04\xf6MEAS\x86\x8c\x8c\x07\xa7\x9e\x01\x0c\x15\x04\xf6'
b'\x15\x04\xf6MEAS\x85\x8c\x8c\x07\xa7\x9f\x01\x0c\x15'
b'\x15\x04\xf6MEAS\x87\x8c\x8c\x07\xa7\xa0\x01\x0c'
b'\xa7\xa2\x01\x0c\x15\x04\xf6MEAS\x87\x8c\x8c\x07\xa7\xa1\x01\x0c'
b'\x8c\x07\xa7\xa3\x01\x0c\x15\x04\xf6MEAS\x87\x8c\x8c\x07'
b'\x88\x8c\x8c\x07\xa7\xa4\x01\x0c\x15\x04\xf6MEAS\x88\x8c'
b'MEAS\x8b\x8c\x8c\x07\xa7\xa5\x01\x0c\x15\x04\xf6MEAS'
b'\x04\xf6MEAS\x8b\x8c\x8c\x07\xa7\xa6\x01\x0c\x15\x04\xf6'
b'\x15\x04\xf6MEAS\x8a\x8c\x8c\x07\xa7\xa7\x01\x0c\x15'
b'\x15\x04\xf6MEAS\x88\x8c\x8c\x07\xa7\xa8\x01\x0c'
b'\x01\x0c\x15\x04\xf6MEAS\x88\x8c\x8c\x07\xa7\xa9\x01\x0c'
b'\x8c\x07\xa7\xab\x01\x0c\x15\x04\xf6MEAS\x8b\x8c\x8c\x07\xa7\xaa'
b'\x8c\x8c\x07\xa7\xac\x01\x0c\x15\x04\xf6MEAS\x8c\x8c'
b'AS\x89\x8c\x8c\x07\xa7\xad\x01\x0c\x15\x04\xf6MEAS\x8a'
b'MEAS\x88\x8c\x8c\x07\xa7\xae\x01\x0c\x15\x04\xf6ME'
b'\x15\x04\xf6MEAS\x87\x8c\x8c\x07\xa7\xaf\x01\x0c\x15\x04\xf6'
b'\x15\x04\xf6MEAS\x8a\x8c\x8c\x07\xa7\xb0\x01\x0c'
b'\x0c\x15\x04\xf6MEAS\x8a\x8c\x8c\x07\xa7\xb1\x01\x0c'
b'\x07\xa7\xb3\x01\x0c\x15\x04\xf6MEAS\x89\x8c\x8c\x07\xa7\xb2\x01'
b'\x8c\x8c\x07\xa7\xb4\x01\x0c\x15\x04\xf6MEAS\x89\x8c\x8c'
b'\x85\x8c\x8c\x07\xa7\xb5\x01\x0c\x15\x04\xf6MEAS\x84'
b'MEAS\x87\x8c\x8c\x07\xa7\xb6\x01\x0c\x15\x04\xf6MEAS'
b'\x04\xf6MEAS\x8b\x8c\x8c\x07\xa7\xb7\x01\x0c\x15\x04\xf6'
b'\x15\x04\xf6MEAS\x8b\x8c\x8c\x07\xa7\xb8\x01\x0c\x15'
b'\x15\x04\xf6MEAS\x8a\x8c\x8c\x07\xa7\xb9\x01\x0c'
b'\xa7\xbb\x01\x0c\x15\x04\xf6MEAS\x87\x8c\x8c\x07\xa7\xba\x01\x0c'
try to unpack the preamble !?
How do I read information like Article number, Serial number, Channel, Status, Measuring Value between the preamble?
The payload size seems to be fixed here for 22 Bytes (via Wireshark)

Parsing the reversed buffer is just weird; please use struct's support for endianess. Using big-endian '!' in a little-endian context is also odd.
The first four bytes are a text constant. Ok, fine perhaps you'll need to reverse those. But just those, please.
After that, use struct.unpack to parse out 'IIQI'. So far, that was kind of working OK with your approach, since all fields consume 4 bytes or a pair of 4 bytes. But finding frame M's length is the fly in the ointment since it is just 2 bytes, so parse it with 'H', giving you a combined 'IIQIH'. After that, you'll need to advance by only that many bytes, and then expect another 'MEAS' text constant once you've exhausted that set of measurements.

I managed to avoid TelnetLib altogether and created a tcp client using python3. I had the payload size already from my wireshark dump (22 Bytes) hence I keep receiving 22 bytes of Information. Apparently the module sends two distinct 22 Bytes payload
First (frame) payload has the preamble, serial, article, channel information
Second (frame) payload has the information like bytes per frame, measuring value counter, measuring value Channel 1, measuring value Channel 2, measuring value Channel 3
The information is in int32 and thus needs a formula to be converted to real readings (mentioned in the instruction manual)
(as mentioned by #J_H the unpacking was as He mentioned in his answer with small changes)
Code
import socket
import time
import struct
DRANGEMIN = 3261
DRANGEMAX = 15853
MEASRANGE = 50
OFFSET = 35
# Create a TCP/IP socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_address = ('169.254.168.150', 10001)
print('connecting to %s port %s' % server_address)
sock.connect(server_address)
def value_mm(raw_val):
return (((raw_val - DRANGEMIN) * MEASRANGE) / (DRANGEMAX - DRANGEMIN) + OFFSET)
if __name__ == '__main__':
while True:
Laser_Value = 0
data = sock.recv(22)
preamble, article, serial, x1, x2 = struct.unpack('<4sIIQH', data)
if not preamble == b'SAEM':
status, bpf, mValCounter, CH1, CH2, CH3 = struct.unpack('<hIIIII',data)
#print(CH1, CH2, CH3)
Laser_Value = CH3
print(str(value_mm(Laser_Value)) + " mm")
#print('RAW: ' + str(len(data)))
print('\n')
#time.sleep(0.1)
Sure enough, this provides me the information that is needed and I compared the information via the propreitary software which the company provides.

Implement realtime signal processing in Python - how to capture audio continuously?

I'm planning to implement a "DSP-like" signal processor in Python. It should capture small fragments of audio via ALSA, process them, then play them back via ALSA.
To get things started, I wrote the following (very simple) code.
import alsaaudio
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NORMAL)
inp.setchannels(1)
inp.setrate(96000)
inp.setformat(alsaaudio.PCM_FORMAT_U32_LE)
inp.setperiodsize(1920)
outp = alsaaudio.PCM(alsaaudio.PCM_PLAYBACK, alsaaudio.PCM_NORMAL)
outp.setchannels(1)
outp.setrate(96000)
outp.setformat(alsaaudio.PCM_FORMAT_U32_LE)
outp.setperiodsize(1920)
while True:
l, data = inp.read()
# TODO: Perform some processing.
outp.write(data)
The problem is, that the audio "stutters" and is not gapless. I tried experimenting with the PCM mode, setting it to either PCM_ASYNC or PCM_NONBLOCK, but the problem remains. I think the problem is that samples "between" two subsequent calls to "inp.read()" are lost.
Is there a way to capture audio "continuously" in Python (preferably without the need for too "specific"/"non-standard" libraries)? I'd like the signal to always get captured "in the background" into some buffer, from which I can read some "momentary state", while audio is further being captured into the buffer even during the time, when I perform my read operations. How can I achieve this?
Even if I use a dedicated process/thread to capture the audio, this process/thread will always at least have to (1) read audio from the source, (2) then put it into some buffer (from which the "signal processing" process/thread then reads). These two operations will therefore still be sequential in time and thus samples will get lost. How do I avoid this?
Thanks a lot for your advice!
EDIT 2: Now I have it running.
import alsaaudio
from multiprocessing import Process, Queue
import numpy as np
import struct
"""
A class implementing buffered audio I/O.
"""
class Audio:
"""
Initialize the audio buffer.
"""
def __init__(self):
#self.__rate = 96000
self.__rate = 8000
self.__stride = 4
self.__pre_post = 4
self.__read_queue = Queue()
self.__write_queue = Queue()
"""
Reads audio from an ALSA audio device into the read queue.
Supposed to run in its own process.
"""
def __read(self):
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NORMAL)
inp.setchannels(1)
inp.setrate(self.__rate)
inp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
inp.setperiodsize(self.__rate / 50)
while True:
_, data = inp.read()
self.__read_queue.put(data)
"""
Writes audio to an ALSA audio device from the write queue.
Supposed to run in its own process.
"""
def __write(self):
outp = alsaaudio.PCM(alsaaudio.PCM_PLAYBACK, alsaaudio.PCM_NORMAL)
outp.setchannels(1)
outp.setrate(self.__rate)
outp.setformat(alsaaudio.PCM_FORMAT_U32_BE)
outp.setperiodsize(self.__rate / 50)
while True:
data = self.__write_queue.get()
outp.write(data)
"""
Pre-post data into the output buffer to avoid buffer underrun.
"""
def __pre_post_data(self):
zeros = np.zeros(self.__rate / 50, dtype = np.uint32)
for i in range(0, self.__pre_post):
self.__write_queue.put(zeros)
"""
Runs the read and write processes.
"""
def run(self):
self.__pre_post_data()
read_process = Process(target = self.__read)
write_process = Process(target = self.__write)
read_process.start()
write_process.start()
"""
Reads audio samples from the queue captured from the reading thread.
"""
def read(self):
return self.__read_queue.get()
"""
Writes audio samples to the queue to be played by the writing thread.
"""
def write(self, data):
self.__write_queue.put(data)
"""
Pseudonymize the audio samples from a binary string into an array of integers.
"""
def pseudonymize(self, s):
return struct.unpack(">" + ("I" * (len(s) / self.__stride)), s)
"""
Depseudonymize the audio samples from an array of integers into a binary string.
"""
def depseudonymize(self, a):
s = ""
for elem in a:
s += struct.pack(">I", elem)
return s
"""
Normalize the audio samples from an array of integers into an array of floats with unity level.
"""
def normalize(self, data, max_val):
data = np.array(data)
bias = int(0.5 * max_val)
fac = 1.0 / (0.5 * max_val)
data = fac * (data - bias)
return data
"""
Denormalize the data from an array of floats with unity level into an array of integers.
"""
def denormalize(self, data, max_val):
bias = int(0.5 * max_val)
fac = 0.5 * max_val
data = np.array(data)
data = (fac * data).astype(np.int64) + bias
return data
debug = True
audio = Audio()
audio.run()
while True:
data = audio.read()
pdata = audio.pseudonymize(data)
if debug:
print "[PRE-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))
ndata = audio.normalize(pdata, 0xffffffff)
if debug:
print "[PRE-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))
print "[PRE-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))
#ndata += 0.01 # When I comment in this line, it wreaks complete havoc!
if debug:
print "[POST-NORMALIZED] Level: " + str(int(10.0 * np.log10(np.max(np.absolute(ndata)))))
print "[POST-NORMALIZED] Min: " + str(np.min(ndata)) + ", Max: " + str(np.max(ndata))
pdata = audio.denormalize(ndata, 0xffffffff)
if debug:
print "[POST-PSEUDONYMIZED] Min: " + str(np.min(pdata)) + ", Max: " + str(np.max(pdata))
print ""
data = audio.depseudonymize(pdata)
audio.write(data)
However, when I even perform the slightest modification to the audio data (e. g. comment that line in), I get a lot of noise and extreme distortion at the output. It seems like I don't handle the PCM data correctly. The strange thing is that the output of the "level meter", etc. all appears to make sense. However, the output is completely distorted (but continuous) when I offset it just slightly.
EDIT 3: I just found out that my algorithms (not included here) work when I apply them to wave files. So the problem really appears to actually boil down to the ALSA API.
EDIT 4: I finally found the problems. They were the following.
1st - ALSA quietly "fell back" to PCM_FORMAT_U8_LE upon requesting PCM_FORMAT_U32_LE, thus I interpreted the data incorrectly by assuming that each sample was 4 bytes wide. It works when I request PCM_FORMAT_S32_LE.
2nd - The ALSA output seems to expect period size in bytes, even though they explicitely state that it is expected in frames in the specification. So you have to set the period size four times as high for output if you use 32 bit sample depth.
3rd - Even in Python (where there is a "global interpreter lock"), processes are slow compared to Threads. You can get latency down a lot by changing to threads, since the I/O threads basically don't do anything that's computationally intensive.

When you
read one chunk of data,
write one chunk of data,
then wait for the second chunk of data to be read,
then the buffer of the output device will become empty if the second chunk is not shorter than the first chunk.
You should fill up the output device's buffer with silence before starting the actual processing. Then small delays in either the input or output processing will not matter.

You can do that all manually, as #CL recommend in his/her answer, but I'd recommend just using
GNU Radio instead:
It's a framework that takes care of doing all the "getting small chunks of samples in and out your algorithm"; it scales very well, and you can write your signal processing either in Python or C++.
In fact, it comes with an Audio Source and an Audio Sink that directly talk to ALSA and just give/take continuous samples. I'd recommend reading through GNU Radio's Guided Tutorials; they explain exactly what is necessary to do your signal processing for an audio application.
A really minimal flow graph would look like:
You can substitute the high pass filter for your own signal processing block, or use any combination of the existing blocks.
There's helpful things like file and wav file sinks and sources, filters, resamplers, amplifiers (ok, multipliers), …

I finally found the problems. They were the following.
1st - ALSA quietly "fell back" to PCM_FORMAT_U8_LE upon requesting PCM_FORMAT_U32_LE, thus I interpreted the data incorrectly by assuming that each sample was 4 bytes wide. It works when I request PCM_FORMAT_S32_LE.
2nd - The ALSA output seems to expect period size in bytes, even though they explicitely state that it is expected in frames in the specification. So you have to set the period size four times as high for output if you use 32 bit sample depth.
3rd - Even in Python (where there is a "global interpreter lock"), processes are slow compared to Threads. You can get latency down a lot by changing to threads, since the I/O threads basically don't do anything that's computationally intensive.
Audio is gapless and undistorted now, but latency is far too high.

Python stop multiple process when one returns a result?

I am trying to write a simple proof-of-work nonce-finder in python.
def proof_of_work(b, nBytes):
nonce = 0
# while the first nBytes of hash(b + nonce) are not 0
while sha256(b + uint2bytes(nonce))[:nBytes] != bytes(nBytes):
nonce = nonce + 1
return nonce
Now I am trying to do this multiprocessed, so it can use all CPU cores and find the nonce faster. My idea is to use multiprocessing.Pool and execute the function proof_of_work multiple times, passing two params num_of_cpus_running and this_cpu_id like so:
def proof_of_work(b, nBytes, num_of_cpus_running, this_cpu_id):
nonce = this_cpu_id
while sha256(b + uint2bytes(nonce))[:nBytes] != bytes(nBytes):
nonce = nonce + num_of_cpus_running
return nonce
So, if there are 4 cores, every one will calculate nonces like this:
core 0: 0, 4, 8, 16, 32 ...
core 1: 1, 5, 9, 17, 33 ...
core 2: 2, 6, 10, 18, 34 ...
core 3: 3, 7, 15, 31, 38 ...
So, I have to rewrite proof_of_work so when anyone of the processes finds a nonce, everyone else stops looking for nonces, taking into account that the found nonce has to be the lowest value possible for which the required bytes are 0. If a CPU speeds up for some reason, and returns a valid nonce higher than the lowest valid nonce, then the proof of work is not valid.
The only thing I don't know how to do is the part in which a process A will only stop if process B found a nonce that is lower than the nonce that is being calculated right now by process A. If its higher, A keeps calculating (just in case) until it arrives to the nonce provided by B.
I hope I explained myself correctly. Also, if there is a faster implementation of anything I wrote, I would love to hear about it. Thank you very much!

One easy option is to use micro-batches and check if an answer was found. Too small batches incur overhead from starting parallel jobs, too large size causes other processes to do extra work while one process already found an answer. Each batch should take 1 - 10 seconds to be efficient.
Sample code:
from multiprocessing import Pool
from hashlib import sha256
from time import time
def find_solution(args):
salt, nBytes, nonce_range = args
target = '0' * nBytes
for nonce in xrange(nonce_range[0], nonce_range[1]):
result = sha256(salt + str(nonce)).hexdigest()
#print('%s %s vs %s' % (result, result[:nBytes], target)); sleep(0.1)
if result[:nBytes] == target:
return (nonce, result)
return None
def proof_of_work(salt, nBytes):
n_processes = 8
batch_size = int(2.5e5)
pool = Pool(n_processes)
nonce = 0
while True:
nonce_ranges = [
(nonce + i * batch_size, nonce + (i+1) * batch_size)
for i in range(n_processes)
]
params = [
(salt, nBytes, nonce_range) for nonce_range in nonce_ranges
]
# Single-process search:
#solutions = map(find_solution, params)
# Multi-process search:
solutions = pool.map(find_solution, params)
print('Searched %d to %d' % (nonce_ranges[0][0], nonce_ranges[-1][1]-1))
# Find non-None results
solutions = filter(None, solutions)
if solutions:
return solutions
nonce += n_processes * batch_size
if __name__ == '__main__':
start = time()
solutions = proof_of_work('abc', 6)
print('\n'.join('%d => %s' % s for s in solutions))
print('Solution found in %.3f seconds' % (time() - start))
Output (a laptop with Core i7):
Searched 0 to 1999999
Searched 2000000 to 3999999
Searched 4000000 to 5999999
Searched 6000000 to 7999999
Searched 8000000 to 9999999
Searched 10000000 to 11999999
Searched 12000000 to 13999999
Searched 14000000 to 15999999
Searched 16000000 to 17999999
Searched 18000000 to 19999999
Searched 20000000 to 21999999
Searched 22000000 to 23999999
Searched 24000000 to 25999999
Searched 26000000 to 27999999
Searched 28000000 to 29999999
Searched 30000000 to 31999999
Searched 32000000 to 33999999
Searched 34000000 to 35999999
Searched 36000000 to 37999999
37196346 => 000000f4c9aee9d427dc94316fd49192a07f1aeca52f6b7c3bb76be10c5adf4d
Solution found in 20.536 seconds
With single core it took 76.468 seconds. Anyway this isn't by far the most efficient way to find a solution but it works. For example if the salt is long then the SHA-256 state could be pre-computed after the salt has been absorbed and continue brute-force search from there. Also byte array could be more efficient than the hexdigest().

A general method to do this is to:
think of work packets, e.g. to perform the calculation for a particular range, a range should not take long, say 0.1 seconds to a second
have some manager distribute the work packets to the worker
after a work packet has been concluded, tell the manager the result and request a new work packet
if the work is done and a result has been found accept the results from workers and give them a signal that no more work is to be performed - the workers can now safely terminate
This way you don't have to check with the manager each iteration (which would slow down everything), or do nasty things like stopping a thread mid-session. Needless to say, the manager needs to be thread safe.
This fits perfectly with your model, as you still need the results of the other workers, even if a result has been found.
Note that in your model, it could be that a thread may go out of sync with the other threads, lagging behind. You don't want to do another million calculations once a result is found. I'm just reiterating this from the question because I think the model is wrong. You should fix the model instead of fixing the implementation.

You can use multiprocessing.Queue(). Have a Queue per CPU/process. When a process finds a nonce, it puts it on the queue of other processes. Other processes check their queue (non-blocking) in each iteration of the while loop and if there is anything on it, they decide to continue or terminate based on the value in the queue:
def proof_of_work(b, nBytes, num_of_cpus_running, this_cpu_id, qSelf, qOthers):
nonce = this_cpu_id
while sha256(b + uint2bytes(nonce))[:nBytes] != bytes(nBytes):
nonce = nonce + num_of_cpus_running
try:
otherNonce = qSelf.get(block=False)
if otherNonce < nonce:
return
except:
pass
for q in qOthers:
q.put(nonce)
return nonce
qOthers is a list of queues ( each queue=multiprocessing.Queue() ) belonging to other processes.
If you decide to use queues as I suggested, you should be able to write a better/nicer implementation of above approach.

I like to improve NikoNyrh's answer by changing pool.map to pool.imap_unordered. Using imap_unordered will return the result immediately from any of the workers without waiting for all of them to be completed. So once any of the results returns the tuple, we can exit the while loop.
def proof_of_work(salt, nBytes):
n_processes = 8
batch_size = int(2.5e5)
with Pool(n_processes) as pool:
nonce = 0
while True:
nonce_ranges = [
(nonce + i * batch_size, nonce + (i+1) * batch_size)
for i in range(n_processes)
]
params = [
(salt, nBytes, nonce_range) for nonce_range in nonce_ranges
]
print('Searched %d to %d' % (nonce_ranges[0][0], nonce_ranges[-1][1]-1))
for result in pool.imap_unordered(find_solution, params):
if isinstance(result,tuple): return result
nonce += n_processes * batch_size

Scapy how get ping time?

I'm trying to write a scapy script which can make an average on the ping time, so I need to get the time elapsed between ICMP echo/reply packet sent and reply packet received. For now, I have this:
#! /usr/bin/env python
from scapy.all import *
from time import *
def QoS_ping(host, count=3):
packet = Ether()/IP(dst=host)/ICMP()
t=0.0
for x in range(count):
t1=time()
ans=srp(packet,iface="eth0", verbose=0)
t2=time()
t+=t2-t1
return (t/count)*1000
The problem is that using time() function doesn't rise a good result. For example, I find 134 ms on one domain, and with the ping system function on the same domain, I have found 30 ms (average of course).
My question is: Is there a way to get the exactly time elpased beetween sent packet and received packet by scapy?
I don't want to use popen() function or other system call because I need scapy for futur packet management.

Is there a way to get the exactly time elpased beetween sent packet and received packet by scapy?
You can use pak.time and pak.sent_time
I modified your script to use them...
import statistics
import os
from scapy.all import Ether, IP, ICMP, srp
if os.geteuid() > 0:
raise OSError("This script must run as root")
ping_rtt_list = list()
def ping_addr(host, count=3):
packet = Ether()/IP(dst=host)/ICMP()
t=0.0
for x in range(count):
x += 1 # Start with x = 1 (not zero)
ans, unans = srp(packet, iface="eth0", filter='icmp', verbose=0)
rx = ans[0][1]
tx = ans[0][0]
delta = rx.time - tx.sent_time
print("ping #{0} rtt: {1} second".format(x, round(delta, 6)))
ping_rtt_list.append(round(delta, 6))
return ping_rtt_list
if __name__=="__main__":
ping_rtt_list = ping_addr('172.16.15.1')
rtt_avg = round(statistics.mean(ping_rtt_list), 6)
print("Avg ping rtt (seconds):", rtt_avg)
An example run:
$ sudo /opt/virtual_env/py37_test/bin/python ./ping_w_scapy.py
ping #1 rtt: 0.002019 second
ping #2 rtt: 0.002347 second
ping #3 rtt: 0.001807 second
Avg ping rtt (seconds): 0.002058

BTW, using unbuffered python (python -u to start it) increases the timing accuracy as python is not waiting for the buffers to decide to dump. Using your above script, it changed my results from being off by 0.4 ms to being off by 0.1-ish.

Mike, just a small fix in order to get the average time, change:
print "Average %0.3f" % float(match.group(1))
to:
print "Average %0.3f" % float(match.group(2))
since (match.group(1)) will get the min time and not the avg as mentioned.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Async ping with multiprocessing.pool - python

I recommend you to use gevent (http://www.gevent.org, asynchronous I/O library based on libev and greenlet coroutines) instead of multiprocessing. It turns out there an implementation of ICMP for gevent: https://github.com/mastahyeti/gping

Related

Zero simulation time

unable to unpack information between custom Preamble in Python and telnetlib

Implement realtime signal processing in Python - how to capture audio continuously?

Python stop multiple process when one returns a result?

Scapy how get ping time?

Categories

Resources