Sending Sparse Files - python

I've been messing around with sockets in Python and I'd like to be able to send a sparse image file from one machine to another. As expected, sending a sparse file over a python socket doesn't preserve the sparseness of the file. I'd like to do a sparse tar and send it that way, but I just can't figure it out.
The tarfile module says it supports reading sparse files with the GNU format which doesn't help me for creating them... but the python docs say the Pax format has "virtually no limits". I'm not sure if that means I can create an archive and preserve the sparse file or not using the pax format... I've been trying but I just have no idea how it might work.
If this solution isn't an option, is there any other way to send a sparse file over a socket? I hate to have to call 'tar -xSf' via a system command from my application...
Thanks,
Server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
s.bind((socket.gethostname(), 50001))
s.listen(1)
img = open('test.img', 'rb')
client, addr = s.accept()
l = img.read(8192)
while(l):
client.send(l)
l = img.read(8192)
img.close()
s.close()
Client
host = ''
port = 50001
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
s.connect((host, port))
img = open('./newimg.img', 'wb')
l = s.recv(8192)
while(l):
img.write(l)
l = s.recv(8192)
img.close()
s.close()
On the server, I make a new sparse file: truncate -s 1G test.img
a du -h shows: 0 test.img
I run my server and client. Here is a du -h on the transferred file: 1.0G newimg.img
As you can see, it expands the file and it is no longer sparse.

Holes in files are normally created if you write to the beginning of a file, seek to the end and write there. If you read the file you are reading zeros even if there are holes in the file. When you send the files the literal bytes are sent and of course also read. When you then write the bytes all bytes will be written and it will not happen that the holes are created by the filesystem.
To mitigate that you can first seek the holes in the file, sent where they are, and then send the rest of the file.
The following is not polished but should give you a starting point.
import os
f = open(path, "b")
fd = f.fileno()
end = os.stat(fd).st_size
holes = []
offset = os.lseek(fd, 0, os.SEEK_HOLE)
while offset != end:
end_hole = os.lseek(fd, offset, os.SEEK_DATA)
holes.append((offset, end_hole))
offset = end_hole
[open socket and stuff]
# send the holes
socket.write(json.dumps(holes)) # encode appropriately
# send file
f.seek(0)
total = 0
for hole in holes:
while total < hole[0]:
l = f.read(8192)
if len(l) + total > hole[0]:
socket.write(l[:len(l) + total - hole[0]])
l.seek(hole[1])
total += len(1) + total - hole[0]
else:
socket.write(l)
total += len(l)
Then on the client side:
still_json = True
a = []
l = s.recv(8192)
while(still_json):
a.append(l)
if check_json_end(l):
still_json = False
else:
l = s.recv(8192)
holes = parse_json(a) # the last chunk can contain something that is not json
# I asume that a still contains the bytes that are not json
fout = open(outfile, "wb")
total = 0
fout.write(a[0]) # handle the case where the first rest after the json in a is already after a hole
total += len(a[0])
for hole in holes:
while total < hole[0]:
l = socket.recv(8192)
if len(l) + total > hole[0]:
fout.write(l[:len(l) + total - hole[0]])
fout.seek(hole[1])
fout.write(l[len(l) + total - hole[0]:])
else:
fout.write(l)
total += len(l)
There are probably lots of bugs in it and you should rethink each line, but the general principle should be alright. JSON is of course arbitrarily chosen, there are probably other protocols that are better in that case. You could also create your own.

Related

python TCP how many bytes to read

I want to send a filename from 1 machine to another. Would I need to find out exactly how big the string is and read exactly that many bytes? Right now I'm reading an arbitrary amount of bytes (200) that I know is bigger than the string. Is this a problem? I want to send more commands in the future too is reading more bytes than nessesary going to mess that up?
send code:
filenameToSend = 'images/capture' + str(captureNumber) + '-' + str(pairNumber + 1) + '-' + 'IMAGEINPAIRNUMBER' + '-' + currentTime + '.jpg'
# send message to capture image pair on slave
sock.sendto(filenameToSend.encode('utf-8'), (slave1_TCP_IP, TCP_PORT))
recieve code:
c, addr = sock.accept()
data = c.recv(200)
command = data.decode('utf-8')

Python (pySerial): cannot trim/remove CR, LF or unexpected characters from string

I am using an Arduino to output Temp & Hum data from a sensor, this is being read by a PC running Python using pySerial. The data is reading in correctly but I would like to remove the CR/LF and unexpected characters. One idea I found on this site was to use lstrip or lreplace but they do not seem to work correctly. They will remove one instance of the character but even repeating the line or making a small loop has no effect.
This is what the program Prints (Bottom line is string after I've tried to cut out the unnecessary characters):
[b'\n', b'\r\n', b'Read sensor: OK\r\n', b'Hum40.00\r\n', b'TempC18.00\r\n']
[" b'Hum40.00\r\n'", " b'TempC18.00\r\n']"]
I am aiming for it to read:
[Hum40.00, TempC18.00]
I can hopefully fine tune the message later.
This is the code:
import serial as ser
import time
count = 0
msgArray = []
saveMsg = []
ser = ser.Serial('COM16', 9600, timeout=1, parity='N', stopbits=1, bytesize=8, xonxoff=0) # Setting up and opening COM port
ser.close()
ser.open()
def readSerial(): #reads a whole line from COM port
serLine = ser.readline()
return serLine
def sveMsgCut(): #saves the buffer as a message then cuts message
cutMsg = saveMsg
words = cutMsg.split(',')
return words
while True: #main program
dataSerial = readSerial()
if count < 5: #reads COM port 5 times and passes along to buffer msgArray
msgArray.append(dataSerial)
count = count+1
else:
print(msgArray) #~display msgArray
saveMsg = str(msgArray) #convert to string
splitMsg = saveMsg.split(',') #splits string (csv)
phrase = splitMsg[3:5] #cuts out excess either side of Temp & Hum/
phraseString = str(phrase)
phraseNew = phraseString.lstrip("/n") #an attempts ot remove CR
print(phraseNew) #~print adjusted string
saveMsg = msgArray
count = 0 #resets msgArray and counter
msgArray = []
time.sleep(5)
I am fairly new to programming, especially Python so it may be something simple that I've missed but have tried several different ideas and cannot remove the extra characters.
not sure why rstrip/lstrip do not work for you.
this code runs as expected on my machine:
s = '\r\nHum40.00\r\n'
print (s.rstrip().lstrip())
The only difference I see is the "/n" parameter so instead try:
phraseNew = phraseString.lstrip()
Decided to go about this another way. Placed the raw serial data into a list and extracted the temp and humidity readings then joined them together as a string:
else:
print(msgArray) #Raw serial data
msgString = str(msgArray) #convert serial data to str
character = list(msgString) #str to list
# the next two lines extract the temp & hum readings then converts
them to their respective strings
humidity = ''.join(character[46:51])
temperature = ''.join(character[65:70])
print('H:' + (humidity) + ' T:' + (temperature))
The output now looks like this (Raw then processed data):
[b'\n', b'\r\n', b'Read sensor: OK\r\n', b'Hum40.00\r\n', b'TempC21.00\r\n']
H:40.00 T:21.00
Now onto the next stage.
One thought:
So what you have now is:
"b'Hum40.00\r\n'"
Now what I see here is that the ' symbol is now a part of the string. This means that ' is the last symbol instead of \r\n. I have found success with starting on the outside are working inwards. Let's first start by removing the b. To do this try:
x = str(ser.readline())
x = x.lstrip("b")
Now we should see this:
"'Hum40.00\r\n'"
Next remove the '
x = x.strip("'")
Now we see this:
"Hum40.00\r\n"
Now the tricky part here is deleting the \r\n. There is a post here:Can't delete "\r\n" from a string
that explains why:
x = x.rstrip("\r\n")
will not work.
Instead type:
x = x.rstrip("\\r\\n")
Final code:
x = str(ser.readline())
x = x.lstrip("b")
x = x.strip("'")
x = x.rstrip("\\r\\n")
print(x)
Should yeild:
"Hum40.00"

How can I generate all possible IPs from a CIDR list in Python?

Let's say I have a text file contains a bunch of cidr ip ranges like this:
x.x.x.x/24
x.x.x.x/24
x.x.x.x/23
x.x.x.x/23
x.x.x.x/22
x.x.x.x/22
x.x.x.x/21
and goes on...
How can I convert these cidr notations to all possible ip list in a new text file in Python?
You can use netaddr for this. The code below will create a file on your disk and fill it with every ip address in the requested block:
from netaddr import *
f = open("everyip.txt", "w")
ip = IPNetwork('10.0.0.0/8')
for addr in ip:
f.write(str(addr) + '\n')
f.close()
If you don't need the satisfaction of writing your script from scratch, you could use the python cidrize package.
based off How can I generate all possible IPs from a list of ip ranges in Python?
import struct, socket
def ips(start, end):
start = struct.unpack('>I', socket.inet_aton(start))[0]
end = struct.unpack('>I', socket.inet_aton(end))[0]
return [socket.inet_ntoa(struct.pack('>I', i)) for i in range(start, end)]
# ip/CIDR
ip = '012.123.234.34'
CIDR = 10
i = struct.unpack('>I', socket.inet_aton(ip))[0] # number
# 175893026
start = (i >> CIDR) << CIDR # shift right end left to make 0 bits
end = i | ((1 << CIDR) - 1) # or with 11111 to make highest number
start = socket.inet_ntoa(struct.pack('>I', start)) # real ip address
end = socket.inet_ntoa(struct.pack('>I', end))
ips(start, end)

Read Wireshark dump file for packet times

My instructions were to read the wireshark.bin data file dumped from the Wireshark program and pick out the packet times. I have no idea how to skip the header and find the first time.
"""
reads the wireshark.bin data file dumped from the wireshark program
"""
from datetime import datetime
import struct
import datetime
#file = r"V:\workspace\Python3_Homework08\src\wireshark.bin"
file = open("wireshark.bin", "rb")
idList = [ ]
with open("wireshark.bin", "rb") as f:
while True:
bytes_read = file.read(struct.calcsize("=l"))
if not bytes_read:
break
else:
if len(bytes_read) > 3:
idList.append(struct.unpack("=l", bytes_read)[0])
o = struct.unpack("=l111", bytes_read)[0]
print( datetime.date.fromtimestamp(o))
Try reading the entire file at once, and then accessing it as a list:
data = open("wireshark.bin", "rb").read() # let Python automatically close file
magic = data[:4] # magic wireshark number (also reveals byte order)
gmt_correction = data[8:12] # GMT offset
data = data[24:] # actual packets
Now you can loop through data in (16?) byte size chunks, looking at the appropriate offset in each chunk for the timestamp.
The magic number is 0xa1b2c3d4, which takes four bytes, or two words. We can determine the order (big-endian or little-endian) by examining those first four bytes by using the struct module:
magic = struct.unpack('>L', data[0:4])[0] # before the data = data[24:] line above
if magic == 0xa1b2c3d4:
order = '>'
elif magic == 0xd4c3b2a1:
order = '<'
else:
raise NotWireSharkFile()
Now that we have the order (and know it's a wireshark file), we can loop through the packets:
field0, field1, field2, field3 = \
struct.unpack('%sllll' % order, data[:16])
payload = data[16 : 16 + field?]
data = data[16 + field?]
I left the names vague, since this is homework, but those field? names represent the information stored in the packet header which includes the timestamp and the length of the following packet data.
This code is incomplete, but hopefully will be enough to get you going.

Decoding tcp packets using python

I am trying to decode data received over a tcp connection. The packets are small, no more than 100 bytes. However when there is a lot of them I receive some of the the packets joined together. Is there a way to prevent this. I am using python
I have tried to separate the packets, my source is below. The packets start with STX byte and end with ETX bytes, the byte following the STX is the packet length, (packet lengths less than 5 are invalid) the checksum is the last bytes before the ETX
def decode(data):
while True:
start = data.find(STX)
if start == -1: #no stx in message
pkt = ''
data = ''
break
#stx found , next byte is the length
pktlen = ord(data[1])
#check message ends in ETX (pktken -1) or checksum invalid
if pktlen < 5 or data[pktlen-1] != ETX or checksum_valid(data[start:pktlen]) == False:
print "Invalid Pkt"
data = data[start+1:]
continue
else:
pkt = data[start:pktlen]
data = data[pktlen:]
break
return data , pkt
I use it like this
#process reports
try:
data = sock.recv(256)
except: continue
else:
while data:
data, pkt = decode(data)
if pkt:
process(pkt)
Also if there are multiple packets in the data stream, is it best to return the packets as a collection of lists or just return the first packet
I am not that familiar with python, only C, is this method OK. Any advice would be most appreciated. Thanks in advance
Thanks
I would create a class that is responsible for decoding the packets from a stream, like this:
class PacketDecoder(object):
STX = ...
ETX = ...
def __init__(self):
self._stream = ''
def feed(self, buffer):
self._stream += buffer
def decode(self):
'''
Yields packets from the current stream.
'''
while len(self._stream) > 2:
end = self._stream.find(self.ETX)
if end == -1:
break
packet_len = ord(self._stream[1])
packet = self._stream[:end]
if packet_len >= 5 and check_sum_valid(packet):
yield packet
self._stream = self._stream[end+1:]
And then use like this:
decoder = PacketDecoder()
while True:
data = sock.recv(256)
if not data:
# handle lost connection...
decoder.feed(data)
for packet in decoder.decode():
process(packet)
TCP provides a data stream, not individual packets, at the interface level. If you want discrete packets, you can use UDP (and handle lost or out of order packets on your own), or put some data separator inline. It sounds like you are doing that already, with STX/ETX as your separators. However, as you note, you get multiple messages in one data chunk from your TCP stack.
Note that unless you are doing some other processing, data in the code you show does not necessarily contain an integral number of messages. That is, it is likely that the last STX will not have a matching ETX. The ETX will be in the next data chunk without an STX.
You should probably read individual messages from the TCP data stream and return them as they occur.
Try scapy, a powerful interactive packet manipulation program.
Where does the data come from ? Instead of trying to decode it by hand, why not use the excellent Impacket package:
http://oss.coresecurity.com/projects/impacket.html
Nice and simple... :)
The trick is in the file object.
f=sock.makefile()
while True:
STX = f.read(1)
pktlen = f.read(1)
wholePacket = STX + pktlen + f.read(ord(pktlen)-2)
doSomethingWithPacket(wholePacket)
And that's it! (There is also no need to check checksums when using TCP.)
And here is a more "robust"(?) version (it uses STX and checksum):
f=sock.makefile()
while True:
while f.read(1)!=STX:
continue
pktlen = f.read(1)
wholePacket = STX + pktlen + f.read(ord(pktlen)-2)
if checksum_valid(wholePacket):
doSomethingWithPacket(wholePacket)

Categories