Python 3 Socket Receive Data - python

I am working with python sockets and although I have read everything I can find on stackoverflow, regarding the socket.recv byte stream, I still have a few questions I hope some of you can answer.
So I have this line:
chunk = s.recv(1024)
I receive the following packet: '\x12\x03\x10\x00\x00\x00\x00\x23\x45\x34\x56'
I need to get bytes at index 1 & 2 ('\x03\x10') and convert the 2-byte data to an integer .. which should be 784.
What is the best way to handle this?
In addition, what is the best way to remove a portion of the chunk data and pass it into a function? I.E. I need to remove bytes at index 7 - 9 and pass that string of 3 bytes into a function.

Getting substrings out of a larger string is simple; for example, given:
>>> thestring = 'this is a test'
If I want bytes 1 and 2, and I can ask for:
>>> thestring[1:3]
'hi'
In your example, once you have the two bytes of interest:
>>> chunk = '\x12\x03\x10\x00\x00\x00\x00\x23\x45\x34\x56'
>>> data = chunk[1:3]
You can use the struct module to unpack those into a single integer:
>>> import struct
>>> struct.unpack('H', data)
(4099,)
That assumes native byte order. You can explicitly request big endian like this:
>>> struct.unpack('>H', data)
(784,)
Read the struct documentation for more information. Note that H (used here) means "unsigned short", while h (used in another answer) means "signed short", and which one is appropriate depends on your application.

Use the struct module:
import struct
pkt = b'\x12\x03\x10\x00\x00\x00\x00\x23\x45\x34\x56'
i = struct.unpack_from('>h', pkt, 1)[0]
>>> print i
784
The format string '>h' means that the data has big endian byte order (>) and to treat the data as a short 2-byte integer (h). There is also an unsigned variety denoted by H.
To grab a substring of the data use slicing:
>>> idx = 7
>>> length = 3
>>> pkt[idx:idx+length]
b'#E4'
and pass it to a function:
>>> func(pkt[idx:idx+length])

Related

Python3 reading a binary file, 4 bytes at a time and xor it with a 4 byte long key

I want to read a binary file, get the content four bytes by four bytes and perform int operations on these packets.
Using a dummy binary file, opened this way:
with open('MEM_10001000_0000B000.mem', 'br') as f:
for byte in f.read():
print (hex(byte))
I want to perform an encryption with a 4 byte long key, 0x9485A347 for example.
Is there a simple way I can read my files 4 bytes at a time and get them as int or do I need to put them in a temporary result using a counter?
My original idea is the following:
current_tmp = []
for byte in data:
current_tmp.append(int(byte))
if (len(current_tmp) == 4):
print (current_tmp)
# but current_tmp is an array not a single int
current_tmp = []
In my example, instead of having [132, 4, 240, 215] I would rather have 0x8404f0d7
Just use the "amount" parameter of read to read 4 bytes at a time, and the "from_bytes" constructor of Python's 3 int to get it going:
with open('MEM_10001000_0000B000.mem', 'br') as f:
data = f.read(4)
while data:
number = int.from_bytes(data, "big")
...
data = f.read(4)
If you are not using Python 3 yet for some reason, int won't feature a from_bytes method - then you could resort to use the struct module:
import struct
...
number = struct.unpack(">i", data)[0]
...
These methods however are good for a couple interations, and could get slow for a large file - Python offers a way for you to simply fill an array of 4-byte integer numbers directly in memory from an openfile - which is more likely what you should be using:
import array, os
numbers = array.array("i")
with open('MEM_10001000_0000B000.mem', 'br') as f:
numbers.fromfile(f, os.stat('MEM_10001000_0000B000.mem').st_size // numbers.itemsize)
numbers.byteswap()
Once you have the array, you can xor it with something like
from functools import reduce #not needed in Python2.7
result = reduce(lambda result, input: result ^ input, numbers, key)
will give you a numbers sequence with all numbers in your file read-in as 4 byte, big endian, signed ints.
If you file is not a multiple of 4 bytes, the first two methods might need some adjustment - fixing the while condition will be enough.

Why does bytes(5) return b'\x00\x00\x00\x00\x00' instead of b'\x05'?

I'm converting int to bytes using this command in python:
a = 5
b = bytes(a, 'utf-8')
but when I print b I get this value:
b'\x00\x00\x00\x00\x00'
what is wrong with this piece of code?
The bytes() function documentation points to the bytearray() documentation, which states:
The optional source parameter can be used to initialize the array in a few different ways:
[....]
If it is an integer, the array will have that size and will be initialized with null bytes.
You asked for a bytes() object of size 5, initialised to null bytes.
You probably want to turn a into a string first:
bytes(str(a), 'utf-8')
Demo:
>>> a = 5
>>> bytes(str(a), 'utf-8')
b'5'
If you wanted to have the byte value 5 (so the ENQ ASCII control code or whatever else you might want it to mean) you'll need to put it a in a list:
bytes([a])
(no need to provide an encoding then):
>>> bytes([a])
b'\x05'
You are creating a byte array of length 5.
To get the binary represenation of the number 5 you can use bin()
bin(5)
0b101

How to make a fixed-size byte variable in Python

Let's say, I have a string (Unicode if it matters) variable which is less than 100 bytes. I want to create another variable with exactly 100 byte in size which includes this string and is padded with zero or whatever. How would I do it in Python 3?
For assembling packets to go over the network, or for assembling byte-perfect binary files, I suggest using the struct module.
struct — Interpret bytes as packed binary data
Just for the string, you might not need struct, but as soon as you start also packing binary values, struct will make your life much easier.
Depending on your needs, you might be better off with an off-the-shelf network serialization library, such as Protocol Buffers; or you might even just use JSON for the wire format.
Protocol Buffer Basics: Python
PyMOTW - JavaScript Object Notation Serializer
Something like this should work:
st = "具有"
by = bytes(st, "utf-8")
by += b"0" * (100 - len(by))
print(by)
# b'\xe5\x85\xb7\xe6\x9c\x890000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000'
Obligatory addendum since your original post seems to conflate strings with the length of their encoded byte representation: Python unicode explanation
To pad with null bytes you can do it the way they do it in the stdlib base64 module.
some_data = b'foosdsfkl\x05'
null_padded = some_data + bytes(100 - len(some_data))
Here's a roundabout way of doing it:
>>> import sys
>>> a = "a"
>>> sys.getsizeof(a)
22
>>> a = "aa"
>>> sys.getsizeof(a)
23
>>> a = "aaa"
>>> sys.getsizeof(a)
24
So following this, an ASCII string of 100 bytes will need to be 79 characters long
>>> a = "".join(["a" for i in range(79)])
>>> len(a)
79
>>> sys.getsizeof(a)
100
This approach above is a fairly simple way of "calibrating" strings to figure out their lengths. You could automate a script to pad a string out to the appropriate memory size to account for other encodings.
def padder(strng):
TARGETSIZE = 100
padChar = "0"
curSize = sys.getsizeof(strng)
if curSize <= TARGETSIZE:
for i in range(TARGETSIZE - curSize):
strng = padChar + strng
return strng
else:
return strng # Not sure if you need to handle strings that start longer than your target, but you can do that here

How can I convert two bytes of an integer back into an integer in Python?

I am currently using an Arduino that's outputting some integers (int) through Serial (using pySerial) to a Python script that I'm writing for the Arduino to communicate with X-Plane, a flight simulation program.
I managed to separate the original into two bytes so that I could send it over to the script, but I'm having a little trouble reconstructing the original integer.
I tried using basic bitwise operators (<<, >> etc.) as I would have done in a C++like program, but it does not seem to be working.
I suspect it has to do with data types. I may be using integers with bytes in the same operations, but I can't really tell which type each variable holds, since you don't really declare variables in Python, as far as I know (I'm very new to Python).
self.pot=self.myline[2]<<8
self.pot|=self.myline[3]
You can use the struct module to convert between integers and representation as bytes. In your case, to convert from a Python integer to two bytes and back, you'd use:
>>> import struct
>>> struct.pack('>H', 12345)
'09'
>>> struct.unpack('>H', '09')
(12345,)
The first argument to struct.pack and struct.unpack represent how you want you data to be formatted. Here, I ask for it to be in big-ending mode by using the > prefix (you can use < for little-endian, or = for native) and then I say there is a single unsigned short (16-bits integer) represented by the H.
Other possibilities are b for a signed byte, B for an unsigned byte, h for a signed short (16-bits), i for a signed 32-bits integer, I for an unsigned 32-bits integer. You can get the complete list by looking at the documentation of the struct module.
For example, using Big Endian encoding:
int.from_bytes(my_bytes, byteorder='big')
What you have seems basically like it should work, assuming the data stored in myline has the high byte first:
myline = [0, 1, 2, 3]
pot = myline[2]<<8 | myline[3]
print 'pot: {:d}, 0x{:04x}'.format(pot, pot) # outputs "pot: 515, 0x0203"
Otherwise, if it's low-byte first you'd need to do the opposite way:
myline = [0, 1, 2, 3]
pot = myline[3]<<8 | myline[2]
print 'pot: {:d}, 0x{:04x}'.format(pot, pot) # outputs "pot: 770, 0x0302"
This totally works:
long = 500
first = long & 0xff #244
second = long >> 8 #1
result = (second << 8) + first #500
If you are not sure of types in 'myline' please check Stack Overflow question How to determine the variable type in Python?.
To convert a byte or char to the number it represents, use ord(). Here's a simple round trip from an int to bytes and back:
>>> number = 3**9
>>> hibyte = chr(number / 256)
>>> lobyte = chr(number % 256)
>>> hibyte, lobyte
('L', '\xe3')
>>> print number == (ord(hibyte) << 8) + ord(lobyte)
True
If your myline variable is string or bytestring, you can use the formula in the last line above. If it somehow is a list of integers, then of course you don't need ord.

Convert int to single byte in a string?

I'm implementing PKCS#7 padding right now in Python and need to pad chunks of my file in order to amount to a number divisible by sixteen. I've been recommended to use the following method to append these bytes:
input_chunk += '\x00'*(-len(input_chunk)%16)
What I need to do is the following:
input_chunk_remainder = len(input_chunk) % 16
input_chunk += input_chunk_remainder * input_chunk_remainder
Obviously, the second line above is wrong; I need to convert the first input_chunk_remainder to a single byte string. How can I do this in Python?
In Python 3, you can create bytes of a given numeric value with the bytes() type; you can pass in a list of integers (between 0 and 255):
>>> bytes([5])
b'\x05'
bytes([5] * 5)
b'\x05\x05\x05\x05\x05'
An alternative method is to use an array.array() with the right number of integers:
>>> import array
>>> array.array('B', 5*[5]).tobytes()
b'\x05\x05\x05\x05\x05'
or use the struct.pack() function to pack your integers into bytes:
>>> import struct
>>> struct.pack('{}B'.format(5), *(5 * [5]))
b'\x05\x05\x05\x05\x05'
There may be more ways.. :-)
In Python 2 (ancient now), you can do the same by using the chr() function:
>>> chr(5)
'\x05'
>>> chr(5) * 5
'\x05\x05\x05\x05\x05'
In Python3, the bytes built-in accepts a sequence of integers. So for just one integer:
>>> bytes([5])
b'\x05'
Of course, thats bytes, not a string. But in Python3 world, OP would probably use bytes for the app he described, anyway.

Categories