What is a pad byte (x) in the python struct type? Is it an unsigned char with value 0, or what exactly does it look like / why is it one of the types that are available in the struct object?
For example, what would be the difference between doing one of the following:
>>> struct.pack('BB', 0, ord('a'))
b'\x00a'
>>> struct.pack('xB', ord('a'))
b'\x00a'
It's useful for matching required length of another system.
For example. In my work, there is a server that sends a fixed sized header and expects fixed sized messages. This guarantees that, lets say the first 20 bytes are a header, with bytes 0-8 being the message size.
It doesn't really matter what type the pad is. It's basically junk data. unsigned char 0 is a good choice though and the one that struct.pack uses.
I'm trying to send a float as a series of 4 bytes over serial.
I have code that looks like this which works:
ser.write(b'\xcd') #sending the byte representation of 0.1
ser.write(b'\xcc')
ser.write(b'\xcc')
ser.write(b'\x3d')
but I want to be able to send an arbitary float.
I also want to be able to go through each byte individually so this won't do for example:
bytes = struct.pack('f',float(0.1))
ser.write(bytes)
because I want to check each byte.
I'm using python 2.7
How can I do this?
You can use the struct module to pack the float as binary data. Then loop through each byte of the bytearray and write them to your output.
import struct
value = 13.37 # arbitrary float
bin = struct.pack('f', value)
for b in bin:
ser.write(b)
I'm using Python to communicate data over a TCP socket between a server and client application. I need to send a 4 bytes which represent a data sample. The initial sample is an 32-bit unsigned integer. How can I send those 4 bytes of raw data through the socket?
I want to send the data: 0x12345678 and 0xFEDCBA98
The raw data sent over the socket should be exactly that if I read it on wireshark/tcpdump/etc. I don't want each value in the 8 hex numbers to be represented as an ascii character, I want the raw data to remain intact.
Thank you
The main method to send binary data in Python is using the struct module.
For example, packing 3 4-byte unsigned integers is done like this
In [3]: struct.pack("III", 3, 4, 5)
Out[3]: '\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00'
Note to keep the endianess correct, using "<", ">", and so on.
you could use bytes(). Thats what I use to send strings over a network but you can use it for ints as well. Its usage is bytes([3])
Edit: bytes converts an int (3) to bytes. a string of bytes is represented as b'\x03'. so like your byte string: 0x12345678 would be b'\x12345678'
check the docs for more info:
https://docs.python.org/3.1/library/functions.html#bytes
i got an issue with structs not packing a string
i currently create a random 20 byte long string and when i try to pack this using structs in 20 octets by the code below
payload = struct.pack("H" * 20, *rendezvous_cookie)
rendezvous_cookie calculated by os.urandom(20)
i get the error struct.error: cannot convert argument to integer
is there any quick easy way of encoding the string so it can be packed this way?
Thanks
Edit managed to fix it by doing :
payload = struct.pack('!20s', rendezvous_cookie)
this way it takes the input as a string fine and is still of 20 octets
os.urandom(n) returns a random str of length n.
If you want to make a list of integers out of it, use:
[ord(b) for b in os.urandom(n)]
You can feed that as arguments to struct.pack.
Note, however, that os.urandom(n) already returns a serialized list of bytes. You may be able to use that directly. Using struct.pack("H", ...) makes each number occupy two bytes (one of which will hold no data).
I recently came across the dataType called bytearray in python. Could someone provide scenarios where bytearrays are required?
This answer has been shameless ripped off from here
Example 1: Assembling a message from fragments
Suppose you're writing some network code that is receiving a large message on a socket connection. If you know about sockets, you know that the recv() operation doesn't wait for all of the data to arrive. Instead, it merely returns what's currently available in the system buffers. Therefore, to get all of the data, you might write code that looks like this:
# remaining = number of bytes being received (determined already)
msg = b""
while remaining > 0:
chunk = s.recv(remaining) # Get available data
msg += chunk # Add it to the message
remaining -= len(chunk)
The only problem with this code is that concatenation (+=) has horrible performance. Therefore, a common performance optimization in Python 2 is to collect all of the chunks in a list and perform a join when you're done. Like this:
# remaining = number of bytes being received (determined already)
msgparts = []
while remaining > 0:
chunk = s.recv(remaining) # Get available data
msgparts.append(chunk) # Add it to list of chunks
remaining -= len(chunk)
msg = b"".join(msgparts) # Make the final message
Now, here's a third solution using a bytearray:
# remaining = number of bytes being received (determined already)
msg = bytearray()
while remaining > 0:
chunk = s.recv(remaining) # Get available data
msg.extend(chunk) # Add to message
remaining -= len(chunk)
Notice how the bytearray version is really clean. You don't collect parts in a list and you don't perform that cryptic join at the end. Nice.
Of course, the big question is whether or not it performs. To test this out, I first made a list of small byte fragments like this:
chunks = [b"x"*16]*512
I then used the timeit module to compare the following two code fragments:
# Version 1
msgparts = []
for chunk in chunks:
msgparts.append(chunk)
msg = b"".join(msgparts)
#Version 2
msg = bytearray()
for chunk in chunks:
msg.extend(chunk)
When tested, version 1 of the code ran in 99.8s whereas version 2 ran in 116.6s (a version using += concatenation takes 230.3s by comparison). So while performing a join operation is still faster, it's only faster by about 16%. Personally, I think the cleaner programming of the bytearray version might make up for it.
Example 2: Binary record packing
This example is an slight twist on the last example. Suppose you had a large Python list of integer (x,y) coordinates. Something like this:
points = [(1,2),(3,4),(9,10),(23,14),(50,90),...]
Now, suppose you need to write that data out as a binary encoded file consisting of a 32-bit integer length followed by each point packed into a pair of 32-bit integers. One way to do it would be to use the struct module like this:
import struct
f = open("points.bin","wb")
f.write(struct.pack("I",len(points)))
for x,y in points:
f.write(struct.pack("II",x,y))
f.close()
The only problem with this code is that it performs a large number of small write() operations. An alternative approach is to pack everything into a bytearray and only perform one write at the end. For example:
import struct
f = open("points.bin","wb")
msg = bytearray()
msg.extend(struct.pack("I",len(points))
for x,y in points:
msg.extend(struct.pack("II",x,y))
f.write(msg)
f.close()
Sure enough, the version that uses bytearray runs much faster. In a simple timing test involving a list of 100000 points, it runs in about half the time as the version that makes a lot of small writes.
Example 3: Mathematical processing of byte values
The fact that bytearrays present themselves as arrays of integers makes it easier to perform certain kinds of calculations. In a recent embedded systems project, I was using Python to communicate with a device over a serial port. As part of the communications protocol, all messages had to be signed with a Longitudinal Redundancy Check (LRC) byte. An LRC is computed by taking an XOR across all of the byte values.
Bytearrays make such calculations easy. Here's one version:
message = bytearray(...) # Message already created
lrc = 0
for b in message:
lrc ^= b
message.append(lrc) # Add to the end of the message
Here's a version that increases your job security:
message.append(functools.reduce(lambda x,y:x^y,message))
And here's the same calculation in Python 2 without bytearrays:
message = "..." # Message already created
lrc = 0
for b in message:
lrc ^= ord(b)
message += chr(lrc) # Add the LRC byte
Personally, I like the bytearray version. There's no need to use ord() and you can just append the result at the end of the message instead of using concatenation.
Here's another cute example. Suppose you wanted to run a bytearray through a simple XOR-cipher. Here's a one-liner to do it:
>>> key = 37
>>> message = bytearray(b"Hello World")
>>> s = bytearray(x ^ key for x in message)
>>> s
bytearray(b'm#IIJ\x05rJWIA')
>>> bytearray(x ^ key for x in s)
bytearray(b"Hello World")
>>>
Here is a link to the presentation
A bytearray is very similar to a regular python string (str in python2.x, bytes in python3) but with an important difference, whereas strings are immutable, bytearrays are mutable, a bit like a list of single character strings.
This is useful because some applications use byte sequences in ways that perform poorly with immutable strings. When you are making lots of little changes in the middle of large chunks of memory, as in a database engine, or image library, strings perform quite poorly; since you have to make a copy of the whole (possibly large) string. bytearrays have the advantage of making it possible to make that kind of change without making a copy of the memory first.
But this particular case is actually more the exception, rather than the rule. Most uses involve comparing strings, or string formatting. For the latter, there's usually a copy anyway, so a mutable type would offer no advantage, and for the former, since immutable strings cannot change, you can calculate a hash of the string and compare that as a shortcut to comparing each byte in order, which is almost always a big win; and so it's the immutable type (str or bytes) that is the default; and bytearray is the exception when you need it's special features.
If you look at the documentation for bytearray, it says:
Return a new array of bytes. The bytearray type is a mutable sequence of integers in the range 0 <= x < 256.
In contrast, the documentation for bytes says:
Return a new “bytes” object, which is an immutable sequence of integers in the range 0 <= x < 256. bytes is an immutable version of bytearray – it has the same non-mutating methods and the same indexing and slicing behaviors.
As you can see, the primary distinction is mutability. str methods that "change" the string actually return a new string with the desired modification. Whereas bytearray methods that change the sequence actually change the sequence.
You would prefer using bytearray, if you are editing a large object (e.g. an image's pixel buffer) through its binary representation and you want the modifications to be done in-place for efficiency.
Wikipedia provides an example of XOR cipher using Python's bytearrays (docstrings reduced):
#!/usr/bin/python2.7
from os import urandom
def vernam_genkey(length):
"""Generating a key"""
return bytearray(urandom(length))
def vernam_encrypt(plaintext, key):
"""Encrypting the message."""
return bytearray([ord(plaintext[i]) ^ key[i] for i in xrange(len(plaintext))])
def vernam_decrypt(ciphertext, key):
"""Decrypting the message"""
return bytearray([ciphertext[i] ^ key[i] for i in xrange(len(ciphertext))])
def main():
myMessage = """This is a topsecret message..."""
print 'message:',myMessage
key = vernam_genkey(len(myMessage))
print 'key:', str(key)
cipherText = vernam_encrypt(myMessage, key)
print 'cipherText:', str(cipherText)
print 'decrypted:', vernam_decrypt(cipherText,key)
if vernam_decrypt(vernam_encrypt(myMessage, key),key)==myMessage:
print ('Unit Test Passed')
else:
print('Unit Test Failed - Check Your Python Distribution')
if __name__ == '__main__':
main()