Sending concatenated fixed-length values with Python sockets

Sending concatenated fixed-length values with Python sockets - python

I implemented a simple socket in Python, for sending integer values via UDP:
socket_client.sendto(str(value).encode('utf-8'), (UDP_IP, UDP_PORT))
Now I would like to send two different values with only one packet, concatenating as a 64 bit length payload.
Problem is that converted and encoded values have no fixed length so receiver cannot figure how to split the received payload...
My goal would be to send a 64 length payload as to be able to split in two different 32 bits integers... Is it possible?

Binary encoding
Python integers have a built-in to_bytes method, which conveniently serialises integers to bytes objects of a given length.
You have to specify the byteorder and need to decide whether integers can be negative.
For example, this results in 32-bit big-endian unsigned representation (can represent 0..4294967295):
>>> x = 127
>>> b = x.to_bytes(4, 'big', signed=False)
>>> b
b'\x00\x00\x00\x7f'
You can simply concatenate two of those to get a 64-bit package.
NB: Send the bytes objects as such. Don't call str(...).encode() on them.
On the receiver side, you can use the int.from_bytes class method to parse a 4-byte string:
>>> int.from_bytes(b'\x00\x00\x00\x7f', 'big', signed=False)
127
Encoding with decimal digits
Your post isn't entirely clear.
Your snippet contains str(value).encode('utf8'), which could suggest that you intend to send numbers encoded with ASCII digits.
This is also possible – simply pad your numbers with zeros or spaces:
>>> '{:04d}'.format(127)
'0127'
>>> '{: 4d}'.format(127)
' 127'
On the receiver, parse the 4-byte strings with int(...).
Note that this representation only allows for the range 0..9999.

Related

How can I extract mixed binary and ascii values from a bytes string like I did in 2.x?

The following represents a binary image extracted from a file (spaces inserted between bytes to make reading easier). File is opened with 'rb' mode.
01 77 33 9F 41 42 43 44 00 11 11 11
In Python 2.7, I read it as a character string and I use ord() to extract the binary values and then I can extract or even search the string for a specific text value (such as the "ABCD" in characters 4-7). The binary bytes can be anything from 0-FF. I've been putting off conversion to python 3 partly because of this.
I need to be able, in Python 3, to treat a string of bytes as a mixture of binary and ascii (not unicode) values. The format is not fixed, it consists of data structures. For example, the 33 in byte 2 might be a record length that tells me where the start of the next record is. In other words, I can't just say that I know the text string is always in location 4.
I don't write the file, I just use it, so changing it is not an option.
I've seen lots of examples of using b' and other things to convert fixed strings but I need a way to intermix these values, extracting bytes, 2-byte to 8-byte values as 16-bit to 64-bit words, and extracting/searching for ASCII strings within the larger string.
The byte/character separation in Python 3 seems somewhat inflexible for what I need. I'm sure there's a way to do this I just haven't found an example or an answered question that seems to cover this case.
This is a simplified example, I can't provide real data (it's proprietary) but this illustrates the problem. The real files may be short (<1K) or huge (>100K), containing multiple records of different sizes.
Is there an easy, straightforward way to essentially replicate the functionality I have in Python 2.7?
This is on Windows.
Thanks

I need to be able, in Python 3, to treat a string of bytes as a mixture of binary and ascii (not unicode) values. The format is not fixed, it consists of data structures. For example, the 33 in byte 2 might be a record length that tells me where the start of the next record is. In other words, I can't just say that I know the text string is always in location 4.
Read the file in binary mode, as you are doing. This produces a bytes object, which in 3.x is not the same as a str (as it would be in 2.x).
Interpret the bytes as bytes, as needed, to figure out the general structure of the data. Slicing the bytes produces another bytes as before; indexing produces an int with the numeric value of that single byte (not as before) - no ord required.
When you have determined a subset of the bytes that represent a string (let's say for convenience that you have sliced it out), convert to string using the appropriate encoding: e.g. str(my_bytes, 'ascii'). Note that ASCII will not handle byte values 0x80 through 0xFF; especially with binary-ish legacy file formats, there's a good chance your data is actually something like Latin-1: str(my_bytes, 'iso-8859-1').
search the string for a specific text value
You can search at either the text or the byte level - bytes objects support the in operator, searching for either a subsequence of bytes or a single integer value. Whether it makes more sense to search before or after string conversion will depend on what you are doing.
using b' and other things to convert fixed strings
b'' is just the syntax for a literal bytes object. It's what you'll see if you ask for the repr of what you read from the file. Prefixing a b onto an existing string literal in your code isn't really "converting" anything, but replacing it with the value you should have had in the first place.
2-byte to 8-byte values as 16-bit to 64-bit words
The documentation says it at least as well as I could:
>>> help(int.from_bytes)
Help on built-in function from_bytes:
from_bytes(...) method of builtins.type instance
int.from_bytes(bytes, byteorder, *, signed=False) -> int
Return the integer represented by the given array of bytes.
The bytes argument must be a bytes-like object (e.g. bytes or bytearray).
The byteorder argument determines the byte order used to represent the
integer. If byteorder is 'big', the most significant byte is at the
beginning of the byte array. If byteorder is 'little', the most
significant byte is at the end of the byte array. To request the native
byte order of the host system, use `sys.byteorder' as the byte order value.
The signed keyword-only argument indicates whether two's complement is
used to represent the integer.

Proper way for converting to bigendian for network submission

I need to get an int through the network. Is this the proper way to convert to bytes in big-endian?
pack("I",socket.htonl(integer_value))
I unpack it as:
socket.ntohl(unpack("I",data)[0])
I noticed that pack-unpack also have the <> to use for endian conversion so I am not sure if I could just directly use that instead or if htonl is safer.

You should use only the struct module for communicating with another system. By using the htonl first, you'll end up with an indeterminate order being transmitted.
Since you need to convert the integer into a string of bytes in order to send it to another system, you'll need to use struct.pack (because htonl just returns a different integer than the one passed as argument and you cannot directly send an integer). And in using struct.pack you must choose an endianness for that string of bytes (if you don't specify one, you'll get a default ordering which may not be the same on the receiving side so you really need to choose one).
Converting an integer to a sequence of bytes in a definite order is exactly what struct.pack("!I", integer_value) does and a sequence of bytes in a definite order is exactly what you need on the receiving end.
On the other hand, if you use struct.pack("!I", socket.htonl(integer_value)), what does that do? Well, first it puts the integer into big-endian order (network byte order), then it takes your already big-endian integer and converts it to bytes in "big-endian order". But, on a little endian machine, that will actually reverse the ordering again, and you will end up transmitting the integer in little-endian byte order if you do both those two operations.
But on a big-endian machine htonl is a no-op, and then you're converting the result into bytes in big-endian order.
So using ntohl actually defeats the purpose and a receiving machine would have to know the byte-order used on the sending machine in order to properly decode it. Observe...
Little-endian box:
>>> print(socket.htonl(27))
452984832
>>> print(struct.pack("!I", 27))
b'\x00\x00\x00\x1b'
>>> print(struct.pack("!I", socket.htonl(27)))
b'\x1b\x00\x00\x00'
Big-endian box:
>>> print(socket.htonl(27))
27
>>> print(struct.pack("!I", 27))
b'\x00\x00\x00\x1b'
>>> print(struct.pack("!I", socket.htonl(27)))
b'\x00\x00\x00\x1b'

struct.unpack() uses '!' in the format specifiers for network byte order. But its the same as '>'...

Sending hexadecimal or ASCII value stored in a variable using pyserial

I was trying to send a byte containing hex value over serial port using pyserial. The hex value has to be in a variable (so that I can do some manipulations before sending). Sample code will explain my intent:
import serial
com=serial.Serial('COM1')
a_var=64
a_var=a_var+1
com.write(a_var) #This of course throws error
I want to receive 'A' or 0x41 on the other side. I could send hex using
com.write(b'\x41')
but not using a variable. Converting it to string or character or encoding the string did not help. I am using python 3.5.
Thanks

At first the name choice of your variable was not optimal. input is a built-in function and you might shadow it.
There are many way to put bytes into a variable:
to_send = b'A'
to_send = b'\x41'
to_send = bytes([65])
You see how to use an ASCII character, the escape sequence for hex numbers and the list of integers.
Now send via
com.write(to_send)

bytearray can be used to send bytes (as hex or ascii). They are mutable, hence numerical manipulations are possible. Any number of bytes can be sent using it.
import serial
com=serial.Serial('COM2')
elements=[65,67,69,71] #Create an array of your choice
a_var=bytearray(elements) #Create byte array of it
com.write(a_var[0:3]) #Write desired elements at serial port
a_var[0]=a_var[0]+1 #Do your mathematical manipulation
com.write(a_var[0:1]) #Write again as desired
com.close()

Python: Send Integer value over socket

I'm using Python to communicate data over a TCP socket between a server and client application. I need to send a 4 bytes which represent a data sample. The initial sample is an 32-bit unsigned integer. How can I send those 4 bytes of raw data through the socket?
I want to send the data: 0x12345678 and 0xFEDCBA98
The raw data sent over the socket should be exactly that if I read it on wireshark/tcpdump/etc. I don't want each value in the 8 hex numbers to be represented as an ascii character, I want the raw data to remain intact.
Thank you

The main method to send binary data in Python is using the struct module.
For example, packing 3 4-byte unsigned integers is done like this
In [3]: struct.pack("III", 3, 4, 5)
Out[3]: '\x03\x00\x00\x00\x04\x00\x00\x00\x05\x00\x00\x00'
Note to keep the endianess correct, using "<", ">", and so on.

you could use bytes(). Thats what I use to send strings over a network but you can use it for ints as well. Its usage is bytes([3])
Edit: bytes converts an int (3) to bytes. a string of bytes is represented as b'\x03'. so like your byte string: 0x12345678 would be b'\x12345678'
check the docs for more info:
https://docs.python.org/3.1/library/functions.html#bytes

Binary data with pyserial(python serial port)

serial.write() method in pyserial seems to only send string data. I have arrays like [0xc0,0x04,0x00] and want to be able to send/receive them via the serial port? Are there any separate methods for raw I/O?
I think I might need to change the arrays to ['\xc0','\x04','\x00'], still, null character might pose a problem.

An alternative method, without using the array module:
def a2s(arr):
""" Array of integer byte values --> binary string
"""
return ''.join(chr(b) for b in arr)

You need to convert your data to a string
"\xc0\x04\x00"
Null characters are not a problem in Python -- strings are not null-terminated the zero byte behaves just like another byte "\x00".
One way to do this:
>>> import array
>>> array.array('B', [0xc0, 0x04, 0x00]).tostring()
'\xc0\x04\x00'

I faced a similar (but arguably worse) issue, having to send control bits through a UART from a python script to test an embedded device. My data definition was "field1: 8 bits , field2: 3 bits, field3 7 bits", etc. It turns out you can build a robust and clean interface for this using the BitArray library. Here's a snippet (minus the serial set-up)
from bitstring import BitArray
cmdbuf = BitArray(length = 50) # 50 byte BitArray
cmdbuf.overwrite('0xAA', 0) # Init the marker byte at the head
Here's where it gets flexible. The command below replaces the 4 bits at
bit position 23 with the 4 bits passed. Note that it took a binary
bit value, given in string form. I can set/clear any bits at any location
in the buffer this way, without having to worry about stepping on
values in adjacent bytes or bits.
cmdbuf.overwrite('0b0110', 23)
# To send on the (previously opened) serial port
ser.write( cmdbuf )

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.