Using bytearray.translate() with a table - python

I'm trying to remove certain characters from a bytearray (specifically, certain control characters that are messing up my formatting)
I manually listed individual translates and it worked, but can I format this as a single translate?
In the string variant, the input can be a dictionary table. But I get an error when I tried this that parameters must be a bytearray object.
translation_table_0A = bytes.maketrans(b"\x0A", b"\x00")
translation_table_0B = bytes.maketrans(b"\x0B", b"\x00")
translation_table_0C = bytes.maketrans(b"\x0C", b"\x00")
translation_table_0D = bytes.maketrans(b"\x0D", b"\x00")
translation_table_04 = bytes.maketrans(b"\x04", b"\x00")
test_bytes = bytearray(b"\x75\x66\x73\x62\x0D\x73\x62\x0B\x00\x74\xF1\x74\x73\x62\x61\x76\x00\x0C\x76\x02\x04\x01\x62\x68\x72\x74\x00\x00\x00\x0A\x01\x00")
out_list = test_bytes.translate(translation_table_0A) # remove \x0A
out_list = out_list.translate(translation_table_0B) # remove \x0B
out_list = out_list.translate(translation_table_0C) # remove \x0C
out_list = out_list.translate(translation_table_0D) # remove \x0D
out_list = out_list.translate(translation_table_04) # remove \x04
print(f"Output coded: {obj}")
print(f"Output decoded: {obj.decode('mac-roman')}")
I would think it would work like this:
translate_dict = {b"\x0A" : b"\x00", b"\x0B" : b"\x00", b"\x0C" : b"\x00", b"\x0D" : b"\x00", b"\x04" : b"\x00", }
out_list = test_bytes.translate(translate_dict) # remove Control Chars
But it doesn't. Does anyone know how to get this working?
Unfortunately the documentation is lacking in details:
bytes
bytes maketrans()
bytes methods
bytes translate()
From the maketrans method, a table can be generated, but 'from' and 'to' must be byte-like objects, so tuples, lists, or dictionaries wont work.
note: Not interested in regex solutions, or other libraries. Specifically looking for this application.

If you want a bytes translation table, you get a 256-byte mapping of bytes, that is indexed into by the source byte value, and returns the nth byte value in the mapping. You don't have to set up 4 different translation tables to translate 4 bytes, you can do it like this:
>>> translation_table = bytes.maketrans(b"\x0A\x0B\x0C\x0D\x04", b"\x00\x00\x00\x00\x00")
That will let you change the unwanted byte values to \x00 like this:
>>> test_bytes=bytearray(b"\x75\x66\x73\x62\x0D\x73\x62\x0B\x00\x74\xF1\x74\x73\x62\x61\x76\x00\x0C\x76\x02\x04\x01\x62\x68\x72\x74\x00\x00\x00\x0A\x01\x00")
>>> test_bytes.translate(translation_table)
bytearray (b'ufsb\x00sb\x00\x00t\xf1tsbav\x00\x00v\x02\x00\x01bhrt\x00\x00\x00\x00\x01\x00')
which does not look exactly like test_bytes with 4 byte values changed, because the default representation of a printable character in a bytestring is the printable character and not the hex escape. You can see this if you ask for test_bytes back:
>>> test_bytes
bytearray (b'ufsb\rsb\x0b\x00t\xf1tsbav\x00\x0cv\x02\x04\x01bhrt\x00\x00\x00\n\x01\x00')
Here sequences such as tsbav and bhrt appear as printable characters and not as hex escapes. But it is only the representation that differs.
If you are working with bytes, you can't use a dictionary as a translation table. In Python 3, where strings are Unicode, a 256-byte mapping table won't work, because there are 1,114,112 possible codepoints that the table might need to translate. So for strings, translate() uses a dict instead. While efficient, a dict can't match a 256-byte character map for efficiency. So bytes.maketrans() makes a 256-byte character map, but str.maketrans() makes a dict, and the corresponding translate() methods expect the corresponding kind of translation table.

Related

deleting escape characters python

temp = str(read_temp())
### temp is 29.12
temp = binascii.hexlify(temp)
### now temp is 32392e3132
n = 2
ta = [temp[i:i+n] for i in range(0, len(temp), n)]
### now ta[0]=32 ta[1]=39 ta[2]=2e ta[3]=31 ta[4]=32
print(type(ta[0]))
data_send = r'\x00\x00\x00\x00\x'+ta[0]+r'\x'+ta[1]+r'\x'+ta[2]+r'\x'+ta[3]+r'\x'+ta[4]
data_send = literal_eval("'%s'" %data_send) # that can be delete
yield Task(self.send, data_send)
Hi, python version=2.7.1.6
I read the temperature. Example of temperature is 29.22 *C. I want to add this value of temperature to data_send like ascii code. Then i will send the data on tornado web server on iec104 protocol.
when i print the data the result is '\x00\x00\x00\x0028.87'. I want to change this data like that'\x00\x00\x00\x00\x32\x38\x2e\x38\x37'. But the result goes on like that: \\x00\\x00\\x00\\x00\\x32\\x38\\x2e\\x38\\x37
I want to delete this extra escaping character \
Please help me
You're using r-prefixed strings (raw strings). Within raw strings, any backslashes are interpreted literally, not as an escape character. If you want a string in which each character has the actual hex value you're encoding, like '\x00' for 0, remove the r prefix from the string.
Then, when printing the string, use the repr function to reverse the encoding (i.e. to see the escape sequences used):
>>> s = b"\x61\x00\x12"
>>> print(repr(s))
b'a\x00\x12'
Note that any hex value that corresponds to a printable character (like x61 above) will be shown as the actual character (a in this case), instead of the escape sequence.
The string will contain the actual values encoded with a hex escape sequence:
>>> print(*s)
97 0 18
If you just want a string of literal escape sequences, regardless of whether the character is printable or not, you'll have to do it manually.
Given a list of numbers you want to encode as hex sequences,
nums = [97, 0, 18]
you can do
escaped = ''.join(r'\x{:02x}'.format(num) for num in nums)
(in the format specification, 0 is the fill character, 2 is the width, and x indicates hexadecimal). Now, if you print escaped, you will see a string of escape sequences:
>>> print(escaped)
\x61\x00\x12
If you need to send a temperature as plain text characters after four null characters, this will work:
temp = str(read_temp())
data_send = b'\x00\x00\x00\x00' + temp.encode('ascii')
yield Task(self.send, data_send)
Also, just:
print(b'\x00\x00\x00\x00' + '28.87'.encode('ascii'))
Result:
b'\x00\x00\x00\x0028.87'
Which is exactly what you need, i.e. a string of bytes, four chr(0) followed by a chr(0x32), chr(0x38), chr(0x2e), chr(0x38) and chr(0x37).
Unless of course the service somehow expects a Python string representation of the data, which would be more than a bit odd, but not impossible.

How to replace 'b' with 0 or 1 in the binary representation of a string

import binascii
a = []
a = input('enter the messge')
def str2bin(message):
binary = bin(int(binascii.hexlify(message.encode("ascii")), 16))
return binary[1:]
print(str2bin(a))
Input string : hai
Output : b11010000110000101101001
How to remove or replace the 'b' from the output and replace it with another binary digit ?
Python strings cannot be changed after they have been created, they are immutable. You will have to create a new string, combining the digit and a substring of the original string, like this:
data = str2bin(a)
data0 = "0" + data[1:]
data1 = "1" + data[1:]
[1:] is a slice. In this case, it makes a copy of the string with the first character (at index 0) removed.
The bin function isn't suitable for this task. Not only does it give you that unwanted 'b', it also removes leading zeros, so the encoded bit strings vary in length, making them difficult to decode correctly. Instead, you can use the format function or method, and specify the bit length so no leading zeros are lost.
In Python 3, binascii.hexlify isn't required, we can get the necessary integers directly from the bytes object. The code below ensures that the bit string for each byte has exactly 8 bits, padding with zeros on the left when necessary. It uses the default UTF-8 encoding, but you can change that to 'ascii' if you want. Both encodings give the same result if the input string is pure ASCII, but 'utf8' handles any Unicode. Of course, for characters outside the ASCII range a single character will be encoded as 2 or more bytes.
s = 'hai'
bits = ''.join([format(u, '08b') for u in s.encode()])
print(bits)
output
011010000110000101101001
If you have Python 3.6+, you can do this using the more compact (and faster) f-string syntax:
bits = ''.join([f'{u:08b}' for u in s.encode()])

Converting Byte to String and Back Properly in Python3?

Given a random byte (i.e. not only numbers/characters!), I need to convert it to a string and then back to the inital byte without loosing information. This seems like a basic task, but I ran in to the following problems:
Assuming:
rnd_bytes = b'w\x12\x96\xb8'
len(rnd_bytes)
prints: 4
Now, converting it to a string. Note: I need to set backslashreplace as it otherwise returns a 'UnicodeDecodeError' or would loose information setting it to another flag value.
my_str = rnd_bytes.decode('utf-8' , 'backslashreplace')
Now, I have the string.
I want to convert it back to exactly the original byte (size 4!):
According to python ressources and this answer, there are different possibilities:
conv_bytes = bytes(my_str, 'utf-8')
conv_bytes = my_str.encode('utf-8')
But len(conv_bytes) returns 10.
I tried to analyse the outcome:
>>> repr(rnd_bytes)
"b'w\\x12\\x96\\xb8'"
>>> repr(my_str)
"'w\\x12\\\\x96\\\\xb8'"
>>> repr(conv_bytes)
"b'w\\x12\\\\x96\\\\xb8'"
It would make sense to replace '\\\\'. my_str.replace('\\\\','\\') doesn't change anything. Probably, because four backslashes represent only two. So, my_str.replace('\\','\') would find the '\\\\', but leads to
SyntaxError: EOL while scanning string literal
due to the last argument '\'. This had been discussed here, where the following suggestion came up:
>>> my_str2=my_str.encode('utf_8').decode('unicode_escape')
>>> repr(my_str2)
"'w\\x12\\x96¸'"
This replaces the '\\\\' but seems to add / change some other characters:
>>> conv_bytes2 = my_str2.encode('utf8')
>>> len(conv_bytes2)
6
>>> repr(conv_bytes2)
"b'w\\x12\\xc2\\x96\\xc2\\xb8'"
There must be a prober way to convert a (complex) byte to a string and back. How can I achieve that?
Note: Some codes found on the Internet.
You could try to convert it to hex format. Then it is easy to convert it back to byte format.
Sample code to convert bytes to string:
hex_str = rnd_bytes.hex()
Here is how 'hex_str' looks like:
'771296b8'
And code for converting it back to bytes:
new_rnd_bytes = bytes.fromhex(hex_str)
The result is:
b'w\x12\x96\xb8'
For processing you can use:
readable_str = ''.join(chr(int(hex_str[i:i+2], 16)) for i in range(0, len(hex_str), 2))
But newer try to encode readable string, here is how readable string looks like:
'w\x12\x96¸'
After processing readable string convert it back to hex format before converting it back to bytes string like:
hex_str = ''.join([str(hex(ord(i)))[2:4] for i in readable_str])
Now, converting it to a string. Note: I need to set backslashreplace as it otherwise returns a 'UnicodeDecodeError' or would loose information setting it to another flag value.
The UTF-8 encoding cannot interpret every possible sequence of bytes as a string. Using backslashreplace gives you a string that preserves the information for bytes that couldn't be converted:
>>> rnd_bytes = b'w\x12\x96\xb8'
>>> rnd_bytes.decode('utf-8', 'backslashreplace')
'w\x12\\x96\\xb8'
but that representation is not very useful for converting back.
Instead, use an encoding that does interpret every possible sequence of bytes as a string. The most straightforward of these is ISO-8859-1, which simply maps each byte one at a time to the first 256 Unicode code points respectively.
>>> rnd_bytes.decode('iso-8859-1')
'w\x12\x96¸'
>>> rnd_bytes.decode('iso-8859-1').encode('iso-8859-1') == rnd_bytes
True

What does Python string.maketrans("","") do?

string.maketrans("","")
gives
\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13
\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?
#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~
\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90
\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2
\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4
\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9
\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde
\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed
\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff
What does this mean?
And how does it help in removing punctuation in a string with the following call:
import string
myStr.translate(string.maketrans("",""), string.punctuation)
I'll take some liberties, since Python 2 muddles the line being strings and bytes. There are 256 bytes, ranging from 0 to 255. You can get their byte representation by using chr(). So, all the bytes from 0 to 255 look like this
>>> ''.join(map(chr, range(256)))
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\
x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;
<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80
\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93
\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6
\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9
\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc
\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf
\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2
\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
string.maketrans(from, to) creates a string of 256 characters, where the characters in from will be replaced by to. For example, string.maketrans('ab01', 'AB89') will return the string from above, but a will be replaced by A, b by B, 0 by 8 and 1 by 9.
>>> string.maketrans('ab01', 'AB89')
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\
x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./8923456789:;
<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`ABcdefghijklmnopqrstuvwxyz{|}~\x7f\x80
\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93
\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6
\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9
\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc
\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf
\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2
\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
Effectively, string.maketrans('', '') == ''.join(map(chr, range(256))).
This serves as a map, which when provided to str.translate(), it can be used to replace multiple characters with one pass over your string. For the example map above, all characters will remain the same, except from all a turning into A, b into B, etc. If you do myStr.translate(string.maketrans('', '')), you simply don't change anything in myStr.
Finally, translate() has one additional argument, deletechars. If you pass a string for that argument, translate() will translate all characters according to the mapping you provide, but it will ignore, any characters in deletechars. So, putting it all together, myStr.translate(string.maketrans('', ''), string.punctuation) does not change any character in the string, but in the process will ignore any character in string.punctuation. Effectively, you have removed the punctuation in the output string.
string.maketrans(intab, outtab)returns a translation table that maps each character in the intabstring into the character at the same position in the outtab string.
tran_table = string.maketrans(intab, outtab)
print myStr.translate(tran_table)
The code above will then translate myStr using your created table. In your case the table generates all characters because you do not specify anything.
Python 2.7's string.maketrans() returns a byte value, like your result, which could be used with string.translate().
string.translate(s, table) translates characters in s (Let's call this c) into table[ord(c)]. So \x00 is translated into table[0], and so on. In your case, it's just returning an identity table.
It should be noted that string.translate is deprecated in Python 2.7, and in Python 3.1 and onwards, they are replaced by bytes.maketrans(), bytes.translate(), and the corresponding methods for str ans bytearray.

Python: Converting HEX string to bytes

I'm trying to make byte frame which I will send via UDP. I have class Frame which has attributes sync, frameSize, data, checksum etc. I'm using hex strings for value representation. Like this:
testFrame = Frame("AA01","0034","44853600","D43F")
Now, I need to concatenate this hex values together and convert them to byte array like this?!
def convertToBits(self):
stringMessage = self.sync + self.frameSize + self.data + self.chk
return b16decode(self.stringMessage)
But when I print converted value I don't get the same values or I don't know to read python notation correctly:
This is sync: AA01
This is frame size: 0034
This is data:44853600
This is checksum: D43F
b'\xaa\x01\x004D\x856\x00\xd4?'
So, first word is converted ok (AA01 -> \xaa\x01) but (0034 -> \x004D) it's not the same. I tried to use bytearray.fromhex because I can use spaces between bytes but I got same result. Can you help me to send same hex words via UDP?
Python displays any byte that can represent a printable ASCII character as that character. 4 is the same as \x34, but as it opted to print the ASCII character in the representation.
So \x004 is really the same as \x00\x34, D\x856\x00 is the same as \x44\x85\x36\x00, and \xd4? is the same as \xd4\x3f, because:
>>> b'\x34'
'4'
>>> b'\x44'
'D'
>>> b'\x36'
'6'
>>> b'\x3f'
'?'
This is just the representation of the bytes value; the value is entirely correct and you don't need to do anything else.
If it helps, you can visualise the bytes values as hex again using binascii.hexlify():
>>> import binascii
>>> binascii.hexlify(b'\xaa\x01\x004D\x856\x00\xd4?')
b'aa01003444853600d43f'
and you'll see that 4, D, 6 and ? are once again represented by the correct hexadecimal characters.

Categories