What does Python string.maketrans("","") do? - python

string.maketrans("","")
gives
\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13
\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?
#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~
\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90
\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2
\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4
\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9
\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde
\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed
\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff
What does this mean?
And how does it help in removing punctuation in a string with the following call:
import string
myStr.translate(string.maketrans("",""), string.punctuation)

I'll take some liberties, since Python 2 muddles the line being strings and bytes. There are 256 bytes, ranging from 0 to 255. You can get their byte representation by using chr(). So, all the bytes from 0 to 255 look like this
>>> ''.join(map(chr, range(256)))
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\
x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;
<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80
\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93
\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6
\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9
\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc
\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf
\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2
\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
string.maketrans(from, to) creates a string of 256 characters, where the characters in from will be replaced by to. For example, string.maketrans('ab01', 'AB89') will return the string from above, but a will be replaced by A, b by B, 0 by 8 and 1 by 9.
>>> string.maketrans('ab01', 'AB89')
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\
x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./8923456789:;
<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`ABcdefghijklmnopqrstuvwxyz{|}~\x7f\x80
\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93
\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6
\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9
\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc
\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf
\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2
\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
Effectively, string.maketrans('', '') == ''.join(map(chr, range(256))).
This serves as a map, which when provided to str.translate(), it can be used to replace multiple characters with one pass over your string. For the example map above, all characters will remain the same, except from all a turning into A, b into B, etc. If you do myStr.translate(string.maketrans('', '')), you simply don't change anything in myStr.
Finally, translate() has one additional argument, deletechars. If you pass a string for that argument, translate() will translate all characters according to the mapping you provide, but it will ignore, any characters in deletechars. So, putting it all together, myStr.translate(string.maketrans('', ''), string.punctuation) does not change any character in the string, but in the process will ignore any character in string.punctuation. Effectively, you have removed the punctuation in the output string.

string.maketrans(intab, outtab)returns a translation table that maps each character in the intabstring into the character at the same position in the outtab string.
tran_table = string.maketrans(intab, outtab)
print myStr.translate(tran_table)
The code above will then translate myStr using your created table. In your case the table generates all characters because you do not specify anything.

Python 2.7's string.maketrans() returns a byte value, like your result, which could be used with string.translate().
string.translate(s, table) translates characters in s (Let's call this c) into table[ord(c)]. So \x00 is translated into table[0], and so on. In your case, it's just returning an identity table.
It should be noted that string.translate is deprecated in Python 2.7, and in Python 3.1 and onwards, they are replaced by bytes.maketrans(), bytes.translate(), and the corresponding methods for str ans bytearray.

Related

Using bytearray.translate() with a table

I'm trying to remove certain characters from a bytearray (specifically, certain control characters that are messing up my formatting)
I manually listed individual translates and it worked, but can I format this as a single translate?
In the string variant, the input can be a dictionary table. But I get an error when I tried this that parameters must be a bytearray object.
translation_table_0A = bytes.maketrans(b"\x0A", b"\x00")
translation_table_0B = bytes.maketrans(b"\x0B", b"\x00")
translation_table_0C = bytes.maketrans(b"\x0C", b"\x00")
translation_table_0D = bytes.maketrans(b"\x0D", b"\x00")
translation_table_04 = bytes.maketrans(b"\x04", b"\x00")
test_bytes = bytearray(b"\x75\x66\x73\x62\x0D\x73\x62\x0B\x00\x74\xF1\x74\x73\x62\x61\x76\x00\x0C\x76\x02\x04\x01\x62\x68\x72\x74\x00\x00\x00\x0A\x01\x00")
out_list = test_bytes.translate(translation_table_0A) # remove \x0A
out_list = out_list.translate(translation_table_0B) # remove \x0B
out_list = out_list.translate(translation_table_0C) # remove \x0C
out_list = out_list.translate(translation_table_0D) # remove \x0D
out_list = out_list.translate(translation_table_04) # remove \x04
print(f"Output coded: {obj}")
print(f"Output decoded: {obj.decode('mac-roman')}")
I would think it would work like this:
translate_dict = {b"\x0A" : b"\x00", b"\x0B" : b"\x00", b"\x0C" : b"\x00", b"\x0D" : b"\x00", b"\x04" : b"\x00", }
out_list = test_bytes.translate(translate_dict) # remove Control Chars
But it doesn't. Does anyone know how to get this working?
Unfortunately the documentation is lacking in details:
bytes
bytes maketrans()
bytes methods
bytes translate()
From the maketrans method, a table can be generated, but 'from' and 'to' must be byte-like objects, so tuples, lists, or dictionaries wont work.
note: Not interested in regex solutions, or other libraries. Specifically looking for this application.
If you want a bytes translation table, you get a 256-byte mapping of bytes, that is indexed into by the source byte value, and returns the nth byte value in the mapping. You don't have to set up 4 different translation tables to translate 4 bytes, you can do it like this:
>>> translation_table = bytes.maketrans(b"\x0A\x0B\x0C\x0D\x04", b"\x00\x00\x00\x00\x00")
That will let you change the unwanted byte values to \x00 like this:
>>> test_bytes=bytearray(b"\x75\x66\x73\x62\x0D\x73\x62\x0B\x00\x74\xF1\x74\x73\x62\x61\x76\x00\x0C\x76\x02\x04\x01\x62\x68\x72\x74\x00\x00\x00\x0A\x01\x00")
>>> test_bytes.translate(translation_table)
bytearray (b'ufsb\x00sb\x00\x00t\xf1tsbav\x00\x00v\x02\x00\x01bhrt\x00\x00\x00\x00\x01\x00')
which does not look exactly like test_bytes with 4 byte values changed, because the default representation of a printable character in a bytestring is the printable character and not the hex escape. You can see this if you ask for test_bytes back:
>>> test_bytes
bytearray (b'ufsb\rsb\x0b\x00t\xf1tsbav\x00\x0cv\x02\x04\x01bhrt\x00\x00\x00\n\x01\x00')
Here sequences such as tsbav and bhrt appear as printable characters and not as hex escapes. But it is only the representation that differs.
If you are working with bytes, you can't use a dictionary as a translation table. In Python 3, where strings are Unicode, a 256-byte mapping table won't work, because there are 1,114,112 possible codepoints that the table might need to translate. So for strings, translate() uses a dict instead. While efficient, a dict can't match a 256-byte character map for efficiency. So bytes.maketrans() makes a 256-byte character map, but str.maketrans() makes a dict, and the corresponding translate() methods expect the corresponding kind of translation table.

deleting escape characters python

temp = str(read_temp())
### temp is 29.12
temp = binascii.hexlify(temp)
### now temp is 32392e3132
n = 2
ta = [temp[i:i+n] for i in range(0, len(temp), n)]
### now ta[0]=32 ta[1]=39 ta[2]=2e ta[3]=31 ta[4]=32
print(type(ta[0]))
data_send = r'\x00\x00\x00\x00\x'+ta[0]+r'\x'+ta[1]+r'\x'+ta[2]+r'\x'+ta[3]+r'\x'+ta[4]
data_send = literal_eval("'%s'" %data_send) # that can be delete
yield Task(self.send, data_send)
Hi, python version=2.7.1.6
I read the temperature. Example of temperature is 29.22 *C. I want to add this value of temperature to data_send like ascii code. Then i will send the data on tornado web server on iec104 protocol.
when i print the data the result is '\x00\x00\x00\x0028.87'. I want to change this data like that'\x00\x00\x00\x00\x32\x38\x2e\x38\x37'. But the result goes on like that: \\x00\\x00\\x00\\x00\\x32\\x38\\x2e\\x38\\x37
I want to delete this extra escaping character \
Please help me
You're using r-prefixed strings (raw strings). Within raw strings, any backslashes are interpreted literally, not as an escape character. If you want a string in which each character has the actual hex value you're encoding, like '\x00' for 0, remove the r prefix from the string.
Then, when printing the string, use the repr function to reverse the encoding (i.e. to see the escape sequences used):
>>> s = b"\x61\x00\x12"
>>> print(repr(s))
b'a\x00\x12'
Note that any hex value that corresponds to a printable character (like x61 above) will be shown as the actual character (a in this case), instead of the escape sequence.
The string will contain the actual values encoded with a hex escape sequence:
>>> print(*s)
97 0 18
If you just want a string of literal escape sequences, regardless of whether the character is printable or not, you'll have to do it manually.
Given a list of numbers you want to encode as hex sequences,
nums = [97, 0, 18]
you can do
escaped = ''.join(r'\x{:02x}'.format(num) for num in nums)
(in the format specification, 0 is the fill character, 2 is the width, and x indicates hexadecimal). Now, if you print escaped, you will see a string of escape sequences:
>>> print(escaped)
\x61\x00\x12
If you need to send a temperature as plain text characters after four null characters, this will work:
temp = str(read_temp())
data_send = b'\x00\x00\x00\x00' + temp.encode('ascii')
yield Task(self.send, data_send)
Also, just:
print(b'\x00\x00\x00\x00' + '28.87'.encode('ascii'))
Result:
b'\x00\x00\x00\x0028.87'
Which is exactly what you need, i.e. a string of bytes, four chr(0) followed by a chr(0x32), chr(0x38), chr(0x2e), chr(0x38) and chr(0x37).
Unless of course the service somehow expects a Python string representation of the data, which would be more than a bit odd, but not impossible.

Convert escaped utf-8 string to utf in python 3

I have a py3 string that includes escaped utf-8 sequencies, such as "Company\\ffffffc2\\ffffffae", which I would like to convert to the correct utf 8 string (which would in the example be "Company®", since the escaped sequence is c2 ae). I've tried
print (bytes("Company\\\\ffffffc2\\\\ffffffae".replace(
"\\\\ffffff", "\\x"), "ascii").decode("utf-8"))
result: Company\xc2\xae
print (bytes("Company\\\\ffffffc2\\\\ffffffae".replace (
"\\\\ffffff", "\\x"), "ascii").decode("unicode_escape"))
result: Company®
(wrong, since chracters are treated separately, but they should be treated together.
If I do
print (b"Company\xc2\xae".decode("utf-8"))
It gives the correct result.
Company®
How can i achieve that programmatically (i.e. starting from a py3 str)
A simple solution is:
import ast
test_in = "Company\\\\ffffffc2\\\\ffffffae"
test_out = ast.literal_eval("b'''" + test_in.replace('\\\\ffffff','\\x') + "'''").decode('utf-8')
print(test_out)
However it will fail if there is a triple quote ''' in the input string itself.
Following code does not have this problem, but it is not as simple as the first one.
In the first step the string is split on a regular expression. The odd items are ascii parts, e.g. "Company"; each even item corresponds to one escaped utf8 code, e.g. "\\\\ffffffc2". Each substring is converted to bytes according to its meaning in the input string. Finally all parts are joined together and decoded from bytes to a string.
import re
REGEXP = re.compile(r'(\\\\ffffff[0-9a-f]{2})', flags=re.I)
def convert(estr):
def split(estr):
for i, substr in enumerate(REGEXP.split(estr)):
if i % 2:
yield bytes.fromhex(substr[-2:])
elif substr:
yield bytes(substr, 'ascii')
return b''.join(split(estr)).decode('utf-8')
test_in = "Company\\\\ffffffc2\\\\ffffffae"
print(convert(test_in))
The code could be optimized. Ascii parts do not need encode/decode and consecutive hex codes should be concatenated.

Creating \x Single Char Hex Values in Python

How do you dynamically create single char hex values?
For instance, I tried
a = "ff"
"\x{0}".format(a)
and
a = "ff"
"\x" + a
I ultimately was looking for something like
\xff
However, neither of the combinations above appear to work.
Additionally, I was originally using chr to obtain single char hex representations of integers but I noticed that chr(63) would return ? (as that is its ascii representation).
Is there another function aside from chr that will return chr(63) as \x_ _ where _ _ is its single char hex representation? In other words, a function that only produces single char hex representations.
When you say \x{0}, Python escapes x and thinks that the next two characters will be hexa-decimal characters, but they are actually not. Refer the table here.
\xhh Character with hex value hh (4,5)
4 . Unlike in Standard C, exactly two hex digits are required.
5 . In a string literal, hexadecimal and octal escapes denote the byte with the given value; it is not necessary that the byte encodes a character in the source character set. In a Unicode literal, these escapes denote a Unicode character with the given value.
So, you have to escape \ in \x, like this
print "\\x{0}".format(a)
# \xff
Try str.decode with 'hex' encoding:
In [204]: a.decode('hex')
Out[204]: '\xff'
Besides, chr returns a single-char string, you don't need to worry about the output of this string:
In [219]: c = chr(31)
In [220]: c
Out[220]: '\x1f'
In [221]: print c #invisible printout
In [222]:

how to compare backslash in python

I have a set of strings that are read from a file say ['\x1\p1', '\x2\p2', '\x3\p3', ... etc.].
When I read them into variables and print them the strings displayed as ['\\x1\\p1', '\\x2\\p2', '\\x3\\p3', ... etc.]. I understand that the variable is represented as '\x1\p1', ... etc. internally, but when it is displayed it is displayed with double slash.
but now I want to search and replace the elements of this list in the sentence, i.e say if \x1\p1 is in the sentence "How are you doing \x1\p1" then replace '\x1\p1' with 'Y'. But the replace method does not work in this case! wonder why?
Let me explain further:
my text file (codes.txt) has entries \xs1\x32, \xs2\x54 delimited by new line. so when I read it using
with open('codes') as codes:
code_list = codes.readlines()
next, I do lets say code_list_element_1 = code_list[1].rstrip()
when I print code_list_element_1, it displays as '\\xs1\\x32'
Next, let me target string be target_string = 'Hi! my name is \xs1\x32'
now I want to replace code_list_element_1 which is supposed to be \xs1\x32 in the target_string with say 'Y'
So, I tried code_list_element_1 in target_string. I get False
Next, instead of reading the codes from a text file I initialized a variable find_me = '\xs1\x32'
now, I try find_me in target_string. I get True
and hence target_string.replace(find_me,"Y") displays what I want: "Hi! my name is Y"
You are looking at a string representation that can be pasted back into Python; the backslashes are doubled to make sure the values are not interpreted as escape sequences (such as \n, meaning a newline, or \xfe, meaning the byte with value 254, hex FE).
If you are building new string values, you also need to use those doubled backslashes to prevent Python from seeing escape sequences where there are none, or use raw string literals:
>>> '\\x1\\p1'
'\\x1\\p1'
>>> r'\x1\p1'
'\\x1\\p1'
For this specific example, not handling the backslashes properly actually results in an exception:
>>> '\x1\p1'
ValueError: invalid \x escape
because Python expects to find two hex digits after a \x escape.
raw strings (those prefixed by r are very useful for backslash-itis.
In [9]: a=r"How are you doing \x1\p1"
In [10]: a
Out[10]: 'How are you doing \\x1\\p1'
In [11]: a.replace(r'\x1\p1', 'Y')
Out[11]: 'How are you doing Y'
In [12]:

Categories