deleting escape characters python

deleting escape characters python - python

temp = str(read_temp())
### temp is 29.12
temp = binascii.hexlify(temp)
### now temp is 32392e3132
n = 2
ta = [temp[i:i+n] for i in range(0, len(temp), n)]
### now ta[0]=32 ta[1]=39 ta[2]=2e ta[3]=31 ta[4]=32
print(type(ta[0]))
data_send = r'\x00\x00\x00\x00\x'+ta[0]+r'\x'+ta[1]+r'\x'+ta[2]+r'\x'+ta[3]+r'\x'+ta[4]
data_send = literal_eval("'%s'" %data_send) # that can be delete
yield Task(self.send, data_send)
Hi, python version=2.7.1.6
I read the temperature. Example of temperature is 29.22 *C. I want to add this value of temperature to data_send like ascii code. Then i will send the data on tornado web server on iec104 protocol.
when i print the data the result is '\x00\x00\x00\x0028.87'. I want to change this data like that'\x00\x00\x00\x00\x32\x38\x2e\x38\x37'. But the result goes on like that: \\x00\\x00\\x00\\x00\\x32\\x38\\x2e\\x38\\x37
I want to delete this extra escaping character \
Please help me

You're using r-prefixed strings (raw strings). Within raw strings, any backslashes are interpreted literally, not as an escape character. If you want a string in which each character has the actual hex value you're encoding, like '\x00' for 0, remove the r prefix from the string.
Then, when printing the string, use the repr function to reverse the encoding (i.e. to see the escape sequences used):
>>> s = b"\x61\x00\x12"
>>> print(repr(s))
b'a\x00\x12'
Note that any hex value that corresponds to a printable character (like x61 above) will be shown as the actual character (a in this case), instead of the escape sequence.
The string will contain the actual values encoded with a hex escape sequence:
>>> print(*s)
97 0 18
If you just want a string of literal escape sequences, regardless of whether the character is printable or not, you'll have to do it manually.
Given a list of numbers you want to encode as hex sequences,
nums = [97, 0, 18]
you can do
escaped = ''.join(r'\x{:02x}'.format(num) for num in nums)
(in the format specification, 0 is the fill character, 2 is the width, and x indicates hexadecimal). Now, if you print escaped, you will see a string of escape sequences:
>>> print(escaped)
\x61\x00\x12

If you need to send a temperature as plain text characters after four null characters, this will work:
temp = str(read_temp())
data_send = b'\x00\x00\x00\x00' + temp.encode('ascii')
yield Task(self.send, data_send)
Also, just:
print(b'\x00\x00\x00\x00' + '28.87'.encode('ascii'))
Result:
b'\x00\x00\x00\x0028.87'
Which is exactly what you need, i.e. a string of bytes, four chr(0) followed by a chr(0x32), chr(0x38), chr(0x2e), chr(0x38) and chr(0x37).
Unless of course the service somehow expects a Python string representation of the data, which would be more than a bit odd, but not impossible.

Related

How to replace 'b' with 0 or 1 in the binary representation of a string

import binascii
a = []
a = input('enter the messge')
def str2bin(message):
binary = bin(int(binascii.hexlify(message.encode("ascii")), 16))
return binary[1:]
print(str2bin(a))
Input string : hai
Output : b11010000110000101101001
How to remove or replace the 'b' from the output and replace it with another binary digit ?

Python strings cannot be changed after they have been created, they are immutable. You will have to create a new string, combining the digit and a substring of the original string, like this:
data = str2bin(a)
data0 = "0" + data[1:]
data1 = "1" + data[1:]
[1:] is a slice. In this case, it makes a copy of the string with the first character (at index 0) removed.

The bin function isn't suitable for this task. Not only does it give you that unwanted 'b', it also removes leading zeros, so the encoded bit strings vary in length, making them difficult to decode correctly. Instead, you can use the format function or method, and specify the bit length so no leading zeros are lost.
In Python 3, binascii.hexlify isn't required, we can get the necessary integers directly from the bytes object. The code below ensures that the bit string for each byte has exactly 8 bits, padding with zeros on the left when necessary. It uses the default UTF-8 encoding, but you can change that to 'ascii' if you want. Both encodings give the same result if the input string is pure ASCII, but 'utf8' handles any Unicode. Of course, for characters outside the ASCII range a single character will be encoded as 2 or more bytes.
s = 'hai'
bits = ''.join([format(u, '08b') for u in s.encode()])
print(bits)
output
011010000110000101101001
If you have Python 3.6+, you can do this using the more compact (and faster) f-string syntax:
bits = ''.join([f'{u:08b}' for u in s.encode()])

What does Python string.maketrans("","") do?

string.maketrans("","")
gives
\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13
\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?
#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~
\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90
\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2
\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4
\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9
\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde
\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed
\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff
What does this mean?
And how does it help in removing punctuation in a string with the following call:
import string
myStr.translate(string.maketrans("",""), string.punctuation)

I'll take some liberties, since Python 2 muddles the line being strings and bytes. There are 256 bytes, ranging from 0 to 255. You can get their byte representation by using chr(). So, all the bytes from 0 to 255 look like this
>>> ''.join(map(chr, range(256)))
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\
x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;
<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80
\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93
\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6
\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9
\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc
\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf
\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2
\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
string.maketrans(from, to) creates a string of 256 characters, where the characters in from will be replaced by to. For example, string.maketrans('ab01', 'AB89') will return the string from above, but a will be replaced by A, b by B, 0 by 8 and 1 by 9.
>>> string.maketrans('ab01', 'AB89')
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\
x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./8923456789:;
<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`ABcdefghijklmnopqrstuvwxyz{|}~\x7f\x80
\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93
\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6
\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9
\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc
\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf
\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2
\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
Effectively, string.maketrans('', '') == ''.join(map(chr, range(256))).
This serves as a map, which when provided to str.translate(), it can be used to replace multiple characters with one pass over your string. For the example map above, all characters will remain the same, except from all a turning into A, b into B, etc. If you do myStr.translate(string.maketrans('', '')), you simply don't change anything in myStr.
Finally, translate() has one additional argument, deletechars. If you pass a string for that argument, translate() will translate all characters according to the mapping you provide, but it will ignore, any characters in deletechars. So, putting it all together, myStr.translate(string.maketrans('', ''), string.punctuation) does not change any character in the string, but in the process will ignore any character in string.punctuation. Effectively, you have removed the punctuation in the output string.

string.maketrans(intab, outtab)returns a translation table that maps each character in the intabstring into the character at the same position in the outtab string.
tran_table = string.maketrans(intab, outtab)
print myStr.translate(tran_table)
The code above will then translate myStr using your created table. In your case the table generates all characters because you do not specify anything.

Python 2.7's string.maketrans() returns a byte value, like your result, which could be used with string.translate().
string.translate(s, table) translates characters in s (Let's call this c) into table[ord(c)]. So \x00 is translated into table[0], and so on. In your case, it's just returning an identity table.
It should be noted that string.translate is deprecated in Python 2.7, and in Python 3.1 and onwards, they are replaced by bytes.maketrans(), bytes.translate(), and the corresponding methods for str ans bytearray.

Convert escaped utf-8 string to utf in python 3

I have a py3 string that includes escaped utf-8 sequencies, such as "Company\\ffffffc2\\ffffffae", which I would like to convert to the correct utf 8 string (which would in the example be "Company®", since the escaped sequence is c2 ae). I've tried
print (bytes("Company\\\\ffffffc2\\\\ffffffae".replace(
"\\\\ffffff", "\\x"), "ascii").decode("utf-8"))
result: Company\xc2\xae
print (bytes("Company\\\\ffffffc2\\\\ffffffae".replace (
"\\\\ffffff", "\\x"), "ascii").decode("unicode_escape"))
result: CompanyÂ®
(wrong, since chracters are treated separately, but they should be treated together.
If I do
print (b"Company\xc2\xae".decode("utf-8"))
It gives the correct result.
Company®
How can i achieve that programmatically (i.e. starting from a py3 str)

A simple solution is:
import ast
test_in = "Company\\\\ffffffc2\\\\ffffffae"
test_out = ast.literal_eval("b'''" + test_in.replace('\\\\ffffff','\\x') + "'''").decode('utf-8')
print(test_out)
However it will fail if there is a triple quote ''' in the input string itself.
Following code does not have this problem, but it is not as simple as the first one.
In the first step the string is split on a regular expression. The odd items are ascii parts, e.g. "Company"; each even item corresponds to one escaped utf8 code, e.g. "\\\\ffffffc2". Each substring is converted to bytes according to its meaning in the input string. Finally all parts are joined together and decoded from bytes to a string.
import re
REGEXP = re.compile(r'(\\\\ffffff[0-9a-f]{2})', flags=re.I)
def convert(estr):
def split(estr):
for i, substr in enumerate(REGEXP.split(estr)):
if i % 2:
yield bytes.fromhex(substr[-2:])
elif substr:
yield bytes(substr, 'ascii')
return b''.join(split(estr)).decode('utf-8')
test_in = "Company\\\\ffffffc2\\\\ffffffae"
print(convert(test_in))
The code could be optimized. Ascii parts do not need encode/decode and consecutive hex codes should be concatenated.

Creating \x Single Char Hex Values in Python

How do you dynamically create single char hex values?
For instance, I tried
a = "ff"
"\x{0}".format(a)
and
a = "ff"
"\x" + a
I ultimately was looking for something like
\xff
However, neither of the combinations above appear to work.
Additionally, I was originally using chr to obtain single char hex representations of integers but I noticed that chr(63) would return ? (as that is its ascii representation).
Is there another function aside from chr that will return chr(63) as \x_ _ where _ _ is its single char hex representation? In other words, a function that only produces single char hex representations.

When you say \x{0}, Python escapes x and thinks that the next two characters will be hexa-decimal characters, but they are actually not. Refer the table here.
\xhh Character with hex value hh (4,5)
4 . Unlike in Standard C, exactly two hex digits are required.
5 . In a string literal, hexadecimal and octal escapes denote the byte with the given value; it is not necessary that the byte encodes a character in the source character set. In a Unicode literal, these escapes denote a Unicode character with the given value.
So, you have to escape \ in \x, like this
print "\\x{0}".format(a)
# \xff

Try str.decode with 'hex' encoding:
In [204]: a.decode('hex')
Out[204]: '\xff'
Besides, chr returns a single-char string, you don't need to worry about the output of this string:
In [219]: c = chr(31)
In [220]: c
Out[220]: '\x1f'
In [221]: print c #invisible printout
In [222]:

Show non printable characters in a string

Is it possible to visualize non-printable characters in a python string with its hex values?
e.g. If I have a string with a newline inside I would like to replace it with \x0a.
I know there is repr() which will give me ...\n, but I'm looking for the hex version.

I don't know of any built-in method, but it's fairly easy to do using a comprehension:
import string
printable = string.ascii_letters + string.digits + string.punctuation + ' '
def hex_escape(s):
return ''.join(c if c in printable else r'\x{0:02x}'.format(ord(c)) for c in s)

I'm kind of late to the party, but if you need it for simple debugging, I found that this works:
string = "\n\t\nHELLO\n\t\n\a\17"
procd = [c for c in string]
print(procd)
# Prints ['\n,', '\t,', '\n,', 'H,', 'E,', 'L,', 'L,', 'O,', '\n,', '\t,', '\n,', '\x07,', '\x0f,']
While just list is simpler, a comprehension makes it easier to add in filtering/mapping if necessary.

You'll have to make the translation manually; go through the string with a regular expression for example, and replace each occurrence with the hex equivalent.
import re
replchars = re.compile(r'[\n\r]')
def replchars_to_hex(match):
return r'\x{0:02x}'.format(ord(match.group()))
replchars.sub(replchars_to_hex, inputtext)
The above example only matches newlines and carriage returns, but you can expand what characters are matched, including using \x escape codes and ranges.
>>> inputtext = 'Some example containing a newline.\nRight there.\n'
>>> replchars.sub(replchars_to_hex, inputtext)
'Some example containing a newline.\\x0aRight there.\\x0a'
>>> print(replchars.sub(replchars_to_hex, inputtext))
Some example containing a newline.\x0aRight there.\x0a

Modifying ecatmur's solution to handle non-printable non-ASCII characters makes it less trivial and more obnoxious:
def escape(c):
if c.printable():
return c
c = ord(c)
if c <= 0xff:
return r'\x{0:02x}'.format(c)
elif c <= '\uffff':
return r'\u{0:04x}'.format(c)
else:
return r'\U{0:08x}'.format(c)
def hex_escape(s):
return ''.join(escape(c) for c in s)
Of course if str.isprintable isn't exactly the definition you want, you can write a different function. (Note that it's a very different set from what's in string.printable—besides handling non-ASCII printable and non-printable characters, it also considers \n, \r, \t, \x0b, and \x0c as non-printable.
You can make this more compact; this is explicit just to show all the steps involved in handling Unicode strings. For example:
def escape(c):
if c.printable():
return c
elif c <= '\xff':
return r'\x{0:02x}'.format(ord(c))
else:
return c.encode('unicode_escape').decode('ascii')
Really, no matter what you do, you're going to have to handle \r, \n, and \t explicitly, because all of the built-in and stdlib functions I know of will escape them via those special sequences instead of their hex versions.

I did something similar once by deriving a str subclass with a custom __repr__() method which did what I wanted. It's not exactly what you're looking for, but may give you some ideas.
# -*- coding: iso-8859-1 -*-
# special string subclass to override the default
# representation method. main purpose is to
# prefer using double quotes and avoid hex
# representation on chars with an ord > 128
class MsgStr(str):
def __repr__(self):
# use double quotes unless there are more of them within the string than
# single quotes
if self.count("'") >= self.count('"'):
quotechar = '"'
else:
quotechar = "'"
rep = [quotechar]
for ch in self:
# control char?
if ord(ch) < ord(' '):
# remove the single quotes around the escaped representation
rep += repr(str(ch)).strip("'")
# embedded quote matching quotechar being used?
elif ch == quotechar:
rep += "\\"
rep += ch
# else just use others as they are
else:
rep += ch
rep += quotechar
return "".join(rep)
if __name__ == "__main__":
s1 = '\tWürttemberg'
s2 = MsgStr(s1)
print "str s1:", s1
print "MsgStr s2:", s2
print "--only the next two should differ--"
print "repr(s1):", repr(s1), "# uses built-in string 'repr'"
print "repr(s2):", repr(s2), "# uses custom MsgStr 'repr'"
print "str(s1):", str(s1)
print "str(s2):", str(s2)
print "repr(str(s1)):", repr(str(s1))
print "repr(str(s2)):", repr(str(s2))
print "MsgStr(repr(MsgStr('\tWürttemberg'))):", MsgStr(repr(MsgStr('\tWürttemberg')))

There is also a way to print non-printable characters in the sense of them executing as commands within the string even if not visible (transparent) in the string, and their presence can be observed by measuring the length of the string using len as well as by simply putting the mouse cursor at the start of the string and seeing/counting how many times you have to tap the arrow key to get from start to finish, as oddly some single characters can have a length of 3 for example, which seems perplexing. (Not sure if this was already demonstrated in prior answers)
In this example screenshot below, I pasted a 135-bit string that has a certain structure and format (which I had to manually create beforehand for certain bit positions and its overall length) so that it is interpreted as ascii by the particular program I'm running, and within the resulting printed string are non-printable characters such as the 'line break` which literally causes a line break (correction: form feed, new page I meant, not line break) in the printed output there is an extra entire blank line in between the printed result (see below):
Example of printing non-printable characters that appear in printed string
Input a string:100100001010000000111000101000101000111011001110001000100001100010111010010101101011100001011000111011001000101001000010011101001000000
HPQGg]+\,vE!:#
>>> len('HPQGg]+\,vE!:#')
17
>>>
In the above code excerpt, try to copy-paste the string HPQGg]+\,vE!:# straight from this site and see what happens when you paste it into the Python IDLE.
Hint: You have to tap the arrow/cursor three times to get across the two letters from P to Q even though they appear next to each other, as there is actually a File Separator ascii command in between them.
However, even though we get the same starting value when decoding it as a byte array to hex, if we convert that hex back to bytes they look different (perhaps lack of encoding, not sure), but either way the above output of the program prints non-printable characters (I came across this by chance while trying to develop a compression method/experiment).
>>> bytes(b'HPQGg]+\,vE!:#').hex()
'48501c514767110c5d2b5c2c7645213a40'
>>> bytes.fromhex('48501c514767110c5d2b5c2c7645213a40')
b'HP\x1cQGg\x11\x0c]+\\,vE!:#'
>>> (0x48501c514767110c5d2b5c2c7645213a40 == 0b100100001010000000111000101000101000111011001110001000100001100010111010010101101011100001011000111011001000101001000010011101001000000)
True
>>>
In the above 135 bit string, the first 16 groups of 8 bits from the big-endian side encode each character (including non-printable), whereas the last group of 7 bits results in the # symbol, as seen below:
Technical breakdown of the format of the above 135-bit string
And here as text is the breakdown of the 135-bit string:
10010000 = H (72)
10100000 = P (80)
00111000 = x1c (28 for File Separator) *
10100010 = Q (81)
10001110 = G(71)
11001110 = g (103)
00100010 = x11 (17 for Device Control 1) *
00011000 = x0c (12 for NP form feed, new page) *
10111010 = ] (93 for right bracket ‘]’
01010110 = + (43 for + sign)
10111000 = \ (92 for backslash)
01011000 = , (44 for comma, ‘,’)
11101100 = v (118)
10001010 = E (69)
01000010 = ! (33 for exclamation)
01110100 = : (58 for colon ‘:’)
1000000 = # (64 for ‘#’ sign)
So in closing, the answer to the sub-question about showing the non-printable as hex, in byte array further above appears the letters x1c which denote the file separator command which was also noted in the hint. The byte array could be considered a string if excluding the prefix b on the left side, and again this value shows in the print string albeit it is invisible (although its presence can be observed as demonstrated above with the hint and len command).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

deleting escape characters python - python

Related

How to replace 'b' with 0 or 1 in the binary representation of a string

What does Python string.maketrans("","") do?

Convert escaped utf-8 string to utf in python 3

Creating \x Single Char Hex Values in Python

Show non printable characters in a string

Categories

Resources