Creating \x Single Char Hex Values in Python - python

How do you dynamically create single char hex values?
For instance, I tried
a = "ff"
"\x{0}".format(a)
and
a = "ff"
"\x" + a
I ultimately was looking for something like
\xff
However, neither of the combinations above appear to work.
Additionally, I was originally using chr to obtain single char hex representations of integers but I noticed that chr(63) would return ? (as that is its ascii representation).
Is there another function aside from chr that will return chr(63) as \x_ _ where _ _ is its single char hex representation? In other words, a function that only produces single char hex representations.

When you say \x{0}, Python escapes x and thinks that the next two characters will be hexa-decimal characters, but they are actually not. Refer the table here.
\xhh Character with hex value hh (4,5)
4 . Unlike in Standard C, exactly two hex digits are required.
5 . In a string literal, hexadecimal and octal escapes denote the byte with the given value; it is not necessary that the byte encodes a character in the source character set. In a Unicode literal, these escapes denote a Unicode character with the given value.
So, you have to escape \ in \x, like this
print "\\x{0}".format(a)
# \xff

Try str.decode with 'hex' encoding:
In [204]: a.decode('hex')
Out[204]: '\xff'
Besides, chr returns a single-char string, you don't need to worry about the output of this string:
In [219]: c = chr(31)
In [220]: c
Out[220]: '\x1f'
In [221]: print c #invisible printout
In [222]:

Related

deleting escape characters python

temp = str(read_temp())
### temp is 29.12
temp = binascii.hexlify(temp)
### now temp is 32392e3132
n = 2
ta = [temp[i:i+n] for i in range(0, len(temp), n)]
### now ta[0]=32 ta[1]=39 ta[2]=2e ta[3]=31 ta[4]=32
print(type(ta[0]))
data_send = r'\x00\x00\x00\x00\x'+ta[0]+r'\x'+ta[1]+r'\x'+ta[2]+r'\x'+ta[3]+r'\x'+ta[4]
data_send = literal_eval("'%s'" %data_send) # that can be delete
yield Task(self.send, data_send)
Hi, python version=2.7.1.6
I read the temperature. Example of temperature is 29.22 *C. I want to add this value of temperature to data_send like ascii code. Then i will send the data on tornado web server on iec104 protocol.
when i print the data the result is '\x00\x00\x00\x0028.87'. I want to change this data like that'\x00\x00\x00\x00\x32\x38\x2e\x38\x37'. But the result goes on like that: \\x00\\x00\\x00\\x00\\x32\\x38\\x2e\\x38\\x37
I want to delete this extra escaping character \
Please help me
You're using r-prefixed strings (raw strings). Within raw strings, any backslashes are interpreted literally, not as an escape character. If you want a string in which each character has the actual hex value you're encoding, like '\x00' for 0, remove the r prefix from the string.
Then, when printing the string, use the repr function to reverse the encoding (i.e. to see the escape sequences used):
>>> s = b"\x61\x00\x12"
>>> print(repr(s))
b'a\x00\x12'
Note that any hex value that corresponds to a printable character (like x61 above) will be shown as the actual character (a in this case), instead of the escape sequence.
The string will contain the actual values encoded with a hex escape sequence:
>>> print(*s)
97 0 18
If you just want a string of literal escape sequences, regardless of whether the character is printable or not, you'll have to do it manually.
Given a list of numbers you want to encode as hex sequences,
nums = [97, 0, 18]
you can do
escaped = ''.join(r'\x{:02x}'.format(num) for num in nums)
(in the format specification, 0 is the fill character, 2 is the width, and x indicates hexadecimal). Now, if you print escaped, you will see a string of escape sequences:
>>> print(escaped)
\x61\x00\x12
If you need to send a temperature as plain text characters after four null characters, this will work:
temp = str(read_temp())
data_send = b'\x00\x00\x00\x00' + temp.encode('ascii')
yield Task(self.send, data_send)
Also, just:
print(b'\x00\x00\x00\x00' + '28.87'.encode('ascii'))
Result:
b'\x00\x00\x00\x0028.87'
Which is exactly what you need, i.e. a string of bytes, four chr(0) followed by a chr(0x32), chr(0x38), chr(0x2e), chr(0x38) and chr(0x37).
Unless of course the service somehow expects a Python string representation of the data, which would be more than a bit odd, but not impossible.

Int to Byte conversion not representing in hexadecimal in Python

I'm having some difficulty in understanding how python converts between int and byte data types and specifically why it isn't consistent with representing it as hexadecimal numbers.
Consider the following where I convert the number 13 into a 2 byte representation:
>>> (13).to_bytes(2, byteorder='big')
b'\x00\r'
Why does it use the character r in the second byte location?
In this case I would have expected it to output:
b'\x00\xD'
Doing the reverse in both cases outputs the correct answer.
>>> int.from_bytes(b'\x00\x0D', byteorder='big')
13
>>> int.from_bytes(b'\x00\r', byteorder='big')
13
And both have the correct number of bytes
>>> len(b'\x00\x0D')
2
>>> len(b'\x00\r')
2
There is a difference between bytes and a hexadecimal representation. bytes is a datatype; hexadecimal is a way of representing bit patterns on the screen.
A bytes is an immutable sequence of 8-bit values. The interpreter displays it, where possible, as characters or as string escape sequences, and where not possible, in hexadecimal. The corresponding literal is called a bytestring. In other words, hexadecimal is a sort of last resort. You can construct the bytes b'ABC' using hexadecimal notation: b'\x41\x42\x43' but the interpreter will still report it as b'ABC'. That is no different from the way quotes are handled:
>>> a = "ABC"
>>> a
'ABC'
>>> a = 'AB\'C'
>>> a
"AB'C"
The interpreter has a standard way of displaying its data and takes no account of the way you entered that data in the first place. This isn't a roundtrip failure, because no information is lost. You are seeing an equivalent representation rather than the same representation, is all.
If you want to see a hexadecimal representation then you should ask for it explicitly, instead of relying on the interpreter's default way of displaying a particular datatype.
>>> fmt = '\\x{0:02X}\\x{1:02X}'
>>> print(fmt.format(*((13).to_bytes(2, byteorder='big'))))
\x00\0D
There are some special escape sequences which are so common that they are not explicitly represented with their hex values:
\a <-> \x07 alert
\b <-> \x08 backspace
\t <-> \x09 tab (horizontal)
\n <-> \x0A new line
\v <-> \x0B vertical tab
\f <-> \x0C formfeed
\r <-> \x0D carriage return
\" <-> \x22 "
\' <-> \x27 '
\\ <-> \x5C \
you find a full list of escape sequences and how they work here.
Note that there is also \0 <-> \x00 in the C language.

What does Python string.maketrans("","") do?

string.maketrans("","")
gives
\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13
\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?
#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~
\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90
\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2
\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4
\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9
\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde
\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed
\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff
What does this mean?
And how does it help in removing punctuation in a string with the following call:
import string
myStr.translate(string.maketrans("",""), string.punctuation)
I'll take some liberties, since Python 2 muddles the line being strings and bytes. There are 256 bytes, ranging from 0 to 255. You can get their byte representation by using chr(). So, all the bytes from 0 to 255 look like this
>>> ''.join(map(chr, range(256)))
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\
x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;
<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80
\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93
\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6
\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9
\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc
\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf
\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2
\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
string.maketrans(from, to) creates a string of 256 characters, where the characters in from will be replaced by to. For example, string.maketrans('ab01', 'AB89') will return the string from above, but a will be replaced by A, b by B, 0 by 8 and 1 by 9.
>>> string.maketrans('ab01', 'AB89')
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\
x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./8923456789:;
<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`ABcdefghijklmnopqrstuvwxyz{|}~\x7f\x80
\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93
\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6
\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9
\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc
\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf
\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2
\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'
Effectively, string.maketrans('', '') == ''.join(map(chr, range(256))).
This serves as a map, which when provided to str.translate(), it can be used to replace multiple characters with one pass over your string. For the example map above, all characters will remain the same, except from all a turning into A, b into B, etc. If you do myStr.translate(string.maketrans('', '')), you simply don't change anything in myStr.
Finally, translate() has one additional argument, deletechars. If you pass a string for that argument, translate() will translate all characters according to the mapping you provide, but it will ignore, any characters in deletechars. So, putting it all together, myStr.translate(string.maketrans('', ''), string.punctuation) does not change any character in the string, but in the process will ignore any character in string.punctuation. Effectively, you have removed the punctuation in the output string.
string.maketrans(intab, outtab)returns a translation table that maps each character in the intabstring into the character at the same position in the outtab string.
tran_table = string.maketrans(intab, outtab)
print myStr.translate(tran_table)
The code above will then translate myStr using your created table. In your case the table generates all characters because you do not specify anything.
Python 2.7's string.maketrans() returns a byte value, like your result, which could be used with string.translate().
string.translate(s, table) translates characters in s (Let's call this c) into table[ord(c)]. So \x00 is translated into table[0], and so on. In your case, it's just returning an identity table.
It should be noted that string.translate is deprecated in Python 2.7, and in Python 3.1 and onwards, they are replaced by bytes.maketrans(), bytes.translate(), and the corresponding methods for str ans bytearray.

How can I slice a substring from a unicode string with Python?

I have a unicode string as a result : u'splunk>\xae\uf001'
How can I get the substring 'uf001'
as a simple string in python?
The characters uf001 are not actually present in the string, so you can't just slice them off. You can do
repr(s)[-6:-1]
or
'u' + hex(ord(s[-1]))[2:]
Since you want the actual string (as seen from comments) , just get the last character [-1] index , Example -
>>> a = u'splunk>\xae\uf001'
>>> print(a)
splunk>®ï€
>>> a[-1]
'\uf001'
>>> print(a[-1])
ï€
If you want the unicode representation (\uf001) , then take repr(a[-1]) , Example -
>>> repr(a[-1])
"'\\uf001'"
\uf001 is a single unicode character (not multiple strings) , so you can directly get that character as above.
You see \uf001 because you are checking the results of repr() on the string, if you print it, or use it somewhere else (like for files, etc) it will be the correct \uf001 character.
u'' it is how a Unicode string is represented in Python source code. REPL uses this representation by default to display unicode objects:
>>> u'splunk>\xae\uf001'
u'splunk>\xae\uf001'
>>> print(u'splunk>\xae\uf001')
splunk>®
>>> print(u'splunk>\xae\uf001'[-1])

If your terminal is not configured to display Unicode or if you are on a narrow build (e.g., it is likely for Python 2 on Windows) then the result may be different.
Unicode string is an immutable sequence of Unicode codepoints in Python. len(u'\uf001') == 1: it does not contain uf001 (5 characters) in it. You could write it as u'' (it is necessary to declare the character encoding of your source file on Python 2 if you use non-ascii characters):
>>> u'\uf001' == u''
True
It is just a different way to represent exactly the same Unicode character (a single codepoint in this case).
Note: some user-perceived characters may span several Unicode codepoints e.g.:
>>> import unicodedata
>>> unicodedata.normalize('NFKD', u'ё')
u'\u0435\u0308'
>>> print(unicodedata.normalize('NFKD', u'ё'))
ё

Python issue with incorrectly formated strings that contains \x

At some point our python script receives string like that:
In [1]: ab = 'asd\xeffe\ctive'
In [2]: print ab
asd�fe\ctve \ \\ \\\k\\\
Data is damaged we need escape \x to be properly interpreted as \x but \c has not special meaning in string thus must be intact.
So far the closest solution I found is do something like:
In [1]: ab = 'asd\xeffe\ctve \\ \\\\ \\\\\\k\\\\\\'
In [2]: print ab.encode('string-escape').replace('\\\\', '\\').replace("\\'", "'")
asd\xeffe\ctve \ \\ \\\k\\\
Output taken from IPython, I assumed that ab is a string not unicode string (in the later case we would have to do something like that:
def escape_string(s):
if isinstance(s, str):
s = s.encode('string-escape').replace('\\\\', '\\').replace("\\'", "'")
elif isinstance(s, unicode):
s = s.encode('unicode-escape').replace('\\\\', '\\').replace("\\'", "'")
return s
\xhh is an escape character and \x is seen as the start of this escape.
'\\' is the same as '\x5c'. It is just two different ways to write the backslash character as a Python string literal.
These literal strings: r'\c', '\\c', '\x5cc', '\x5c\x63' are identical str objects in memory.
'\xef' is a single byte (239 as an integer), but r'\xef' (same as '\\xef') is a 4-byte string: '\x5c\x78\x65\x66'.
If s[0] returns '\xef' then it is what s object actually contains. If it is wrong then fix the source of the data.
Note: string-escape also escapes \n and the like:
>>> print u'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''.encode('unicode-escape')
\xef\\c\\\u2603"'\u2603\u2603"'\n\xa0
>>> print b'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''.encode('string-escape')
\xef\\c\\\\N{SNOWMAN}"\'\xe2\x98\x83\\u2603"\'\n\xa0
backslashreplace is used only on characters that cause UnicodeEncodeError:
>>> print u'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''
ï\c\☃"'☃☃"'
>>> print b'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''
�\c\\N{SNOWMAN}"'☃\u2603"'
�
>>> print u'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''.encode('ascii', 'backslashreplace')
\xef\c\\u2603"'\u2603\u2603"'
\xa0
>>> print b'''\xef\c\\\N{SNOWMAN}"'\
... ☃\u2603\"\'\n\xa0'''.decode('latin1').encode('ascii', 'backslashreplace')
\xef\c\\N{SNOWMAN}"'\xe2\x98\x83\u2603"'
\xa0
Backslashes introduce "escape sequences". \x specifically allows you to specify a byte, which is given as two hexadecimal digits after the x. ef are two hexadecimal digits, hence you get no error. Double the backslash to escape it, or use a raw string r"\xeffective".
Edit: While the Python console may show you '\\', this is precisely what you expect. You just say you expect something else because you confuse the string and its representation. It's a string containing a single backslash. If you were to output it with print, you'd see a single backslash.
But the string literal '\' is ill-formed (not closed because \' is an apostrophe, not a backslash and end-of-string-literal), so repr, which formats the results at the interactive shell, does not produce it. Instead it produces a string literal which you could paste into Python source code and get the same string object. For example, len('\\') == 1.
The \x escape sequence signifies a Unicode character in the string, and ef is being interpreted as the hex code. You can sanitize the string by adding an additional \, or else make it a raw string (r'\xeffective').
>>> r'\xeffective'[0]
'\\'
EDIT: You could convert an existing string using the following hack:
>>> a = '\xeffective'
>>> b = repr(a).strip("'")
>>> b
'\\xeffective'

Categories