Python: convert a dot separated hex values to string? - python

In this post: Print a string as hex bytes? I learned how to print as string into an "array" of hex bytes now I need something the other way around:
So for example the input would be: 73.69.67.6e.61.74.75.72.65 and the output would be a string.

you can use the built in binascii module. Do note however that this function will only work on ASCII encoded characters.
binascii.unhexlify(hexstr)
Your input string will need to be dotless however, but that is quite easy with a simple
string = string.replace('.','')
another (arguably safer) method would be to use base64 in the following way:
import base64
encoded = base64.b16encode(b'data to be encoded')
print (encoded)
data = base64.b16decode(encoded)
print (data)
or in your example:
data = base64.b16decode(b"7369676e6174757265", True)
print (data.decode("utf-8"))
The string can be sanitised before input into the b16decode method.
Note that I am using python 3.2 and you may not necessarily need the b out the front of the string to denote bytes.
Example was found here

Without binascii:
>>> a="73.69.67.6e.61.74.75.72.65"
>>> "".join(chr(int(e, 16)) for e in a.split('.'))
'signature'
>>>
or better:
>>> a="73.69.67.6e.61.74.75.72.65"
>>> "".join(e.decode('hex') for e in a.split('.'))
PS: works with unicode:
>>> a='.'.join(x.encode('hex') for x in 'Hellö Wörld!')
>>> a
'48.65.6c.6c.94.20.57.94.72.6c.64.21'
>>> print "".join(e.decode('hex') for e in a.split('.'))
Hellö Wörld!
>>>
EDIT:
No need for a generator expression here (thx to thg435):
a.replace('.', '').decode('hex')

Use string split to get a list of strings, then base 16 for decoding the bytes.
>>> inp="73.69.67.6e.61.74.75.72.65"
>>> ''.join((chr(int(i,16)) for i in inp.split('.')))
'signature'
>>>

Related

How to print byte representation of a string?

So I have a string like "abcd" and I want to convert it into bytes and print it.
I tried print(b'abcd') which prints exactly b'abcd' but I want '\x61\x62\x63\x64'.
Is there a single function for this purpose or do I have to use unhexlify with join?
Note: This is a simplified example of what I'm acutally doing. I need the aforementioned representation for a regex search.
There's no single function to do it, so you would need to do the formatting manually:
s = 'abcd'
print(r'\x' + r'\x'.join(f'{b:02x}' for b in bytes(s, 'utf8')))
Output:
\x61\x62\x63\x64
You can get hex values of a string like this:
string = "abcd"
print(".".join(hex(ord(c))[2:] for c in string))

Convert hex to decimal/string in python

So I wrote this small socket program to send a udp packet and receive the response
sock.sendto(data, (MCAST_GRP, MCAST_PORT))
msgFromServer = sock.recvfrom(1024)
banner=msgFromServer[0]
print(msgFromServer[0])
#name = msgFromServer[0].decode('ascii', 'ignore')
#print(name)
Response is
b'\xff\xff\xff\xffI\x11server banner\x00map\x00game\x00Counter-Strike: Global Offensive\x00\xda\x02\x00\x10\x00dl\x01\x011.38.2.2\x00\xa1\x87iempty,secure\x00\xda\x02\x00\x00\x00\x00\x00\x00'
Now the thing is I wanted to convert all hex value to decimal,
I tried the decode; but then I endup loosing all the hex values.
How can I convert all the hex values to decimal in my case
example: \x13 = 19
EDIT: I guess better way to iterate my question is
How do I convert only the hex values to decimal in the given response
There are two problems here:
handling the non-ASCII bytes
handling \xhh sequences which are legitimate characters in Python strings
We can address both with a mix of regular expressions and string methods.
First, decode the bytes to ASCII using the backslashreplace error handler to avoid losing the non-ASCII bytes.
>>> import re
>>>
>>> decoded = msgFromServer[0].decode('ascii', errors='backslashreplace')
>>> decoded
'\\xff\\xff\\xff\\xffI\x11server banner\x00map\x00game\x00Counter-Strike: Global Offensive\x00\\xda\x02\x00\x10\x00dl\x01\x011.38.2.2\x00\\xa1\\x87iempty,secure\x00\\xda\x02\x00\x00\x00\x00\x00\x00'
Next, use a regular expression to replace the non-ASCII '\\xhh' sequences with their numeric equivalents:
>>> temp = re.sub(r'\\x([a-fA-F0-9]{2})', lambda m: str(int(m.group(1), 16)), decoded)
>>> temp
'255255255255I\x11server banner\x00map\x00game\x00Counter-Strike: Global Offensive\x00218\x02\x00\x10\x00dl\x01\x011.38.2.2\x00161135iempty,secure\x00218\x02\x00\x00\x00\x00\x00\x00'
Finally, map \xhh escape sequences to their decimal values using str.translate:
>>> tt = str.maketrans({x: str(x) for x in range(32)})
>>> final = temp.translate(tt)
>>> final
'255255255255I17server banner0map0game0Counter-Strike: Global Offensive021820160dl111.38.2.20161135iempty,secure02182000000'
You can first convert the bytes representation to hex using the bytes.hex method and then cast it into an integer with the appropriate base with int(x, base)
>>> b'\x13'.hex()
'13'
>>> int(b'\x13'.hex(), 16)
19
Assume v contains the response, what you are asking for is
[int(i) for i in v]
I suspect it's not what you want, it is what I read from the question

Decode unicode string in python

I'd like to decode the following string:
t\u028c\u02c8m\u0251\u0279o\u028a\u032f
It should be the IPA of 'tomorrow' as given in a JSON string from http://rhymebrain.com/talk?function=getWordInfo&word=tomorrow
My understanding is that it should be something like:
x = 't\u028c\u02c8m\u0251\u0279o\u028a\u032f'
print x.decode()
I have tried the solutions from here , here , here, and here (and several other that more or less apply), and several permutations of its parts, but I can't get it to work.
Thank you
You need a u before your string (in Python 2.x, which you appear to be using) to indicate that this is a unicode string:
>>> x = u't\u028c\u02c8m\u0251\u0279o\u028a\u032f' # note the u
>>> print x
tʌˈmɑɹoʊ̯
If you have already stored the string in a variable, you can use the following constructor to convert the string into unicode:
>>> s = 't\u028c\u02c8m\u0251\u0279o\u028a\u032f' # your string has a unicode-escape encoding but is not unicode
>>> x = unicode(s, encoding='unicode-escape')
>>> print x
tʌˈmɑɹoʊ̯
>>> x
u't\u028c\u02c8m\u0251\u0279o\u028a\u032f' # a unicode string

How do you decode an ascii string in python?

For example, in your python shell(IDLE):
>>> a = "\x3cdiv\x3e"
>>> print a
The result you get is:
<div>
but if a is an ascii encoded string:
>>> a = "\\x3cdiv\\x3e" ## it's the actual \x3cdiv\x3e string if you read it from a file
>>> print a
The result you get is:
\x3cdiv\x3e
Now what i really want from a is <div>, so I did this:
>>> b = a.decode("ascii")
>>> print b
BUT surprisingly I did NOT get the result I want, it's still:
\x3cdiv\x3e
So basically what do I do to convert a, which is \x3cdiv\x3e to b, which should be <div>?
Thanks
>>> a = rb"\x3cdiv\x3e"
>>> a.decode('unicode_escape')
'<div>'
Also check out some interesting codecs.
With python 3.x, you would adapt Kabie answer to
a = b"\x3cdiv\x3e"
a.decode('unicode_escape')
or
a = b"\x3cdiv\x3e"
a.decode('ascii')
both give
>>> a
b'<div>'
What is b prefix for ?
Bytes literals are always prefixed with 'b' or 'B'; they produce an
instance of the bytes type instead of the str type. They may only
contain ASCII characters; bytes with a numeric value of 128 or greater
must be expressed with escapes.

Python strip() unicode string?

How can you use string methods like strip() on a unicode string? and can't you access characters of a unicode string like with oridnary strings? (ex: mystring[0:4] )
It's working as usual, as long as they are actually unicode, not str (note: every string literal must be preceded by u, like in this example):
>>> a = u"coțofană"
>>> a
u'co\u021bofan\u0103'
>>> a[-1]
u'\u0103'
>>> a[2]
u'\u021b'
>>> a[3]
u'o'
>>> a.strip(u'ă')
u'co\u021bofan'
Maybe it's a bit late to answer to this, but if you are looking for the library function and not the instance method, you can use that as well.
Just use:
yourunicodestring = u' a unicode string with spaces all around '
unicode.strip(yourunicodestring)
In some cases it's easier to use this one, for example inside a map function like:
unicodelist=[u'a',u' a ',u' foo is just...foo ']
map (unicode.strip,unicodelist)
You can do every string operation, actually in Python 3, all str's are unicode.
>>> my_unicode_string = u"abcşiüğ"
>>> my_unicode_string[4]
u'i'
>>> my_unicode_string[3]
u'\u015f'
>>> print(my_unicode_string[3])
ş
>>> my_unicode_string[3:]
u'\u015fi\xfc\u011f'
>>> print(my_unicode_string[3:])
şiüğ
>>> print(my_unicode_string.strip(u"ğ"))
abcşiü
See the Python docs on Unicode strings and the following section on string methods. Unicode strings support all of the usual methods and operations as normal ASCII strings.

Categories