I have an integer that I converted to hexadecimal as follows:
int_N = 193402
hex_value = hex(int_N)
it gives me the following hex: 0x2f37a.
I want to convert the hexadecimal to string.
I tried this:
bytes.fromhex(hex_value[2:]).decode('ASCII')
# [2:] to get rid of the 0x
however, it gives me this error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 1: ordinal not in range(128)
Then I tried with decode('utf-8') instead of ASCII, but it gave me this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 1: invalid start byte
Any suggestions how to fix that? whay it's not converting the hexadecimal '0x2f37a' to a string?
After reading some documentations, I assume maybe Hexadecimal should contain even number of digits in order to covert to string, but wasn't able to do so or make it even, as I'm using hex() and it gave me the value.
Thanks and really appreciate any help!
You should look at the struct and binascii module.
import struct
import binascii
int_N = 193402
s = struct.Struct(">l")
val = s.pack(int_N)
output = binascii.hexlify(val)
print(output) #0002f37a
find out more about c_type packing here at PMOTW3.
If you simply want to convert it to a string, no other requirements, then this works (I'll pad it to 8 characters here):
int_N = 193402
s = hex(int_N)[2:].rjust(8, '0') # get rid of '0x' and pad to 8 characters
print(s, type(s))
Output:
0002f37a <class 'str'>
...proving that it's a string type. If you're interested in getting the individual bytes, then something like this will demonstrate:
for b in bytes.fromhex(s):
print(b, type(b))
Output:
0 <class 'int'>
2 <class 'int'>
243 <class 'int'>
122 <class 'int'>
... showing all four bytes (from eight hex digits) and proving they're integers. The key here is an even number of characters I chose 8) so that fromhex() can decode it. An odd number of bytes will give a ValueError.
Now you can either use the string or the bytes as you please.
Format numbers the way you like with f-strings (format strings). Here are examples of various forms of hexadecimal and binary:
>>> n=193402
>>> f'{n:x} {n:08x} {n:#x} {n:020b}'
'2f37a 0002f37a 0x2f37a 00101111001101111010'
See Format Specification Mini-Language.
Related
i have a problem.
i get data like:
hex_num='0EE6'
data_decode=str(codecs.decode(hex_num, 'hex'))[(0):(80)]
print(data_decode)
>>>b'\x0e\xe6'
And i want encode this like:
data_enc=str(codecs.encode(data_decode, 'hex'))[(2):(6)]
print(str(int(data_enc,16)))
>>>TypeError: encoding with 'hex' codec failed (TypeError: a bytes-like object is required, not 'str')
If i wrote this:
data_enc=str(codecs.encode(b'\x0e\xe6', 'hex'))[(2):(6)]
print(str(int(data_enc,16)))
>>>3814
It will retrun number what i want (3814)
Please help.
You can remove the quotation marks like this: data = b'\x0e\xe6'
The Python 3 documentation states:
Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.
When b is within a string, it will not behave like a string literal prefix, so you have to remove the quotations for the literal to work, and convert the text to bytes directly.
Corrected code:
import codecs
data = b'\x0e\xe6'
data_enc=str(codecs.encode(data, 'hex'))[(2):(6)]
print(str(int(data_enc,16)))
Output:
3814
To change from a hex string to binary data, then using binascii.unhexlify is a convenient method. e.g.:
>>> hex_num='0EE6'
>>> import binascii
>>> binascii.unhexlify(hex_num)
b'\x0e\xe6'
Then to convert the binary data to an integer, using int.from_bytes allows you control over the endianness of the data and if it signed. e.g:
>>> bytes_data = b'\x0e\xe6'
>>> int.from_bytes(bytes_data, byteorder='little', signed=False)
58894
>>> int.from_bytes(bytes_data, byteorder='big', signed=False)
3814
I am trying to create a random unicode generator and made a function that can create 16bit unicode charaters. This is my code:
import random
import string
def rand_unicode():
list = []
list.append(str(random.randint(0,1)))
for i in range(0,3):
if random.randint(0,1):
list.append(string.ascii_letters[random.randint(0, \
len(string.ascii_letters))-1].upper())
else:
list.append(str(random.randint(0,9)))
return ''.join(list)
print(rand_unicode())
The problem is that whenever I try to add a '\u' in the print statement, Python gives me the following error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
I tried raw strings but that only gives me output like '\u0070' without turning it into a unicode character. How can I properly connect the strings to create a unicode character? Any help is appreciated.
From:
The problem is that whenever I try to add a '\u' in the print statement, Python gives me the following error:
it sounds like the problem may be in code you haven't included in your question:
print('\u' + rand_unicode())
This won't do what you expect, because the '\u' is interpreted before the strings are concatenated. See Process escape sequences in a string in Python and try:
print(bytes('\\u' + rand_unicode(), 'us-ascii').decode('unicode_escape'))
A unicode escape sequence such as \u0070 is a single character. It is not the concatenation of \u and the ordinal.
>>> '\u0070' == 'p'
True
>>> '\u0070' == (r'\u' + '0070')
False
To convert an ordinal to a unicode character, you can pass the numerical ordinal to the chr builtin function. Use int(literal, 16) to convert a hex-literal ordinal to a numerical one:
>>> ordinal = '0070'
>>> chr(int(ordinal, 16)) # convert literal to number to unicode
'p'
>>> chr(int(rand_unicode(), 16))
'ᚈ'
Note that creating a literal ordinal is not required. You can directly create the numerical ordinal:
>>> chr(112) # convert decimal number to unicode
'p'
>>> chr(0x0070) # convert hexadecimal number to unicode
'p'
>>> chr(random.randint(0, 0x10FFF))
'嚟'
i know this type is asked alot but no answer was able to specifically help me with my problemsetup.
i have a list of ONLY Unicode codepoints so in this form:
304E
304F
...
No U+XXXX no '\XXXX' version.
Now i've tried to use stringmanipulation to recreate such strings
so i can simply print the corresponding unichar.
what i tried:
x = u'\\u' + listString
x = '\\u' + listString
x = '\u' + listString
the first 2 when printed just give me a '\uXXXX' string, but no idea
how to make it print the char not that string.
the last one gives me this error:
(unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
probably just something i dont get about unicode and stringmanipulation but i hope someone can help me out here.
Thanks in advance o/
You can use chr to get the character for a unicode code point:
>>> chr(0x304E)
'ぎ'
You can use int to convert a hexadecimal string to an integer:
>>> int('304E', 16)
12366
>>> chr(int('304E', 16))
'ぎ'
In Python, if I type
euro = u'\u20AC'
euroUTF8 = euro.encode('utf-8')
print(euroUTF8, type(euroUTF8), len(euroUTF8))
the output is
('\xe2\x82\xac', <type 'str'>, 3)
I have two questions:
1. it looks like euroUTF8 is encoded over 3 bytes, but how do I get its binary representation to see how many bits it contain?
2. what does 'x' in '\xe2\x82\xac' mean? I don't think 'x' is a hex number. And why there are three '\'?
In Python 2, print is a statement, not a function. You are printing a tuple here. Print the individual elements by removing the (..):
>>> euro = u'\u20AC'
>>> euroUTF8 = euro.encode('utf-8')
>>> print euroUTF8, type(euroUTF8), len(euroUTF8)
€ <type 'str'> 3
Now you get the 3 individual objects written as strings to stdout; my terminal just happens to be configured to interpret anything written to it as UTF-8, so the bytes correctly result in the € Euro symbol being displayed.
The \x<hh> sequences are Python string literal escape sequences (see the reference documentation); they are the default output for the repr() applied to a string with non-ASCII, non-printable bytes in them. You'll see the same thing when echoing the value in an interactive interpreter:
>>> euroUTF8
'\xe2\x82\xac'
>>> euroUTF8[0]
'\xe2'
>>> euroUTF8[1]
'\x82'
>>> euroUTF8[2]
'\xac'
They provide you with ASCII-safe debugging output. The contents of all Python standard library containers use this format; including lists, tuples and dictionaries.
If you want to format to see the bits that make up these values, convert each byte to an integer by using the ord() function, then format the integer as binary:
>>> ' '.join([format(ord(b), '08b') for b in euroUTF8])
'11100010 10000010 10101100'
Each letter in each encoding are represented using different number of bits. UTF-8 is a 8 bit encoding, so there is no need to get a binary representation to know each bit count of each character. (If you still want to present bits, refer to Martijn's answer.)
\x means that the following value is a byte. So x is not something like a hex number that you should convert or read. It identifies the following value, which is you are interested in. \'s are used to escape that x's because they are not a part of the value.
Let us use the character Latin Capital Letter a with Ogonek (U+0104) as an example.
I have an int that represents its UTF-8 encoded form:
my_int = 0xC484
# Decimal: `50308`
# Binary: `0b1100010010000100`
If use the unichr function i get: \uC484 or 쒄 (U+C484)
But, I need it to output: Ą
How do I convert my_int to a Unicode code point?
To convert the integer 0xC484 to the bytestring '\xc4\x84' (the UTF-8 representation of the Unicode character Ą), you can use struct.pack():
>>> import struct
>>> struct.pack(">H", 0xC484)
'\xc4\x84'
... where > in the format string represents big-endian, and H represents unsigned short int.
Once you have your UTF-8 bytestring, you can decode it to Unicode as usual:
>>> struct.pack(">H", 0xC484).decode("utf8")
u'\u0104'
>>> print struct.pack(">H", 0xC484).decode("utf8")
Ą
>>> int2bytes(0xC484).decode('utf-8')
u'\u0104'
>>> print(_)
Ą
where int2bytes() is defined here.
Encode the number to a hex string, using hex() or %x. Then you can interpret that as a series of hex bytes using the hex decoder. Finally use the utf-8 decoder to get a unicode string:
def weird_utf8_integer_to_unicode(n):
s= '%x' % n
if len(s) % 2:
s= '0'+s
return s.decode('hex').decode('utf-8')
The len check is in case the first byte is in the range 0x1–0xF, which would leave it missing a leading zero. This should be able to cope with any length string and any character (however encoding a byte sequence in an integer like this would be unable to preseve leading zero bytes).