convert string into hex in python [duplicate] - python

I have this string:
string = '{'id':'other_aud1_aud2','kW':15}'
And, simply put, I would like my string to turn into an hex string like this:'7b276964273a276f746865725f617564315f61756432272c276b57273a31357d'
Have been trying binascii.hexlify(string), but it keeps returning:
TypeError: a bytes-like object is required, not 'str'
Also it's only to make it work with the following method:bytearray.fromhex(data['string_hex']).decode()
For the entire code here it is:
string_data = "{'id':'"+self.id+"','kW':"+str(value)+"}"
print(string_data)
string_data_hex = hexlify(string_data)
get_json = bytearray.fromhex(data['string_hex']).decode()
Also this is python 3.6

You can encode()the string:
string = "{'id':'other_aud1_aud2','kW':15}"
h = hexlify(string.encode())
print(h.decode())
# 7b276964273a276f746865725f617564315f61756432272c276b57273a31357d
s = unhexlify(hex).decode()
print(s)
# {'id':'other_aud1_aud2','kW':15}

The tricky bit here is that a Python 3 string is a sequence of Unicode characters, which is not the same as a sequence of ASCII characters.
In Python2, the str type and the bytes type are synonyms, and there is a separate type, unicode, that represents a sequence of Unicode characters. This makes it something of a mystery, if you have a string: is it a sequence of bytes, or is it a sequence of characters in some character-set?
In Python3, str now means unicode and we use bytes for what used to be str. Given a string—a sequence of Unicode characters—we use encode to convert it to some byte-sequence that can represent it, if there is such a sequence:
>>> 'hello'.encode('ascii')
b'hello'
>>> 'sch\N{latin small letter o with diaeresis}n'
'schön'
>>> 'sch\N{latin small letter o with diaeresis}n'.encode('utf-8')
b'sch\xc3\xb6n'
but:
>>> 'sch\N{latin small letter o with diaeresis}n'.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 3: ordinal not in range(128)
Once you have the bytes object, you already know what to do. In Python2, if you have a str, you have a bytes object; in Python3, use .encode with your chosen encoding.

Related

How can I encode string data (convert to bytes) in Python 3.7

i have a problem.
i get data like:
hex_num='0EE6'
data_decode=str(codecs.decode(hex_num, 'hex'))[(0):(80)]
print(data_decode)
>>>b'\x0e\xe6'
And i want encode this like:
data_enc=str(codecs.encode(data_decode, 'hex'))[(2):(6)]
print(str(int(data_enc,16)))
>>>TypeError: encoding with 'hex' codec failed (TypeError: a bytes-like object is required, not 'str')
If i wrote this:
data_enc=str(codecs.encode(b'\x0e\xe6', 'hex'))[(2):(6)]
print(str(int(data_enc,16)))
>>>3814
It will retrun number what i want (3814)
Please help.
You can remove the quotation marks like this: data = b'\x0e\xe6'
The Python 3 documentation states:
Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.
When b is within a string, it will not behave like a string literal prefix, so you have to remove the quotations for the literal to work, and convert the text to bytes directly.
Corrected code:
import codecs
data = b'\x0e\xe6'
data_enc=str(codecs.encode(data, 'hex'))[(2):(6)]
print(str(int(data_enc,16)))
Output:
3814
To change from a hex string to binary data, then using binascii.unhexlify is a convenient method. e.g.:
>>> hex_num='0EE6'
>>> import binascii
>>> binascii.unhexlify(hex_num)
b'\x0e\xe6'
Then to convert the binary data to an integer, using int.from_bytes allows you control over the endianness of the data and if it signed. e.g:
>>> bytes_data = b'\x0e\xe6'
>>> int.from_bytes(bytes_data, byteorder='little', signed=False)
58894
>>> int.from_bytes(bytes_data, byteorder='big', signed=False)
3814

joining strings together to make a unicode character

I am trying to create a random unicode generator and made a function that can create 16bit unicode charaters. This is my code:
import random
import string
def rand_unicode():
list = []
list.append(str(random.randint(0,1)))
for i in range(0,3):
if random.randint(0,1):
list.append(string.ascii_letters[random.randint(0, \
len(string.ascii_letters))-1].upper())
else:
list.append(str(random.randint(0,9)))
return ''.join(list)
print(rand_unicode())
The problem is that whenever I try to add a '\u' in the print statement, Python gives me the following error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
I tried raw strings but that only gives me output like '\u0070' without turning it into a unicode character. How can I properly connect the strings to create a unicode character? Any help is appreciated.
From:
The problem is that whenever I try to add a '\u' in the print statement, Python gives me the following error:
it sounds like the problem may be in code you haven't included in your question:
print('\u' + rand_unicode())
This won't do what you expect, because the '\u' is interpreted before the strings are concatenated. See Process escape sequences in a string in Python and try:
print(bytes('\\u' + rand_unicode(), 'us-ascii').decode('unicode_escape'))
A unicode escape sequence such as \u0070 is a single character. It is not the concatenation of \u and the ordinal.
>>> '\u0070' == 'p'
True
>>> '\u0070' == (r'\u' + '0070')
False
To convert an ordinal to a unicode character, you can pass the numerical ordinal to the chr builtin function. Use int(literal, 16) to convert a hex-literal ordinal to a numerical one:
>>> ordinal = '0070'
>>> chr(int(ordinal, 16)) # convert literal to number to unicode
'p'
>>> chr(int(rand_unicode(), 16))
'ᚈ'
Note that creating a literal ordinal is not required. You can directly create the numerical ordinal:
>>> chr(112) # convert decimal number to unicode
'p'
>>> chr(0x0070) # convert hexadecimal number to unicode
'p'
>>> chr(random.randint(0, 0x10FFF))
'嚟'

How to convert a full ascii string to hex in python?

I have this string:
string = '{'id':'other_aud1_aud2','kW':15}'
And, simply put, I would like my string to turn into an hex string like this:'7b276964273a276f746865725f617564315f61756432272c276b57273a31357d'
Have been trying binascii.hexlify(string), but it keeps returning:
TypeError: a bytes-like object is required, not 'str'
Also it's only to make it work with the following method:bytearray.fromhex(data['string_hex']).decode()
For the entire code here it is:
string_data = "{'id':'"+self.id+"','kW':"+str(value)+"}"
print(string_data)
string_data_hex = hexlify(string_data)
get_json = bytearray.fromhex(data['string_hex']).decode()
Also this is python 3.6
You can encode()the string:
string = "{'id':'other_aud1_aud2','kW':15}"
h = hexlify(string.encode())
print(h.decode())
# 7b276964273a276f746865725f617564315f61756432272c276b57273a31357d
s = unhexlify(hex).decode()
print(s)
# {'id':'other_aud1_aud2','kW':15}
The tricky bit here is that a Python 3 string is a sequence of Unicode characters, which is not the same as a sequence of ASCII characters.
In Python2, the str type and the bytes type are synonyms, and there is a separate type, unicode, that represents a sequence of Unicode characters. This makes it something of a mystery, if you have a string: is it a sequence of bytes, or is it a sequence of characters in some character-set?
In Python3, str now means unicode and we use bytes for what used to be str. Given a string—a sequence of Unicode characters—we use encode to convert it to some byte-sequence that can represent it, if there is such a sequence:
>>> 'hello'.encode('ascii')
b'hello'
>>> 'sch\N{latin small letter o with diaeresis}n'
'schön'
>>> 'sch\N{latin small letter o with diaeresis}n'.encode('utf-8')
b'sch\xc3\xb6n'
but:
>>> 'sch\N{latin small letter o with diaeresis}n'.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 3: ordinal not in range(128)
Once you have the bytes object, you already know what to do. In Python2, if you have a str, you have a bytes object; in Python3, use .encode with your chosen encoding.

How to programmatically retrieve the unicode char from hexademicals?

Given a list of hexadecimals that corresponds to the unicode, how to programmatically retrieve the unicode char?
E.g. Given the list:
>>> l = ['9359', '935A', '935B']
how to achieve this list:
>>> u = [u'\u9359', u'\u935A', u'\u935B']
>>> u
['鍙', '鍚', '鍛']
I've tried this but it throws a SyntaxError:
>>> u'\u' + l[0]
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
\uhhhh escapes are only valid in string literals, you can't use those to turn arbitrary hex values into characters. In other words, they are part of a larger syntax, and can't be used stand-alone.
Decode the hex value to an integer and pass it to the chr() function (or, on Python 2, the unichr() function):
[chr(int(v, 16)) for v in l] #
You could ask Python to interpret a string containing literal \uhhhh text as a Unicode string literal with the unicode_escape codec, but feels like overkill for individual codepoints:
[(b'\\u' + v.encode('ascii')).decode('unicode_escape') for v in l]
Note the double backslash in the prefix added, and that we have to create byte strings for this to work at all.
Demo:
>>> l = ['9359', '935A', '935B']
>>> [chr(int(v, 16)) for v in l]
['鍙', '鍚', '鍛']
>>> [(b'\\u' + v.encode('ascii')).decode('unicode_escape') for v in l]
['鍙', '鍚', '鍛']

Scrapy item pipeline

I am using scrappy spider and my own item pipeline
value['Title'] = item['Title'][0] if ('Title' in item) else ''
value['Name'] = item['Name'][0] if ('CompanyName' in item) else ''
value['Description'] = item['Description'][0] if ('Description' in item) else ''
When i do this i am getting the value prefixed with u
Example : When i pass the value to o/p and print it
value['Title'] = u'hospital'
What went wrong in my code and why i am getting u and how to remove it
Can anyone help me ?
Thanks,
The u means that the string is represented as unicode. You can remove the u by passing the string to str. str(u'test'). But you can treat is as normal string for most purposes. For example
>>> u'test' == 'test'
True
If you have characters that cannot be represented with plain ascii you should keep the unicode way. If you call str on non ascii characters you will get an exception.
>>> test=u'বাংলা'
>>> test
u'\u09ac\u09be\u0982\u09b2\u09be'
>>> str(test)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-4: ordinal not in range(128)
The u is not part of the string, it is just a way to indicate the type of the string.
>>> type('test')
<type 'str'>
>>> type(u'test')
<type 'unicode'>
Se the following question for more details:
What does the 'u' symbol mean in front of string values?
To remove the u sign you may encode the string as ASCII like this: value['Title'].encode("ascii").

Categories