This question already has answers here:
Print without b' prefix for bytes in Python 3
(8 answers)
Closed 4 years ago.
I am new in python programming and i am a bit confused. I try to get the bytes from a string to hash and encrypt but i got
b'...'
b character in front of string just like the below example. Is any way avoid this?.Can anyone give a solution? Sorry for this silly question
import hashlib
text = "my secret data"
pw_bytes = text.encode('utf-8')
print('print',pw_bytes)
m = hashlib.md5()
m.update(pw_bytes)
OUTPUT:
print b'my secret data'
This should do the trick:
pw_bytes.decode("utf-8")
Here u Go
f = open('test.txt','rb+')
ch=f.read(1)
ch=str(ch,'utf-8')
print(ch)
Decoding is redundant
You only had this "error" in the first place, because of a misunderstanding of what's happening.
You get the b because you encoded to utf-8 and now it's a bytes object.
>> type("text".encode("utf-8"))
>> <class 'bytes'>
Fixes:
You can just print the string first
Redundantly decode it after encoding
Related
This question already has answers here:
Url decode UTF-8 in Python
(5 answers)
Closed 6 months ago.
So I have the following string
"%E3%83%9C%E3%83%89%E3%82%AB%E3%81%95%E3%82%93"
It actually means this
ボドカさん
This string seems to be encoded in UTF-8 because when I write this in python
encoded_str = b'\xe3\x83\x9c\xe3\x83\x89\xe3\x82\xab\xe3\x81\x95\xe3\x82\x93'
print(encoded_str)
print(encoded_str.decode('utf-8'))
Here is the output I get
b'\xe3\x83\x9c\xe3\x83\x89\xe3\x82\xab\xe3\x81\x95\xe3\x82\x93'
ボドカさん
But now I would like a script that will allow me to decode any string in the initial format and here is my code.
import re
import os
mystr = "%E3%83%9C%E3%83%89%E3%82%AB%E3%81%95%E3%82%93"
mystr = mystr.lower()
mystr = re.sub('%', r'\\x', mystr)
encoded_str = bytes(mystr, "utf-8")
print(mystr)
print(encoded_str)
print(encoded_str.decode('utf-8'))
Output:
\xe3\x83\x9c\xe3\x83\x89\xe3\x82\xab\xe3\x81\x95\xe3\x82\x93
b'\\xe3\\x83\\x9c\\xe3\\x83\\x89\\xe3\\x82\\xab\\xe3\\x81\\x95\\xe3\\x82\\x93'
\xe3\x83\x9c\xe3\x83\x89\xe3\x82\xab\xe3\x81\x95\xe3\x82\x93
I tried so many possibilities but I couldn't find the right way to encode proprely my string like the b'STRING' thing would do. I always get extra \ characters from the encoding process that then spoil the decoding process too.
I tried all the encoding methods existing in python for the bytes() function.
I need help please. Thank you.
Stack overflow banned me for that question lol
mystr = "%E3%83%9C%E3%83%89%E3%82%AB%E3%81%95%E3%82%93"
encoded_str = bytes.fromhex(mystr.replace('%', ''))
print(encoded_str.decode('utf-8'))
Output:
ボドカさん
This question already has answers here:
getting bytes from unicode string in python
(6 answers)
Closed 2 years ago.
I have a string which I get from a function
>>> example = Some_function()
This Some_function return a very long combination of Unicode and ASCII string like 'gn1\ud123a\ud123\ud123\ud123\ud919\ud123\ud123'
My Problem is that when I try to convert this unicode string to bytes it gives me an error that \ud919 cannot be encoded by utf-8. I tried :
>>> further=bytes(example,encoding='utf-8')
Note: I cannot ignore this \ud919. If there is a way to solve this problem or how can I convert 'gn1\ud123a\ud123\ud123\ud123\ud919\ud123\ud123' to 'gn1\ud123a\ud123\ud123\ud123\\ud919\ud123\ud123' to treat \ud919 as simple string not unicode.
based on the version.
print type(unicode_string), repr(unicode_string) Python 3.x : print type(unicode_string), ascii(unicode_string)
\ud919 is a surrogate character, one does not simply convert it. Use surrogatepass flag:
'gn1\ud123a\ud123\ud123\ud123\ud919\ud123\ud123'.encode('utf-8', 'surrogatepass')
>>> b'gn1\xed\x84\xa3a\xed\x84\xa3\xed\x84\xa3\xed\x84\xa3\xed\xa4\x99\xed\x84\xa3\xed\x84\xa3'
This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 2 years ago.
I'm scraping some data from a website and using regex i was able to extract some strings in UTF-16 format. Using this site I'm able to decode the strings i extract but i want to do it all in Python.
The extracted text is in String format, not bytes. So a simple .encode() doesn't work.
For example:
String: \u0074\u0065\u0073\u0074 --> String: test
I can think of solving this by treating the string as a byte object, but i have no idea how to do this.
EDIT: The data chunk i've extracted from using regex:
I = new Array();
I[0] = new Array();
I[0][1] = new Array();
I[0][1][0] = new Array();
I[0][1][0][0] = '\u0074\u0065\u0073\u0074';
I[0][2]='';
Any help is appreciated.
Thanks
If you don't care about security, the simplest way is eval:
eval('"' + yourstringhere + '"')
But the correct way to do this would be:
x = bytes(yourstringhere, "utf-8").decode("unicode_escape")
the variable yourstringhere should contain the "String" object you mentioned before running this, obviously.
At first we convert the string to bytes using the bytes function. Then we decode using a special encoding called unicode_escape which parses the \u.... sequences into actual unicode characters
This question already has answers here:
Print without b' prefix for bytes in Python 3
(8 answers)
Closed 4 years ago.
I am new in python programming and i am a bit confused. I try to get the bytes from a string to hash and encrypt but i got
b'...'
b character in front of string just like the below example. Is any way avoid this?.Can anyone give a solution? Sorry for this silly question
import hashlib
text = "my secret data"
pw_bytes = text.encode('utf-8')
print('print',pw_bytes)
m = hashlib.md5()
m.update(pw_bytes)
OUTPUT:
print b'my secret data'
This should do the trick:
pw_bytes.decode("utf-8")
Here u Go
f = open('test.txt','rb+')
ch=f.read(1)
ch=str(ch,'utf-8')
print(ch)
Decoding is redundant
You only had this "error" in the first place, because of a misunderstanding of what's happening.
You get the b because you encoded to utf-8 and now it's a bytes object.
>> type("text".encode("utf-8"))
>> <class 'bytes'>
Fixes:
You can just print the string first
Redundantly decode it after encoding
This question already has answers here:
Chinese and Japanese character support in python
(3 answers)
Closed 7 years ago.
I have used Python to get some info through urllib2, but the info is unicode string.
I've tried something like below:
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print unicode(a).encode("gb2312")
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print a.encode("utf-8").decode("utf-8")
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print u""+a
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print str(a).decode("utf-8")
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print str(a).encode("utf-8")
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print a.decode("utf-8").encode("gb2312")
but all results are the same:
\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728
And I want to get the following Chinese text:
方法,删除存储在
You need to convert the string to a unicode string.
First of all, the backslashes in a are auto-escaped:
a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print a # Prints \u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728
a # Prints '\\u65b9\\u6cd5\\uff0c\\u5220\\u9664\\u5b58\\u50a8\\u5728'
So playing with the encoding / decoding of this escaped string makes no difference.
You can either use unicode literal or convert the string into a unicode string.
To use unicode literal, just add a u in the front of the string:
a = u"\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
To convert existing string into a unicode string, you can call unicode, with unicode_escape as the encoding parameter:
print unicode(a, encoding='unicode_escape') # Prints 方法,删除存储在
I bet you are getting the string from a JSON response, so the second method is likely to be what you need.
BTW, the unicode_escape encoding is a Python specific encoding which is used to
Produce a string that is suitable as Unicode literal in Python source
code
Where are you getting this data from? Perhaps you could share the method by which you are downloading and extracting it.
Anyway, it kind of looks like a remnant of some JSON encoded string? Based on that assumption, here is a very hacky (and not entirely serious) way to do it:
>>> a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
>>> a
'\\u65b9\\u6cd5\\uff0c\\u5220\\u9664\\u5b58\\u50a8\\u5728'
>>> s = '"{}"'.format(a)
>>> s
'"\\u65b9\\u6cd5\\uff0c\\u5220\\u9664\\u5b58\\u50a8\\u5728"'
>>> import json
>>> json.loads(s)
u'\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728'
>>> print json.loads(s)
方法,删除存储在
This involves recreating a valid JSON encoded string by wrapping the given string in a in double quotes, then decoding the JSON string into a Python unicode string.