purpose of '"sss".decode("base64").decode("zlib")' - python

ACTIVATE_THIS = """
eJx1UsGOnDAMvecrIlYriDRlKvU20h5aaY+teuilGo1QALO4CwlKAjP8fe1QGGalRoLEefbzs+Mk
Sb7NcvRo3iTcoGqwgyy06As+HWSNVciKaBTFywYoJWc7yit2ndBVwEkHkIzKCV0YdQdmkvShs6YH
E3IhfjFaaSNLoHxQy2sLJrL0ow98JQmEG/rAYn7OobVGogngBgf0P0hjgwgt7HOUaI5DdBVJkggR
3HwSktaqWcCtgiHIH7qHV+esW2CnkRJ+9R5cQGsikkWEV/J7leVGs9TV4TvcO5QOOrTHYI+xeCjY
JR/m9GPDHv2oSZunUokS2A/WBelnvx6tF6LUJO2FjjlH5zU6Q+Kz/9m69LxvSZVSwiOlGnT1rt/A
77j+WDQZ8x9k2mFJetOle88+lc8sJJ/AeerI+fTlQigTfVqJUiXoKaaC3AqmI+KOnivjMLbvBVFU
1JDruuadNGcPmkgiBTnQXUGUDd6IK9JEQ9yPdM96xZP8bieeMRqTuqbxIbbey2DjVUNzRs1rosFS
TsLAdS/0fBGNdTGKhuqD7mUmsFlgGjN2eSj1tM3GnjfXwwCmzjhMbR4rLZXXk+Z/6Hp7Pn2+kJ49
jfgLHgI4Jg==
""".decode("base64").decode("zlib")
my code:
import zlib
print 'dsss'.decode('base64').decode('zlib')#error
Traceback (most recent call last):
File "D:\zjm_code\b.py", line 4, in <module>
print 'dsss'.decode('base64').decode('zlib')
File "D:\Python25\lib\encodings\zlib_codec.py", line 43, in zlib_decode
output = zlib.decompress(input)
zlib.error: Error -3 while decompressing data: unknown compression method
a='dsss'.encode('zlib')
print a
a.encode('base64')
print a
a.decode('base64')#error
print a
a.decode('zlib')
print a
x\x9cK)..Traceback (most recent call last):
File "D:\zjm_code\b.py", line 7, in <module>
a.decode('base64')
File "D:\Python25\lib\encodings\base64_codec.py", line 42, in base64_decode
output = base64.decodestring(input)
File "D:\Python25\lib\base64.py", line 321, in decodestring
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
a='dsss'
a=a.encode('zlib')
print a
a=a.decode('zlib')
print a#why can't print 'dsss'
x\x9cK)..
a='dsss'
a=a.encode('zlib')
#print a
a=a.decode('zlib')
print a#its ok
i think the 'print a' encode the a with 'uhf-8'.
so:
#encoding:utf-8
a='dsss'
a=a.encode('zlib')
print a
a=a.decode('utf-8')#but error.
a=a.decode('zlib')
print a#
x\x9cK)..Traceback (most recent call last):
File "D:\zjm_code\b.py", line 5, in <module>
a=a.decode('utf-8')
File "D:\Python25\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c in position 1: unexpected code byte

The data in the strings is encoded and compressed binary data. The .decode("base64").decode("zlib") unencodes and decompresses it.
The error you got was because 'dsss' decoded from base64 is not valid zlib compressed data.

What is the purpose of x.decode(”base64”).decode(”zlib”) for x in ("sss", "dsss", random_garbage)? Excuse me, you should know; you are the one who is doing it!
Edit after OP's addition of various puzzles
Puzzle 1
a='dsss'.encode('zlib')
print a
a.encode('base64')
print a
a.decode('base64')#error
print a
a.decode('zlib')
print a
Resolution: all 3 statements of the form
a.XXcode('encoding')
should be
a = a.XXcode('encoding')
Puzzle 2
a='dsss'
a=a.encode('zlib')
print a
a=a.decode('zlib')
print a#why can't print 'dsss'
x\x9cK)..
But it does print 'dsss':
>>> a='dsss'
>>> a=a.encode('zlib')
>>> print a
x£K)..♠ ♦F☺¥
>>> a=a.decode('zlib')
>>> print a#why can't print 'dsss'
dsss
>>>
Puzzle 3
"""i think the 'print a' encode the a with 'uhf-8'."""
Resolution: You think extremely incorrectly. What follows the print is an expression. There are no such side effects. What do you imagine happens when you do this:
print 'start text ' + a + 'end text'
?
What do you imagine happens if you do print a twice? Encoding the already-encoded text again? Why don't you stop imagining and try it out?
In any case, note that the output of str.encode('zlib') is an str object, not a unicode object:
>>> print repr('dsss'.encode('zlib'))
'x\x9cK)..\x06\x00\x04F\x01\xbe'
Getting from that to UTF-8 is going to be somewhat difficult ... it would have to be decoded into unicode first -- with what codec? ascii and utf8 are going to have trouble with the '\x9c' and the '\xbe' ...

It is the reverse of:
original_message.encode('zlib').encode('base64')
zlib is a binary compression algorithm. base64 is a text encoding of binary data, which is useful to send binary message through text protocols like SMTP.
After 'dsss' was decoded from base64 (the three bytes 76h, CBh, 2Ch), the result was not valid zlib compressed data so it couldn't be decoded.
Try printing ACTIVATE_THIS to see the result of the decoding. It turns out to be some Python code.

.decode('base64') can be called only on a string that's encoded as "base-64, in order to retrieve the byte sequence that was there encoded. Presumably that byte sequence, in the example you bring, was zlib-compressed, and so the .decode('zlib') part decompresses it.
Now, for your case:
>>> 'dsss'.decode('base64')
'v\xcb,'
But 'v\xcv,' is not a zlib-compressed string! And so of course you cannot ask zlib to "decompress" it. Fortunately zlib recognizes the fact (that 'v\xcv,' could not possibly have been produced by applying any of the compression algorithms zlib knows about to any input whatsoever) and so gives you a helpful error message (instead of a random-ish string of bytes, which you might well have gotten if you had randomly supplied a different but equally crazy input string!-)
Edit: the error in
a.encode('base64')
print a
a.decode('base64')#error
is obviously due to the fact that strings are immutable: just calling a.encode (or any other method) does not alter a, it produces a new string object (and here you're just printing it).
In the next snippet, the error is only in the OP's mind:
>>> a='dsss'
>>> a=a.encode('zlib')
>>> print a
x?K)..F?
>>> a=a.decode('zlib')
>>> print a#why can't print 'dsss'
dsss
>>>
that "why can't print" question is truly peculiar, applied to code that does print 'dsss'. Finally,
i think the 'print a' encode the a
with 'uhf-8'.
You think wrongly: there's no such thing as "uhf-8" (you mean "utf-8" maybe?), and anyway print a does not alter a, any more than just calling a.encode does.

Related

Convert str to unicode str

I need to convert a str to text in Python 2.7
a = u'"\u0274\u1d1c\u0274\u1d04\u1d00 \u1d00\u028f\u1d1c\u1d05\u1d07s \u1d00 \u1d1c\u0274 \u0274\u026a\xf1\u1d0f \u1d0f \u1d1c\u0274\u1d00 \u0274\u026a\xf1\u1d00 \u1d04\u1d0f\u0274 \u1d1c\u0274\u1d00 \u1d1b\u1d00\u0280\u1d07\u1d00 \u1d07\u0274 \u029f\u1d00 \u01eb\u1d1c\u1d07 s\u026a\u1d07\u0274\u1d1b\u1d07 \u01eb\u1d1c\u1d07 \u1d18\u1d1c\u1d07\u1d05\u1d07 \u1d1b\u1d07\u0274\u1d07\u0280 \u1d07x\u026a\u1d1b\u1d0f"'
I try with a.decode('utf8') but the truth is I don't know what kind of code is the str a
The output I need is:
"ɴᴜɴᴄᴀ ᴀʏᴜᴅᴇs ᴀ ᴜɴ ɴɪñᴏ ᴏ ᴜɴᴀ ɴɪñᴀ ᴄᴏɴ ᴜɴᴀ ᴛᴀʀᴇᴀ ᴇɴ ʟᴀ ǫᴜᴇ sɪᴇɴᴛᴇ ǫᴜᴇ ᴘᴜᴇᴅᴇ ᴛᴇɴᴇʀ ᴇxɪᴛᴏ"
ERROR:
>>> print(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "F:\WinPython-64bit-2.7.13.1Zero\python-2.7.13.amd64\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1-5: character maps to <undefined>
Since you are on Python2, you have to encode the string contents - which are already text, to your terminal encoding.
So, if you are on windows, print(a.encode("cp-850")), if you are on Linux, Mac-OS, or other O.S.: print(a.encode("utf-8"))
On Python3 the encoding should be done automatically.
Also, it is important to understand that characters codified like \uNNNN in Python correspond to Unicode codepoints - and not to specific character encodings like "utf-8", "latin1" or "utf-16". In Python 3 most readable characters encoding like this will be shown even with the string internal representation, which is displayed by default in a Python interactive session (otherwise use the built-in repr call to see it). By using the built-in "str" or a call to print, you see the rendered string, and all \uXXXX, \UXXXXXXXX, \xNN and \N{unicode character name} tokens are rendered as the actual characters. (In Python2 you need to manually encode this representation to the character encoding used in your device)
In other words, if you are using Python 3, this is as simple as:
In [15]: a = u'"\u0274\u1d1c\u0274\u1d04\u1d00 \u1d00\u028f\u1d1c\u1d05\u1d07s \u1d00 \u1d1c\u0274 \u0274\u026a\xf1\u1d0f \u1d0f \u1d1c\u0274\u1d00 \u0274\u026a\xf1\u1d00 \u1d04\u1d0f\u0274 \u1d1c\u0274\u1d00 \u1d1b\u1d00\u0280\u1d07\u1d00 \u1d07\u0274 \u029f\u1d00 \u01eb\u1d1c\u1d07 s\u026a\u1d07\u0274\u1d1b\u1d07 \u01eb\u1d1c\u1d07 \u1d18\u1d1c\u1d07\u1d05\u1d07 \u1d1b\u1d07\u0274\u1d07\u0280 \u1d07x\u026a\u1d1b\u1d0f"'
...:
In [16]: a
Out[16]: '"ɴᴜɴᴄᴀ ᴀʏᴜᴅᴇs ᴀ ᴜɴ ɴɪñᴏ ᴏ ᴜɴᴀ ɴɪñᴀ ᴄᴏɴ ᴜɴᴀ ᴛᴀʀᴇᴀ ᴇɴ ʟᴀ ǫᴜᴇ sɪᴇɴᴛᴇ ǫᴜᴇ ᴘᴜᴇᴅᴇ ᴛᴇɴᴇʀ ᴇxɪᴛᴏ"'
Or:
In [17]: print(a)
"ɴᴜɴᴄᴀ ᴀʏᴜᴅᴇs ᴀ ᴜɴ ɴɪñᴏ ᴏ ᴜɴᴀ ɴɪñᴀ ᴄᴏɴ ᴜɴᴀ ᴛᴀʀᴇᴀ ᴇɴ ʟᴀ ǫᴜᴇ sɪᴇɴᴛᴇ ǫᴜᴇ ᴘᴜᴇᴅᴇ ᴛᴇɴᴇʀ ᴇxɪᴛᴏ"

Reconstruct the source file from string output

I use stepic3 to hide some data. Multiple files are compressed into a zip file, which will be the hidden message. However, when I use the following code
from PIL import Image
import stepic
def enc_():
im = Image.open("secret.png")
text = str(open("source.zip", "rb").read())
im = stepic.encode(im, text)
im.save('stegolena.png','PNG')
def dec_():
im1=Image.open('stegolena.png')
out = stepic.decode(im1)
plaintext = open("out.zip", "w")
plaintext.write(out)
plaintext.close()
I get the error
Complete Trace back
Traceback (most recent call last):
File "C:\Users\Sherif\OneDrive\Pyhton Projects\Kivy Tests\simple.py", line 28, in enc_()
File "C:\Users\Sherif\OneDrive\Pyhton Projects\Kivy Tests\simple.py", line 8, in enc_
im = stepic.encode(im, text)
File "C:\Users\Sherif\OneDrive\Pyhton Projects\Kivy Tests\stepic.py", line 89, in encode
encode_inplace(image, data)
File "C:\Users\Sherif\OneDrive\Pyhton Projects\Kivy Tests\stepic.py", line 75, in encode_inplace
for pixel in encode_imdata(image.getdata(), data):
File "C:\Users\Sherif\OneDrive\Pyhton Projects\Kivy Tests\stepic.py", line 58, in encode_imdata
byte = ord(data[i])
TypeError: ord() expected string of length 1, but int found
There are two ways to convert to a string.
text = open("source.zip", "r", encoding='utf-8', errors='ignore').read()
with output
PKn!K\Z
sec.txt13 byte 1.10mPKn!K\Z
sec.txtPK52
or
text = str(open("source.zip", "rb").read())
with output
b'PK\x03\x04\x14\x00\x00\x00\x00\x00n\x8f!K\\\xac\xdaZ\r\x00\x00\x00\r\x00\x00\x00\x07\x00\x00\x00sec.txt13 byte 1.10mPK\x01\x02\x14\x00\x14\x00\x00\x00\x00\x00n\x8f!K\\\xac\xdaZ\r\x00\x00\x00\r\x00\x00\x00\x07\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xb6\x81\x00\x00\x00\x00sec.txtPK\x05\x06\x00\x00\x00\x00\x01\x00\x01\x005\x00\x00\x002\x00\x00\x00\x00\x00'
I used the second and I got the same string back from the retrieval.
In order to reconstruct the zip file (output is string), I use the code
plaintext = open("out.zip", "w")
plaintext.write(output)
plaintext.close()
but the written file says is corrupted when I try to open it. When I try to read what was written to it, with either
output = output.encode(encoding='utf_8', errors='strict')
or
output = bytes(output, 'utf_8')
the output is
b"b'PK\\x03\\x04\\x14\\x00\\x00\\x00\\x00\\x00n\\x8f!K\\\\\\xac\\xdaZ\\r\\x00\\x00\\x00\\r\\x00\\x00\\x00\\x07\\x00\\x00\\x00sec.txt13 byte 1.10mPK\\x01\\x02\\x14\\x00\\x14\\x00\\x00\\x00\\x00\\x00n\\x8f!K\\\\\\xac\\xdaZ\\r\\x00\\x00\\x00\\r\\x00\\x00\\x00\\x07\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\xb6\\x81\\x00\\x00\\x00\\x00sec.txtPK\\x05\\x06\\x00\\x00\\x00\\x00\\x01\\x00\\x01\\x005\\x00\\x00\\x002\\x00\\x00\\x00\\x00\\x00'"
which is different from the source file.
What do I have to reconstruct the embedded file faithfully?
When you read a file in rb mode, you'll get a byte array. If you print it, it may look like a string, but each individual element is actually an integer.
>>> my_bytes = b'hello'
>>> my_bytes
b'hello'
>>> my_bytes[0]
104
This explain the error
"C:\Users\Sherif\OneDrive\Pyhton Projects\Kivy Tests\stepic.py", line 58, in encode_imdata byte = ord(data[i]) TypeError: ord() expected string of length 1, but int found
ord() expects a string, so you have to convert all the bytes to strings. Unfortunately, str(some_byte_array) doesn't do what you think it does. It creates a literal string representation of your byte array, including the preceeding "b" and the surrounding quotes.
>>> string = str(my_bytes)
>>> string[0]
'b'
>>> string[1]
"'"
>>> string[2]
'h'
What you want instead is to convert each byte (integer) to a string individually. map(chr, some_byte_array) will do this for you. We have to do this simply because stepic expects a string. When it embeds a character, it does ord(data[i]), which converts a string of length one to its Unicode code (integer).
Furthermore, we can't leave our string as a map object, because the code needs to calculate the length of the whole string before embedding it. Therefore, ''.join(map(chr, some_bytearray)) is what we have to use for our input secret.
For extraction stepic does the opposite. It extracts the secret byte by byte and turns them into strings with chr(byte). In order to reverse that, we need to get the ordinal value of each character individually. map(ord, out) should do the trick. And since we want to write our file in binary, further feeding that into bytearray() will take care of everything.
Overall, these are the changes you should make to your code.
def enc_():
im = Image.open("secret.png")
text = ''.join(map(chr, open("source.zip", "rb").read()))
im = stepic.encode(im, text)
im.save('stegolena.png','PNG')
def dec_():
im1=Image.open('stegolena.png')
out = stepic.decode(im1)
plaintext = open("out.zip", "wb")
plaintext.write(bytearray(map(ord, out)))
plaintext.close()

Convert Python string between iso8859_1 and utf-8

I am trying to do the same thing in python as the java code below.
String decoded = new String("中".getBytes("ISO8859_1"), "UTF-8");
System.out.println(decoded);
The output is a Chinese String "中".
In Python I tried the encode/decode/bytearray thing but I always got unreadable string. I think my problem is that I don't really understand how the java/python encoding mechanism works. Also I cannot find a solution from the existing answers.
#coding=utf-8
def p(s):
print s + ' -- ' + str(type(s))
ch1 = 'ä¸-'
p(ch1)
chu1 = ch1.decode('ISO8859_1')
p(chu1.encode('utf-8'))
utf_8 = bytearray(chu1, 'utf-8')
p(utf_8)
p(utf_8.decode('utf-8').encode('utf-8'))
#utfstr = utf_8.decode('utf-8').decode('utf-8')
#p(utfstr)
p(ch1.decode('iso-8859-1').encode('utf8'))
ä¸- -- <type 'str'>
ä¸Â- -- <type 'str'>
ä¸Â- -- <type 'bytearray'>
ä¸Â- -- <type 'str'>
ä¸Â- -- <type 'str'>
Daniel Roseman's answer is really close. Thank you. But when it comes to my real case:
ch = 'masanori harigae ã\201®ã\203\221ã\203¼ã\202½ã\203\212ã\203«ä¼\232è-°å®¤'
print ch.decode('utf-8').encode('iso-8859-1')
I got
Traceback (most recent call last):
File "", line 1, in
File "/apps/Python/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x81 in position 19: invalid start byte
Java code:
String decoded = new String("masanori harigae ã\201®ã\203\221ã\203¼ã\202½ã\203\212ã\203«ä¼\232è-°å®¤".getBytes("ISO8859_1"), "UTF-8");
System.out.println(decoded);
The output is masanori harigae のパーソナル会�-�室
You are doing this the wrong way round. You have a bytestring that is wrongly encoded as utf-8 and you want it to be interpreted as iso-8859-1:
>>> ch = "中"
>>> print u.decode('utf-8').encode('iso-8859-1')
中

Python to show special characters

I know there are tons of threads regarding this issue but I have not managed to find one which solves my problem.
I am trying to print a string but when printed it doesn't show special characters (e.g. æ, ø, å, ö and ü). When I print the string using repr() this is what I get:
u'Von D\xc3\xbc' and u'\xc3\x96berg'
Does anyone know how I can convert this to Von Dü and Öberg? It's important to me that these characters are not ignored, e.g. myStr.encode("ascii", "ignore").
EDIT
This is the code I use. I use BeautifulSoup to scrape a website. The contents of a cell (<td>) in a table (<table>), is put into the variable name. This is the variable which contains special characters that I cannot print.
web = urllib2.urlopen(url);
soup = BeautifulSoup(web)
tables = soup.find_all("table")
scene_tables = [2, 3, 6, 7, 10]
scene_index = 0
# Iterate over the <table>s we want to work with
for scene_table in scene_tables:
i = 0
# Iterate over < td> to find time and name
for td in tables[scene_table].find_all("td"):
if i % 2 == 0: # td contains the time
time = remove_whitespace(td.get_text())
else: # td contains the name
name = remove_whitespace(td.get_text()) # This is the variable containing "nonsense"
print "%s: %s" % (time, name,)
i += 1
scene_index += 1
Prevention is better than cure. What you need is to find out how that rubbish is being created. Please edit your question to show the code that creates it, and then we can help you fix it. It looks like somebody has done:
your_unicode_string = original_utf8_encoded_bytestring.decode('latin1')
The cure is to reverse the process, simply, and then decode.
correct_unicode_string = your_unicode_string.encode('latin1').decode('utf8')
Update Based on the code that you supplied, the probable cause is that the website declares that it is encoded in ISO-8859-1 (aka latin1) but in reality it is encoded in UTF-8. Please update your question to show us the url.
If you can't show it, read the BS docs; it looks like you'll need to use:
BeautifulSoup(web, from_encoding='utf8')
Unicode support in many languages is confusing, so your error here is understandable. Those strings are UTF-8 bytes, which would work properly if you drop the u at the front:
>>> err = u'\xc3\x96berg'
>>> print err
Ã?berg
>>> x = '\xc3\x96berg'
>>> print x
Öberg
>>> u = x.decode('utf-8')
>>> u
u'\xd6berg'
>>> print u
Öberg
For lots more information:
http://www.joelonsoftware.com/articles/Unicode.html
http://docs.python.org/howto/unicode.html
You should really really read those links and understand what is going on before proceeding. If, however, you absolutely need to have something that works today, you can use this horrible hack that I am embarrassed to post publicly:
def convert_fake_unicode_to_real_unicode(string):
return ''.join(map(chr, map(ord, string))).decode('utf-8')
The contents of the strings are not unicode, they are UTF-8 encoded.
>>> print u'Von D\xc3\xbc'
Von Dü
>>> print 'Von D\xc3\xbc'
Von Dü
>>> print unicode('Von D\xc3\xbc', 'utf-8')
Von Dü
>>>
Edit:
>>> print '\xc3\x96berg' # no unicode identifier, works as expected because it's an UTF-8 encoded string
Öberg
>>> print u'\xc3\x96berg' # has unicode identifier, means print uses the unicode charset now, outputs weird stuff
Ãberg
# Look at the differing object types:
>>> type('\xc3\x96berg')
<type 'str'>
>>> type(u'\xc3\x96berg')
<type 'unicode'>
>>> '\xc3\x96berg'.decode('utf-8') # this command converts from UTF-8 to unicode, look at the unicode identifier in the output
u'\xd6berg'
>>> unicode('\xc3\x96berg', 'utf-8') # this does the same thing
u'\xd6berg'
>>> unicode(u'foo bar', 'utf-8') # trying to convert a unicode string to unicode will fail as expected
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: decoding Unicode is not supported

how to show the right word in my code, my code is : os.urandom(64)

My code is:
print os.urandom(64)
which outputs:
> "D:\Python25\pythonw.exe" "D:\zjm_code\a.py"
\xd0\xc8=<\xdbD'
\xdf\xf0\xb3>\xfc\xf2\x99\x93
=S\xb2\xcd'\xdbD\x8d\xd0\\xbc{&YkD[\xdd\x8b\xbd\x82\x9e\xad\xd5\x90\x90\xdcD9\xbf9.\xeb\x9b>\xef#n\x84
which isn't readable, so I tried this:
print os.urandom(64).decode("utf-8")
but then I get:
> "D:\Python25\pythonw.exe" "D:\zjm_code\a.py"
Traceback (most recent call last):
File "D:\zjm_code\a.py", line 17, in <module>
print os.urandom(64).decode("utf-8")
File "D:\Python25\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-3: invalid data
What should I do to get human-readable output?
No shortage of choices. Here's a couple:
>>> os.urandom(64).encode('hex')
'0bf760072ea10140d57261d2cd16bf7af1747e964c2e117700bd84b7acee331ee39fae5cff6f3f3fc3ee3f9501c9fa38ecda4385d40f10faeb75eb3a8f557909'
>>> os.urandom(64).encode('base64')
'ZuYDN1BiB0ln73+9P8eoQ3qn3Q74QzCXSViu8lqueKAOUYchMXYgmz6WDmgJm1DyTX598zE2lClX\n4iEXXYZfRA==\n'
os.urandom is giving you a 64-bytes string. Encoding it in hex is probably the best way to make it "human readable" to some extent. E.g.:
>>> s = os.urandom(64)
>>> s.encode('hex')
'4c28351a834d80674df3b6eb5f59a2fd0df2ed2a708d14548e4a88c7139e91ef4445a8b88db28ceb3727851c02ce1822b3c7b55a977fa4f4c4f2a0e278ca569e'
Of course this gives you 128 characters in the result, which may be too long a line to read comfortably; it's easy to split it up, though -- e.g.:
>>> print s[:32].encode('hex')
4c28351a834d80674df3b6eb5f59a2fd0df2ed2a708d14548e4a88c7139e91ef
>>> print s[32:].encode('hex')
4445a8b88db28ceb3727851c02ce1822b3c7b55a977fa4f4c4f2a0e278ca569e
two chunks of 64 characters each shown on separate lines may be easier on the eye.
Random bytes are not likely to be unicode characters, so I'm not suprised that you get encoding errors. Instead you need to convert them somehow. If all you're trying to do is see what they are, then something like:
print [ord(o) for o in os.urandom(64)]
Or, if you'd prefer to have it as hex 0-9a-f:
print ''.join( [hex(ord(o))[2:] for o in os.urandom(64)] )

Categories