I have large binary zip files to transport as part of a JSON interface. I've converted them to base64 for this purpose but I'm unable to read them back cleanly as shown with a simple case here:
~ $ ipython --nobanner
In [1]: epub = 'trial/epubs/9780857863812.epub'
In [2]: import base64
In [3]: import json
In [4]: f = open(epub, 'rb')
In [5]: content = f.read()
In [7]: base64.urlsafe_b64decode(base64.urlsafe_b64encode(content)) == content
Out[7]: True
In [8]: base64.urlsafe_b64decode(json.loads(json.dumps(base64.urlsafe_b64encode(content)))) == content
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/home/paul/wk/evanidus/repo_wc/branches/services/carroll/dev/<ipython console> in <module>()
/usr/lib/python2.6/base64.pyc in urlsafe_b64decode(s)
110 The alphabet uses '-' instead of '+' and '_' instead of '/'.
111 """
--> 112 return b64decode(s, '-_')
113
114
/usr/lib/python2.6/base64.pyc in b64decode(s, altchars)
69 """
70 if altchars is not None:
---> 71 s = _translate(s, {altchars[0]: '+', altchars[1]: '/'})
72 try:
73 return binascii.a2b_base64(s)
/usr/lib/python2.6/base64.pyc in _translate(s, altchars)
34 for k, v in altchars.items():
35 translation[ord(k)] = v
---> 36 return s.translate(''.join(translation))
37
38
TypeError: character mapping must return integer, None or unicode
It seems that the json processing is corrupting the base64 content somehow.
The problem was with the encoding. json returns utf-8 encoded text which the base64 module cannot handle (it wants ascii). The correction is therefore to encode('ascii') the json decoded string before passing it to base64:
In [8]: base64.urlsafe_b64decode(
json.loads(
json.dumps(base64.urlsafe_b64encode(content))
)
.encode('ascii')
) == content
Out[7]: True
Related
I have an hexadecimal string like this:
s = '\x83 \x01\x86\x01p\t\x89oA'
I decoded to hex values like this, getting the following output.
>>> ' '.join('{:02x}'.format(ord(ch)) for ch in s)
'83 20 01 86 01 70 09 89 6f 41'
But now I have issues to decode a hex string that is exactly as the previous one, but this comes from a binary file. and has a b at the begining. The error below:
with open('file.dat', 'rb') as infile:
data = infile.read()
>>> data
b'\x83 \x01\x86\x01p\t\x89oA'
>>> ' '.join('{:02x}'.format(ord(ch)) for ch in data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <genexpr>
TypeError: ord() expected string of length 1, but int found
How would be the way to fix this? Thanks
Use .hex() method on the byte string instead.
In [25]: data = b'\x83 \x01\x86\x01p\t\x89oA'
In [26]: data.hex()
Out[26]: '83200186017009896f41'
I have python 2 code that works:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
from os import path
filename = "test.bin" # file contents in hex: 57 58 59 5A 12 00 00 00 4E 44
ID = 4
myfile = open(filename, 'rb')
filesize = path.getsize(filename)
data = list(myfile.read(filesize))
myfile.close()
temp_ptr = data[ID:ID+2]
pointer = int(''.join(reversed(temp_ptr)).encode('hex'), 16)
print(pointer)
Prints "18"
However, it does not work in python 3. I get:
Traceback (most recent call last):
File "py2vs3.py", line 13, in <module>
ptr = int(''.join(reversed(temp_ptr)).encode('hex'), 16)
TypeError: sequence item 0: expected str instance, int found
I am simply grabbing one 32-bit field from a file and printing how C would see it. How do I make this work in Py3? All the code examples I find are for python 2, and the docs make no sense to me.
Python 3 distinguishes between binary and text I/O. Files opened in binary mode (including 'b' in the mode argument) return contents as bytes objects without any decoding based on https://docs.python.org/3/library/functions.html#open
I imitated the example provided by you inline below, instead of reading from a file.
# Python 2
frame = "\x57\x58\x59\x5A\x12\x00\x00\x00\x4E\x44"
int(''.join(reversed(frame[4:6])).encode('hex'), 16)
# Result is 18
Same thing in Python 3
# Python 3
# The preceding b'' signifies that this is a bytearray, the same type
# returned when read from a file in binary mode
frame = b"\x57\x58\x59\x5A\x12\x00\x00\x00\x4E\x44"
int.from_bytes(frame[4:6], "little")
# The 2nd argument "little" represents which is the most significant bit
# i.e left most or right most; more details in the link below
# Result is 18
https://docs.python.org/3/library/stdtypes.html#int.from_bytes has more information about the method
As Mad Wombat commented, python3 does read the file as a byte array rather than a string. The following snippet essentially synthesizes the process.
data = [char for char in myfile.read()]+['\n']
I am trying to decode Base64 into Hex for about 200 Base64 data and I am getting this following error. It does decoding for 60 of them then stops.
ABHvPdSaxrhjAWA=
0011ef3dd49ac6b8630160
ABHPdSaxrhjAWA=
Traceback (most recent call last):
File "tt.py", line 36, in <module>
csvlines[0] = csvlines[0].decode("base64").encode("hex")
File "C:\Python27\lib\encodings\base64_codec.py", line 43, in base64_decode
output = base64.decodestring(input)
File "C:\Python27\lib\base64.py", line 325, in decodestring
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
Some original Base64 source from CSV
ABHPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdS4xriiAVQ=
ABDPdSqxrizAU4=
ABDPdSrxrjPAUo=
You have at least one string in your CSV file that is either not a Base64 string, is a corrupted (damaged) Base64 string, or is a string that is missing the required = padding. Your example value, ABHPdSaxrhjAWA=, is short one = or is missing another data character.
Base64 strings, properly padded, have a length that is a multiple of 4, so you can easily re-add the padding:
value = csvlines[0]
if len(value) % 4:
# not a multiple of 4, add padding:
value += '=' * (4 - len(value) % 4)
csvlines[0] = value.decode("base64").encode("hex")
If the value then still fails to decode, then your input was corrupted or not valid Base64 to begin with.
For the example error, ABHPdSaxrhjAWA=, the above adds one = to make it decodable:
>>> value = 'ABHPdSaxrhjAWA='
>>> if len(value) % 4:
... # not a multiple of 4, add padding:
... value += '=' * (4 - len(value) % 4)
...
>>> value
'ABHPdSaxrhjAWA=='
>>> value.decode('base64')
'\x00\x11\xcfu&\xb1\xae\x18\xc0X'
>>> value.decode('base64').encode('hex')
'0011cf7526b1ae18c058'
I need to emphasise that your data may simply be corrupted. Your console output includes one value that worked, and one that failed. The one that worked is one character longer, and that's the only difference:
ABHvPdSaxrhjAWA=
ABHPdSaxrhjAWA=
Note the v in the 4th place; this is missing from the second example. This could indicate that something happened to your CSV data that caused that character to be dropped from the second example. Adding in padding can make the second value decodable again, but the result would be wrong. We can't tell you which of those two options is the cause here.
Simple demonstration:
In [1]: import base64
In [2]: data = 'demonstration yo'
In [3]: code = base64.b64encode(data)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 code = base64.b64encode(data)
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:58, in b64encode(s, altchars)
51 def b64encode(s, altchars=None):
52 """Encode the bytes-like object s using Base64 and return a bytes object.
53
54 Optional altchars should be a byte string of length 2 which specifies an
55 alternative alphabet for the '+' and '/' characters. This allows an
56 application to e.g. generate url or filesystem safe Base64 strings.
57 """
---> 58 encoded = binascii.b2a_base64(s, newline=False)
59 if altchars is not None:
60 assert len(altchars) == 2, repr(altchars)
TypeError: a bytes-like object is required, not 'str'
In [4]: data = 'demonstration yo'.encode("ascii")
In [5]: code = base64.b64encode(data)
In [6]: code
Out[6]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [7]: base64.b64decode(code) == data
Out[7]: True
In [8]: base64.b64decode(code[0:18]) == data
---------------------------------------------------------------------------
Error Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 base64.b64decode(code[0:18]) == data
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:87, in b64decode(s, altchars, validate)
85 if validate and not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s):
86 raise binascii.Error('Non-base64 digit found')
---> 87 return binascii.a2b_base64(s)
Error: Incorrect padding
What's cool:
It ignores extra padding.
In [13]: code
Out[13]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [14]: base64.b64decode(code + "=========".encode("ascii")) == data
Out[14]: True
I am trying to decode Base64 into Hex for about 200 Base64 data and I am getting this following error. It does decoding for 60 of them then stops.
ABHvPdSaxrhjAWA=
0011ef3dd49ac6b8630160
ABHPdSaxrhjAWA=
Traceback (most recent call last):
File "tt.py", line 36, in <module>
csvlines[0] = csvlines[0].decode("base64").encode("hex")
File "C:\Python27\lib\encodings\base64_codec.py", line 43, in base64_decode
output = base64.decodestring(input)
File "C:\Python27\lib\base64.py", line 325, in decodestring
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
Some original Base64 source from CSV
ABHPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdS4xriiAVQ=
ABDPdSqxrizAU4=
ABDPdSrxrjPAUo=
You have at least one string in your CSV file that is either not a Base64 string, is a corrupted (damaged) Base64 string, or is a string that is missing the required = padding. Your example value, ABHPdSaxrhjAWA=, is short one = or is missing another data character.
Base64 strings, properly padded, have a length that is a multiple of 4, so you can easily re-add the padding:
value = csvlines[0]
if len(value) % 4:
# not a multiple of 4, add padding:
value += '=' * (4 - len(value) % 4)
csvlines[0] = value.decode("base64").encode("hex")
If the value then still fails to decode, then your input was corrupted or not valid Base64 to begin with.
For the example error, ABHPdSaxrhjAWA=, the above adds one = to make it decodable:
>>> value = 'ABHPdSaxrhjAWA='
>>> if len(value) % 4:
... # not a multiple of 4, add padding:
... value += '=' * (4 - len(value) % 4)
...
>>> value
'ABHPdSaxrhjAWA=='
>>> value.decode('base64')
'\x00\x11\xcfu&\xb1\xae\x18\xc0X'
>>> value.decode('base64').encode('hex')
'0011cf7526b1ae18c058'
I need to emphasise that your data may simply be corrupted. Your console output includes one value that worked, and one that failed. The one that worked is one character longer, and that's the only difference:
ABHvPdSaxrhjAWA=
ABHPdSaxrhjAWA=
Note the v in the 4th place; this is missing from the second example. This could indicate that something happened to your CSV data that caused that character to be dropped from the second example. Adding in padding can make the second value decodable again, but the result would be wrong. We can't tell you which of those two options is the cause here.
Simple demonstration:
In [1]: import base64
In [2]: data = 'demonstration yo'
In [3]: code = base64.b64encode(data)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 code = base64.b64encode(data)
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:58, in b64encode(s, altchars)
51 def b64encode(s, altchars=None):
52 """Encode the bytes-like object s using Base64 and return a bytes object.
53
54 Optional altchars should be a byte string of length 2 which specifies an
55 alternative alphabet for the '+' and '/' characters. This allows an
56 application to e.g. generate url or filesystem safe Base64 strings.
57 """
---> 58 encoded = binascii.b2a_base64(s, newline=False)
59 if altchars is not None:
60 assert len(altchars) == 2, repr(altchars)
TypeError: a bytes-like object is required, not 'str'
In [4]: data = 'demonstration yo'.encode("ascii")
In [5]: code = base64.b64encode(data)
In [6]: code
Out[6]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [7]: base64.b64decode(code) == data
Out[7]: True
In [8]: base64.b64decode(code[0:18]) == data
---------------------------------------------------------------------------
Error Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 base64.b64decode(code[0:18]) == data
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:87, in b64decode(s, altchars, validate)
85 if validate and not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s):
86 raise binascii.Error('Non-base64 digit found')
---> 87 return binascii.a2b_base64(s)
Error: Incorrect padding
What's cool:
It ignores extra padding.
In [13]: code
Out[13]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [14]: base64.b64decode(code + "=========".encode("ascii")) == data
Out[14]: True
When trying to parse an empty string I get a SyntaxError. Why does it raise a different error than parsing a 'foo'? In the source of ast.literal_eval only ValueError is explicitly raised.
In [1]: import ast
In [2]: ast.literal_eval('foo')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-d8695a7c4a9f> in <module>()
----> 1 ast.literal_eval('foo')
/usr/lib/python2.7/ast.pyc in literal_eval(node_or_string)
78 return left - right
79 raise ValueError('malformed string')
---> 80 return _convert(node_or_string)
81
82
/usr/lib/python2.7/ast.pyc in _convert(node)
77 else:
78 return left - right
---> 79 raise ValueError('malformed string')
80 return _convert(node_or_string)
81
ValueError: malformed string
In [3]: ast.literal_eval('')
File "<unknown>", line 0
^
SyntaxError: unexpected EOF while parsing
ast uses compile to compile the source string (which must be an expression) into an AST.
If the source string is not a valid expression (like an empty string), a SyntaxError will be raised by compile. If, on the other hand, the source string would be a valid expression (e.g. a variable name like foo), compile will succeed but then literal_eval might fail with a ValueError.
Therefore, you should catch both SyntaxError and ValueError when using literal_eval.