Base64 Incorrect padding error using Python - python

I am trying to decode Base64 into Hex for about 200 Base64 data and I am getting this following error. It does decoding for 60 of them then stops.
ABHvPdSaxrhjAWA=
0011ef3dd49ac6b8630160
ABHPdSaxrhjAWA=
Traceback (most recent call last):
File "tt.py", line 36, in <module>
csvlines[0] = csvlines[0].decode("base64").encode("hex")
File "C:\Python27\lib\encodings\base64_codec.py", line 43, in base64_decode
output = base64.decodestring(input)
File "C:\Python27\lib\base64.py", line 325, in decodestring
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
Some original Base64 source from CSV
ABHPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdS4xriiAVQ=
ABDPdSqxrizAU4=
ABDPdSrxrjPAUo=

You have at least one string in your CSV file that is either not a Base64 string, is a corrupted (damaged) Base64 string, or is a string that is missing the required = padding. Your example value, ABHPdSaxrhjAWA=, is short one = or is missing another data character.
Base64 strings, properly padded, have a length that is a multiple of 4, so you can easily re-add the padding:
value = csvlines[0]
if len(value) % 4:
# not a multiple of 4, add padding:
value += '=' * (4 - len(value) % 4)
csvlines[0] = value.decode("base64").encode("hex")
If the value then still fails to decode, then your input was corrupted or not valid Base64 to begin with.
For the example error, ABHPdSaxrhjAWA=, the above adds one = to make it decodable:
>>> value = 'ABHPdSaxrhjAWA='
>>> if len(value) % 4:
... # not a multiple of 4, add padding:
... value += '=' * (4 - len(value) % 4)
...
>>> value
'ABHPdSaxrhjAWA=='
>>> value.decode('base64')
'\x00\x11\xcfu&\xb1\xae\x18\xc0X'
>>> value.decode('base64').encode('hex')
'0011cf7526b1ae18c058'
I need to emphasise that your data may simply be corrupted. Your console output includes one value that worked, and one that failed. The one that worked is one character longer, and that's the only difference:
ABHvPdSaxrhjAWA=
ABHPdSaxrhjAWA=
Note the v in the 4th place; this is missing from the second example. This could indicate that something happened to your CSV data that caused that character to be dropped from the second example. Adding in padding can make the second value decodable again, but the result would be wrong. We can't tell you which of those two options is the cause here.

Simple demonstration:
In [1]: import base64
In [2]: data = 'demonstration yo'
In [3]: code = base64.b64encode(data)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 code = base64.b64encode(data)
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:58, in b64encode(s, altchars)
51 def b64encode(s, altchars=None):
52 """Encode the bytes-like object s using Base64 and return a bytes object.
53
54 Optional altchars should be a byte string of length 2 which specifies an
55 alternative alphabet for the '+' and '/' characters. This allows an
56 application to e.g. generate url or filesystem safe Base64 strings.
57 """
---> 58 encoded = binascii.b2a_base64(s, newline=False)
59 if altchars is not None:
60 assert len(altchars) == 2, repr(altchars)
TypeError: a bytes-like object is required, not 'str'
In [4]: data = 'demonstration yo'.encode("ascii")
In [5]: code = base64.b64encode(data)
In [6]: code
Out[6]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [7]: base64.b64decode(code) == data
Out[7]: True
In [8]: base64.b64decode(code[0:18]) == data
---------------------------------------------------------------------------
Error Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 base64.b64decode(code[0:18]) == data
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:87, in b64decode(s, altchars, validate)
85 if validate and not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s):
86 raise binascii.Error('Non-base64 digit found')
---> 87 return binascii.a2b_base64(s)
Error: Incorrect padding
What's cool:
It ignores extra padding.
In [13]: code
Out[13]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [14]: base64.b64decode(code + "=========".encode("ascii")) == data
Out[14]: True

Related

How to decode hexadecimal string with "b" at the beggining?

I have an hexadecimal string like this:
s = '\x83 \x01\x86\x01p\t\x89oA'
I decoded to hex values like this, getting the following output.
>>> ' '.join('{:02x}'.format(ord(ch)) for ch in s)
'83 20 01 86 01 70 09 89 6f 41'
But now I have issues to decode a hex string that is exactly as the previous one, but this comes from a binary file. and has a b at the begining. The error below:
with open('file.dat', 'rb') as infile:
data = infile.read()
>>> data
b'\x83 \x01\x86\x01p\t\x89oA'
>>> ' '.join('{:02x}'.format(ord(ch)) for ch in data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <genexpr>
TypeError: ord() expected string of length 1, but int found
How would be the way to fix this? Thanks
Use .hex() method on the byte string instead.
In [25]: data = b'\x83 \x01\x86\x01p\t\x89oA'
In [26]: data.hex()
Out[26]: '83200186017009896f41'

Convert base64 to string using base64.b64decode method [duplicate]

I am trying to decode Base64 into Hex for about 200 Base64 data and I am getting this following error. It does decoding for 60 of them then stops.
ABHvPdSaxrhjAWA=
0011ef3dd49ac6b8630160
ABHPdSaxrhjAWA=
Traceback (most recent call last):
File "tt.py", line 36, in <module>
csvlines[0] = csvlines[0].decode("base64").encode("hex")
File "C:\Python27\lib\encodings\base64_codec.py", line 43, in base64_decode
output = base64.decodestring(input)
File "C:\Python27\lib\base64.py", line 325, in decodestring
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
Some original Base64 source from CSV
ABHPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdS4xriiAVQ=
ABDPdSqxrizAU4=
ABDPdSrxrjPAUo=
You have at least one string in your CSV file that is either not a Base64 string, is a corrupted (damaged) Base64 string, or is a string that is missing the required = padding. Your example value, ABHPdSaxrhjAWA=, is short one = or is missing another data character.
Base64 strings, properly padded, have a length that is a multiple of 4, so you can easily re-add the padding:
value = csvlines[0]
if len(value) % 4:
# not a multiple of 4, add padding:
value += '=' * (4 - len(value) % 4)
csvlines[0] = value.decode("base64").encode("hex")
If the value then still fails to decode, then your input was corrupted or not valid Base64 to begin with.
For the example error, ABHPdSaxrhjAWA=, the above adds one = to make it decodable:
>>> value = 'ABHPdSaxrhjAWA='
>>> if len(value) % 4:
... # not a multiple of 4, add padding:
... value += '=' * (4 - len(value) % 4)
...
>>> value
'ABHPdSaxrhjAWA=='
>>> value.decode('base64')
'\x00\x11\xcfu&\xb1\xae\x18\xc0X'
>>> value.decode('base64').encode('hex')
'0011cf7526b1ae18c058'
I need to emphasise that your data may simply be corrupted. Your console output includes one value that worked, and one that failed. The one that worked is one character longer, and that's the only difference:
ABHvPdSaxrhjAWA=
ABHPdSaxrhjAWA=
Note the v in the 4th place; this is missing from the second example. This could indicate that something happened to your CSV data that caused that character to be dropped from the second example. Adding in padding can make the second value decodable again, but the result would be wrong. We can't tell you which of those two options is the cause here.
Simple demonstration:
In [1]: import base64
In [2]: data = 'demonstration yo'
In [3]: code = base64.b64encode(data)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 code = base64.b64encode(data)
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:58, in b64encode(s, altchars)
51 def b64encode(s, altchars=None):
52 """Encode the bytes-like object s using Base64 and return a bytes object.
53
54 Optional altchars should be a byte string of length 2 which specifies an
55 alternative alphabet for the '+' and '/' characters. This allows an
56 application to e.g. generate url or filesystem safe Base64 strings.
57 """
---> 58 encoded = binascii.b2a_base64(s, newline=False)
59 if altchars is not None:
60 assert len(altchars) == 2, repr(altchars)
TypeError: a bytes-like object is required, not 'str'
In [4]: data = 'demonstration yo'.encode("ascii")
In [5]: code = base64.b64encode(data)
In [6]: code
Out[6]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [7]: base64.b64decode(code) == data
Out[7]: True
In [8]: base64.b64decode(code[0:18]) == data
---------------------------------------------------------------------------
Error Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 base64.b64decode(code[0:18]) == data
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:87, in b64decode(s, altchars, validate)
85 if validate and not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s):
86 raise binascii.Error('Non-base64 digit found')
---> 87 return binascii.a2b_base64(s)
Error: Incorrect padding
What's cool:
It ignores extra padding.
In [13]: code
Out[13]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [14]: base64.b64decode(code + "=========".encode("ascii")) == data
Out[14]: True

Why am I receiving this error when sorting a Text file by a certain column based upon number?

My code for the sorting of the file.
g = open('Lapse File.txt', 'r')
column = []
i = 1
next(g)
for line in g:
column.append(int(line.split('\t')[2]))
column.sort()
This is the error I get.
Traceback (most recent call last):
File "E:/Owles/new lapse .py", line 51, in <module>
column.append(int(line.split('\t')[2]))
ValueError: invalid literal for int() with base 10: '-8.3\n
My main question is why is there a \n. Earlier in the code I had written to another text file and wrote it by column from a previously read in file.
This is my code for writing the file
for columns in (raw.strip().split() for raw in Sounding):
if (i >2 and i <=33):
G.write(columns [3]+'\t'+columns[2]+'\t'+columns[4]+'\n')
i = i + 1
elif (i >= 34):
G.write(columns [0]+'\t'+columns[1]+'\t'+columns[2]+'\n')
i = i + 1
else:
i = i + 1
I am unsure if writing the lines like that is the issue because I have inserted the new line function.
The traceback is telling you exactly what happened:
ValueError: invalid literal for int() with base 10: '-8.3\n'
The problem here is that, while int() can handle the negative sign and the trailing newline character, it can't handle the decimal point, '.'. As you know, -8.3 may be a real, rational number, but it's not an integer. If you want to preserve the fractional value to end up with -8.3, use float() instead of int(). If you want to discard the fractional value to end up with -8, use float() to parse the string and then use int() on the result.
-8.3:
column.append(float(line.split('\t')[2]))
-8:
column.append(int(float(line.split('\t')[2])))
Because only numeric strings can be cast to integers; look at this:
numeric_string = "109"
not_numeric_string = "f9"
This is okay:
>>> int(numeric_string)
109
And it cannot be cast:
>>> int(not_numeric_string)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'f9'
So somewhere in your script it is getting a non-numeric string.
It seems as though the "-8.3\n" string sequence has raised the error, so you must strip escape chars as well.

Value Error : invalid literal for int() with base 10: ''

I'm new in Python and I don't know why I'm getting this error sometimes.
This is the code:
import random
sorteio = []
urna = open("urna.txt")
y = 1
while y <= 50:
sort = int(random.random() * 392)
print sort
while sort > 0:
x = urna.readline()
sort = sort - 1
print x
sorteio = sorteio + [int(x)]
y = y + 1
print sorteio
Where urna.txt is a file on this format:
1156
459
277
166
638
885
482
879
33
559
I'll be grateful if anyone knows why this error appears and how to fix it.
Upon attempting to read past the end of the file, you're getting an empty string '' which cannot be converted to an int.
>>> int('')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: ''
to satisfy the requirement of selecting 50 random lines from the text value, if I understand your problem correctly:
import random
with open("urna.txt") as urna:
sorteio = [int(line) for line in urna] # all lines of the file as ints
selection = random.sample(sorteio, 50)
print selection
.readline() returns an empty string when you come to the end of the file, and that is not a valid number.
Test for it:
if x.strip(): # not empty apart from whitespace
sorteio = sorteio + [int(x)]
You appear to beappending to a list; lists have a method for that:
sorteio.append(int(x))
If you want to get a random sample from your file, there are better methods. One is to read all values, then use random.sample(), or you can pick values as you read the file line by line all the while adjusting the likelihood the next line is part of the sample. See a previous answer of mine for a more in-depth discussion on that subject.

How can I get python base64 encoding to play nice with json?

I have large binary zip files to transport as part of a JSON interface. I've converted them to base64 for this purpose but I'm unable to read them back cleanly as shown with a simple case here:
~ $ ipython --nobanner
In [1]: epub = 'trial/epubs/9780857863812.epub'
In [2]: import base64
In [3]: import json
In [4]: f = open(epub, 'rb')
In [5]: content = f.read()
In [7]: base64.urlsafe_b64decode(base64.urlsafe_b64encode(content)) == content
Out[7]: True
In [8]: base64.urlsafe_b64decode(json.loads(json.dumps(base64.urlsafe_b64encode(content)))) == content
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/home/paul/wk/evanidus/repo_wc/branches/services/carroll/dev/<ipython console> in <module>()
/usr/lib/python2.6/base64.pyc in urlsafe_b64decode(s)
110 The alphabet uses '-' instead of '+' and '_' instead of '/'.
111 """
--> 112 return b64decode(s, '-_')
113
114
/usr/lib/python2.6/base64.pyc in b64decode(s, altchars)
69 """
70 if altchars is not None:
---> 71 s = _translate(s, {altchars[0]: '+', altchars[1]: '/'})
72 try:
73 return binascii.a2b_base64(s)
/usr/lib/python2.6/base64.pyc in _translate(s, altchars)
34 for k, v in altchars.items():
35 translation[ord(k)] = v
---> 36 return s.translate(''.join(translation))
37
38
TypeError: character mapping must return integer, None or unicode
It seems that the json processing is corrupting the base64 content somehow.
The problem was with the encoding. json returns utf-8 encoded text which the base64 module cannot handle (it wants ascii). The correction is therefore to encode('ascii') the json decoded string before passing it to base64:
In [8]: base64.urlsafe_b64decode(
json.loads(
json.dumps(base64.urlsafe_b64encode(content))
)
.encode('ascii')
) == content
Out[7]: True

Categories