Convert base64 to string using base64.b64decode method [duplicate] - python

I am trying to decode Base64 into Hex for about 200 Base64 data and I am getting this following error. It does decoding for 60 of them then stops.
ABHvPdSaxrhjAWA=
0011ef3dd49ac6b8630160
ABHPdSaxrhjAWA=
Traceback (most recent call last):
File "tt.py", line 36, in <module>
csvlines[0] = csvlines[0].decode("base64").encode("hex")
File "C:\Python27\lib\encodings\base64_codec.py", line 43, in base64_decode
output = base64.decodestring(input)
File "C:\Python27\lib\base64.py", line 325, in decodestring
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
Some original Base64 source from CSV
ABHPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdS4xriiAVQ=
ABDPdSqxrizAU4=
ABDPdSrxrjPAUo=

You have at least one string in your CSV file that is either not a Base64 string, is a corrupted (damaged) Base64 string, or is a string that is missing the required = padding. Your example value, ABHPdSaxrhjAWA=, is short one = or is missing another data character.
Base64 strings, properly padded, have a length that is a multiple of 4, so you can easily re-add the padding:
value = csvlines[0]
if len(value) % 4:
# not a multiple of 4, add padding:
value += '=' * (4 - len(value) % 4)
csvlines[0] = value.decode("base64").encode("hex")
If the value then still fails to decode, then your input was corrupted or not valid Base64 to begin with.
For the example error, ABHPdSaxrhjAWA=, the above adds one = to make it decodable:
>>> value = 'ABHPdSaxrhjAWA='
>>> if len(value) % 4:
... # not a multiple of 4, add padding:
... value += '=' * (4 - len(value) % 4)
...
>>> value
'ABHPdSaxrhjAWA=='
>>> value.decode('base64')
'\x00\x11\xcfu&\xb1\xae\x18\xc0X'
>>> value.decode('base64').encode('hex')
'0011cf7526b1ae18c058'
I need to emphasise that your data may simply be corrupted. Your console output includes one value that worked, and one that failed. The one that worked is one character longer, and that's the only difference:
ABHvPdSaxrhjAWA=
ABHPdSaxrhjAWA=
Note the v in the 4th place; this is missing from the second example. This could indicate that something happened to your CSV data that caused that character to be dropped from the second example. Adding in padding can make the second value decodable again, but the result would be wrong. We can't tell you which of those two options is the cause here.

Simple demonstration:
In [1]: import base64
In [2]: data = 'demonstration yo'
In [3]: code = base64.b64encode(data)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 code = base64.b64encode(data)
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:58, in b64encode(s, altchars)
51 def b64encode(s, altchars=None):
52 """Encode the bytes-like object s using Base64 and return a bytes object.
53
54 Optional altchars should be a byte string of length 2 which specifies an
55 alternative alphabet for the '+' and '/' characters. This allows an
56 application to e.g. generate url or filesystem safe Base64 strings.
57 """
---> 58 encoded = binascii.b2a_base64(s, newline=False)
59 if altchars is not None:
60 assert len(altchars) == 2, repr(altchars)
TypeError: a bytes-like object is required, not 'str'
In [4]: data = 'demonstration yo'.encode("ascii")
In [5]: code = base64.b64encode(data)
In [6]: code
Out[6]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [7]: base64.b64decode(code) == data
Out[7]: True
In [8]: base64.b64decode(code[0:18]) == data
---------------------------------------------------------------------------
Error Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 base64.b64decode(code[0:18]) == data
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:87, in b64decode(s, altchars, validate)
85 if validate and not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s):
86 raise binascii.Error('Non-base64 digit found')
---> 87 return binascii.a2b_base64(s)
Error: Incorrect padding
What's cool:
It ignores extra padding.
In [13]: code
Out[13]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [14]: base64.b64decode(code + "=========".encode("ascii")) == data
Out[14]: True

Related

TypeError: cannot use a string pattern on a bytes-like object python3

I have updated my project to Python 3.7 and Django 3.0
Here is code of models.py
def get_fields(self):
fields = []
html_text = self.html_file.read()
self.html_file.seek(0)
# for now just find singleline, multiline, img editable
# may put repeater in there later (!!)
for m in re.findall("(<(singleline|multiline|img editable)[^>]*>)", html_text):
# m is ('<img editable="true" label="Image" class="w300" width="300" border="0">', 'img editable')
# or similar
# first is full tag, second is tag type
# append as a list
# MUST also save value in here
data = {'tag':m[0], 'type':m[1], 'label':'', 'value':None}
title_list = re.findall("label\s*=\s*\"([^\"]*)", m[0])
if(len(title_list) == 1):
data['label'] = title_list[0]
# store the data
fields.append(data)
return fields
Here is my error traceback
File "/home/harika/krishna test/dev-1.8/mcam/server/mcam/emails/models.py", line 91, in get_fields
for m in re.findall("(<(singleline|multiline|img editable)[^>]*>)", html_text):
File "/usr/lib/python3.7/re.py", line 225, in findall
return _compile(pattern, flags).findall(string)
TypeError: cannot use a string pattern on a bytes-like object
How can I solve my issue?
The thing is that python3's read returns bytes (i.e. "raw" representation) and not string. You can convert between bytes and string if you specify encoding, i.e. how are characters converted to bytes:
>>> '☺'.encode('utf8')
b'\xe2\x98\xba'
>>> '☺'.encode('utf16')
b'\xff\xfe:&'
the b before string signifies that the value is not string but rather bytes. You can also supply raw bytes if you use that prefix:
>>> bytes_x = b'x'
>>> string_x = 'x'
>>> bytes_x == string_x
False
>>> bytes_x.decode('ascii') == string_x
True
>>> bytes_x == string_x.encode('ascii')
True
Note you can only use basic (ASCII) characters if you are using b prefix:
>>> b'☺'
File "<stdin>", line 1
SyntaxError: bytes can only contain ASCII literal characters.
So to fix your problem you need to either convert the input to a string with appropriate encoding:
html_text = self.html_file.read().decode('utf-8') # or 'ascii' or something else
Or -- probably better option -- is to use bytes in the findalls instead of strings:
for m in re.findall(b"(<(singleline|multiline|img editable)[^>]*>)", html_text):
...
title_list = re.findall(b"label\s*=\s*\"([^\"]*)", m[0])
(note the b in front of each "string")

How to decode hexadecimal string with "b" at the beggining?

I have an hexadecimal string like this:
s = '\x83 \x01\x86\x01p\t\x89oA'
I decoded to hex values like this, getting the following output.
>>> ' '.join('{:02x}'.format(ord(ch)) for ch in s)
'83 20 01 86 01 70 09 89 6f 41'
But now I have issues to decode a hex string that is exactly as the previous one, but this comes from a binary file. and has a b at the begining. The error below:
with open('file.dat', 'rb') as infile:
data = infile.read()
>>> data
b'\x83 \x01\x86\x01p\t\x89oA'
>>> ' '.join('{:02x}'.format(ord(ch)) for ch in data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <genexpr>
TypeError: ord() expected string of length 1, but int found
How would be the way to fix this? Thanks
Use .hex() method on the byte string instead.
In [25]: data = b'\x83 \x01\x86\x01p\t\x89oA'
In [26]: data.hex()
Out[26]: '83200186017009896f41'

Base64 Incorrect padding error using Python

I am trying to decode Base64 into Hex for about 200 Base64 data and I am getting this following error. It does decoding for 60 of them then stops.
ABHvPdSaxrhjAWA=
0011ef3dd49ac6b8630160
ABHPdSaxrhjAWA=
Traceback (most recent call last):
File "tt.py", line 36, in <module>
csvlines[0] = csvlines[0].decode("base64").encode("hex")
File "C:\Python27\lib\encodings\base64_codec.py", line 43, in base64_decode
output = base64.decodestring(input)
File "C:\Python27\lib\base64.py", line 325, in decodestring
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
Some original Base64 source from CSV
ABHPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdS4xriiAVQ=
ABDPdSqxrizAU4=
ABDPdSrxrjPAUo=
You have at least one string in your CSV file that is either not a Base64 string, is a corrupted (damaged) Base64 string, or is a string that is missing the required = padding. Your example value, ABHPdSaxrhjAWA=, is short one = or is missing another data character.
Base64 strings, properly padded, have a length that is a multiple of 4, so you can easily re-add the padding:
value = csvlines[0]
if len(value) % 4:
# not a multiple of 4, add padding:
value += '=' * (4 - len(value) % 4)
csvlines[0] = value.decode("base64").encode("hex")
If the value then still fails to decode, then your input was corrupted or not valid Base64 to begin with.
For the example error, ABHPdSaxrhjAWA=, the above adds one = to make it decodable:
>>> value = 'ABHPdSaxrhjAWA='
>>> if len(value) % 4:
... # not a multiple of 4, add padding:
... value += '=' * (4 - len(value) % 4)
...
>>> value
'ABHPdSaxrhjAWA=='
>>> value.decode('base64')
'\x00\x11\xcfu&\xb1\xae\x18\xc0X'
>>> value.decode('base64').encode('hex')
'0011cf7526b1ae18c058'
I need to emphasise that your data may simply be corrupted. Your console output includes one value that worked, and one that failed. The one that worked is one character longer, and that's the only difference:
ABHvPdSaxrhjAWA=
ABHPdSaxrhjAWA=
Note the v in the 4th place; this is missing from the second example. This could indicate that something happened to your CSV data that caused that character to be dropped from the second example. Adding in padding can make the second value decodable again, but the result would be wrong. We can't tell you which of those two options is the cause here.
Simple demonstration:
In [1]: import base64
In [2]: data = 'demonstration yo'
In [3]: code = base64.b64encode(data)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 code = base64.b64encode(data)
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:58, in b64encode(s, altchars)
51 def b64encode(s, altchars=None):
52 """Encode the bytes-like object s using Base64 and return a bytes object.
53
54 Optional altchars should be a byte string of length 2 which specifies an
55 alternative alphabet for the '+' and '/' characters. This allows an
56 application to e.g. generate url or filesystem safe Base64 strings.
57 """
---> 58 encoded = binascii.b2a_base64(s, newline=False)
59 if altchars is not None:
60 assert len(altchars) == 2, repr(altchars)
TypeError: a bytes-like object is required, not 'str'
In [4]: data = 'demonstration yo'.encode("ascii")
In [5]: code = base64.b64encode(data)
In [6]: code
Out[6]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [7]: base64.b64decode(code) == data
Out[7]: True
In [8]: base64.b64decode(code[0:18]) == data
---------------------------------------------------------------------------
Error Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 base64.b64decode(code[0:18]) == data
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:87, in b64decode(s, altchars, validate)
85 if validate and not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s):
86 raise binascii.Error('Non-base64 digit found')
---> 87 return binascii.a2b_base64(s)
Error: Incorrect padding
What's cool:
It ignores extra padding.
In [13]: code
Out[13]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [14]: base64.b64decode(code + "=========".encode("ascii")) == data
Out[14]: True

Value Error : invalid literal for int() with base 10: ''

I'm new in Python and I don't know why I'm getting this error sometimes.
This is the code:
import random
sorteio = []
urna = open("urna.txt")
y = 1
while y <= 50:
sort = int(random.random() * 392)
print sort
while sort > 0:
x = urna.readline()
sort = sort - 1
print x
sorteio = sorteio + [int(x)]
y = y + 1
print sorteio
Where urna.txt is a file on this format:
1156
459
277
166
638
885
482
879
33
559
I'll be grateful if anyone knows why this error appears and how to fix it.
Upon attempting to read past the end of the file, you're getting an empty string '' which cannot be converted to an int.
>>> int('')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: ''
to satisfy the requirement of selecting 50 random lines from the text value, if I understand your problem correctly:
import random
with open("urna.txt") as urna:
sorteio = [int(line) for line in urna] # all lines of the file as ints
selection = random.sample(sorteio, 50)
print selection
.readline() returns an empty string when you come to the end of the file, and that is not a valid number.
Test for it:
if x.strip(): # not empty apart from whitespace
sorteio = sorteio + [int(x)]
You appear to beappending to a list; lists have a method for that:
sorteio.append(int(x))
If you want to get a random sample from your file, there are better methods. One is to read all values, then use random.sample(), or you can pick values as you read the file line by line all the while adjusting the likelihood the next line is part of the sample. See a previous answer of mine for a more in-depth discussion on that subject.

How can I get python base64 encoding to play nice with json?

I have large binary zip files to transport as part of a JSON interface. I've converted them to base64 for this purpose but I'm unable to read them back cleanly as shown with a simple case here:
~ $ ipython --nobanner
In [1]: epub = 'trial/epubs/9780857863812.epub'
In [2]: import base64
In [3]: import json
In [4]: f = open(epub, 'rb')
In [5]: content = f.read()
In [7]: base64.urlsafe_b64decode(base64.urlsafe_b64encode(content)) == content
Out[7]: True
In [8]: base64.urlsafe_b64decode(json.loads(json.dumps(base64.urlsafe_b64encode(content)))) == content
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/home/paul/wk/evanidus/repo_wc/branches/services/carroll/dev/<ipython console> in <module>()
/usr/lib/python2.6/base64.pyc in urlsafe_b64decode(s)
110 The alphabet uses '-' instead of '+' and '_' instead of '/'.
111 """
--> 112 return b64decode(s, '-_')
113
114
/usr/lib/python2.6/base64.pyc in b64decode(s, altchars)
69 """
70 if altchars is not None:
---> 71 s = _translate(s, {altchars[0]: '+', altchars[1]: '/'})
72 try:
73 return binascii.a2b_base64(s)
/usr/lib/python2.6/base64.pyc in _translate(s, altchars)
34 for k, v in altchars.items():
35 translation[ord(k)] = v
---> 36 return s.translate(''.join(translation))
37
38
TypeError: character mapping must return integer, None or unicode
It seems that the json processing is corrupting the base64 content somehow.
The problem was with the encoding. json returns utf-8 encoded text which the base64 module cannot handle (it wants ascii). The correction is therefore to encode('ascii') the json decoded string before passing it to base64:
In [8]: base64.urlsafe_b64decode(
json.loads(
json.dumps(base64.urlsafe_b64encode(content))
)
.encode('ascii')
) == content
Out[7]: True

Categories