When trying to parse an empty string I get a SyntaxError. Why does it raise a different error than parsing a 'foo'? In the source of ast.literal_eval only ValueError is explicitly raised.
In [1]: import ast
In [2]: ast.literal_eval('foo')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-d8695a7c4a9f> in <module>()
----> 1 ast.literal_eval('foo')
/usr/lib/python2.7/ast.pyc in literal_eval(node_or_string)
78 return left - right
79 raise ValueError('malformed string')
---> 80 return _convert(node_or_string)
81
82
/usr/lib/python2.7/ast.pyc in _convert(node)
77 else:
78 return left - right
---> 79 raise ValueError('malformed string')
80 return _convert(node_or_string)
81
ValueError: malformed string
In [3]: ast.literal_eval('')
File "<unknown>", line 0
^
SyntaxError: unexpected EOF while parsing
ast uses compile to compile the source string (which must be an expression) into an AST.
If the source string is not a valid expression (like an empty string), a SyntaxError will be raised by compile. If, on the other hand, the source string would be a valid expression (e.g. a variable name like foo), compile will succeed but then literal_eval might fail with a ValueError.
Therefore, you should catch both SyntaxError and ValueError when using literal_eval.
Related
I am trying to collect all the internal links of Requests library for python and filter out all the external links.
I am using regular expression to do the same. But it is throwing this type error that I am unable to solve.
My code:
import requests
from bs4 import BeautifulSoup
import re
r = requests.get('https://2.python-requests.org/en/master/')
content = BeautifulSoup(r.text)
[i['href'] for i in content.find_all('a') if not re.match("http", i)]
Error:
TypeError Traceback (most recent call last)
<ipython-input-10-b7d82067fe9c> in <module>
----> 1 [i['href'] for i in content.find_all('a') if not re.match("http", i)]
<ipython-input-10-b7d82067fe9c> in <listcomp>(.0)
----> 1 [i['href'] for i in content.find_all('a') if not re.match("http", i)]
~\Anaconda3\lib\re.py in match(pattern, string, flags)
171 """Try to apply the pattern at the start of the string, returning
172 a Match object, or None if no match was found."""
--> 173 return _compile(pattern, flags).match(string)
174
175 def fullmatch(pattern, string, flags=0):
TypeError: expected string or bytes-like object
You are passing it a BeautifulSoup node object not a string. Try this:
[i['href'] for i in content.find_all('a') if not re.match("http", i['href'])]
I am trying to decode Base64 into Hex for about 200 Base64 data and I am getting this following error. It does decoding for 60 of them then stops.
ABHvPdSaxrhjAWA=
0011ef3dd49ac6b8630160
ABHPdSaxrhjAWA=
Traceback (most recent call last):
File "tt.py", line 36, in <module>
csvlines[0] = csvlines[0].decode("base64").encode("hex")
File "C:\Python27\lib\encodings\base64_codec.py", line 43, in base64_decode
output = base64.decodestring(input)
File "C:\Python27\lib\base64.py", line 325, in decodestring
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
Some original Base64 source from CSV
ABHPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdS4xriiAVQ=
ABDPdSqxrizAU4=
ABDPdSrxrjPAUo=
You have at least one string in your CSV file that is either not a Base64 string, is a corrupted (damaged) Base64 string, or is a string that is missing the required = padding. Your example value, ABHPdSaxrhjAWA=, is short one = or is missing another data character.
Base64 strings, properly padded, have a length that is a multiple of 4, so you can easily re-add the padding:
value = csvlines[0]
if len(value) % 4:
# not a multiple of 4, add padding:
value += '=' * (4 - len(value) % 4)
csvlines[0] = value.decode("base64").encode("hex")
If the value then still fails to decode, then your input was corrupted or not valid Base64 to begin with.
For the example error, ABHPdSaxrhjAWA=, the above adds one = to make it decodable:
>>> value = 'ABHPdSaxrhjAWA='
>>> if len(value) % 4:
... # not a multiple of 4, add padding:
... value += '=' * (4 - len(value) % 4)
...
>>> value
'ABHPdSaxrhjAWA=='
>>> value.decode('base64')
'\x00\x11\xcfu&\xb1\xae\x18\xc0X'
>>> value.decode('base64').encode('hex')
'0011cf7526b1ae18c058'
I need to emphasise that your data may simply be corrupted. Your console output includes one value that worked, and one that failed. The one that worked is one character longer, and that's the only difference:
ABHvPdSaxrhjAWA=
ABHPdSaxrhjAWA=
Note the v in the 4th place; this is missing from the second example. This could indicate that something happened to your CSV data that caused that character to be dropped from the second example. Adding in padding can make the second value decodable again, but the result would be wrong. We can't tell you which of those two options is the cause here.
Simple demonstration:
In [1]: import base64
In [2]: data = 'demonstration yo'
In [3]: code = base64.b64encode(data)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 code = base64.b64encode(data)
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:58, in b64encode(s, altchars)
51 def b64encode(s, altchars=None):
52 """Encode the bytes-like object s using Base64 and return a bytes object.
53
54 Optional altchars should be a byte string of length 2 which specifies an
55 alternative alphabet for the '+' and '/' characters. This allows an
56 application to e.g. generate url or filesystem safe Base64 strings.
57 """
---> 58 encoded = binascii.b2a_base64(s, newline=False)
59 if altchars is not None:
60 assert len(altchars) == 2, repr(altchars)
TypeError: a bytes-like object is required, not 'str'
In [4]: data = 'demonstration yo'.encode("ascii")
In [5]: code = base64.b64encode(data)
In [6]: code
Out[6]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [7]: base64.b64decode(code) == data
Out[7]: True
In [8]: base64.b64decode(code[0:18]) == data
---------------------------------------------------------------------------
Error Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 base64.b64decode(code[0:18]) == data
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:87, in b64decode(s, altchars, validate)
85 if validate and not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s):
86 raise binascii.Error('Non-base64 digit found')
---> 87 return binascii.a2b_base64(s)
Error: Incorrect padding
What's cool:
It ignores extra padding.
In [13]: code
Out[13]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [14]: base64.b64decode(code + "=========".encode("ascii")) == data
Out[14]: True
I am trying to decode Base64 into Hex for about 200 Base64 data and I am getting this following error. It does decoding for 60 of them then stops.
ABHvPdSaxrhjAWA=
0011ef3dd49ac6b8630160
ABHPdSaxrhjAWA=
Traceback (most recent call last):
File "tt.py", line 36, in <module>
csvlines[0] = csvlines[0].decode("base64").encode("hex")
File "C:\Python27\lib\encodings\base64_codec.py", line 43, in base64_decode
output = base64.decodestring(input)
File "C:\Python27\lib\base64.py", line 325, in decodestring
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
Some original Base64 source from CSV
ABHPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdSaxrhjAWA=
ABDPdS4xriiAVQ=
ABDPdSqxrizAU4=
ABDPdSrxrjPAUo=
You have at least one string in your CSV file that is either not a Base64 string, is a corrupted (damaged) Base64 string, or is a string that is missing the required = padding. Your example value, ABHPdSaxrhjAWA=, is short one = or is missing another data character.
Base64 strings, properly padded, have a length that is a multiple of 4, so you can easily re-add the padding:
value = csvlines[0]
if len(value) % 4:
# not a multiple of 4, add padding:
value += '=' * (4 - len(value) % 4)
csvlines[0] = value.decode("base64").encode("hex")
If the value then still fails to decode, then your input was corrupted or not valid Base64 to begin with.
For the example error, ABHPdSaxrhjAWA=, the above adds one = to make it decodable:
>>> value = 'ABHPdSaxrhjAWA='
>>> if len(value) % 4:
... # not a multiple of 4, add padding:
... value += '=' * (4 - len(value) % 4)
...
>>> value
'ABHPdSaxrhjAWA=='
>>> value.decode('base64')
'\x00\x11\xcfu&\xb1\xae\x18\xc0X'
>>> value.decode('base64').encode('hex')
'0011cf7526b1ae18c058'
I need to emphasise that your data may simply be corrupted. Your console output includes one value that worked, and one that failed. The one that worked is one character longer, and that's the only difference:
ABHvPdSaxrhjAWA=
ABHPdSaxrhjAWA=
Note the v in the 4th place; this is missing from the second example. This could indicate that something happened to your CSV data that caused that character to be dropped from the second example. Adding in padding can make the second value decodable again, but the result would be wrong. We can't tell you which of those two options is the cause here.
Simple demonstration:
In [1]: import base64
In [2]: data = 'demonstration yo'
In [3]: code = base64.b64encode(data)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 code = base64.b64encode(data)
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:58, in b64encode(s, altchars)
51 def b64encode(s, altchars=None):
52 """Encode the bytes-like object s using Base64 and return a bytes object.
53
54 Optional altchars should be a byte string of length 2 which specifies an
55 alternative alphabet for the '+' and '/' characters. This allows an
56 application to e.g. generate url or filesystem safe Base64 strings.
57 """
---> 58 encoded = binascii.b2a_base64(s, newline=False)
59 if altchars is not None:
60 assert len(altchars) == 2, repr(altchars)
TypeError: a bytes-like object is required, not 'str'
In [4]: data = 'demonstration yo'.encode("ascii")
In [5]: code = base64.b64encode(data)
In [6]: code
Out[6]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [7]: base64.b64decode(code) == data
Out[7]: True
In [8]: base64.b64decode(code[0:18]) == data
---------------------------------------------------------------------------
Error Traceback (most recent call last)
Input In [8], in <cell line: 1>()
----> 1 base64.b64decode(code[0:18]) == data
File ~/anaconda3/envs/dedup/lib/python3.8/base64.py:87, in b64decode(s, altchars, validate)
85 if validate and not re.fullmatch(b'[A-Za-z0-9+/]*={0,2}', s):
86 raise binascii.Error('Non-base64 digit found')
---> 87 return binascii.a2b_base64(s)
Error: Incorrect padding
What's cool:
It ignores extra padding.
In [13]: code
Out[13]: b'ZGVtb25zdHJhdGlvbiB5bw=='
In [14]: base64.b64decode(code + "=========".encode("ascii")) == data
Out[14]: True
I'm facing this error and I'm really not able to find the reason for it.
Can somebody please point out the reason for it ?
for i in tweet_raw.comments:
mns_proc.append(processComUni(i))
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-416-439073b420d1> in <module>()
1 for i in tweet_raw.comments:
----> 2 tweet_processed.append(processtwt(i))
3
<ipython-input-414-4e1b8a8fb285> in processtwt(tweet)
4 #Convert to lower case
5 #tweet = re.sub('RT[\s]+','',tweet)
----> 6 tweet = tweet.lower()
7 #Convert www.* or https?://* to URL
8 #tweet = re.sub('((www\.[\s]+)|(https?://[^\s]+))','',tweet)
AttributeError: 'float' object has no attribute 'lower'
A second similar error that facing is this :
for i in tweet_raw.comments:
tweet_proc.append(processtwt(i))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-423-439073b420d1> in <module>()
1 for i in tweet_raw.comments:
----> 2 tweet_proc.append(processtwt(i))
3
<ipython-input-421-38fab2ef704e> in processComUni(tweet)
11 tweet=re.sub(('[http]+s?://[^\s<>"]+|www\.[^\s<>"]+'),'', tweet)
12 #Convert #username to AT_USER
---> 13 tweet = re.sub('#[^\s]+',' ',tweet)
14 #Remove additional white spaces
15 tweet = re.sub('[\s]+', ' ', tweet)
C:\Users\m1027201\AppData\Local\Continuum\Anaconda\lib\re.pyc in sub(pattern, repl, string, count, flags)
149 a callable, it's passed the match object and must return
150 a replacement string to be used."""
--> 151 return _compile(pattern, flags).sub(repl, string, count)
152
153 def subn(pattern, repl, string, count=0, flags=0):
TypeError: expected string or buffer
Shall I check whether of not a particluar tweet is tring before passing it to processtwt() function ? For this error I dont even know which line its failing at.
Just try using this:
tweet = str(tweet).lower()
Lately, I've been facing many of these errors, and converting them to a string before applying lower() always worked for me.
My answer will be broader than shalini answer. If you want to check if the object is of type str then I suggest you check type of object by using isinstance() as shown below. This is more pythonic way.
tweet = "stackoverflow"
## best way of doing it
if isinstance(tweet,(str,)):
print tweet
## other way of doing it
if type(tweet) is str:
print tweet
## This is one more way to do it
if type(tweet) == str:
print tweet
All the above works fine to check the type of object is string or not.
I have large binary zip files to transport as part of a JSON interface. I've converted them to base64 for this purpose but I'm unable to read them back cleanly as shown with a simple case here:
~ $ ipython --nobanner
In [1]: epub = 'trial/epubs/9780857863812.epub'
In [2]: import base64
In [3]: import json
In [4]: f = open(epub, 'rb')
In [5]: content = f.read()
In [7]: base64.urlsafe_b64decode(base64.urlsafe_b64encode(content)) == content
Out[7]: True
In [8]: base64.urlsafe_b64decode(json.loads(json.dumps(base64.urlsafe_b64encode(content)))) == content
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/home/paul/wk/evanidus/repo_wc/branches/services/carroll/dev/<ipython console> in <module>()
/usr/lib/python2.6/base64.pyc in urlsafe_b64decode(s)
110 The alphabet uses '-' instead of '+' and '_' instead of '/'.
111 """
--> 112 return b64decode(s, '-_')
113
114
/usr/lib/python2.6/base64.pyc in b64decode(s, altchars)
69 """
70 if altchars is not None:
---> 71 s = _translate(s, {altchars[0]: '+', altchars[1]: '/'})
72 try:
73 return binascii.a2b_base64(s)
/usr/lib/python2.6/base64.pyc in _translate(s, altchars)
34 for k, v in altchars.items():
35 translation[ord(k)] = v
---> 36 return s.translate(''.join(translation))
37
38
TypeError: character mapping must return integer, None or unicode
It seems that the json processing is corrupting the base64 content somehow.
The problem was with the encoding. json returns utf-8 encoded text which the base64 module cannot handle (it wants ascii). The correction is therefore to encode('ascii') the json decoded string before passing it to base64:
In [8]: base64.urlsafe_b64decode(
json.loads(
json.dumps(base64.urlsafe_b64encode(content))
)
.encode('ascii')
) == content
Out[7]: True