Can somebody help me with this code? I'm trying to make a python script that will play videos and I found this file that download's Youtube videos. I am not entirely sure what is going on and I can't figure out this error.
Error:
AttributeError: 'NoneType' object has no attribute 'group'
Traceback:
Traceback (most recent call last):
File "youtube.py", line 67, in <module>
videoUrl = getVideoUrl(content)
File "youtube.py", line 11, in getVideoUrl
grps = fmtre.group(0).split('&')
Code snippet:
(lines 66-71)
content = resp.read()
videoUrl = getVideoUrl(content)
if videoUrl is not None:
print('Video URL cannot be found')
exit(1)
(lines 9-17)
def getVideoUrl(content):
fmtre = re.search('(?<=fmt_url_map=).*', content)
grps = fmtre.group(0).split('&')
vurls = urllib2.unquote(grps[0])
videoUrl = None
for vurl in vurls.split('|'):
if vurl.find('itag=5') > 0:
return vurl
return None
The error is in your line 11, your re.search is returning no results, ie None, and then you're trying to call fmtre.group but fmtre is None, hence the AttributeError.
You could try:
def getVideoUrl(content):
fmtre = re.search('(?<=fmt_url_map=).*', content)
if fmtre is None:
return None
grps = fmtre.group(0).split('&')
vurls = urllib2.unquote(grps[0])
videoUrl = None
for vurl in vurls.split('|'):
if vurl.find('itag=5') > 0:
return vurl
return None
You use regex to match the url, but it can't match, so the result is None
and None type doesn't have the group attribute
You should add some code to detect the result
If it can't match the rule, it should not go on under code
def getVideoUrl(content):
fmtre = re.search('(?<=fmt_url_map=).*', content)
if fmtre is None:
return None # if fmtre is None, it prove there is no match url, and return None to tell the calling function
grps = fmtre.group(0).split('&')
vurls = urllib2.unquote(grps[0])
videoUrl = None
for vurl in vurls.split('|'):
if vurl.find('itag=5') > 0:
return vurl
return None
Just wanted to mention the newly walrus operator in this context because this question is marked as a duplicate quite often and the operator may solve this very easily.
Before Python 3.8 we needed:
match = re.search(pattern, string, flags)
if match:
# do sth. useful here
As of Python 3.8 we can write the same as:
if (match := re.search(pattern, string, flags)) is not None:
# do sth. with match
Other languages had this before (think of C or PHP) but imo it makes for a cleaner code.
For the above code this could be
def getVideoUrl(content):
if (fmtre := re.search('(?<=fmt_url_map=).*', content)) is None:
return None
...
just wanted to add to the answers, a group
of data is expected to be in a sequence, so you can
match each section of the grouped data without
skipping over a data because if a word is skipped from a
sentence, we may not refer to the sentence as one group anymore, see the below example for more clarification, however, the compile method is deprecated.
msg = "Malcolm reads lots of books"
#The below code will return an error.
book = re.compile('lots books')
book = re.search(book, msg)
print (book.group(0))
#The below codes works as expected
book = re.compile ('of books')
book = re.search(book, msg)
print (book.group(0))
#Understanding this concept will help in your further
#researchers. Cheers.
Related
I keep getting the following error when trying to parse some json:
Traceback (most recent call last):
File "/Users/batch/projects/kl-api/api/helpers.py", line 37, in collect_youtube_data
keywords = channel_info_response_data['items'][0]['brandingSettings']['channel']['keywords']
KeyError: 'brandingSettings'
How do I make sure that I check my JSON output for a key before assigning it to a variable? If a key isn’t found, then I just want to assign a default value. Code below:
try:
channel_id = channel_id_response_data['items'][0]['id']
channel_info_url = YOUTUBE_URL + '/channels/?key=' + YOUTUBE_API_KEY + '&id=' + channel_id + '&part=snippet,contentDetails,statistics,brandingSettings'
print('Querying:', channel_info_url)
channel_info_response = requests.get(channel_info_url)
channel_info_response_data = json.loads(channel_info_response.content)
no_of_videos = int(channel_info_response_data['items'][0]['statistics']['videoCount'])
no_of_subscribers = int(channel_info_response_data['items'][0]['statistics']['subscriberCount'])
no_of_views = int(channel_info_response_data['items'][0]['statistics']['viewCount'])
avg_views = round(no_of_views / no_of_videos, 0)
photo = channel_info_response_data['items'][0]['snippet']['thumbnails']['high']['url']
description = channel_info_response_data['items'][0]['snippet']['description']
start_date = channel_info_response_data['items'][0]['snippet']['publishedAt']
title = channel_info_response_data['items'][0]['snippet']['title']
keywords = channel_info_response_data['items'][0]['brandingSettings']['channel']['keywords']
except Exception as e:
raise Exception(e)
You can either wrap all your assignment in something like
try:
keywords = channel_info_response_data['items'][0]['brandingSettings']['channel']['keywords']
except KeyError as ignore:
keywords = "default value"
or, let say, use .has_key(...). IMHO In your case first solution is preferable
suppose you have a dict, you have two options to handle the key-not-exist situation:
1) get the key with default value, like
d = {}
val = d.get('k', 10)
val will be 10 since there is not a key named k
2) try-except
d = {}
try:
val = d['k']
except KeyError:
val = 10
This way is far more flexible since you can do anything in the except block, even ignore the error with a pass statement if you really don't care about it.
How do I make sure that I check my JSON output
At this point your "JSON output" is just a plain native Python dict
for a key before assigning it to a variable? If a key isn’t found, then I just want to assign a default value
Now you know you have a dict, browsing the official documention for dict methods should answer the question:
https://docs.python.org/3/library/stdtypes.html#dict.get
get(key[, default])
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
so the general case is:
var = data.get(key, default)
Now if you have deeply nested dicts/lists where any key or index could be missing, catching KeyErrors and IndexErrors can be simpler:
try:
var = data[key1][index1][key2][index2][keyN]
except (KeyError, IndexError):
var = default
As a side note: your code snippet is filled with repeated channel_info_response_data['items'][0]['statistics'] and channel_info_response_data['items'][0]['snippet'] expressions. Using intermediate variables will make your code more readable, easier to maintain, AND a bit faster too:
# always set a timeout if you don't want the program to hang forever
channel_info_response = requests.get(channel_info_url, timeout=30)
# always check the response status - having a response doesn't
# mean you got what you expected. Here we use the `raise_for_status()`
# shortcut which will raise an exception if we have anything else than
# a 200 OK.
channel_info_response.raise_for_status()
# requests knows how to deal with json:
channel_info_response_data = channel_info_response.json()
# we assume that the response MUST have `['items'][0]`,
# and that this item MUST have "statistics" and "snippets"
item = channel_info_response_data['items'][0]
stats = item["statistics"]
snippet = item["snippet"]
no_of_videos = int(stats.get('videoCount', 0))
no_of_subscribers = int(stats.get('subscriberCount', 0))
no_of_views = int(stats.get('viewCount', 0))
avg_views = round(no_of_views / no_of_videos, 0)
try:
photo = snippet['thumbnails']['high']['url']
except KeyError:
photo = None
description = snippet.get('description', "")
start_date = snippet.get('publishedAt', None)
title = snippet.get('title', "")
try:
keywords = item['brandingSettings']['channel']['keywords']
except KeyError
keywords = ""
You may also want to learn about string formatting (contatenating strings is quite error prone and barely readable), and how to pass arguments to requests.get()
Im working on a small project of retrieving information about books from the Google Books API using Python 3. For this i make a call to the API, read out the variables and store those in a list. For a search like "linkedin" this works perfectly. However when i enter "Google", it reads the second title from the JSON input. How can this happen?
Please find my code below (Google_Results is the class I use to initialize the variables):
import requests
def Book_Search(search_term):
parms = {"q": search_term, "maxResults": 3}
r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
print(r.url)
results = r.json()
i = 0
for result in results["items"]:
try:
isbn13 = str(result["volumeInfo"]["industryIdentifiers"][0]["identifier"])
isbn10 = str(result["volumeInfo"]["industryIdentifiers"][1]["identifier"])
title = str(result["volumeInfo"]["title"])
author = str(result["volumeInfo"]["authors"])[2:-2]
publisher = str(result["volumeInfo"]["publisher"])
published_date = str(result["volumeInfo"]["publishedDate"])
description = str(result["volumeInfo"]["description"])
pages = str(result["volumeInfo"]["pageCount"])
genre = str(result["volumeInfo"]["categories"])[2:-2]
language = str(result["volumeInfo"]["language"])
image_link = str(result["volumeInfo"]["imageLinks"]["thumbnail"])
dict = Google_Results(isbn13, isbn10, title, author, publisher, published_date, description, pages, genre,
language, image_link)
gr.append(dict)
print(gr[i].title)
i += 1
except:
pass
return
gr = []
Book_Search("Linkedin")
I am a beginner to Python, so any help would be appreciated!
It does so because there is no publisher entry in volumeInfo of the first entry, thus it raises a KeyError and your except captures it. If you're going to work with fuzzy data you have to account for the fact that it will not always have the expected structure. For simple cases you can rely on dict.get() and its default argument to return a 'valid' default entry if an entry is missing.
Also, there are a few conceptual problems with your function - it relies on a global gr which is bad design, it shadows the built-in dict type and it captures all exceptions guaranteeing that you cannot exit your code even with a SIGINT... I'd suggest you to convert it to something a bit more sane:
def book_search(search_term, max_results=3):
results = [] # a list to store the results
parms = {"q": search_term, "maxResults": max_results}
r = requests.get(url="https://www.googleapis.com/books/v1/volumes", params=parms)
try: # just in case the server doesn't return valid JSON
for result in r.json().get("items", []):
if "volumeInfo" not in result: # invalid entry - missing volumeInfo
continue
result_dict = {} # a dictionary to store our discovered fields
result = result["volumeInfo"] # all the data we're interested is in volumeInfo
isbns = result.get("industryIdentifiers", None) # capture ISBNs
if isinstance(isbns, list) and isbns:
for i, t in enumerate(("isbn10", "isbn13")):
if len(isbns) > i and isinstance(isbns[i], dict):
result_dict[t] = isbns[i].get("identifier", None)
result_dict["title"] = result.get("title", None)
authors = result.get("authors", None) # capture authors
if isinstance(authors, list) and len(authors) > 2: # you're slicing from 2
result_dict["author"] = str(authors[2:-2])
result_dict["publisher"] = result.get("publisher", None)
result_dict["published_date"] = result.get("publishedDate", None)
result_dict["description"] = result.get("description", None)
result_dict["pages"] = result.get("pageCount", None)
genres = result.get("authors", None) # capture genres
if isinstance(genres, list) and len(genres) > 2: # since you're slicing from 2
result_dict["genre"] = str(genres[2:-2])
result_dict["language"] = result.get("language", None)
result_dict["image_link"] = result.get("imageLinks", {}).get("thumbnail", None)
# make sure Google_Results accepts keyword arguments like title, author...
# and make them optional as they might not be in the returned result
gr = Google_Results(**result_dict)
results.append(gr) # add it to the results list
except ValueError:
return None # invalid response returned, you may raise an error instead
return results # return the results
Then you can easily retrieve as much info as possible for a term:
gr = book_search("Google")
And it will be far more tolerant of data omissions, provided that your Google_Results type makes most of the entries optional.
Following #Coldspeed's recommendation it became clear that missing information in the JSON file caused the exception to run. Since I only had a "pass" statement there it skipped the entire result. Therefore I will have to adapt the "Try and Except" statements so errors do get handled properly.
Thanks for the help guys!
I am using Python's regex with an if-statement: if the match is None, then it should go to the else clause. But it shows this error:
AttributeError: 'NoneType' object has no attribute 'group'
The script is:
import string
chars = re.escape(string.punctuation)
sub='FW: Re: 29699'
if re.search("^FW: (\w{10})",sub).group(1) is not None :
d=re.search("^FW: (\w{10})",sub).group(1)
else:
a=re.sub(r'['+chars+']', ' ',sub)
d='_'.join(a.split())
Every help is great help!
Your problem is this: if your search doesn't find anything, it will return None. You can't do None.group(1), which is what your code amounts to. Instead, check whether the search result is None—not the search result's first group.
import re
import string
chars = re.escape(string.punctuation)
sub='FW: Re: 29699'
search_result = re.search(r"^FW: (\w{10})", sub)
if search_result is not None:
d = search_result.group(1)
else:
a = re.sub(r'['+chars+']', ' ', sub)
d = '_'.join(a.split())
print(d)
# FW_RE_29699
http://cs1.ucc.ie/~adc2/cgi-bin/lab7/index.html
You can check out the error for yourself by just inputting anything into anyone of the boxes , doesnt have to be all, any help would be great
, I will send on the code after this
from cgitb import enable
enable()
from cgi import FieldStorage,escape
print('Content-Type: text/html')
print()
actor=''
genre=''
theyear=''
director=''
mood=''
result=''
form_data= FieldStorage()
if len(form_data) != 0:
try:
actor=escape(form_data.getfirst('actor'))
genre=escape(form_data.getfirst('genre'))
theyear=escape(form_data.getfirst('theyear'))
director=escape(form_data.getfirst('director'))
mood= escape(form_data.getfirst('mood'))
connection = db.connect('####', '###', '####', '###')
cursor = connection.cursor(db.cursors.DictCursor)
cursor.execute("""SELECT title
FROM films
WHERE (actor = '%s')
OR (actor='%s' AND genre='%s')
OR (actor='%s' AND genre='%s' AND theyear='%i')
OR (actor='%s' AND genre='%s' AND theyear='%i' AND director='%s')
OR (actor='%s' AND genre='%s' AND theyear='%i' AND director='%s' AND mood='%s') % (actor, actor,genre, actor,genre,theyear, actor,genre,theyear,director,actor,genre,theyear,director,mood))
""")
result = """<table>
<tr><th>Your movie!</th></tr>
<tr><th></th></tr>"""
for row in cursor.fetchall():
result+= '<tr><td>%s</td></tr>' ,(row['title'])
result+= '</table>'
cursor.close()
connection.close()
except db.Error:
result = '<p>Sorry! We are currently experiencing technical difficulties.</p>'
Your <input> is named year but you try to run escape(form_data.getfirst('theyear')). getfirst returns None when there is no corresponding form value and escape fails one None. For similar reasons you need to better handle optional fields like what Willem said in the comments.
According to the error code:
/users/2020/adc2/public_html/cgi-bin/lab7/index.py in ()
24 try:
25 actor=escape(form_data.getfirst('actor'))
=> 26 genre=escape(form_data.getfirst('genre'))
27 theyear=escape(form_data.getfirst('theyear'))
28 director=escape(form_data.getfirst('director'))
genre = '', escape = <function escape>, form_data = FieldStorage(None, None, [MiniFieldStorage('actor', 'i')]), form_data.getfirst = <bound method FieldStorage.getfirst of FieldStorage(None, None, [MiniFieldStorage('actor', 'i')])>
/usr/local/lib/python3.4/cgi.py in escape(s=None, quote=None)
1038 warn("cgi.escape is deprecated, use html.escape instead",
1039 DeprecationWarning, stacklevel=2)
=> 1040 s = s.replace("&", "&") # Must be done first!
1041 s = s.replace("<", "<")
1042 s = s.replace(">", ">")
s = None, s.replace undefined
escape() seems to get None as an argument. Escape() uses replace() directly internally, according to the given code snippet.. So the quick fix would be to make sure that you do not give None into the escape method but maybe an empty string instead.
my_non_none_value = form_data.getfirst('actor') if form_data.getfirst('actor') else ""
bla = escape(my_non_none_value)
long version:
my_non_none_value = form_data.getfirst('actor')
if my_non_none_value is None:
my_non_none_value = ""
bla = escape(my_non_none_value)
Side note: escape() in cgi is deprecated, use html.escape() instead.
I have some code below, which I admit is very repetitive in terms of tasks, I am looking for a cleaner way to exit a while loop based on strings found in a dynamic stdout feed.
I have a config file which outlines various test scenarios and highlights the string sequences to search for . Sometimes a test will require only two strings to be found, other times as much as four strings will need to be found, hence the try and except blocks below. If the string can't be found in the config file for that particular test scenario it will set the 'stringfoundn' to None.
Here's what I have so far:
#while loop runs from here looking through stdout from log file:
while not stdout_queue.empty():
line = stdout_queue.get()
#There will always be one string to look for
string1 = config.get(TestType, 'STRING1')
matchString1 = re.search(string1, line)
# See if String 2 is in Config file for testtype - else set to None
try:
string2 = config.get(TestType, 'STRING2')
matchString2 = re.search(string2, line)
except:
stringFound2 = None
# See if String 3 is in Config file for testtype - else set to None
try:
string3 = config.get(TestType, 'STRING3')
matchString3 = re.search(string3, line)
except:
stringFound3 = None
pass
# See if String 3 is in Config file for testtype - else set to None
try:
string4 = config.get(TestType, 'STRING4')
matchString4 = re.search(string4, line)
except:
stringFound4 = None
pass
if matchString1 and not stringFound1:
stringFound1 = 1
if matchString2 and not stringFound2:
stringFound2 = 1
if matchString3 and not stringFound3:
stringFound1 = 1
if matchString4 and not stringFound4:
stringFound4 = 1
if ((stringFound2 and stringFound3 and stringFound4) is None) and stringFound1:
# do stuff here in cases where only ONE string is entered into Config file testtype
return
if ((stringFound3 and stringFound4) is None) and (stringFound1 and stringFound2):
# do stuff here in cases where only TWO strings are entered into Config file testtype
return
if stringFound4 is None and (stringFound1 and stringFound2 and stringFound3):
# do stuff here in cases where only THREE strings are entered into Config file testtype
return
if stringFound1 and stringFound2 and stringFound3 and stringFound4:
# do stuff here in cases where only THREE strings are entered into Config file testtype
return
Aside for the possibility of tidying up the loop, I think my problem lies with the ('is None) and (stringFound) if statements at the end. Any ideas on how to streamline, or better exit this loop?
Thanks
The re.search function already returns None if the string is not matched so there is no need for 2 variables per string to be matched:
try:
string2 = config.get(TestType, 'STRING2')
matchString2 = re.search(string2, line)
except:
#stringFound2 = None
matchString2 = None
None in Python is not the same as False, but it is evaluated as such. Take the example:
a = None
b = True
c = None
print (a and b)
print (a or b)
print None == False
print not None
d = a and b
if not d: print 'Trueish'
So you can remove the first block of if statements and replace the last block with something like:
if matchString1 and not (matchString2 or matchString3 or matchString4 ):
# if any of matchString2, 3 or 4 is entered the condition fails
# do stuff here in cases where only ONE string is entered into Config file testtype
return
You may need to adjust the condition(s) but my point of view is there.
How about this:
string_count = 4
while not stdout_queue.empty():
line = stdout_queue.get()
strings = []
matches = []
for i in range(string_count):
string = None
match = None
try:
string = config.get(TestType, 'STRING{}'.format(i + 1))
match = re.search(string, line)
except KeyError:
pass
strings.append(string)
matches.append(match)
valid_strings = tuple(map(bool, strings))
valid_matches = tuple(map(bool, strings))
found = valid_strings and valid_matches
found_count = sum(found)
# If you want to check that specific strings were found
if found == (True, False, False, False):
# Do something
pass
# If you want to do something based on the number of found strings
if found_count == 2:
pass