Is there any way to return the number of Google search results in Python3? I tried several way from SO but none of them are still working:
>>> import requests
>>> from bs4 import BeautifulSoup
>>> def get_results(name):
re = requests.get('https://www.google.com/search', params={'q':name})
soup = BeautifulSoup(re.text, 'lxml')
response = soup.find('div', {'id': 'resultStats'})
return int(response.text.replace(',', '').split()[1])
>>> get_results('Leonardo DiCaprio')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in get_results
AttributeError: 'NoneType' object has no attribute 'text'
response in your get_results() function is None because the request to Google returned an error page so the div you are looking for does not exist. You should check for a successful response status before trying to parse the results.
Related
I'm trying to create the following variable in my Python shell app:
soup = bs4.BeautifulSoup(offSpring.text, 'lxml')
However, it keeps returning an error saying:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'bs4' is not defined
I have already installed bs using the "pip install bs" command.
Any ideas as to why this isn't working correctly?
Try this:
from bs4 import BeautifulSoup
soup = BeautifulSoup(offSpring.text, 'lxml')
For a school assignment we need to use some moviedata. Instead of copy/pasting all the info I need I thought I would scrape it off IMDB. However I am not familiar with Python and I am running into an issue here.
This is my code:
import urllib
import urllib.request
from bs4 import BeautifulSoup
url = "http://www.imdb.com"
values = {'q' : 'The Matrix'}
data = urllib.parse.urlencode(values).encode("utf-8")
req = urllib.request.Request(url, data)
response = urllib.request.urlopen(req)
r = response.read()
soup = BeautifulSoup(r)
That code keeps giving me the error:
> Traceback (most recent call last): File "<pyshell#16>", line 1, in
> <module>
> soup = BeautifulSoup(r) File "C:\Users\My Name\AppData\Local\Programs\Python\Python36-32\lib\site-packages\bs4\__init__.py",
> line 153, in __init__
> builder = builder_class() File "C:\Users\My Name\AppData\Local\Programs\Python\Python36-32\lib\site-packages\bs4\builder\_htmlparser.py",
> line 39, in __init__
> return super(HTMLParserTreeBuilder, self).__init__(*args, **kwargs) TypeError: __init__() got an unexpected keyword argument 'strict
'
Does any of you great minds know what I am doing wrong?
I tried using google and found a post mentioning it migt had something to do with requests so I unistalled requests and installed it again... didn't work.
here is my code i want to scrape a list of words from a website,
but when i call the .string on the
import requests
from bs4 import BeautifulSoup
url = "https://www.merriam-webster.com/browse/thesaurus/a"
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
entry_view = soup.find_all('div', {'class': 'entries'})
view = entry_view[0]
list = view.ul
for m in list:
for x in m:
title = x.string
print(title)
what I want is a list printing the text from the website but what I get is an error
Traceback (most recent call last):
File "/home/vidu/PycharmProjects/untitled/hello.py", line 14, in <module>
title = x.string
AttributeError: 'str' object has no attribute 'string'
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
import apport.fileutils
File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
from apport.packaging_impl import impl as packaging
File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 23, in <module>
import apt
File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'
Original exception was:
Traceback (most recent call last):
File "/home/vidu/PycharmProjects/untitled/hello.py", line 14, in <module>
title = x.string
AttributeError: 'str' object has no attribute 'string'
You can achieve what you want by using the following piece of code.
Code:
import requests
from bs4 import BeautifulSoup
url = "https://www.merriam-webster.com/browse/thesaurus/a"
html_source = requests.get(url).text
soup = BeautifulSoup(html_source, "html.parser")
entry_view = soup.find_all('div', {'class': 'entries'})
entries = []
for elem in entry_view:
for e in elem.find_all('a'):
entries.append(e.text)
#show only 5 elements and whole list length
print(entries[:5])
print(entries[-5:])
print(len(entries))
Output:
['A1', 'aback', 'abaft', 'abandon', 'abandoned']
['absorbing', 'absorption', 'abstainer', 'abstain from', 'abstemious']
100
In your code:
print(type(list))
<class 'bs4.element.Tag'>
print(type(m))
<class 'bs4.element.NavigableString'>
print(type(x))
<class 'str'>
So, as you can see, the variable x is already a string, so it's non-sense to use the bs4 method .string().
p.s.: you shouldn't use a variable name like list, it's a reserved keyword.
AttributeError: 'str' object has no attribute 'string'
This is telling you that the object is already a string. Try removing that and it should work.
It also tells you that the proper syntax of the string data type is str not string.
Another thing to take home from this is that you convert using title = str(x), but since it is already a string in this case it is redundant.
To quote Google:
Python has a built-in string class named "str" with many handy features (there is an older module named "string" which you should not use)
I have the following code to load JSON:
import json
import requests
r = requests.get('http://api.reddit.com/controversial?limit=5')
if r.status_code = 200:
reddit_data = json.loads(r.content)
print reddit_data['data']['children'][1]['data']
else:
print "Errror."
And I got this message.
arsh#arsh:~$ python q.py
Traceback (most recent call last):
File "q.py", line 1, in <module>
import json
File "/home/arsh/json.py", line 5, in <module>
reddit_data = json.loads(r.content)
AttributeError: 'module' object has no attribute 'loads'
You have a different file called json.py in your home directory:
File "/home/arsh/json.py", line 5, in <module>
This file is in the way, you did not import the standard library version. Rename it to something else or delete it. You'll also have to remove the json.pyc file.
Note that requests response objects can already handle JSON responses for you:
import requests
r = requests.get('http://api.reddit.com/controversial?limit=5')
r.raise_for_status()
reddit_data = r.json()
print reddit_data['data']['children'][1]['data']
The Response.json() method handles decoding JSON for you, including detecting the correct characterset to use when decoding.
I'm trying to grab all the winner categories from this page:
http://www.chicagoreader.com/chicago/BestOf?category=4053660&year=2013
I've written this in sublime:
import urllib2
from bs4 import BeautifulSoup
url = "http://www.chicagoreader.com/chicago/BestOf?category=4053660&year=2013"
page = urllib2.urlopen(url)
soup_package = BeautifulSoup(page)
page.close()
#find everything in the div class="bestOfItem). This works.
all_categories = soup_package.findAll("div",class_="bestOfItem")
# print(all_categories)
#this part breaks it:
soup = BeautifulSoup(all_categories)
winner = soup.a.string
print(winner)
When I run this in terminal, I get the following error:
Traceback (most recent call last):
File "winners.py", line 12, in <module>
soup = BeautifulSoup(all_categories)
File "build/bdist.macosx-10.9-intel/egg/bs4/__init__.py", line 193, in __init__
File "build/bdist.macosx-10.9-intel/egg/bs4/builder/_lxml.py", line 99, in prepare_markup
File "build/bdist.macosx-10.9-intel/egg/bs4/dammit.py", line 249, in encodings
File "build/bdist.macosx-10.9-intel/egg/bs4/dammit.py", line 304, in find_declared_encoding
TypeError: expected string or buffer
Any one know what's happening there?
You are trying to create a new BeautifulSoup object from a list of elements.
soup = BeautifulSoup(all_categories)
There is absolutely no need to do this here; just loop over each match instead:
for match in all_categories:
winner = match.a.string
print(winner)