I'm trying to create the following variable in my Python shell app:
soup = bs4.BeautifulSoup(offSpring.text, 'lxml')
However, it keeps returning an error saying:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'bs4' is not defined
I have already installed bs using the "pip install bs" command.
Any ideas as to why this isn't working correctly?
Try this:
from bs4 import BeautifulSoup
soup = BeautifulSoup(offSpring.text, 'lxml')
Related
I am working on a project that speaks out google search results. Everything is working fine accept this piece of code.
Code
from googlesearch.googlesearch import GoogleSearch
response = GoogleSearch().search("something")
for result in response.results:
print("Title: " + result.title)
print("Content: " + result.getText())
Error Recieved
Traceback (most recent call last):
File "d:/Programming/Python/Projects/Smart-Actions/gSearch.py", line 3, in <module>
from googlesearch.googlesearch import GoogleSearch
File "C:\Users\vinay\anaconda3\lib\site-packages\googlesearch\googlesearch.py", line 6, in <module>
import urllib2
ModuleNotFoundError: No module named 'urllib2'
I am running Python 3.7.6 , Please help me in resolving this error.
For a school assignment we need to use some moviedata. Instead of copy/pasting all the info I need I thought I would scrape it off IMDB. However I am not familiar with Python and I am running into an issue here.
This is my code:
import urllib
import urllib.request
from bs4 import BeautifulSoup
url = "http://www.imdb.com"
values = {'q' : 'The Matrix'}
data = urllib.parse.urlencode(values).encode("utf-8")
req = urllib.request.Request(url, data)
response = urllib.request.urlopen(req)
r = response.read()
soup = BeautifulSoup(r)
That code keeps giving me the error:
> Traceback (most recent call last): File "<pyshell#16>", line 1, in
> <module>
> soup = BeautifulSoup(r) File "C:\Users\My Name\AppData\Local\Programs\Python\Python36-32\lib\site-packages\bs4\__init__.py",
> line 153, in __init__
> builder = builder_class() File "C:\Users\My Name\AppData\Local\Programs\Python\Python36-32\lib\site-packages\bs4\builder\_htmlparser.py",
> line 39, in __init__
> return super(HTMLParserTreeBuilder, self).__init__(*args, **kwargs) TypeError: __init__() got an unexpected keyword argument 'strict
'
Does any of you great minds know what I am doing wrong?
I tried using google and found a post mentioning it migt had something to do with requests so I unistalled requests and installed it again... didn't work.
here is my code i want to scrape a list of words from a website,
but when i call the .string on the
import requests
from bs4 import BeautifulSoup
url = "https://www.merriam-webster.com/browse/thesaurus/a"
source_code = requests.get(url)
plain_text = source_code.text
soup = BeautifulSoup(plain_text, "html.parser")
entry_view = soup.find_all('div', {'class': 'entries'})
view = entry_view[0]
list = view.ul
for m in list:
for x in m:
title = x.string
print(title)
what I want is a list printing the text from the website but what I get is an error
Traceback (most recent call last):
File "/home/vidu/PycharmProjects/untitled/hello.py", line 14, in <module>
title = x.string
AttributeError: 'str' object has no attribute 'string'
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
import apport.fileutils
File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
from apport.packaging_impl import impl as packaging
File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 23, in <module>
import apt
File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'
Original exception was:
Traceback (most recent call last):
File "/home/vidu/PycharmProjects/untitled/hello.py", line 14, in <module>
title = x.string
AttributeError: 'str' object has no attribute 'string'
You can achieve what you want by using the following piece of code.
Code:
import requests
from bs4 import BeautifulSoup
url = "https://www.merriam-webster.com/browse/thesaurus/a"
html_source = requests.get(url).text
soup = BeautifulSoup(html_source, "html.parser")
entry_view = soup.find_all('div', {'class': 'entries'})
entries = []
for elem in entry_view:
for e in elem.find_all('a'):
entries.append(e.text)
#show only 5 elements and whole list length
print(entries[:5])
print(entries[-5:])
print(len(entries))
Output:
['A1', 'aback', 'abaft', 'abandon', 'abandoned']
['absorbing', 'absorption', 'abstainer', 'abstain from', 'abstemious']
100
In your code:
print(type(list))
<class 'bs4.element.Tag'>
print(type(m))
<class 'bs4.element.NavigableString'>
print(type(x))
<class 'str'>
So, as you can see, the variable x is already a string, so it's non-sense to use the bs4 method .string().
p.s.: you shouldn't use a variable name like list, it's a reserved keyword.
AttributeError: 'str' object has no attribute 'string'
This is telling you that the object is already a string. Try removing that and it should work.
It also tells you that the proper syntax of the string data type is str not string.
Another thing to take home from this is that you convert using title = str(x), but since it is already a string in this case it is redundant.
To quote Google:
Python has a built-in string class named "str" with many handy features (there is an older module named "string" which you should not use)
Is there any way to return the number of Google search results in Python3? I tried several way from SO but none of them are still working:
>>> import requests
>>> from bs4 import BeautifulSoup
>>> def get_results(name):
re = requests.get('https://www.google.com/search', params={'q':name})
soup = BeautifulSoup(re.text, 'lxml')
response = soup.find('div', {'id': 'resultStats'})
return int(response.text.replace(',', '').split()[1])
>>> get_results('Leonardo DiCaprio')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in get_results
AttributeError: 'NoneType' object has no attribute 'text'
response in your get_results() function is None because the request to Google returned an error page so the div you are looking for does not exist. You should check for a successful response status before trying to parse the results.
I am running python 3.5 with BeautifulSoup4 and getting an error when I attempt to pass the plain text of a webpage to the constructor.
The source code I am trying to run is
import requests from bs4
import BeautifulSoup
tcg = 'http://magic.tcgplayer.com/db/deck_search_result.asp?Format=Commander'
sourcecode = requests.get(tcg)
plaintext = sourcecode.text
soup = BeautifulSoup(plaintext)
When running this I get the folloing error
Traceback (most recent call last):
File "/Users/Brian/PycharmProjects/magic_crawler/main.py", line 11, in <module>
soup = BeautifulSoup(plaintext)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/bs4/__init__.py", line 202, in __init__
self._feed()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/bs4/__init__.py", line 216, in _feed
self.builder.feed(self.markup)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/bs4/builder/_htmlparser.py", line 156, in feed
parser = BeautifulSoupHTMLParser(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'strict'
Python 3.5 is an alpha release (the first beta is expected this weekend but isn't out just yet at the time of this post). BeautifulSoup certainly hasn't claimed any compatibility with 3.5.
Stick to using Python 3.4 for now.