Python3 beautifulsoup module 'NoneType' Error - python

I'm new to beautifulsoup module and I have a problem. My code is simple. Before all, the site I'm trying to scrape from is this
and I am trying to scrape the price. (The big number two (2) with more of it)
My code:
import urllib
from bs4 import BeautifulSoup
quote_page = 'https://www.bloomberg.com/quote/SPX:IND'
page = urllib.request.urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
price_box = soup.find('div', attr = {'class': 'price'})
price = price_box.text
print(price)
The error I get:
price = price_box.text
AttributeError: 'NoneType' object has no attribute 'text'

I have used a more robust CSS Selector instead of the find methods. Since there is only one div element with class price, I am guessing this is the right element.
import requests
from bs4 import BeautifulSoup
response = requests.get('https://www.bloomberg.com/quote/SPX:IND')
soup = BeautifulSoup(response.content, 'lxml')
price = soup.select_one('.price').text
print(price)

Another solution:
from bs4 import BeautifulSoup
from requests import Session
session = Session()
session.headers['user-agent'] = (
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/'
'66.0.3359.181 Safari/537.36'
)
quote_page = 'https://www.bloomberg.com/quote/SPX:IND'
page= session.get(quote_page)
soup = BeautifulSoup(page.text, 'html.parser')
price_box = soup.find('meta', itemprop="price")
price = float(price_box['content'])
print(price)

Related

How to extract data from dynamic website?

I am trying to get the restaurant name and address of each restaurant from this platform:
https://customers.dlivery.live/en/list
So far I tried with BeautifulSoup
import requests
from bs4 import BeautifulSoup
import json
url = 'https://customers.dlivery.live/en/list'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
'AppleWebKit/537.36 (KHTML, like Gecko) '\
'Chrome/75.0.3770.80 Safari/537.36'}
response = requests.get(url,headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
soup
I noticed that within soup there is not the data about the restaurants.
How can I do this?
if you inspect element the page, you will notice that the names are wrapped in the card_heading class, and the addresses are wrapped in card_distance class.
soup = BeautifulSoup(response.text, 'html.parser')
restaurantAddress = soup.find_all(class_='card_distance')
for address in restaurantAddress:
print(address.text)
and
soup = BeautifulSoup(response.text, 'html.parser')
restaurantNames = soup.find_all(class_='card_heading')
for name in restaurantNames:
print(name.text)
Not sure if this exact code will work, but this is pretty close to what you are looking for.

Get text from <span class: with Beautifulsoup and requests

so I tried to get a specific text from a website but it only gives me the error (floor = soup.find('span', {'class': 'text-white fs-14px text-truncate attribute-value'}).text
AttributeError: 'NoneType' object has no attribute 'text')
I specifically want to get the 'Floor Price' text.
My code:
import bs4
from bs4 import BeautifulSoup
#target url
url = "https://magiceden.io/marketplace/solsamo"
#act like browser
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
response = requests.get('https://magiceden.io/marketplace/solsamo')
#parse the downloaded page
soup = BeautifulSoup(response.content, 'lxml')
floor = soup.find('span', {'class': 'text-white fs-14px text-truncate attribute-value'}).text
print(floor)
There is no needed data in HTML you receive after:
response = requests.get('https://magiceden.io/marketplace/solsamo')
You can make sure of this by looking at page source code:
view-source:https://magiceden.io/marketplace/solsamo
You should use Selenium instead requests to get your data or you can examine XHR-requests on this page, maybe you can get this data using requests by following other link.

Web scraping finviz for fundamental data on marketcap

I'm trying to scrape finviz(https://finviz.com/quote.ashx?t=aapl) for marketcap in the fundamental table but couldn't for the life of me locate the table or the class with beautiful soup. It seems that every time I use soup.find_all() it works for 'div', 'td', 'table' etc. but it returns an empty list when I try to add a class like {'class':'snapshot-td2'}. Does anyone know how I may fix this?
import requests
from bs4 import BeautifulSoup
import bs4
def parseMarketCap():
response = requests.get("https://finviz.com/quote.ashx?t=aapl")
soup = bs4.BeautifulSoup(response.content, "lxml")
table = soup.find_all('td', {'class':'snapshot-td2'})
print(table)
I also tried the following for the table, but no luck as well:
table = soup.find_all('table', {'class': "snapshot-table2"})
inspect
fundamental table
You need an user-agent header, then you can use -soup-contains: to target the preceding td by its text then move to the desired value field:
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://finviz.com/quote.ashx?t=aapl', headers = {'User-Agent':'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
print(soup.select_one('td:-soup-contains("Market Cap")').find_next('b').text)
as QHarr suggest that you to add user-agent for proper response
response = requests.get("https://finviz.com/quote.ashx?t=aapl",headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'})
soup = BeautifulSoup(response.text, "lxml")
content_table = soup.find('table', {'class':'snapshot-table2'})
tabel_rows = content_table.find_all('tr')
market_cap = tabel_rows[1].find_all('td')[1].text
print(market_cap)
Try to use User-Agent for your request like this:
user_agent = {'User Agent':'YOUR_USER_AGENT'}
r = requests.get('YOURURL', headers=user_agent)
...

NoneType Object has no attribute “get_text” — Python

I was doing some web scraping from amazon and I came across this error (mentioned in title).
This is my code:
import requests
from bs4 import BeautifulSoup
import smtplib
URL = 'https://www.amazon.co.uk/UGREEN-Adapter-Samsung-Oneplus- Blackview/dp/B072V9CNTK/ref=sr_1_2_sspa?keywords=otg+cable&qid=1578610622&sr=8-2-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEzRzRRUUdaR05RVlRJJmVuY3J5cHRlZElkPUEwNjExNjM4MVI4NVZaTFlYTlhGSCZlbmNyeXB0ZWRBZElkPUEwMjg1MTU0OEhROERWQTBSRFAzJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='
headers = {
"User Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64 AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36'}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find(id="productTitle").get_text()
price = soup.find(id="priceblock_ourprice").get_text()
converted_price = float(price[0:3])
def check_price():
print(soup.find(id="priceblock_ourprice").get_text())
converted_price = float(price[0:3])
if(converted_price < 7.00):
send_mail()
It is because the page is dynamically loaded using javascript. You can use selenium to get the html code of the website, like this:
from selenium import webdriver
URL = 'https://www.amazon.co.uk/UGREEN-Adapter-Samsung-Oneplus- Blackview/dp/B072V9CNTK/ref=sr_1_2_sspa?keywords=otg+cable&qid=1578610622&sr=8-2-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEzRzRRUUdaR05RVlRJJmVuY3J5cHRlZElkPUEwNjExNjM4MVI4NVZaTFlYTlhGSCZlbmNyeXB0ZWRBZElkPUEwMjg1MTU0OEhROERWQTBSRFAzJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='
driver = webdriver.Chrome()
driver.get(URL)
time.sleep(5)
page = driver.page_source
driver.close()
Thus, here is the full code:
from bs4 import BeautifulSoup
from selenium import webdriver
import time
URL = 'https://www.amazon.co.uk/UGREEN-Adapter-Samsung-Oneplus- Blackview/dp/B072V9CNTK/ref=sr_1_2_sspa?keywords=otg+cable&qid=1578610622&sr=8-2-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEzRzRRUUdaR05RVlRJJmVuY3J5cHRlZElkPUEwNjExNjM4MVI4NVZaTFlYTlhGSCZlbmNyeXB0ZWRBZElkPUEwMjg1MTU0OEhROERWQTBSRFAzJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='
driver = webdriver.Chrome()
driver.get(URL)
time.sleep(5)
page = driver.page_source
driver.close()
soup = BeautifulSoup(page, 'html5lib')
title = soup.find(id="productTitle")
price = soup.find(id="priceblock_ourprice")
print(soup.find(id="priceblock_ourprice").get_text())
Output:
£6.99

google search html doesn't contain div id='resultStats'

I'm trying to get the number of search results of a google search, which looks like this in the html, if i just save it from the browser:
<div id="resultStats">About 8,660,000,000 results<nobr> (0.49 seconds) </nobr></div>
But the HTML retrieved by python looks like a mobile website when I open it in a browser and it doesn't contain 'resultStats'.
I already tried (1) adding parameters to the URL like https://www.google.com/search?client=firefox-b-d&q=test and (2) copying a complete URL from a browser, but it didn't help.
import requests
from bs4 import BeautifulSoup
import re
def google_results(query):
url = 'https://www.google.com/search?q=' + query
html = requests.get(url).text
soup = BeautifulSoup(html, 'html.parser')
div = soup.find('div', id='resultStats')
return int(''.join(re.findall(r'\d+', div.text.split()[1])))
print(google_results('test'))
Error:
Traceback: line 11, in google_results
return int(''.join(re.findall(r'\d+', div.text.split()[1])))
AttributeError: 'NoneType' object has no attribute 'text'
The solution is to add headers (Thanks, John):
import requests
from bs4 import BeautifulSoup
import re
def google_results(query):
url = 'https://www.google.com/search?q=' + query
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0'
}
html = requests.get(url, headers=headers).text
soup = BeautifulSoup(html, 'html.parser')
div = soup.find('div', id='resultStats')
return int(''.join(re.findall(r'\d+', div.text.split()[1])))
print(google_results('test'))
Output:
9280000000

Categories