Issues with beautifulsoup.find() in python - python

I am trying to make a simple web scraper with python to get stock data. My code was working not long ago and I don't believe I changed anything but now I'm getting the following error:
File "tradingProgram.py", line 69, in
dataArr.append(i.find('div',{'class':'tv-screener-table__symbol-right-part'}).find('a').text)
AttributeError: 'NoneType' object has no attribute 'find'
This is the part of the code that handles beautifulsoup:
content = requests.get("https://www.tradingview.com/markets/stocks-usa/market-movers-gainers/")
soup = BeautifulSoup(content.text,'html.parser')
stockData = soup.find_all('tr',{'class':'tv-data-table__row tv-data-table__stroke tv-screener-table__result-row'})
print(len(stockData))
for i in stockData:
print(i)
dataArr.append(i.find('div',{'class':'tv-screener-table__symbol-right-part'}).find('a').text)

Related

AttributeError: 'function' object has no attribute 'text'

Do you know repl.it?
I am coding python on this site.
And my goal is creating Web Scraper.
I think this code is clean.
But I'm getting an error:
AttributeError: 'function' object has no attribute 'text'
My code:
import requests
indeed_result = requests.get
("https://kr.indeed.com/jobs?q=python&l=%EC%9D%B8%EC%B2%9C")
print(indeed_result.text)
Surely, I have requests package installed.
Please give me some advice
You just need to remove the back to new line after get like this:
import requests
indeed_result = requests.get("https://kr.indeed.com/jobs?q=python&l=%EC%9D%B8%EC%B2%9C")
print(indeed_result.text)
if you want to continue typping in the next line just add a backslash \ as follows:
indeed_result = requests.get\
("https://kr.indeed.com/jobs?q=python&l=%EC%9D%B8%EC%B2%9C")
Removing back to new line after get
try this
import requests
res = requests.get("https://kr.indeed.com/jobs?q=python&l=%EC%9D%B8%EC%B2%9C")
print(res.text)
# result if success 200

Python - Web Scraping exercise - Attribute Error

I am learning how to scrape web information. Below is a snippet of the actual code solution + output from datacamp.
On datacamp, this works perfectly fine, but when I try to run it on Spyder (my own macbook), it doesn't work...
This is because on datacamp, the URL has already been pre-loaded into a variable named 'response'.. however on Spyder, the URL needs to be defined again.
So, I first defined the response variable as response = requests.get('https://www.datacamp.com/courses/all') so that the code will point to datacamp's website..
My code looks like:
from scrapy.selector import Selector
import requests
response = requests.get('https://www.datacamp.com/courses/all')
this_url = response.url
this_title = response.xpath('/html/head/title/text()').extract_first()
print_url_title( this_url, this_title )
When I run this on Spyder, I got an error message
Traceback (most recent call last):
File "<ipython-input-30-6a8340fd3a71>", line 11, in <module>
this_title = response.xpath('/html/head/title/text()').extract_first()
AttributeError: 'Response' object has no attribute 'xpath'
Could someone please guide me? I would really like to know how to get this code working on Spyder.. thank you very much.
The value returned by requests.get('https://www.datacamp.com/courses/all') is a Response object, and this object has no attribute xpath, hence the error: AttributeError: 'Response' object has no attribute 'xpath'
I assume response from your tutorial source, probably has been assigned to another object (most likely the object returned by etree.HTML) and not the value returned by requests.get(url).
You can however do this:
from lxml import etree #import etree
response = requests.get('https://www.datacamp.com/courses/all') #get the Response object
tree = etree.HTML(response.text) #pass the page's source using the Response object
result = tree.xpath('/html/head/title/text()') #extract the value
print(response.url) #url
print(result) #findings

Python Selenium TypeError: 'NoneType' object is not subscriptable

Been stuck on an error for quite some time so I hope somebody can help!
I have a piece of code as follows:
urls = driver.find_elements_by_xpath('//*[#href]')
for url in urls:
hopeful = url.get_attribute('ping')
print(hopeful)
actual = hopeful[31:]
driver.get(actual)
time.sleep(4)
driver.close
When i run the code i get the following error:
TypeError: 'NoneType' object is not subscriptable
In the output i am getting URLs alongside some unwanted aspects beforehand, which is why i am trying to use the substring function to remove the first 31 characters, which would leave me with my URL allowing me to pass it into driver.get().
Is there a way i can remove the program from returning and attempting to substring none, which is resulting in the error?
This might be naive but why not check for None?
You did not provide the exact line that triggers the error, so I suppose it is hopefull[31].
urls = driver.find_elements_by_xpath('//*[#href]')
for url in urls:
hopeful = url.get_attribute('ping')
if hopeful and len(hopeful) > 31:
print(hopeful)
actual = hopeful[31:]
if actual.startswith('uk.linkedin.com'):
driver.get(actual)
time.sleep(4)
driver.close

'NoneType' object has no attribute 'group' when debugging shows there is a group

When writing the following code, the console shows an error:
AttributeError: 'NoneType' object has no attribute 'group'
The code:
req = re.search("([0-9]+)([A-Za-z]+)", data).group(0)
But, when debugging I see that there is a group, and the code continues to run instead of collapsing. For example when data is "30DIR /Users/user1/Documents/", the console outpost an error while debugging shows there is a match: "30DIR".
req = re.search("([0-9]+)([A-Za-z]+)", data)
if req:
req = req.group(0)

Python - AttributeError: 'NoneType' object has no attribute 'findAll'

I have written my first bit of python code to scrape a website.
import csv
import urllib2
from BeautifulSoup import BeautifulSoup
c = csv.writer(open("data.csv", "wb"))
soup = BeautifulSoup(urllib2.urlopen('http://www.kitco.com/kitco-gold-index.html').read())
table = soup.find('table', id="datatable_main")
rows = table.findAll('tr')[1:]
for tr in rows:
cols = tr.findAll('td')
text = []
for td in cols:
text.append(td.find(text=True))
c.writerow(text)
When I test it locally in my ide called pyCharm it works good but when I try it out on my server which runs CentOS, I get the following error:
domainname.com [~/public_html/livegold]# python scraper.py
Traceback (most recent call last):
File "scraper.py", line 8, in <module>
rows = table.findAll('tr')[:]
AttributeError: 'NoneType' object has no attribute 'findAll'
I'm guessing I don't have a module installed remotely, I've been hung up on this for two days any help would be greatly appreciated! :)
You are ignoring any errors that could occur in urllib2.urlopen, if for some reason you are getting an error trying to get that page on your server, which you don't get testing locally you are effectively passing in an empty string ('') or a page you don't expect (such as a 404 page) to BeautifulSoup.
Which in turn makes your soup.find('table', id="datatable_main") return None since the document is something you don't expect.
You should either make sure you can get the page you are trying to get on your server, or handle exceptions properly.
There is no table with id datatable_main in the page that the script read.
Try printing the returned page to the terminal - perhaps your script is failing to contact the web server? Sometimes hosting services prevent outgoing HTTP connections.

Categories