Python - Web Scraping exercise - Attribute Error - python

I am learning how to scrape web information. Below is a snippet of the actual code solution + output from datacamp.
On datacamp, this works perfectly fine, but when I try to run it on Spyder (my own macbook), it doesn't work...
This is because on datacamp, the URL has already been pre-loaded into a variable named 'response'.. however on Spyder, the URL needs to be defined again.
So, I first defined the response variable as response = requests.get('https://www.datacamp.com/courses/all') so that the code will point to datacamp's website..
My code looks like:
from scrapy.selector import Selector
import requests
response = requests.get('https://www.datacamp.com/courses/all')
this_url = response.url
this_title = response.xpath('/html/head/title/text()').extract_first()
print_url_title( this_url, this_title )
When I run this on Spyder, I got an error message
Traceback (most recent call last):
File "<ipython-input-30-6a8340fd3a71>", line 11, in <module>
this_title = response.xpath('/html/head/title/text()').extract_first()
AttributeError: 'Response' object has no attribute 'xpath'
Could someone please guide me? I would really like to know how to get this code working on Spyder.. thank you very much.

The value returned by requests.get('https://www.datacamp.com/courses/all') is a Response object, and this object has no attribute xpath, hence the error: AttributeError: 'Response' object has no attribute 'xpath'
I assume response from your tutorial source, probably has been assigned to another object (most likely the object returned by etree.HTML) and not the value returned by requests.get(url).
You can however do this:
from lxml import etree #import etree
response = requests.get('https://www.datacamp.com/courses/all') #get the Response object
tree = etree.HTML(response.text) #pass the page's source using the Response object
result = tree.xpath('/html/head/title/text()') #extract the value
print(response.url) #url
print(result) #findings

Related

AttributeError: 'function' object has no attribute 'text'

Do you know repl.it?
I am coding python on this site.
And my goal is creating Web Scraper.
I think this code is clean.
But I'm getting an error:
AttributeError: 'function' object has no attribute 'text'
My code:
import requests
indeed_result = requests.get
("https://kr.indeed.com/jobs?q=python&l=%EC%9D%B8%EC%B2%9C")
print(indeed_result.text)
Surely, I have requests package installed.
Please give me some advice
You just need to remove the back to new line after get like this:
import requests
indeed_result = requests.get("https://kr.indeed.com/jobs?q=python&l=%EC%9D%B8%EC%B2%9C")
print(indeed_result.text)
if you want to continue typping in the next line just add a backslash \ as follows:
indeed_result = requests.get\
("https://kr.indeed.com/jobs?q=python&l=%EC%9D%B8%EC%B2%9C")
Removing back to new line after get
try this
import requests
res = requests.get("https://kr.indeed.com/jobs?q=python&l=%EC%9D%B8%EC%B2%9C")
print(res.text)
# result if success 200

Issues with beautifulsoup.find() in python

I am trying to make a simple web scraper with python to get stock data. My code was working not long ago and I don't believe I changed anything but now I'm getting the following error:
File "tradingProgram.py", line 69, in
dataArr.append(i.find('div',{'class':'tv-screener-table__symbol-right-part'}).find('a').text)
AttributeError: 'NoneType' object has no attribute 'find'
This is the part of the code that handles beautifulsoup:
content = requests.get("https://www.tradingview.com/markets/stocks-usa/market-movers-gainers/")
soup = BeautifulSoup(content.text,'html.parser')
stockData = soup.find_all('tr',{'class':'tv-data-table__row tv-data-table__stroke tv-screener-table__result-row'})
print(len(stockData))
for i in stockData:
print(i)
dataArr.append(i.find('div',{'class':'tv-screener-table__symbol-right-part'}).find('a').text)

Python - AttributeError: 'NoneType' object has no attribute 'findAll'

I have written my first bit of python code to scrape a website.
import csv
import urllib2
from BeautifulSoup import BeautifulSoup
c = csv.writer(open("data.csv", "wb"))
soup = BeautifulSoup(urllib2.urlopen('http://www.kitco.com/kitco-gold-index.html').read())
table = soup.find('table', id="datatable_main")
rows = table.findAll('tr')[1:]
for tr in rows:
cols = tr.findAll('td')
text = []
for td in cols:
text.append(td.find(text=True))
c.writerow(text)
When I test it locally in my ide called pyCharm it works good but when I try it out on my server which runs CentOS, I get the following error:
domainname.com [~/public_html/livegold]# python scraper.py
Traceback (most recent call last):
File "scraper.py", line 8, in <module>
rows = table.findAll('tr')[:]
AttributeError: 'NoneType' object has no attribute 'findAll'
I'm guessing I don't have a module installed remotely, I've been hung up on this for two days any help would be greatly appreciated! :)
You are ignoring any errors that could occur in urllib2.urlopen, if for some reason you are getting an error trying to get that page on your server, which you don't get testing locally you are effectively passing in an empty string ('') or a page you don't expect (such as a 404 page) to BeautifulSoup.
Which in turn makes your soup.find('table', id="datatable_main") return None since the document is something you don't expect.
You should either make sure you can get the page you are trying to get on your server, or handle exceptions properly.
There is no table with id datatable_main in the page that the script read.
Try printing the returned page to the terminal - perhaps your script is failing to contact the web server? Sometimes hosting services prevent outgoing HTTP connections.

renderContents in beautifulsoup (python)

The code I'm trying to get working is:
h = str(heading)
# '<h1>Heading</h1>'
heading.renderContents()
I get this error:
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
print h.renderContents()
AttributeError: 'str' object has no attribute 'renderContents'
Any ideas?
I have a string with html tags and i need to clean it if there is a different way of doing that please suggest it.
Your error message and your code sample don't line up. You say you're calling:
heading.renderContents()
But your error message says you're calling:
print h.renderContents()
Which suggests that perhaps you have a bug in your code, trying to call renderContents() on a string object that doesn't define that method.
In any case, it would help if you checked what type of object heading is to make sure it's really a BeautifulSoup instance. This works for me with BeautifulSoup 3.2.0:
from BeautifulSoup import BeautifulSoup
heading = BeautifulSoup('<h1>heading</h1>')
repr(heading)
# '<h1>heading</h1>'
print heading.renderContents()
# <h1>heading</h1>
print str(heading)
# '<h1>heading</h1>'
h = str(heading)
print h
# <h1>heading</h1>

pubDate RSS parsing weirdness with Beautifulsoup/Python

I'm trying to parse an RSS/Podcast feed using Beautifulsoup and everything is working nicely except I can't seem to parse the 'pubDate' field.
data = urllib2.urlopen("http://www.democracynow.org/podcast.xml")
dom = BeautifulStoneSoup(data, fromEncoding='utf-8')
items = dom.findAll('item');
for item in items:
title = item.find('title').string.strip()
pubDate = item.find('pubDate').string.strip()
The title gets parsed fine but when it gets to pubDate, it says:
Traceback (most recent call last):
File "", line 2, in
AttributeError: 'NoneType' object has no attribute 'string'
However, when I download a copy of the XML file and rename 'pubDate' to something else, then parse it again, it seems to work. Is pubDate a reserved variable or something in Python?
Thanks,
g
It works with item.find('pubdate').string.strip().
Why don't you use feedparser ?

Categories