renderContents in beautifulsoup (python)

renderContents in beautifulsoup (python) - python

The code I'm trying to get working is:
h = str(heading)
# '<h1>Heading</h1>'
heading.renderContents()
I get this error:
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
print h.renderContents()
AttributeError: 'str' object has no attribute 'renderContents'
Any ideas?
I have a string with html tags and i need to clean it if there is a different way of doing that please suggest it.

Your error message and your code sample don't line up. You say you're calling:
heading.renderContents()
But your error message says you're calling:
print h.renderContents()
Which suggests that perhaps you have a bug in your code, trying to call renderContents() on a string object that doesn't define that method.
In any case, it would help if you checked what type of object heading is to make sure it's really a BeautifulSoup instance. This works for me with BeautifulSoup 3.2.0:
from BeautifulSoup import BeautifulSoup
heading = BeautifulSoup('<h1>heading</h1>')
repr(heading)
# '<h1>heading</h1>'
print heading.renderContents()
# <h1>heading</h1>
print str(heading)
# '<h1>heading</h1>'
h = str(heading)
print h
# <h1>heading</h1>

Related

Python - Web Scraping exercise - Attribute Error

I am learning how to scrape web information. Below is a snippet of the actual code solution + output from datacamp.
On datacamp, this works perfectly fine, but when I try to run it on Spyder (my own macbook), it doesn't work...
This is because on datacamp, the URL has already been pre-loaded into a variable named 'response'.. however on Spyder, the URL needs to be defined again.
So, I first defined the response variable as response = requests.get('https://www.datacamp.com/courses/all') so that the code will point to datacamp's website..
My code looks like:
from scrapy.selector import Selector
import requests
response = requests.get('https://www.datacamp.com/courses/all')
this_url = response.url
this_title = response.xpath('/html/head/title/text()').extract_first()
print_url_title( this_url, this_title )
When I run this on Spyder, I got an error message
Traceback (most recent call last):
File "<ipython-input-30-6a8340fd3a71>", line 11, in <module>
this_title = response.xpath('/html/head/title/text()').extract_first()
AttributeError: 'Response' object has no attribute 'xpath'
Could someone please guide me? I would really like to know how to get this code working on Spyder.. thank you very much.

The value returned by requests.get('https://www.datacamp.com/courses/all') is a Response object, and this object has no attribute xpath, hence the error: AttributeError: 'Response' object has no attribute 'xpath'
I assume response from your tutorial source, probably has been assigned to another object (most likely the object returned by etree.HTML) and not the value returned by requests.get(url).
You can however do this:
from lxml import etree #import etree
response = requests.get('https://www.datacamp.com/courses/all') #get the Response object
tree = etree.HTML(response.text) #pass the page's source using the Response object
result = tree.xpath('/html/head/title/text()') #extract the value
print(response.url) #url
print(result) #findings

Python Selenium print elements not element

In python Selenium I am attempting to print a list of class_name(meta). When I use browser.find.element only one value is returned. I then amend the script:-
demo = browser.find_elements_by_class_name("meta")
print demo.text
I get the following error:-
Traceback (most recent call last):
File "test.py", line 29, in
print demo.text
AttributeError: 'list' object has no attribute 'text'
I new to python & selenium but I have searched for a solution with no luck.
Thanks in advance for your help.

That is happening because you are not iterating. You forget
for lang in demo:
Example code :-
langs = fire.find_elements_by_css_selector("#gt-sl-gms-menu div.goog-menuitem-content")
for lang in langs:
print lang.text
Hope it will help you :)

urllib3.urlencode googlescholar url from string

I am trying to encode a string to url to search google scholar, soon to realize, urlencode is not provided in urllib3.
>>> import urllib3
>>> string = "https://scholar.google.com/scholar?" + urllib3.urlencode( {"q":"rudra banerjee"} )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'urlencode'
So, I checked urllib3 doc and found, I possibly need request_encode_url. But I have no experience in using that and failed.
>>> string = "https://scholar.google.com/scholar?" +"rudra banerjee"
>>> url = urllib3.request_encode_url('POST',string)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'request_encode_url'
So, how I can encode a string to url?
NB I don't have any particular fascination to urllib3. so, any other module will also do.

To simply encode fields in a URL, you can use urllib.urlencode.
In Python 2, this should do the trick:
import urllib
s = "https://scholar.google.com/scholar?" + urllib.urlencode({"q":"rudra banerjee"})
print(s)
# Prints: https://scholar.google.com/scholar?q=rudra+banerjee
In Python 3, it lives under urllib.parse.urlencode instead.

(Edit: I assumed you wanted to download the URL, not simply encode it. My mistake. I'll leave this answer as a reference for others, but see the other answer for encoding a URL.)
If you pass a dictionary into fields, urllib3 will take care of encoding it for you. First, you'll need to instantiate a pool for your connections. Here's a full example:
import urllib3
http = urllib3.PoolManager()
r = http.request('POST', 'https://scholar.google.com/scholar', fields={"q":"rudra banerjee"})
print(r.data)
Calling .request(...) will take care of figuring out the encoding for you based on the method.
Getting started examples are here: https://urllib3.readthedocs.org/en/latest/index.html#usage

Python 3.4.3 save image from url to file using urllib

I tried to make a python program that would allow me to download a jpg file from a website.
Why I'm doing this is really for no reason at all, I just wanted to try it for fun.
Anyways, here is the code:
import urllib
a = 1
while a == 1:
urllib.urlretrieve("http://lemerg.com/data/wallpapers/38/957049.jpg","D:\\Users\\Elias\\Desktop\\FolderName-957049.jpg")
(You may have to properly tab it in, it wouldn't let me here)
So basically what I want it to do is to repeatedly download the same file until I close the program. Just don't ask why.
The error code I get is:
Traceback (most recent call last):
urllib.urlretrieve("http://lemerg.com/data/wallpapers/38/957049.jpg","D:\Users\Elias\Desktop\FolderName-957049.jpg")
AttributeError: 'module' object has no attribute 'urlretrieve'

urlretrieve() in Python3 is in the urllib.request module. Do this:
from urllib import request
a = 1
while a == 1:
request.urlretrieve("http://lemerg.com/data/wallpapers/38/957049.jpg","D:\\Users\\Elias\\Desktop\\FolderName-957049.jpg")

pubDate RSS parsing weirdness with Beautifulsoup/Python

I'm trying to parse an RSS/Podcast feed using Beautifulsoup and everything is working nicely except I can't seem to parse the 'pubDate' field.
data = urllib2.urlopen("http://www.democracynow.org/podcast.xml")
dom = BeautifulStoneSoup(data, fromEncoding='utf-8')
items = dom.findAll('item');
for item in items:
title = item.find('title').string.strip()
pubDate = item.find('pubDate').string.strip()
The title gets parsed fine but when it gets to pubDate, it says:
Traceback (most recent call last):
File "", line 2, in
AttributeError: 'NoneType' object has no attribute 'string'
However, when I download a copy of the XML file and rename 'pubDate' to something else, then parse it again, it seems to work. Is pubDate a reserved variable or something in Python?
Thanks,
g

It works with item.find('pubdate').string.strip().
Why don't you use feedparser ?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

renderContents in beautifulsoup (python) - python

Related

Python - Web Scraping exercise - Attribute Error

Python Selenium print elements not element

urllib3.urlencode googlescholar url from string

Python 3.4.3 save image from url to file using urllib

pubDate RSS parsing weirdness with Beautifulsoup/Python

Categories

Resources