I'm using libxml2 in a Python app I'm writing, and am trying to run some test code to parse an XML file. The program downloads an XML file from the internet and parses it. However, I have run into a problem.
With the following code:
xmldoc = libxml2.parseDoc(gfile_content)
droot = xmldoc.children # Get document root
dchild = droot.children # Get child nodes
while dchild is not None:
if dchild.type == "element":
print "\tAn element with ", dchild.isCountNode(), "child(ren)"
print "\tAnd content", repr(dchild.content)
dchild = dchild.next
xmldoc.freeDoc();
...which is based on the code example found on this article on XML.com, I receive the following error when I attempt to run this code on Python 2.4.3 (CentOS 5.2 package).
Traceback (most recent call last):
File "./xml.py", line 25, in ?
print "\tAn element with ", dchild.isCountNode(), "child(ren)"
AttributeError: xmlNode instance has no attribute 'isCountNode'
I'm rather stuck here.
Edit: I should note here I also tried IsCountNode() and it still threw an error.
isCountNode should read "lsCountNode" (a lower-case "L")
Related
I am learning how to scrape web information. Below is a snippet of the actual code solution + output from datacamp.
On datacamp, this works perfectly fine, but when I try to run it on Spyder (my own macbook), it doesn't work...
This is because on datacamp, the URL has already been pre-loaded into a variable named 'response'.. however on Spyder, the URL needs to be defined again.
So, I first defined the response variable as response = requests.get('https://www.datacamp.com/courses/all') so that the code will point to datacamp's website..
My code looks like:
from scrapy.selector import Selector
import requests
response = requests.get('https://www.datacamp.com/courses/all')
this_url = response.url
this_title = response.xpath('/html/head/title/text()').extract_first()
print_url_title( this_url, this_title )
When I run this on Spyder, I got an error message
Traceback (most recent call last):
File "<ipython-input-30-6a8340fd3a71>", line 11, in <module>
this_title = response.xpath('/html/head/title/text()').extract_first()
AttributeError: 'Response' object has no attribute 'xpath'
Could someone please guide me? I would really like to know how to get this code working on Spyder.. thank you very much.
The value returned by requests.get('https://www.datacamp.com/courses/all') is a Response object, and this object has no attribute xpath, hence the error: AttributeError: 'Response' object has no attribute 'xpath'
I assume response from your tutorial source, probably has been assigned to another object (most likely the object returned by etree.HTML) and not the value returned by requests.get(url).
You can however do this:
from lxml import etree #import etree
response = requests.get('https://www.datacamp.com/courses/all') #get the Response object
tree = etree.HTML(response.text) #pass the page's source using the Response object
result = tree.xpath('/html/head/title/text()') #extract the value
print(response.url) #url
print(result) #findings
I am analyzing the use of Saxon XSLT processing in Python 3.8 with Saxon-C-1.2.0 in Windows 10.
I can succesfully run script SaxonHEC.1.2.0.\Saxon.C.API\python-saxon**saxon_example.py**.
The last print line shows the result of getting an attribute value. I think there is an error in that code.
My question: how to get the value of an XML attribute?
with saxonc.PySaxonProcessor(license=False) as proc:
# ... code left out form saxon_example.py
xml2 = """\
<out>
<person att1='value1' att2='value2'>text1</person>
<person>text2</person>
<person>text3</person>
</out>
"""
node2 = proc.parse_xml(xml_text=xml2)
outNode = node2.children
children = outNode[0].children
attrs = children[1].attributes
if len(attrs) == 2:
print('node.children[1].attributes[1].string_value =', attrs[1].string_value)
print('node.children[1].attributes[1] =', attrs[1])
print('node.children[1].attributes[1].__str__ =', attrs[1].__str__())
print('node.children[1].attributes[1].__repr__ =', attrs[1].__repr__())
print('node.children[1].attributes[1].text =', attrs[1].text)
On the commandline I get:
node.children[1].attributes[1].string_value = att2="value2"
node.children[1].attributes[1] = att2="value2"
node.children[1].attributes[1].__str__ = att2="value2"
node.children[1].attributes[1].__repr__ = att2="value2"
Traceback (most recent call last):
File "test-app.py", line 77, in <module>
print('node.children[1].attributes[1].text =', attrs[1].text)
AttributeError: 'saxonc.PyXdmNode' object has no attribute 'text'
while I expect to see only "value2" without the attribute name.
As mentioned in the comment see link to the bug issue where we mention the fix to the bug: The string_value property should be calling the underlying C++ method getStringValue. Fix available in the next maintenance release.
Another workaround is to use the get_attribute_value function if you know the names of the attributes.
for printJobString in logfile:
userRegex = re.search('(\suser:\s)(.+?)(\sprinter:\s)', printJobString)
if userRegex:
userString = userRegex.group(2)
pagesInt = int(re.search('(\spages:\s)(.+?)(\scode:\s)', printJobString).group(2))
above is my code, when I run this program in the module I end up getting,
Traceback (most recent call last):
File "C:\Users\brandon\Desktop\project3\project3\pages.py", line 45, in <module>
log2hist("log") # version 2.
File "C:\Users\brandon\Desktop\project3\project3\pages.py", line 29, in log2hist
pagesInt = int(re.search('(\spages:\s)(.+?)(\scode:\s)', printJobString).group(2))
AttributeError: 'NoneType' object has no attribute 'group'
I know this error means the search is returning None but I'm not sure how to handle this case. Any help would be appreciated, very new to python and still learning the basics.
I am writing a program that should print out the number of pages a user has.
180.186.109.129 code: k n h user: luis printer: core 2 pages: 32
is a target string, my python file is trying to create a data file that has one line for each user and contains the total number of pages printed
The reason it happens is because your regexp does not find anything and returns None
re.search('(\spages:\s)(.+?)(\scode:\s)') returns None
use an if statement to test if it's not None before you try to group
for printJobString in logfile:
userRegex = re.search('(\suser:\s)(.+?)(\sprinter:\s)', printJobString)
if userRegex:
userString = userRegex.group(2)
pagesInt = re.search('(\spages:\s)(.+?)(\scode:\s)', printJobString)
if pagesInt:
pagesInt = int(pageInts.group(2))
I installed everything as it says on the FlickrAPI homepage but when I try to run:
import flickrapi
api_key = '1a4c975fa83048436a2086bcab7d2290'
api_password = '5e069eae20e60297'
flickrclient = flickrapi.FlickAPI(api_key, api_password)
favourites = flickrClient.favorites_getPublicList(user_id='userid')
photos = flickr.photos_search(user_id='73509078#N00', per_page='10')
sets = flickr.photosets_getList(user_id='73509078#N00')
for photo in favourites.photos[0].photo:
print photo['title']
I get this message from the command prompt:
C:\Users\Desktop>python api.py
Traceback (most recent call last):
File "api.py", line 4, in <module>
flickrclient = flickrapi.FlickAPI(api_key, api_password)
AttributeError: 'module' object has no attribute 'FlickAPI'
Any ideas?? I have tried almost everything
FlickAPI is not the same as FlickrAPI. You're missing an r.
The file C:\Users\XXXXXX\Desktop\FLICKR API\flickrapi.py is not part of the flickrapi package. Please rename it, it is masking the real library. Right now it is being imported instead of the installed package.
The flickrapi package itself consists of a directory with a __init__.py file inside of it. Printing flickrapi.__file__ should result in a path ending in flickrapi\__init__.py.
In your "flickrclient = flickrapi.FlickAPI" line, you're missing an 'r' in FlickAPI.
Also, on the next line, your *"user_id='userid'"* argument needs an actual user ID, such as '999999#N99'
Hopefully you found that & got this working a few months ago! :)
The code I'm trying to get working is:
h = str(heading)
# '<h1>Heading</h1>'
heading.renderContents()
I get this error:
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
print h.renderContents()
AttributeError: 'str' object has no attribute 'renderContents'
Any ideas?
I have a string with html tags and i need to clean it if there is a different way of doing that please suggest it.
Your error message and your code sample don't line up. You say you're calling:
heading.renderContents()
But your error message says you're calling:
print h.renderContents()
Which suggests that perhaps you have a bug in your code, trying to call renderContents() on a string object that doesn't define that method.
In any case, it would help if you checked what type of object heading is to make sure it's really a BeautifulSoup instance. This works for me with BeautifulSoup 3.2.0:
from BeautifulSoup import BeautifulSoup
heading = BeautifulSoup('<h1>heading</h1>')
repr(heading)
# '<h1>heading</h1>'
print heading.renderContents()
# <h1>heading</h1>
print str(heading)
# '<h1>heading</h1>'
h = str(heading)
print h
# <h1>heading</h1>