lxml.etree.iterparse closes input file handler?

lxml.etree.iterparse closes input file handler? - python

filterous is using iterparse to parse a simple XML StringIO object in a unit test. However, when trying to access the StringIO object afterwards, Python exits with a "ValueError: I/O operation on closed file" message. According to the iterparse documentation, "Starting with lxml 2.3, the .close() method will also be called in the error case," but I get no error message or Exception from iterparse. My IO-foo is obviously not up to speed, so does anyone have suggestions?
The command and (hopefully) relevant code:
$ python2.6 setup.py test
setup.py:
from setuptools import setup
from filterous import filterous as package
setup(
...
test_suite = 'tests.tests',
tests/tests.py:
from cStringIO import StringIO
import unittest
from filterous import filterous
XML = '''<posts tag="" total="3" ...'''
class TestSearch(unittest.TestCase):
def setUp(self):
self.xml = StringIO(XML)
self.result = StringIO()
...
def test_empty_tag_not(self):
"""Empty tag; should get N results."""
filterous.search(
self.xml,
self.result,
{'ntag': [u'']},
['href'],
False)
self.assertEqual(
len(self.result.getvalue().splitlines()),
self.xml.getvalue().count('<post '))
filterous/filterous.py:
from lxml import etree
...
def search(file_pointer, out, terms, includes, human_readable = True):
...
context = etree.iterparse(file_pointer, tag='posts')
Traceback:
ERROR: Empty tag; should get N results.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/victor/dev/filterous/tests/tests.py", line 149, in test_empty_tag_not
self.xml.getvalue().count('<post '))
ValueError: I/O operation on closed file
PS: The tests all ran fine on 2010-07-27.

Seems to work fine with StringIO, try using that instead of cStringIO. No idea why it's getting closed.

Docs-fu is the problem. What you quoted "Starting with lxml 2.3, the .close() method will also be called in the error case," is nothing to do with iterparse. It appears on your linked page before the section on iterparse. It is part of the docs for the target parser interface. It is referring to the close() method of the target (output!) object, nothing to do with your StringIO. In any case, you also seem to have ignored that little word also. Before 2.3, lxml closed the target object only if the parse was successful. Now it also closes it upon error.
Why do you want to "access" the StringIO object after parsing has finished?
Update By trying to access the database afterwards, do you mean all those self.xml.getvalue() calls in your tests? [Show the ferschlugginer traceback in your question so we don't need to guess!] If that's causing the problem (it does count as an IO operation), forget getvalue() ... if it were to work, wouldn't it return the (unconventionally named) (invariant) XML?

Related

Passing a list as a url value to urlopen

Motivation
Motivated by this problem - the OP was using urlopen() and accidentally passed a sys.argv list instead of a string as a url. This error message was thrown:
AttributeError: 'list' object has no attribute 'timeout'
Because of the way urlopen was written, the error message itself and the traceback is not very informative and may be difficult to understand especially for a Python newcomer:
Traceback (most recent call last):
File "test.py", line 15, in <module>
get_category_links(sys.argv)
File "test.py", line 10, in get_category_links
response = urlopen(url)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 420, in open
req.timeout = timeout
AttributeError: 'list' object has no attribute 'timeout'
Problem
Here is the shortened code I'm working with:
try:
from urllib.request import urlopen
except ImportError:
from urllib2 import urlopen
import sys
def get_category_links(url):
response = urlopen(url)
# do smth with response
print(response)
get_category_links(sys.argv)
I'm trying to think whether this kind of an error can be caught statically with either smart IDEs like PyCharm, static code analysis tools like flake8 or pylint, or with language features like type annotations.
But, I'm failing to detect the problem:
it is probably too specific for flake8 and pylint to catch - they don't warn about the problem
PyCharm does not warn about sys.argv being passed into urlopen, even though, if you "jump to source" of sys.argv it is defined as:
argv = [] # real value of type <class 'list'> skipped
if I annotate the function parameter as a string and pass sys.argv, no warnings as well:
def get_category_links(url: str) -> None:
response = urlopen(url)
# do smth with response
get_category_links(sys.argv)
Question
Is it possible to catch this problem statically (without actually executing the code)?

Instead of keeping it editor specific, you can use mypy to analyze your code. This way it will run on all dev environments instead of just for those who use PyCharm.
from urllib.request import urlopen
import sys
def get_category_links(url: str) -> None:
response = urlopen(url)
# do smth with response
get_category_links(sys.argv)
response = urlopen(sys.argv)
The issues pointed out by mypy for the above code:
error: Argument 1 to "get_category_links" has incompatible type List[str]; expected "str"
error: Argument 1 to "urlopen" has incompatible type List[str]; expected "Union[str, Request]"
Mypy here can guess the type of sys.argv because of its definition in its stub file. Right now some standard library modules are still missing from typeshed though, so you will have to either contribute them or ignore the errors related till they get added :-).
When to run mypy?
To catch such errors you can run mypy on the files with annotations with your tests in your CI tool. Running it on all files in project may take some time, for a small project it is your choice.
Add a pre-commit hook that runs mypy on staged files and points out issues right away(could be a little annoying to the dev if it takes a while).

Firstly, you need to check whether the url type is string or not and if string then check for ValueError exception(Valid url)
import sys
from urllib2 import urlopen
def get_category_links(url):
if type(url) != type(""): #Check if url is string or not
print "Please give string url"
return
try:
response = urlopen(url)
# do smth with response
print(response)
except ValueError: #If url is string but invalid
print "Bad URL"
get_category_links(sys.argv)

Read-attribute-missing error while uploading file in Python

I am trying to allow a user to upload files to a server from a form, and then display images from my website. The script is Python, which also interfaces to MySQL via cursor.execute commands. I can upload text form fields, but not the file contents, similar to the problem reported at:
uploading html files and using python
I can upload the selected file name, but not read it; I get a read error.
My code is:
#!/home2/snowbear/python/Python-2.7.2/python
import cgi
# Import smtplib for the actual sending function.
import smtplib
import shutil
import datetime
import os
import sys, traceback, re
# Helps troubleshoot python script.
import cgitb; cgitb.enable()
# Import mysql database program.
import mysql.connector
# Windows needs stdio set for binary mode.
try:
import msvcrt
msvcrt.setmode (0, os.O_BINARY) # stdin = 0
msvcrt.setmode (1, os.O_BINARY) # stdout = 1
except ImportError:
message = "No Windows msvcrt to import"
pass
print '<form name="PB_Form" action="PB_resubmit.py" method="post" enctype="multipart/form-data">'
...
# Get form values.
...
if form.has_key("filePix1") and form["filePix1"].value != "":
txtImage1 = form['filePix1'].value
fileItem1 = form['filePix1']
if not fileItem1.file:
print "<br><center>No fileItem1: %s</center>" % fileItem1
else:
data = fileItem1.file.read()
objFile = open(txtImage1, "w+")
objFile.write(data)
objFile.close()
else:
newImage1 = False
...
I get an error for the line:
data = fileItem1.file.read()
The error is:
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'read'
args = ("'NoneType' object has no attribute 'read'",)
message = "'NoneType' object has no attribute 'read'"
Although "fileitem1" is a proper handle to the file entered into the form, since I can get the file name, it however, does not have a "read attribute," as specified in the error message.
I'm using Bluehost for my server. Could the file-read attribute be turned off by the server, or am I missing something, such as another special import for handling files?
Thanks for any suggestions, Walter
&&&&&&&&&&&&&&&&
New note:
The problem was that form['filePix1'] "file" and "filename" attributes were missing; only the "value" attribute existed, and this would only produce the file name, not the file contents.
With much experimenting, I discovered that the browser, Sea Monkey, is causing the problem of missing attributes. When I used Firefox, the "file," "filename," and "value" attributes were normal. I have no idea why Sea Monkey doesn't support file loading attributes.
Walter

AttributeError: 'module' object has no attribute

I've been scouring the internet for a solution and everything i've come across hasn't helped. So now i turn to you.
Traceback (most recent call last):
File "cardreader.py", line 9, in <module>
import ATRdb as ATR
File "/home/pi/Desktop/CardReader/ATRdb.py", line 4, in <module>
import cardreader
File "/home/pi/Desktop/CardReader/cardreader.py", line 113, in <module>
main()
File "/home/pi/Desktop/CardReader/cardreader.py", line 40, in main
getData(db)
File "/home/pi/Desktop/CardReader/cardreader.py", line 98, in getData
if ATR.checkPerms(db,track1):
AttributeError: 'module' object has no attribute 'checkPerms'
I have two files cardreader.py & ATRdb.py
---ATRdb.py has this setup
import sys
import MYSQLdb
import datetime
import cardreader
def checkPerms(db, securitycode):
try:
cursor = db.cursor()
cursor.execute("""SELECT permissions FROM atrsecurity.employee WHERE securitycode = %s""", (securitycode))
r = cursor.fetchone()
Permissions = r
if '3' in Permissions[0]:
return True
else:
return False
except Exception:
cardreader.main()
return False
---cardreader.py has this setup
import sys
import usb.core
import usb.util
import MYSQLdb
import ATRdb as ATR
def main():
db = MYSQLdb.connect(HOST,USER, PASS, DB)
print("Please swipe your card...")
getData(db)
main()
db.close()
def getData(db):
#
#lots of code to get card data
#
if ATR.checkPerms(db, track1):
print ("User has permission")
unlockDoor()
i get the error at the "If ATR.checkPerms():" part. Any help would be appreciated
(first python project)

Your problem is circular imports.
In cardreader, you do this:
import ATRdb as ATR
That starts importing ATRdb, but a few lines into the code, it hits this:
import cardreader
The exact sequence from here depends on whether cardreader.py is your main script or not, and on whether your top-level code that calls main is protected by an if __name__ == '__main__' guard (and assuming that top-level code is in cardreader rather than elsewhere). Rather than try to explain all the possibilities in detail (or wait for you to tell us which one matches your actual code), let's look at what we know is true based on the behavior:
In some way, you're calling main before finishing the import of ATRdb.
This means that, at this point, ATRdb has nothing in it but sys, MYSQLdb, and datetime (and a handful of special attributes that every module gets automatically). In particular, it hasn't gotten to the definition of checkPerms yet, so no such attribute exists in the module yet.
Of course eventually it's going to finish importing the rest of ATRdb, but at that point it's too late; you've already called main and it tried to call ATR.checkPerms and that failed.
While there are various complicated ways to make circular imports work (see the official FAQ for some), the easiest and cleanest solution is to just not do it. If ATRdb needs some functions that are in cardreader, you should probably factor those out into a third module, like cardutils, that both ATRdb and cardreader can import.

Type error writing to file in Python

I am writing a Python script to notify me when changes are made to a webpage and store the current state of the page to a file in order to resume seamlessly after rebooting.
The code is as follows:
import urllib
url="http://example.com"
filepath="/path/to/file.txt"
try:
html=open(filepath,"r").read() # Restores imported code from previous session
except:
html="" # Blanks variable on first run of the script
while True:
imported=urllib.urlopen(url)
if imported!=html:
# Alert me
html=imported
open(filepath,"w").write(html)
# Time delay before next iteration
Running the script returns:
Traceback (most recent call last):
File "April_Fools.py", line 20, in <module>
open(filepath,"w").write(html)
TypeError: expected a character buffer object
------------------
(program exited with code: 1)
Press return to continue
I've no idea what this means. I'm relatively new to Python. Any help would be much appreciated.

urllib.urlopen does not return a string, it returns a response as a file-like object. You need to read that response:
html = imported.read()
Only then is html a string you can write to a file.

As an aside, using open(filename).read() is not considered good style, because you never close the file. The same goes for writing. Try using a context manager instead:
try:
with open(filepath,"r") as htmlfile:
html = htmlfile.read()
except:
html=""
The with block will automatically close the file when you leave the block.

AttributeError: xmlNode instance has no attribute 'isCountNode'

I'm using libxml2 in a Python app I'm writing, and am trying to run some test code to parse an XML file. The program downloads an XML file from the internet and parses it. However, I have run into a problem.
With the following code:
xmldoc = libxml2.parseDoc(gfile_content)
droot = xmldoc.children # Get document root
dchild = droot.children # Get child nodes
while dchild is not None:
if dchild.type == "element":
print "\tAn element with ", dchild.isCountNode(), "child(ren)"
print "\tAnd content", repr(dchild.content)
dchild = dchild.next
xmldoc.freeDoc();
...which is based on the code example found on this article on XML.com, I receive the following error when I attempt to run this code on Python 2.4.3 (CentOS 5.2 package).
Traceback (most recent call last):
File "./xml.py", line 25, in ?
print "\tAn element with ", dchild.isCountNode(), "child(ren)"
AttributeError: xmlNode instance has no attribute 'isCountNode'
I'm rather stuck here.
Edit: I should note here I also tried IsCountNode() and it still threw an error.

isCountNode should read "lsCountNode" (a lower-case "L")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

lxml.etree.iterparse closes input file handler? - python

Seems to work fine with StringIO, try using that instead of cStringIO. No idea why it's getting closed.

Related

Passing a list as a url value to urlopen

Read-attribute-missing error while uploading file in Python

AttributeError: 'module' object has no attribute

Type error writing to file in Python

AttributeError: xmlNode instance has no attribute 'isCountNode'

Categories

Resources