Why is ElementTree.iterparse() raising a ParseError?

Why is ElementTree.iterparse() raising a ParseError? - python

import xml.etree.ElementTree as ET
xmldata = file('my_xml_file.xml')
tree = ET.parse(xmldata)
root = tree.getroot()
root_iter = root.iter()
Now I can call root_iter.next() and get my Element objects. The problem is the real file I am working with is huge and I can't fit all of it in memory. So I am trying to use:
parse_iter = ET.iterparse(xmldata)
If I call parse_iter.next() it raises the following
Traceback (most recent call last):
File "<pyshell#38>", line 1, in <module>
parse_iter.next()
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1260, in next
self._root = self._parser.close()
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1636, in close
self._raiseerror(v)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1488, in _raiseerror
raise err
ParseError: no element found: line 1, column 0
What am I doing wrong?

The code I had was perfectly fine, except I was calling ElementTree.iterparse() on a file object I had already read with ElementTree.parse(). D'oh!
So for those who happen to make the same mistake, the solution is to either open a new file object or use file.seek(0) to reset the file cursor.

Related

I'm trying to pass a variable to another script as an argument but it isnt working

When I change the content file and styleFile vars for just the file path, it works fine. So I know that the content file is there and that it can find it.
I must be passing a variable incorrectly to the other python script. I've been trying but I can't google myself out of this one at the moment.
import os
listStyles = ['/content/neural-style-tf/styles/1.png']
listContent = ['/content/neural-style-tf/image_input/00078.png']
i = 0
for imageName in listStyles:
stylefile = imageName
contentfile = listContent[i]
i = i + 1
print (stylefile)
print (contentfile)
print ('')
!python neural_style.py --content_img contentfile --style_imgs stylefile
Output:
/content/neural-style-tf/styles/1.png
/content/neural-style-tf/image_input/00078.png
Traceback (most recent call last):
File "neural_style.py", line 889, in <module>
main()
File "neural_style.py", line 886, in main
else: render_single_image()
File "neural_style.py", line 849, in render_single_image
content_img = get_content_image(args.content_img)
File "neural_style.py", line 715, in get_content_image
check_image(img, path)
File "neural_style.py", line 552, in check_image
raise OSError(errno.ENOENT, "No such file", path)
FileNotFoundError: [Errno 2] No such file: './image_input/contentfile'

I'm just dumb and need to not just brute force a language when I need it and learn it beforehand.
If anyone else comes across this you need to put a $ in front of the variable to let python know you're passing a var instead of a string.

spaCy: errors attempting to load serialized Doc

I am trying to serialize/deserialize spaCy documents (setup is Windows 7, Anaconda) and am getting errors. I haven't been able to find any explanations. Here is a snippet of code and the error it generates:
import spacy
nlp = spacy.load('en')
text = 'This is a test.'
doc = nlp(text)
fout = 'test.spacy' # <-- according to the API for Doc.to_disk(), this needs to be a directory (but for me, spaCy writes a file)
doc.to_disk(fout)
doc.from_disk(fout)
Traceback (most recent call last):
File "<ipython-input-7-aa22bf1b9689>", line 1, in <module>
doc.from_disk(fout)
File "doc.pyx", line 763, in spacy.tokens.doc.Doc.from_disk
File "doc.pyx", line 806, in spacy.tokens.doc.Doc.from_bytes
ValueError: [E033] Cannot load into non-empty Doc of length 5.
I have also tried creating a new Doc object and loading from that, as shown in the example ("Example: Saving and loading a document") in the spaCy docs, which results in a different error:
from spacy.tokens import Doc
from spacy.vocab import Vocab
new_doc = Doc(Vocab()).from_disk(fout)
Traceback (most recent call last):
File "<ipython-input-16-4d99a1199f43>", line 1, in <module>
Doc(Vocab()).from_disk(fout)
File "doc.pyx", line 763, in spacy.tokens.doc.Doc.from_disk
File "doc.pyx", line 838, in spacy.tokens.doc.Doc.from_bytes
File "stringsource", line 646, in View.MemoryView.memoryview_cwrapper
File "stringsource", line 347, in View.MemoryView.memoryview.__cinit__
ValueError: buffer source array is read-only
EDIT:
As pointed out in the replies, the path provided should be a directory. However, the first code snippet creates a file. Changing this to a non-existing directory path doesn't help as spaCy still creates a file. Attempting to write to an existing directory causes an error too:
fout = 'data'
doc.to_disk(fout) Traceback (most recent call last):
File "<ipython-input-8-6c30638f4750>", line 1, in <module>
doc.to_disk(fout)
File "doc.pyx", line 749, in spacy.tokens.doc.Doc.to_disk
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1161, in open
opener=self._opener)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1015, in _opener
return self._accessor.open(self, flags, mode)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 387, in wrapped
return strfunc(str(pathobj), *args)
PermissionError: [Errno 13] Permission denied: 'data'
Python has no problem writing at this location via standard file operations (open/read/write).
Trying with a Path object yields the same results:
from pathlib import Path
import os
fout = Path(os.path.join(os.getcwd(), 'data'))
doc.to_disk(fout)
Traceback (most recent call last):
File "<ipython-input-17-6c30638f4750>", line 1, in <module>
doc.to_disk(fout)
File "doc.pyx", line 749, in spacy.tokens.doc.Doc.to_disk
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1161, in open
opener=self._opener)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 1015, in _opener
return self._accessor.open(self, flags, mode)
File "C:\Users\Username\AppData\Local\Continuum\anaconda3\lib\pathlib.py", line 387, in wrapped
return strfunc(str(pathobj), *args)
PermissionError: [Errno 13] Permission denied: 'C:\\Users\\Username\\workspace\\data'
Any ideas why this might be happening?

doc.to_disk(fout)
must be
a path to a directory, which will be created if it doesn't exist.
Paths may be either strings or Path-like objects.
as the documentation for spaCy states in https://spacy.io/api/doc
Try changing fout to a directory, it might do the trick.
EDIT:
Examples from the spacy documentation:
for doc.to_disk:
doc.to_disk('/path/to/doc')
and for doc.from_disk:
from spacy.tokens import Doc
from spacy.vocab import Vocab
doc = Doc(Vocab()).from_disk('/path/to/doc')

Error trying parsing xml using python : xml.etree.ElementTree.ParseError: syntax error: line 1,

In python, simply trying to parse XML:
import xml.etree.ElementTree as ET
data = 'info.xml'
tree = ET.fromstring(data)
but got error:
Traceback (most recent call last):
File "C:\mesh\try1.py", line 3, in <module>
tree = ET.fromstring(data)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1312, in XML
return parser.close()
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1665, in close
self._raiseerror(v)
File "C:\Python27\lib\xml\etree\ElementTree.py", line 1517, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: syntax error: line 1, column 0
thats a bit of xml, i have:
<?xml version="1.0" encoding="utf-16"?>
<AnalysisData xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<BlendOperations OperationNumber="1">
<ComponentQuality>
<MaterialName>Oil</MaterialName>
<Weight>1067.843017578125</Weight>
<WeightPercent>31.545017776585109</WeightPercent>
Why is it happening?

You're trying to parse the string 'info.xml' instead of the contents of the file.
You could call tree = ET.parse('info.xml') which will open the file.
Or you could read the file directly:
ET.fromstring(open('info.xml').read())

Parser.pxi problems when pretty printing xml file

I'm getting some errors when I'm trying to pretty print a xml file. I've looked everywhere and tried installning latest version of lxml but still getting this error.
My script is pretty simple it looks like this.
import os
import lxml.etree as etree
from lxml.etree import parse
fname = 'C:\Test_folder\SlutR_20150218.xml'
x = etree.parse(fname)
print etree.tostring(x, pretty_print = True)
And the errors I'm getting is following.
Traceback (most recent call last):
File "C:\Users\a.curcic\Desktop\Övriga_python_script\
Pretty_print_example.py",
line 5, in <module> x = etree.parse(fname)
File "lxml.etree.pyx",
line 3301, in lxml.etree.parse
(src\lxml\lxml.etree.c:72453)
File "parser.pxi", line 1791, in lxml.etree._parseDocument
(src\lxml\lxml.etree.c:105915)
File "parser.pxi", line 1817,
in lxml.etree._parseDocumentFromURL (src\lxml\lxml.etree.c:106214)
XMLSyntaxError:
Extra content at the end of the document, line 2, column 909

Parsing XML exception

I'm new to python, and seriously need help! I have a number of errors I can't figure out. I'm using python 2.7 on a mac. Here is the list of errors:
Traceback (most recent call last):
File "minihiveosc.py", line 378, in <module>
swhive = SWMiniHiveOSC( options.host, options.hport, options.ip, options.port, options.minibees, options.serial, options.baudrate, options.config, [1,options.minibees], options.verbose, options.apimode )
File "minihiveosc.py", line 280, in __init__
self.hive.load_from_file( config )
File "/Users/Puffin/Documents/python/pydon/pydon/pydonhive.py", line 396, in load_from_file
hiveconf = cfgfile.read_file( filename )
File "/Users/Puffin/Documents/python/pydon/pydon/minibeexml.py", line 116, in read_file
tree = ET.parse( filename )
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1183, in parse
tree.parse(source, parser)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse
parser.feed(data)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1643, in feed
self._raiseerror(v)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 164, column 8
Any chance someone can help me?
Thanks!

What you posted in your question is called a "Traceback", and it shows only one error:
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 164, column 8
All the lines before it show how python got there; in the file minihiveosc.py, on line 378 some code was executed (shown in the traceback), which then led to line 280 of the same file, where something else was called, etc.
Every time Python calls a function the current state is pushed onto the stack to make room for the next context, and when an exception occurs python can show you this stack to help you diagnose your problem
In this case, you are trying to feed an XML document to the XML parser that has an error in it; by the time the parser gets to line 164, column 8, it found something it didn't expect. You'll need to inspect that document to see what the problem is, it'll be around that area.

It just because that your XML file is not wellformed at line 8. When the parser tries to read that line it raises that error. Have a look at your document to see what it is.

This is one error with stack trace.
Creation of SWMiniHiveOSC object caused error when executing load_from_file(config) method. File name or file content is in 'options.config'. Your XML config file is not well-formed, there is invalid token at line 164, column 8 in this file. The problem is with XML file, not python code.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why is ElementTree.iterparse() raising a ParseError? - python

The code I had was perfectly fine, except I was calling ElementTree.iterparse() on a file object I had already read with ElementTree.parse(). D'oh! So for those who happen to make the same mistake, the solution is to either open a new file object or use file.seek(0) to reset the file cursor.

Related

I'm trying to pass a variable to another script as an argument but it isnt working

spaCy: errors attempting to load serialized Doc

Error trying parsing xml using python : xml.etree.ElementTree.ParseError: syntax error: line 1,

Parser.pxi problems when pretty printing xml file

Parsing XML exception

Categories

Resources