I am reading a file with a jml extension. The code is very simple and it reads
import xml.etree.ElementTree as ET
tree = ET.parse('VOAPoints_2010_M25.jml')
root = tree.getroot()
but I get a parsing error:
ParseError: not well-formed (invalid token): line 75, column 16
the file I am trying to read is a dataset that has been used before so I am confident that there are no problems with it.
The file is
Can anyone help ?
Sorry for using an answer as a question, but formatting this inside a comment is painful.
Does the code below solve your problem?
import xml.etree.ElementTree as ET
myParser = ET.XMLParser(encoding="utf-8")
tree = ET.parse('VOAPoints_2010_M25.jml',parser=myParser)
root = tree.getroot()
Since the pound sign was the issue, you can escape it with the character entity £. Python can even automate the replace in XML file by iteratively reading each line and replacing it conditionally on the pound symbol:
import xml.etree.ElementTree as ET
oldfile = "VOAPoints_2010_M25.jml"
newfile = "VOAPoints_2010_M25_new.jml"
with open(oldfile, 'r') as otxt:
for rline in otxt:
if "£" in rline:
rline = rline.replace("£", "£")
with open(newfile, 'a') as ntxt:
ntxt.write(rline)
tree = ET.parse(newfile)
root = tree.getroot()
Related
Trying to read bulk data from US Patent and Trade Office. Have tried several xml files from here, I get the same results:
import xml.etree.ElementTree as ET
import re
file = 'ipgb20210105.xml'
tree = ET.parse(file)
yields: "ParseError: junk after document element: line 862, column 0"
Have tried recommendation to wrap with fake root node, but this doesn't work either:
with open(file) as f:
xml = f.read()
tree = ET.fromstring(re.sub(r"(<\?xml[^>]+\?>)", r"\1<root>", xml) + "</root>")
yields: "ParseError: not well-formed (invalid token): line 2, column 2"
Any help much appreciated!
So I am trying to write a new xml file that I edited from the original by replacing the hyphen with an underscore and then start working on that xml file for the rest of the code.
This is my code:
import xml.etree.ElementTree as ET
from lxml import etree
#attaching xml file
xmlfile = "hook_zap.xml"
tree = ET.parse(xmlfile)
root = tree.getroot()
#replace hypen with underscore within the xml
doc = etree.parse(xmlfile)
for e in doc.xpath('//*[contains(local-name(),"-")]'):
e.tag = e.tag.replace('-','_')
refracted = etree.tostring(doc, method='xml')
#create a new xml file with refracted file
refracted.write('base.xml')
#print (refracted)
And I keep getting this error:
AttributeError: 'bytes' object has no attribute 'write'
Write refracted like any other kind data into a file:
with open('base.xml', 'w') as f:
f.write(refracted.decode('utf-8'))
I wrote a code that must modify some values in a xml file. it looks to be working, but when i open this xml file threw PyCharm where i have added the modified file, it just doesn't change a thing. If anyone gave a respond to such a question, please point me where is it. Here is the code as well as the xml.
import xml.etree.ElementTree as ET
tree = ET.parse("farms.xml")
root = tree.getroot()
for elem in root.findall('farm'):
elem.set('money', '2000')
money = elem.get('money')
print(money)
xml
<farms>
<farm farmId="1" name="Моя ферма" color="1" loan="0.000000" money="213" loanAnnualInterestRate="304.166656">
<players>
</players>
</farm>
</farms>
What you are missing is writing the tree back to disk.
import xml.etree.ElementTree as ET
tree = ET.parse("farms.xml")
root = tree.getroot()
for elem in root.findall('farm'):
elem.set('money', '2000')
with open('new_farms.xml', 'wb') as f:
tree.write(f)
It works for me.
Additionally,
print(xml.etree.ElementTree.tostring(root))
will show what you expect.
I am trying to parse an xml file(containing bad characters) using lxml module in recover = True mode.
Below is the code snippet
from lxml import etree
f=open('test.xml')
data=f.read()
f.close()
parser = etree.XMLParser(recover=True)
x = etree.fromstring(data, parser=parser)
Now I want to create another xml file (test1.xml) from the above object (x)
Could anyone please help in this matter.
Thanks
I think this is what you are searching for
from lxml import etree
# opening the source file
with open('test.xml','r') as f:
# reading the number
data=f.read()
parser = etree.XMLParser(recover=True)
# fromstring() parses XML from a string directly into an Element
x = etree.fromstring(data, parser=parser)
# taking the content retrieved
y = etree.tostring(x, pretty_print=True).decode("utf-8")
# writing the content on the output file
with open('test1.xml','w') as f:
f.write(y)
How can I read huge xml files (more than 1GB) using this code:
import xml.etree.ElementTree as ET
tree = ET.parse(file)
doc = tree.getroot()
abstracts = doc.findall('PubmedArticle/MedlineCitation/Article/Abstract')
for abstract in abstracts:
abs_text = abstract.findall('AbstractText')
ab = ''
for txt in abs_text:
ab += txt.text
collections.col_pubmed_xmls.insert({'text': ab, 'tag': tag})
after executing this code an error says that file can not be openned in this line:
ET.parse(file)
I can read small files using this code.
What to do?