Read an xml file in Python

Read an xml file in Python - python

I am reading a file with a jml extension. The code is very simple and it reads
import xml.etree.ElementTree as ET
tree = ET.parse('VOAPoints_2010_M25.jml')
root = tree.getroot()
but I get a parsing error:
ParseError: not well-formed (invalid token): line 75, column 16
the file I am trying to read is a dataset that has been used before so I am confident that there are no problems with it.
The file is
Can anyone help ?

Sorry for using an answer as a question, but formatting this inside a comment is painful.
Does the code below solve your problem?
import xml.etree.ElementTree as ET
myParser = ET.XMLParser(encoding="utf-8")
tree = ET.parse('VOAPoints_2010_M25.jml',parser=myParser)
root = tree.getroot()

Since the pound sign was the issue, you can escape it with the character entity £. Python can even automate the replace in XML file by iteratively reading each line and replacing it conditionally on the pound symbol:
import xml.etree.ElementTree as ET
oldfile = "VOAPoints_2010_M25.jml"
newfile = "VOAPoints_2010_M25_new.jml"
with open(oldfile, 'r') as otxt:
for rline in otxt:
if "£" in rline:
rline = rline.replace("£", "£")
with open(newfile, 'a') as ntxt:
ntxt.write(rline)
tree = ET.parse(newfile)
root = tree.getroot()

Related

python xml won't parse

Trying to read bulk data from US Patent and Trade Office. Have tried several xml files from here, I get the same results:
import xml.etree.ElementTree as ET
import re
file = 'ipgb20210105.xml'
tree = ET.parse(file)
yields: "ParseError: junk after document element: line 862, column 0"
Have tried recommendation to wrap with fake root node, but this doesn't work either:
with open(file) as f:
xml = f.read()
tree = ET.fromstring(re.sub(r"(<\?xml[^>]+\?>)", r"\1<root>", xml) + "</root>")
yields: "ParseError: not well-formed (invalid token): line 2, column 2"
Any help much appreciated!

Write edited xml that replaced hypen with underscore

So I am trying to write a new xml file that I edited from the original by replacing the hyphen with an underscore and then start working on that xml file for the rest of the code.
This is my code:
import xml.etree.ElementTree as ET
from lxml import etree
#attaching xml file
xmlfile = "hook_zap.xml"
tree = ET.parse(xmlfile)
root = tree.getroot()
#replace hypen with underscore within the xml
doc = etree.parse(xmlfile)
for e in doc.xpath('//*[contains(local-name(),"-")]'):
e.tag = e.tag.replace('-','_')
refracted = etree.tostring(doc, method='xml')
#create a new xml file with refracted file
refracted.write('base.xml')
#print (refracted)
And I keep getting this error:
AttributeError: 'bytes' object has no attribute 'write'

Write refracted like any other kind data into a file:
with open('base.xml', 'w') as f:
f.write(refracted.decode('utf-8'))

no modifications in XML file after running python code

I wrote a code that must modify some values in a xml file. it looks to be working, but when i open this xml file threw PyCharm where i have added the modified file, it just doesn't change a thing. If anyone gave a respond to such a question, please point me where is it. Here is the code as well as the xml.
import xml.etree.ElementTree as ET
tree = ET.parse("farms.xml")
root = tree.getroot()
for elem in root.findall('farm'):
elem.set('money', '2000')
money = elem.get('money')
print(money)
xml
<farms>
<farm farmId="1" name="Моя ферма" color="1" loan="0.000000" money="213" loanAnnualInterestRate="304.166656">
<players>
</players>
</farm>
</farms>

What you are missing is writing the tree back to disk.
import xml.etree.ElementTree as ET
tree = ET.parse("farms.xml")
root = tree.getroot()
for elem in root.findall('farm'):
elem.set('money', '2000')
with open('new_farms.xml', 'wb') as f:
tree.write(f)

It works for me.
Additionally,
print(xml.etree.ElementTree.tostring(root))
will show what you expect.

Parsing an xml file and creating another from the parsed object

I am trying to parse an xml file(containing bad characters) using lxml module in recover = True mode.
Below is the code snippet
from lxml import etree
f=open('test.xml')
data=f.read()
f.close()
parser = etree.XMLParser(recover=True)
x = etree.fromstring(data, parser=parser)
Now I want to create another xml file (test1.xml) from the above object (x)
Could anyone please help in this matter.
Thanks

I think this is what you are searching for
from lxml import etree
# opening the source file
with open('test.xml','r') as f:
# reading the number
data=f.read()
parser = etree.XMLParser(recover=True)
# fromstring() parses XML from a string directly into an Element
x = etree.fromstring(data, parser=parser)
# taking the content retrieved
y = etree.tostring(x, pretty_print=True).decode("utf-8")
# writing the content on the output file
with open('test1.xml','w') as f:
f.write(y)

How to read a huge xml file using xml.etree.ElementTree

How can I read huge xml files (more than 1GB) using this code:
import xml.etree.ElementTree as ET
tree = ET.parse(file)
doc = tree.getroot()
abstracts = doc.findall('PubmedArticle/MedlineCitation/Article/Abstract')
for abstract in abstracts:
abs_text = abstract.findall('AbstractText')
ab = ''
for txt in abs_text:
ab += txt.text
collections.col_pubmed_xmls.insert({'text': ab, 'tag': tag})
after executing this code an error says that file can not be openned in this line:
ET.parse(file)
I can read small files using this code.
What to do?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read an xml file in Python - python

Sorry for using an answer as a question, but formatting this inside a comment is painful. Does the code below solve your problem? import xml.etree.ElementTree as ET myParser = ET.XMLParser(encoding="utf-8") tree = ET.parse('VOAPoints_2010_M25.jml',parser=myParser) root = tree.getroot()

Related

python xml won't parse

Write edited xml that replaced hypen with underscore

no modifications in XML file after running python code

Parsing an xml file and creating another from the parsed object

How to read a huge xml file using xml.etree.ElementTree

Categories

Resources