Parsing an xml file and creating another from the parsed object - python

I am trying to parse an xml file(containing bad characters) using lxml module in recover = True mode.
Below is the code snippet
from lxml import etree
f=open('test.xml')
data=f.read()
f.close()
parser = etree.XMLParser(recover=True)
x = etree.fromstring(data, parser=parser)
Now I want to create another xml file (test1.xml) from the above object (x)
Could anyone please help in this matter.
Thanks

I think this is what you are searching for
from lxml import etree
# opening the source file
with open('test.xml','r') as f:
# reading the number
data=f.read()
parser = etree.XMLParser(recover=True)
# fromstring() parses XML from a string directly into an Element
x = etree.fromstring(data, parser=parser)
# taking the content retrieved
y = etree.tostring(x, pretty_print=True).decode("utf-8")
# writing the content on the output file
with open('test1.xml','w') as f:
f.write(y)

Related

How to convert xml variable to xml file in python?

I convert my dictionary to xml but I cant save that in xml file
from dict2xml import dict2xml
xml = dict2xml(my_dictionary)
print(xml)
The following assigns an xml string to your variable xml.
xml = dict2xml(my_dictionary)
You can do the string to a file by doing the following:
from dict2xml import dict2xml
xml = dict2xml(my_dictionary)
with open("my_data.xml", 'w') as f:
f.write(xml)
This will write your xml data to the file my_data.xml
You need to make a XML String and then write file-
from xml.dom.minidom import parseString
xml_str = parseString(xml).toprettyxml()
save_path_file = "myfile.xml"
with open(save_path_file, "w") as f:
f.write(xml_str)

Write edited xml that replaced hypen with underscore

So I am trying to write a new xml file that I edited from the original by replacing the hyphen with an underscore and then start working on that xml file for the rest of the code.
This is my code:
import xml.etree.ElementTree as ET
from lxml import etree
#attaching xml file
xmlfile = "hook_zap.xml"
tree = ET.parse(xmlfile)
root = tree.getroot()
#replace hypen with underscore within the xml
doc = etree.parse(xmlfile)
for e in doc.xpath('//*[contains(local-name(),"-")]'):
e.tag = e.tag.replace('-','_')
refracted = etree.tostring(doc, method='xml')
#create a new xml file with refracted file
refracted.write('base.xml')
#print (refracted)
And I keep getting this error:
AttributeError: 'bytes' object has no attribute 'write'
Write refracted like any other kind data into a file:
with open('base.xml', 'w') as f:
f.write(refracted.decode('utf-8'))

Can anyone tell me what error msg "line 1182 in parse" means when I'm trying to parse and xml in python

This is the code that results in an error message:
import urllib
import xml.etree.ElementTree as ET
url = raw_input('Enter URL:')
urlhandle = urllib.urlopen(url)
data = urlhandle.read()
tree = ET.parse(data)
The error:
I'm new to python. I did read documentation and a couple of tutorials, but clearly I still have done something wrong. I don't believe it is the xml file itself because it does this to two different xml files.
Consider using ElementTree's fromstring():
import urllib
import xml.etree.ElementTree as ET
url = raw_input('Enter URL:')
# http://feeds.bbci.co.uk/news/rss.xml?edition=int
urlhandle = urllib.urlopen(url)
data = urlhandle.read()
tree = ET.fromstring(data)
print ET.tostring(tree, encoding='utf8', method='xml')
data is a reference to the XML content as a string, but the parse() function expects a filename or file object as argument. That's why there is an an error.
urlhandle is a file object, so tree = ET.parse(urlhandle) should work for you.
The error message indicates that your code is trying to open a file, who's name is stored in the variable source.
It's failing to open that file (IOError) because the variable source contains a bunch of XML, not a file name.

How to pass xml as a parameter in python script?

I have the parameters to be passed to my python code saved in an xml file.
How to pass this xml as a parameter to my python code?
Can someone please help on this?
Thanks in Adavce!
You can pass it as a command line parameter when executing the script. Use sys.argv, the array that stores all the arguments passed or argparse module, that handles customisable command line parameters
Assuming you have "file.xml" as:
<?xml version="1.0"?>
<address>
<name>John Doe</name>
<position>CEO</position>
</address>
You can either:
Pass the XML file as a command line parameter to your python script.
Usage: script.py path/to/file.xml
import sys
from xml.dom.minidom import parseString
def read_xml(xml_file):
with open(xml_file, 'r') as f:
data = f.read()
return parseString(data)
if (len(sys.argv) < 2):
print "Error: Missing parameter."
else:
dom = read_xml(sys.argv[1])
tag = dom.getElementsByTagName('name')[0].toxml()
print tag
It will be better if you use argparse module instead of sys.argv.
Or just open and read the XML file and parse it.
from xml.dom.minidom import parseString
def read_xml(xml_file):
with open(xml_file, 'r') as f:
data = f.read()
return parseString(data)
dom = read_xml("file.xml")
tag = dom.getElementsByTagName('name')[0].toxml()
print tag

Exception when parsing a xml using lxml

I wrote this code to validate my xml file via a xsd
def parseAndObjectifyXml(xmlPath, xsdPath):
from lxml import etree
xsdFile = open(xsdPath)
schema = etree.XMLSchema(file=xsdFile)
xmlinput = open(xmlPath)
xmlContent = xmlinput.read()
myxml = etree.parse(xmlinput) # In this line xml input is empty
schema.assertValid(myxml)
but when I want to validate it, my xmlinput is empty but my xmlContent is not empty.
what is the problem?
Files in python have a "current position"; it starts at the beginning of the file (position 0), then, as you read the file, the current position pointer moves along until it reaches the end.
You'll need to put that pointer back to the beginning before the lxml parser can read the contents in full. Use the .seek() method for that:
from lxml import etree
def parseAndObjectifyXml(xmlPath, xsdPath):
xsdFile = open(xsdPath)
schema = etree.XMLSchema(file=xsdFile)
xmlinput = open(xmlPath)
xmlContent = xmlinput.read()
xmlinput.seek(0)
myxml = etree.parse(xmlinput)
schema.assertValid(myxml)
You only need to do this if you need xmlContent somewhere else too; you could alternatively pass it into the .parse() method if wrapped in a StringIO object to provide the necessary file object methods:
from lxml import etree
from cStringIO import StringIO
def parseAndObjectifyXml(xmlPath, xsdPath):
xsdFile = open(xsdPath)
schema = etree.XMLSchema(file=xsdFile)
xmlinput = open(xmlPath)
xmlContent = xmlinput.read()
myxml = etree.parse(StringIO(xmlContent))
schema.assertValid(myxml)
If you are not using xmlContent for anything else, then you do not need the extra .read() call either, and subsequently won't have problems parsing it with lxml; just omit the call altogether, and you won't need to move the current position pointer back to the start either:
from lxml import etree
def parseAndObjectifyXml(xmlPath, xsdPath):
xsdFile = open(xsdPath)
schema = etree.XMLSchema(file=xsdFile)
xmlinput = open(xmlPath)
myxml = etree.parse(xmlinput)
schema.assertValid(myxml)
To learn more about .seek() (and it's counterpart, .tell()), read up on file objects in the Python tutorial.
You should use the XML content that you have read:
xmlContent = xmlinput.read()
myxml = etree.parse(xmlContent)
instead of:
myxml = etree.parse(xmlinput)

Categories