I'm using ElementTree with Python to parse an XML file to find the contents of the tag contentType.
Here's the Python line:
extensionType = ET.parse("src/" + str(filename)).find('contentType')
And here's the XML:
<?xml version="1.0" encoding="UTF-8"?>
<StaticResource xmlns="http://soap.sforce.com/2006/04/metadata">
<cacheControl>Private</cacheControl>
<contentType>image/jpeg</contentType>
</StaticResource>
What am I doing wrong?
Thanks!
you are just parsing the xml file so far. this is how you can get your element using xpath (note how you have to use the given xml namespace xmlns)
import xml.etree.cElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
xmlns = {'soap': '{http://soap.sforce.com/2006/04/metadata}'}
ct_element = root.find('.//{soap}contentType'.format(**xmlns))
print(ct_element.text)
Related
I have the following XML that I need to print out the content of members, I am trying to do this using elementree but I cannot figure out how to do this, below is the xml:
<response status="success"><result><entry name="TEST-ADDRESS-GROUP">
<static>
<member>TEST_1</member>
<member>TEST_2</member>
</static> </entry></result></response>
Try this one :
import xml.etree.ElementTree as ET
xml_data = '''<response status="success"><result><entry name="TEST-ADDRESS-GROUP">
<static>
<member>TEST_1</member>
<member>TEST_2</member>
</static> </entry></result></response>
'''
tree = ET.fromstring(xml_data)
member_elements = tree.findall('.//member')
text_in_member_elements = [element.text for element in member_elements]
I'm trying to get the text content on tag 'Event-id' in the XML, but hyphen is not recognizing as an element on the file, I know script is working well because if a replace the hyphen for a underscore in the XML and run the script it works, anybody knows which could be the problem?
<?xml version="1.0" encoding="UTF-8"?>
<eventsUpdate xmlns="http://nateng.com/xsd/NETworks">
<fullEventsUpdate xmlns="">
<fullEventUpdate xmlns="">
<event-reference xmlns="">
<event-id xmlns="">24425412</event-id>
<event-update xmlns="">34</event-update>
</event-reference>
</fullEventUpdate>
<fullEventUpdate xmlns="">
<event-reference xmlns="">
<event-id xmlns="">24342548</event-id>
<event-update xmlns="">34</event-update>
</event-reference>
</fullEventUpdate>
</fullEventsUpdate>
</eventsUpdate>
from bs4 import BeautifulSoup
dir_path = '20211006085201.xml'
file = open(dir_path, encoding='UTF-8')
contents = file.read()
soup = BeautifulSoup(contents, 'xml')
events = soup.find_all('fullEventUpdate')
print(' \n-------', len(events), 'events calculated on ', dir_path, '--------\n')
idi = soup.find_all('event-reference')
for x in range(0, len(events)):
idText = (idi[x].event-id.get_text())
print(idText)
The problem is you are dealing with namespaced xml, and for that type of document, you should use css selectors instead:
events = soup.select('fullEventUpdate')
for event in events:
print(event.select_one('event-id').text)
Output:
24425412
24342548
More generally, in dealing with xml documents, you are probably better off using something which supports xpath (like lxml or ElementTree).
For XML parsing idiomatic approach is to use xpath selectors.
In python this can be easily achieved with parsel package which is similar to beautifulsoup but built on top of lxml for full xpath support:
body = ...
from parsel import Selector
selector = Selector(body)
for event in sel.xpath("//event-reference"):
print(event.xpath('event-id/text()').get())
results in:
24425412
24342548
Without any external lib (Just ElementTree)
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<eventsUpdate xmlns="http://nateng.com/xsd/NETworks">
<fullEventsUpdate xmlns="">
<fullEventUpdate xmlns="">
<event-reference xmlns="">
<event-id xmlns="">24425412</event-id>
<event-update xmlns="">34</event-update>
</event-reference>
</fullEventUpdate>
<fullEventUpdate xmlns="">
<event-reference xmlns="">
<event-id xmlns="">24342548</event-id>
<event-update xmlns="">34</event-update>
</event-reference>
</fullEventUpdate>
</fullEventsUpdate>
</eventsUpdate> '''
root = ET.fromstring(xml)
ids = [e.text for e in root.findall('.//event-id')]
print(ids)
output
['24425412', '24342548']
I'm trying hard to include my XSLT header in my XML with ElementTree, and I canĀ“t find any information on how to do it.
Here's my Python code:
tree = ET.parse('myfile.xml') #get all tags from this XML document
root = tree.getroot() #get all elements from each tag
root[0][0].text = "ole"
root[0][1].text = "ole"
tree.write('test_file.xml', encoding='utf-8', method="xml") #write XML file
The only problem is to include this header:
<\? xml-stylesheet type="text/xsl" href="myfile.xsl" \?>
Unfortunately xml.etree.ElementTree does not support XSLT (for instance, you can read about write() that method is either xml, text or html).
Luckily though, you can easily do that if you rely on lxml which adds supports to XSLT
I just found the answer!!!
You need to use lxml instead and this is the new code:
from lxml import etree as ET
parser = ET.XMLParser(strip_cdata=False) #strip = false to prevent cdata to be removed/ stripped
tree = ET.parse('myfile.xml', parser)
root = tree.getroot() #get all elements from each tag
tag1 = root.find('TAG1')
tag1.find('TAG2').text = 'text change here'
tree.write('test_file.xml', encoding='utf-8', method="xml")
Your XML template (myfile.xml) is like this:
<?xml-stylesheet type="text/xsl" href="your_file.xsl" ?>
<FirstTAG>
<TAG1>
<TAG2>your text</TAG2>
</TAG1>
</FirstTAG>
And the new one will be like this:
<?xml-stylesheet type="text/xsl" href="your_file.xsl" ?>
<FirstTAG>
<TAG1>
<TAG2>text change here</TAG2>
</TAG1>
</FirstTAG>
<?xml version="1.0"?>
<doc>
<doc1>
<branch name="testing" hash="1cdf045c">
text,source
</branch>
<some name="release01" hash:hashing="f200013e">
<sub-branch name="subrelease01">
xml,sgml
</sub-branch>
</branch>
<bsome name="invalid">
</branch>
</doc1>
</doc>
I wrote something like this:
import xml.etree.ElementTree as ET
from lxml import etree
filename='/testxml'
tree = etree.parse(filename)
root = tree.getroot()
print root
primary = etree.Element("{http://schemas.dmtf.org/xxx/doc/1}doc")
secondary = etree.SubElement(primary, "{http://schemas.dmtf.org/xxx/doc/1}some name")
secondary.text = ????
print(etree.tostring(primary, pretty_print=True))
I would want to change the value for hash:hashing...
If possible I want use xpath query for example /doc/doc1/some name/#hash:hashing
to do this task...
Please help....Thank you!
I have some XML in a unicode-string variable in Python as follows:
<?xml version='1.0' encoding='UTF-8'?>
<results preview='0'>
<meta>
<fieldOrder>
<field>count</field>
</fieldOrder>
</meta>
<result offset='0'>
<field k='count'>
<value><text>6</text></value>
</field>
</result>
</results>
How do I extract the 6 in <value><text>6</text></value> using Python?
With lxml:
import lxml.etree
# xmlstr is your xml in a string
root = lxml.etree.fromstring(xmlstr)
textelem = root.find('result/field/value/text')
print textelem.text
Edit: But I imagine there could be more than one result...
import lxml.etree
# xmlstr is your xml in a string
root = lxml.etree.fromstring(xmlstr)
results = root.findall('result')
textnumbers = [r.find('field/value/text').text for r in results]
BeautifulSoup is the most simple way to parse XML as far as I know...
And assume that you have read the introduction, then just simply use:
soup = BeautifulSoup('your_XML_string')
print soup.find('text').string