I'm using Python2.5, ElementTree 1.2 to parse XML document, which looks like:
<cm:CompositeMessage xmlns:cm="http://www.xyz.com">
<cm:Message>
<cm:Body format="text/xml">
<CHMasterbook >
<event>
<eventName>Snapshot</eventName>
<date>2013-10-25</date>
<time>20:59:02</time>
</event>
</CHMasterbook>
</cm:Body>
</cm:Message>
</cm:CompositeMessage>
After I register the namespace
ET._namespace_map['http://www.xyz.com'] = 'cm'
I can parse the XMLdocument and locate the 'event' node
tree = ElementTree(fromstring(xml))
tree.findall('./{http://www.xyz.com}Message/{http://www.xyz.com}Body/CHMasterBook/event')
But if 'CHMasterbook' node has namespaces like
<CHMasterbook xmlns="http://uri.xyz.com/Chorus/Message" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://uri.xyz.com/Chorus/Message ../schema/chorus-master-book-msg.xsd">
tree.findall only returns empty list and it can no longer locate 'event' node. I also tried to register those namespaces like:
ET._namespace_map['http://uri.xyz.com/Chorus/Message'] = 'xmlns'
ET._namespace_map['http://www.w3.org/2001/XMLSchema-instance'] = 'xmlns:xsi'
ET._namespace_map['http://uri.xyz.com/Chorus/Message ../schema/chorus-master-book-msg.xsd'] = 'xsi:schemaLocationi'
But it didn't help.
I can only use Python 2.5 and ElementTree 1.2 (can't use lxml). Does anyone know how to locate the 'event' node with 'CHMasterbook' having those namespaces?
Try this:
tree = ElementTree(fromstring(xml))
tree.findall('./{http://www.xyz.com}Message'
'/{http://www.xyz.com}Body'
'/{http://uri.xyz.com/Chorus/Message}CHMasterbook'
'/{http://uri.xyz.com/Chorus/Message}event')
In your example, you use CHMasterbook and sometimes CHMasterBook. Remember case is important in XML.
Related
I am trying to update below xml file in python 3 using import xml.etree.ElementTree as ET but not able to add anything between tags
Issue I am facing not able to get/fetch the tag after fileSets.
Can someone let me know how we could update the xml?
abc.xml
<assembly
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2
http://maven.apache.org/xsd/assembly-1.1.2-xsd"
>
<id></id>
<formats>
<format>zip</format>
</formats>
<fileSets>
<fileSet>
<outputDirectory>/<outputDirectory>
<directory>../</directory>
<useDefaultExcludes>false</useDefaultExcludes>
<includes>
</includes>
</fileSet>
</fileSets>
</assembly>
Expected output:(file names will be added dynamically)
abc.xml
<assembly
xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2
http://maven.apache.org/xsd/assembly-1.1.2-xsd"
>
<id></id>
<formats>
<format>zip</format>
</formats>
<fileSets>
<fileSet>
<outputDirectory>/<outputDirectory>
<directory>../</directory>
<useDefaultExcludes>false</useDefaultExcludes>
<includes>
<include>abc.text</include>
<include>def.text</include>
<include>ghi.text</include>
</includes>
</fileSet>
</fileSets>
</assembly>
I am trying this and it prints me all four element inside this files but doesn't know how to access includes and then add something inside this abc.txt and so on.
import xml.etree.ElementTree as ET
tree = ET.parse(abc.xml)
root = tree.getroot()
for actor in root.findall('{http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2}fileSets'):
for name in actor.findall('{http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2}fileSet'):
print(name)
You don't have to do anything with fileSets orfileSet. Since you want to add children to includes, get that element directly.
import xml.etree.ElementTree as ET
# Ensure that the proper prefix is used in the output (in this case, no prefix at all)
ET.register_namespace("", "http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.2")
tree = ET.parse("abc.xml")
# Find the 'includes' element (.// means search the whole document).
# {*} is a wildcard and matches any namespace (Python 3.8)
includes = tree.find(".//{*}includes")
# Create three new 'include' elements
include1 = ET.Element("include")
include1.text = "abc.text"
include2 = ET.Element("include")
include2.text = "def.text"
include3 = ET.Element("include")
include3.text = "ghi.text"
# Add the new elements as children of 'includes'
includes.append(include1)
includes.append(include2)
includes.append(include3)
I am newbie on Python programming. I have requirement where I need to read the xml structure and build the new soap request xml by adding namespace like here is the example what I have
Below XML which i get from other system:
<foo>
<bar>
<type foobar="1"/>
<type foobar="2"/>
</bar>
</foo>
I want final result like below
<?xml version="1.0"?>
<soa:foo xmlns:soa="https://www.w3schools.com/furniture">
<soa:bar>
<soa:type foobar="1"/>
<soa:type foobar="2"/>
</soa:bar>
</soa:foo>
I tried to look in python document but not able to find
One option is to use lxml to iterate over all of the elements and add the namespace uri to the .tag property.
You can use register_namespace() to bind the uri to the desired prefix.
Example...
from lxml import etree
tree = etree.parse("input.xml")
etree.register_namespace("soa", "https://www.w3schools.com/furniture")
for elem in tree.iter():
elem.tag = f"{{https://www.w3schools.com/furniture}}{elem.tag}"
print(etree.tostring(tree, pretty_print=True).decode())
Printed output...
<soa:foo xmlns:soa="https://www.w3schools.com/furniture">
<soa:bar>
<soa:type foobar="1"/>
<soa:type foobar="2"/>
</soa:bar>
</soa:foo>
i need to build xml file using special name of items, this is my current code :
from lxml import etree
import lxml
from lxml.builder import E
wp = E.wp
tmp = wp("title")
print(etree.tostring(tmp))
current output is this :
b'<wp>title</wp>'
i want to be :
b'<wp:title>title</title:wp>'
how i can create items with name like this : wp:title ?
You confused the namespace prefix wp with the tag name. The namespace prefix is a document-local name for a namespace URI. wp:title requires a parser to look for a xmlns:wp="..." attribute to find the namespace itself (usually a URL but any globally unique string would do), either on the tag itself or on a parent tag. This connects tags to a unique value without making tag names too verbose to type out or read.
You need to provide the namepace, and optionally, the namespace mapping (mapping short names to full namespace names) to the element maker object. The default E object provided doesn't have a namespace or namespace map set. I'm going to assume that here that wp is the http://wordpress.org/export/1.2/ Wordpress namespace, as that seems the most likely, although it could also be that you are trying to send Windows Phone notifications.
Instead of using the default E element maker, create your own ElementMaker instance and pass it a namespace argument to tell lxml what URL the element belongs to. To get the right prefix on your element names, you also need to give it a nsmap dictionary that maps prefixes to URLs:
from lxml.builder import ElementMaker
namespaces = {"wp": "http://wordpress.org/export/1.2/"}
E = ElementMaker(namespace=namespaces["wp"], nsmap=namespaces)
title = E.title("Value of the wp:title tag")
This produces a tag with both the correct prefix, and the xmlns:wp attribute:
>>> from lxml.builder import ElementMaker
>>> namespaces = {"wp": "http://wordpress.org/export/1.2/"}
>>> E = ElementMaker(namespace=namespaces["wp"], nsmap=namespaces)
>>> title = E.title("Value of the wp:title tag")
>>> etree.tostring(title, encoding="unicode")
'<wp:title xmlns:wp="http://wordpress.org/export/1.2/">Value of the wp:title tag</wp:title>'
You can omit the nsmap value, but then you'd want to have such a map on a parent element of the document. In that case, you probably want to make separate ElementMaker objects for each namespace you need to support, and you put the nsmap namespace mapping on the outer-most element. When writing out the document, lxml then uses the short names throughout.
For example, creating a Wordpress WXR format document would require a number of namespaces:
from lxml.builder import ElementMaker
namespaces = {
"excerpt": "https://wordpress.org/export/1.2/excerpt/",
"content": "http://purl.org/rss/1.0/modules/content/",
"wfw": "http://wellformedweb.org/CommentAPI/",
"dc": "http://purl.org/dc/elements/1.1/",
"wp": "https://wordpress.org/export/1.2/",
}
RootElement = ElementMaker(nsmap=namespaces)
ExcerptElement = ElementMaker(namespace=namespaces["excerpt"])
ContentElement = ElementMaker(namespace=namespaces["content"])
CommentElement = ElementMaker(namespace=namespaces["wfw"])
DublinCoreElement = ElementMaker(namespace=namespaces["dc"])
ExportElement = ElementMaker(namespace=namespaces["wp"])
and then you'd construct a document with
doc = RootElement.rss(
RootElement.channel(
ExportElement.wxr_version("1.2"),
# etc. ...
),
version="2.0"
)
which, when pretty printed with etree.tostring(doc, pretty_print=True, encoding="unicode"), produces:
<rss xmlns:excerpt="https://wordpress.org/export/1.2/excerpt/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:wp="https://wordpress.org/export/1.2/" version="2.0">
<channel>
<wp:wxr_version>1.2</wp:wxr_version>
</channel>
</rss>
Note how only the root <rss> element has xmlns attributes, and how the <wp:wxr_version> tag uses the right prefix even though we only gave it the namespace URI.
To give a different example, if you are building a Windows Phone tile notification, it'd be simpler. After all, there is just a single namespace to use:
from lxml.builder import ElementMaker
namespaces = {"wp": "WPNotification"}
E = ElementMaker(namespace=namespaces["wp"], nsmap=namespaces)
notification = E.Notification(
E.Tile(
E.BackgroundImage("https://example.com/someimage.png"),
E.Count("42"),
E.Title("The notification title"),
# ...
)
)
which produces
<wp:Notification xmlns:wp="WPNotification">
<wp:Tile>
<wp:BackgroundImage>https://example.com/someimage.png</wp:BackgroundImage>
<wp:Count>42</wp:Count>
<wp:Title>The notification title</wp:Title>
</wp:Tile>
</wp:Notification>
Only the outer-most element, <wp:Notification>, now has the xmlns:wp attribute. All other elements only need to include the wp: prefix.
Note that the prefix used is entirely up to you and even optional. It is the namespace URI that is the real key to uniquely identifying elements across different XML documents. If you used E = ElementMaker(namespace="WPNotification", nsmap={None: "WPNotification"}) instead, and so produced a top-level element with <Notification xmlns="WPNotification"> you still have a perfectly legal XML document that, according to the XML standard, has the exact same meaning.
I have an xml file that I need to update some values from some specific tags. In header tag there are some tags with namespaces. Using find for such tags, works, but if I try to search for some other tags that do not have name spaces, it does not find it.
I tried relative, absolute path, but it does not find. The code is like this:
from lxml import etree
tree = etree.parse('test.xml')
root = tree.getroot()
# get its namespace map, excluding default namespace
nsmap = {k:v for k,v in root.nsmap.iteritems() if k}
# Replace values in tags
identity = tree.find('.//env:identity', nsmap)
identity.text = 'Placeholder' # works fine
e01_0017 = tree.find('.//e01_0017') # does not find
e01_0017.text = 'Placeholder' # and then it throws this ofcourse: AttributeError: 'NoneType' object has no attribute 'text'
# Also tried like this, but still not working
e01_0017 = tree.find('Envelope/Body/IVOIC/UNB/cmp04/e01_0017')
I even tried finding for example body tag, but it does not find it too.
This is how xml structure looks like:
<?xml version="1.0" encoding="ISO-8859-1"?><Envelope xmlns="http://www.someurl.com/TTT" xmlns:env="http://www.someurl.com/TTT_Envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xsi:schemaLocation="http://www.someurl.com/TTT TTT_INVOIC.xsd"><Header>
<env:delivery>
<env:to>
<env:address>Test</env:address>
</env:to>
<env:from>
<env:address>Test2</env:address>
</env:from>
<env:reliability>
<env:sendReceiptTo/>
<env:receiptRequiredBy/>
</env:reliability>
</env:delivery>
<env:properties>
<env:identity>some code</env:identity>
<env:sentAt>2006-03-17T00:38:04+01:00</env:sentAt>
<env:expiresAt/>
<env:topic>http://www.someurl.com/TTT/</env:topic>
</env:properties>
<env:manifest>
<env:reference uri="#INVOIC#D00A">
<env:description>Doc Name Descr</env:description>
</env:reference>
</env:manifest>
<env:process>
<env:type></env:type>
<env:instance/>
<env:handle></env:handle>
</env:process>
</Header>
<Body>
<INVOIC>
<UNB>
<cmp01>
<e01_0001>1</e01_0001>
<e02_0002>1</e02_0002>
</cmp01>
<cmp02>
<e01_0004>from</e01_0004>
</cmp02>
<cmp03>
<e01_0010>to</e01_0010>
</cmp03>
<cmp04>
<e01_0017>060334</e01_0017>
<e02_0019>1652</e02_0019>
</cmp04>
<e01_0020>1</e01_0020>
<cmp05>
<e01_0022>1</e01_0022>
</cmp05>
</UNB>
</INVOIC>
</Body>
</Envelope>
Update It seems something is wrong with header or envelope tags. If I for example use xml without that header and envelope info, then tags are found just fine. If I include envelope attributes and header, it stops finding tags. Updated xml sample with header info
The thing is that your elements like e01_0017 also has a namespace, it inherits its namespace from the namespace of its parent, in this case it goes all the way back to - <Envelope> . The namespace for your elements are - "http://www.someurl.com/TTT" .
You have two options ,
Either directly specify the namespace in the XPATH , Example -
e01_0017 = tree.find('.//{http://www.someurl.com/TTT}e01_0017')
Demo (for your xml) -
In [39]: e01_0017 = tree.find('.//{http://www.someurl.com/TTT}e01_0017')
In [40]: e01_0017
Out[40]: <Element {http://www.someurl.com/TTT}e01_0017 at 0x2fe78c8>
Another option is to add it to the nsmap with some default value for the key and then use it in the xpath. Example -
nsmap = {(k or 'def'):v for k,v in root.nsmap.items()}
e01_0017 = tree.find('.//def:e01_0017',nsmap)
I need to get some info after a specific tag in lxml.
the xml doc looks like this
<?xml version="1.0" encoding="ISO-8859-1"?>
<web-app xmlns="http://java.sun.com/xml/ns/j2ee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee http://java.sun.com/xml/
ns/j2ee/web-app_2_4.xsd"
version="2.4">
<display-name>Community Bank</display-name>
<description>WebGoat for Cigital</description>
<context-param>
<param-name>PropertiesPath</param-name>
<param-value>/WEB-INF/properties.txt</param-value>
<description>This is the path to the properties file from the servlet root</description>
</context-param>
<servlet>
<servlet-name>Index</servlet-name>
<servlet-class>com.cigital.boi.servlet.index</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>Index</servlet-name>
<url-pattern>/index</url-pattern>
</servlet-mapping>
<servlet-mapping>
<servlet-name>Index</servlet-name>
<url-pattern>/index.html</url-pattern>
</servlet-mapping>
I want to read com.cigital.boi.servlet.index .
I have used this code to read everything under servlets
context = etree.parse(handle)
list = parser.xpath('//servlet')
print list
list contains nothing
more info : iterating over the context field i found these lines.
<Element {http://java.sun.com/xml/ns/j2ee}servlet-name at 2ad19e6eca48>
<Element {http://java.sun.com/xml/ns/j2ee}servlet-class at 2ad19e6ecaf8>
I am thinking as I have not included name space while searching , output is empty list.
please suggest hoe to read "com.cigital.boi.servlet.index" in the servlet-class tag
Try following:
from lxml import etree
context = etree.parse(handle)
print next(x.text for x in context.xpath('.//*[local-name()="servlet-class"]'))
Alternative:
from lxml import etree
context = etree.parse(handle)
nsmap = context.getroot().nsmap.copy()
nsmap['xmlns'] = nsmap.pop(None)
print next(x.text for x in context.xpath('.//xmlns:servlet-class', namespaces=nsmap))