Add values dynamically in XML string using python - python

I am new in XML and stuck on some feature. My problem statement is I have a list and an XML String (structure of XML is not fixed). I have defined some identifier in my XML string (here in my case is "{some_values}") with the same name as the name of the list. I want that when my code executes, XML string can identify that list variable and the values that are present in the list will add dynamically at run time.
some_values=[1,2,3]
Input xml
<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/">
<Body>
<Add xmlns="http://tempuri.org/">
<intA>{some_values}</intA>
</Add>
</Body>
</Envelope>
OutPut Xml:
<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/">
<Body>
<Add xmlns="http://tempuri.org/">
<intA>1</intA>
<intA>2</intA>
<intA>3</intA>
</Add>
</Body>
</Envelope>
I need some approach or solution that how can I solve this problem. I read some Python XML parser's libraries and have read somewhere that we can handle XML string using python templating also but unable to find the solution that fits for this particular problem.

Try something along these lines:
import lxml.etree as ET
parser = ET.XMLParser()
some_values=[1,2,3]
content='''<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/">
<Body>
<Add xmlns="http://tempuri.org/">
<intA>{some_values}</intA>
</Add>
</Body>
</Envelope>
'''
tree = ET.fromstring(content, parser)
item = tree.xpath('.//*[local-name()="intA"]')
par = item[0].getparent()
for val in reversed(some_values):
new = ET.XML(f'<intA>{val}</intA>')
par.insert(par.index(item[0])+1,new)
par.remove(item[0])
print(etree.tostring(tree).decode())
Output (you can fix the formatting later):
<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/">
<Body>
<Add xmlns="http://tempuri.org/">
<intA>1</intA><intA>2</intA><intA>3</intA></Add>
</Body>
</Envelope>

Related

How we can read unstructured xml file in pyspark

<editors>
<p poid="1232" class="odo">
<person id="1232">Rob Jhon</person>
<br /> **this text need to be read**
<br />
<title>Sto items:</title> **"this text need to be read"**
<br />
<title>Recent items:</title> **this text need to be read**
</p>
</editors>
As you see in my dataset there are some string areas which are not tagged.
How can i read this xml properly in pyspark to see this string field as a column as well.
If xml is a file called "data.xml", you could start with:
import xml.etree.ElementTree as ET
tree = ET.parse("data.xml")
root = tree.getroot()
print(root[0][1].tail)
This works for me.

Python xml elementree how to check if element if present and process code?

<rules>
<entry name="rule name 1">
<to>
<member>untrust</member>
</to>
<from>
<member>trust</member>
</from>
<source>
<member>object1</member>
</source>
<destination>
<member>any</member>
</destination>
<service>any</service>
<description>'NAT Rule 1'</description>
<nat-type>ipv4</nat-type>
<source-translation>
<static-ip>
<bi-directional>yes</bi-directional>
<translated-address>object1-pub</translated-address>
</static-ip>
</source-translation>
</entry>
<entry name="rule name 2">
<to>
<member>untrust</member>
</to>
<from>
<member>trust</member>
</from>
<source>
<member>any</member>
</source>
<destination>
<member>object2-pub</member>
</destination>
<destination-translation>
<translated-address>object2</translated-address>
</destination-translation>
<service>any</service>
<description>'NAT Rule 2'</description>
<tag>
<member>DST NAT</member>
</tag>
</entry>
</rules>
Hi,
I am trying to process above xml using xml elementree in python. I am looking for a way to check if the <'source-traslation'> or <'destination-translation'> is present. In short, if it if source-translation then set nat-type varialble to source nat and proceed further to get and <'translated-address'> values. If <'destination-address'> is present then process logic to get values for . I am putting all this data in a dict with a format like this...
rules{
rule_name: <name>
options:{
src_zone:<from>
source:<source>
dst_zone:<to>
destination:<destination>
nat-type:<appliaction>
service:<service>
traslated-address:<translated-address>
destination-address:<destination-address>
}
}
I have tried various combinations however it is not working for me.
To check if your element exists you can have an if statement like this:
import xml.etree.ElementTree as ET
root = ET.parse('PATH_TO_YOUR_FILE').getroot()
if len(root.findall('source-translation')) > 0:
PUT YOUR CODE HERE

Using python, elementtree, xml parser to get attributes not working for some reason?

I'm new to python and parsing xml, but I'm having trouble with a particular xml file which is spat out by a program I work with. I'm trying parse this xml file using python and elementtree in order to extract the url data (the URL below is fake). Any ideas as to why this isn't working?
my python code:
def xmlTreeParser(fileName,attribute,tagName):
tree = ET.parse(fileName)
root = tree.getroot()
attribArray = [element.attrib[attribute] for element in root.findall(tagName)]
print attribArray
xmlTreeParser("xml_file.xml",'text','Expr')
here's my xml file:
<Query id="f9cef041-085d-47e0-8d16-15e36bba1ec8" name="">
<Description />
<JustSortedColumns />
<Conditions linking="All">
<Condition class="PDCT" enabled="True" readOnly="False" linking="Any">
<Condition class="SMPL" enabled="True" readOnly="False">
<Operator id="Contains" />
<Expressions>
<Expr class="ENTATTR" id="Person.LinkedInUrl" />
<Expr class="CONST" type="String" kind="Scalar" value="https://www.linkedin.com/Bill-Smith" text="https://www.linkedin.com/Bill-Smith" />
</Expressions>
</Condition>
</Condition>
</Conditions>
</Query>
The python I wrote works just fine on another, test, xml file that I wrote myself. I'm at a loss as to why I can't parse this particular block of xml. Thanks everyone.
For the specific call you make, you need to add this syntax to reach the tag Expr (doc):
xmlTreeParser("xml_file.xml",'text','.//Expr')
But also your Xml doesn't have all attributes like text, you should prevent errors like this :
attribArray = [element.attrib.get(attribute, '') for element in root.findall(tagName)]
# -----------------------------^
print(attribArray)
xmlTreeParser("xml_file.xml",'text','.//Expr')

Blank XML Namespace processing With Python

I am trying to parse a XML using python ,xml example snippet:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<raml xmlns="raml21.xsd" version="2.1">
<series xmlns="" scope="USA" name="Arizona">
<header>
<log action="created"/>
</header>
<x_ns color="Blue">
<p name="timeZone">(GMT-10)</p>
</x_ns>
<x_ns color="Red">
<p name="AvgHeight">175</p>
</x_ns>
<x_ns color="black">
<p name="AvgWeight">235</p>
</x_ns>
the problem is namespaces keeps changing so as an alternative I tried to read the xmlns string first then create a dicionary using namespaces using the below code
root = raw_xml.getroot()
namespace_temp1=root.tag.split("}")
namespace_temp2=namespace_temp1[0].strip('{')
namespaces_auto={}
tag_name =["x","y","z","w","v"]
ns_name=[namespace_temp2,namespace_temp2,namespace_temp2,namespace_temp2,namespace_temp2]
namespace_temp3=zip(tag_name,ns_name)
for tag,ns in namespace_temp3:
namespaces_auto[tag]=ns
namespaces=namespaces_auto
to access a particular tag with namespace I am using the code as follows
for data in raw_xml.findall('x:x_ns',namespaces)
this pretty much solves the problem but gets stuck when the child node has blank xmlns as seen in the series tag (xmlns=""). Not Sure how to incorporate it in the code to check this condition.

How do I parse and write XML using Python's ElementTree without moving namespaces around?

Our project gets from upstream XML of this form:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<runtime>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
<dependentAssembly>
<assemblyIdentity name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" culture="neutral" />
<bindingRedirect oldVersion="0.0.0.0-6.0.0.0" newVersion="7.0.0.0" />
</dependentAssembly>
</assemblyBinding>
</runtime>
<appSettings>
<add key="foo" value="default">
...
</appSettings>
</configuration>
It then reads/parses this XML using ElementTree, and then for every app setting matching a certain key ("foo"), it writes a new value that it knows about that the upstream process doesn't ( in this case key "foo" should have the value "bar").
The downstream process consuming the filtered XML is, aaahhhh... fragile. It expects to receive the XML in exactly the form above.
If I parse this XML without registering a namespace, then ElementTree mangles my tree like this on input:
<configuration xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
<runtime>
<ns0:assemblyBinding>
<ns0:dependentAssembly>
<ns0:assemblyIdentity culture="neutral" name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" />
<ns0:bindingRedirect newVersion="7.0.0.0" oldVersion="0.0.0.0-6.0.0.0" />
</ns0:dependentAssembly>
</ns0:assemblyBinding>
</runtime>
<appSettings>
<add key="foo" value="default">
...
</appSettings>
</configuration>
The downstream process can't handle this, because it's no clever enough to realize that, semantically, this is the same thing. So, I decide to register the namespace I know the upstream process will provide as a default namespace to avoid the prefixes showing up everywhere, and now I get this:
<configuration xmlns="urn:schemas-microsoft-com:asm.v1">
<runtime>
<assemblyBinding>
<dependentAssembly>
<assemblyIdentity culture="neutral" name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" />
<bindingRedirect newVersion="7.0.0.0" oldVersion="0.0.0.0-6.0.0.0" />
</dependentAssembly>
</assemblyBinding>
</runtime>
<appSettings>
<add key="foo" value="default">
...
</appSettings>
</configuration>
I don't know much about XML, but this also the downstream component cries about, and it seems to me that doesn't now mean this default xmlns now apply to all included elements inside <configuration>, whereas before it only applied to the <assemblyBinding> element?
Is there anyway, using ElementTree, to handle this namespace so that I can take in the upstream's XML, set foo's value, and then pass that on downstream, without moving the namespace around, and leaving it exactly as I found it?
I could use an lxml-based solution, which seems to handle this, however, lxml has a dependency on C which the downstream component would really like not to have to support: a pure Python solution is preferable.
I could read the document as HTML which would ignore the namespace attribute, let me manipulate the value I want, and then pass on the document; however, I have yet to find a Python parser that doesn't downcase all the element names, and my downstream component requires the casing on all element names to be preserved.
I could resort to string parsing and regular expressions. I would rather not write my own parser.
The only advice I could find so far about namespace handling in ElementTree suggests the "register a default namespace to avoid prefixes" approach, which I assumed would be suitable, but ElementTree then insists on moving the xmlns declaration up to the root node upon dumping.
I could also be clever build up a string that dumps the tree out in stages and in exactly the right order to put the xmlns declaration back on the "right node", but that strikes me, also, as pretty darned fragile.
Has anyone managed to get past a problem like this?
As far as I know the solution that better suits your needs is to write a pure Python custom rendering using the features exposed by xml.etree.ElementTree. Here is one possible solution:
from xml.etree import ElementTree as ET
from re import findall, sub
def render(root, buffer='', namespaces=None, level=0, indent_size=2, encoding='utf-8'):
buffer += f'<?xml version="1.0" encoding="{encoding}" ?>\n' if not level else ''
root = root.getroot() if isinstance(root, ET.ElementTree) else root
_, namespaces = ET._namespaces(root) if not level else (None, namespaces)
for element in root.iter():
indent = ' ' * indent_size * level
tag = sub(r'({[^}]+}\s*)*', '', element.tag)
buffer += f'{indent}<{tag}'
for ns in findall(r'{[^}]+}', element.tag):
ns_key = ns[1:-1]
if ns_key not in namespaces: continue
buffer += ' xmlns' + (f':{namespaces[ns_key]}' if namespaces[ns_key] != '' else '') + f'="{ns_key}"'
del namespaces[ns_key]
for k, v in element.attrib.items():
buffer += f' {k}="{v}"'
buffer += '>' + element.text.strip() if element.text else '>'
children = list(element)
for child in children:
sep = '\n' if buffer[-1] != '\n' else ''
buffer += sep + render(child, level=level+1, indent_size=indent_size, namespaces=namespaces)
buffer += f'{indent}</{tag}>\n' if 0 != len(children) else f'</{tag}>\n'
return buffer
By issuing theXML data you gave, to the above render function as show below:
data=\
'''<?xml version="1.0" encoding="utf-8"?>
<configuration>
<runtime>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
<dependentAssembly>
<assemblyIdentity name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" culture="neutral" />
<bindingRedirect oldVersion="0.0.0.0-6.0.0.0" newVersion="7.0.0.0" />
</dependentAssembly>
</assemblyBinding>
</runtime>
<appSettings>
<add key="foo" value="default" />
</appSettings>
</configuration>'''
e = ET.fromstring(data)
ET.register_namespace('', "urn:schemas-microsoft-com:asm.v1")
r = ET.ElementTree(e)
You'll get the following resulting XML having the properties you stated you are looking for:
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<runtime>
<assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
<dependentAssembly>
<assemblyIdentity name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" culture="neutral"></assemblyIdentity>
<bindingRedirect oldVersion="0.0.0.0-6.0.0.0" newVersion="7.0.0.0"></bindingRedirect>
</dependentAssembly>
</assemblyBinding>
</runtime>
<appSettings>
<add key="foo" value="default"></add>
</appSettings>
</configuration>
I know I came late to the party.. Anyway hoping this will help you and many other having the same issue, here it is a good solution. Happy coding!

Categories